fake ratings

  • 1
  • Question
  • Updated 4 weeks ago
  • Answered
Hi IMDb team


I have recently posted this question/suggestion in a closed topic, which probably explains the lack of reaction / response. This topic was: https://getsatisfaction.com/imdb/topics/fake-reviews-and-ratings. It dealt with an extraordinary accumulation of fake ratings and fake reviews for a specific movie: The last boy (tt4157728)

This example, i.e. The last boy, is sadly representative of a growing trend on IMDb: an irritating pollution from fake users via a parasitic activity with an obvious lack of relevance and/or honesty: fake rating, fake reviewing, ... Without wanting to appear disrespectful towards the director of this film, it probably do not deserve this avalanche of 9 and 10. In my humble opinion, of course.

Before being critical, let me be clear on one point: I truly love the IMDb website. As well as the iPhone app. Despite its imperfections, it's clearly a worldwide reference. In order to improve its credibility, I think the IMDb team should harden the concept of 'regular' user, one of the main objectives being to reduce or even evacuate such activity. The IMDb audience should be reduced to moviegoers, and moviegoers only, without forgetting the padawans who wish to become one of them. Then, my email is only motivated by this parasitic activity as annoying as growing. Nothing more. Nothing less.

So...

Five month ago, the rating distribution of The last boy was like this:

Rating      Raw votes
10             2054 
9               1797
8               72
7               112
6               152
5               156
4               130
3               122
2               84 
1               145
Thus, the >raw< average was exactly: 8,5 of 10


if we postulate that the fake ratings are concentrated on 1, 9 and 10 and that we roughly correct the 3 extreme values (1, 9 and 10 only), we obtain this new distribution:

Rating      Votes with a rough correction
10             15
9                35
8                72
7                112
6                152
5                156
4                130
3                122
2                84
1                30
The >corrected< average was approximately: 5,1 of 10

The >weighted< average computed by IMDb was yet: 7,2 of 10.
Better than 8,5, obviously.
But insufficiently weighted, undoubtedly.


My first point is: because of this parasitic activity, the credibility of IMDb is somewhat at stake.
My first question is then: what >new< measures do you intend to put in place in order to reduce this issue?

My second and last point is: since Karlito's mail, no improvement is measurable for this movie, during five months, while you are fully aware of the issue, even if I realize that 1) this kind of parasitic activity is not solved film by film, but in a global way and 2) this is not an easy issue.
My second and last question is then: Why do you seem to ignore the issue? Your answer to Karlito looks indeed minimalist. Very minimalist...


If I may, I would like to propose a draft. Each user should be rated according to a credibility index, between 0 (absolute mistrust) and 1 (optimal trust). The weight used for the weighted average and the helpfulness used for the reviews should be both deduced from this CI (i.e. Credibility Index). Thus, if you are a weakly-credible user, your ratings, reviews and tutti quanti should then slightly taken into account or even literally ignored in the case of a null value.

In order to assess the CI of a specific user, the IMDb team may use different key indicators such as:

1) an optional communication of an electronic copy of the identity card, like I did 6 years ago when registering with AirBnB. If not, the CI should be significantly lower than 0.5, no matter what. As long as the user is not irrevocably identified, (s)he remains a guest with lower privileges and then a weak impact on IMDb.

2) a mathematical analysis of his/her set of ratings
     2.1) the set of ratings must be quantitative
     2.2) the distribution curve of the ratings must follow these basic rules:
          2.2.1) It should be a Gaussian-like curve.
          2.2.2) If the user sees everything or if (s)he chooses the movies randomly, the Gaussian curve should be centered on 5.5. Since this is probably not the case, it will be centered on a higher value like 6 or even 6.5. But definitely not 8 or 9!  Thus, the CI should be indexed on this value: abs(average -5.5).
          2.2.3) The ratings should not be from 6 to 9 for instance. They must be really from 1 up to 10.  
         
Two extreme examples
   *) If the distribution curve looks like a Gaussian centered on 5.5, spread between 1 and 10 and based on a quantitative set, the CI will be then optimal. For instance, the distribution curve of Col Needham (ur1000000) is close to perfection with an ultra-quantitative set. His CI should be 0.999 or even 1.
   *) On the contrary, if the curve looks like a small Dirac delta function, the CI will then decrease significantly. For instance, the distribution curve of Ariana Catarina (ur57470894) is weird even if her ratings set is quantitative enough. Her CI should be less than 0.1


I hope it will help


Best regards,
Stéphane

Photo of FrenchEddieFelson

FrenchEddieFelson

  • 19 Posts
  • 18 Reply Likes

Posted 4 weeks ago

  • 1
Photo of Ed Jones (XLIX)

Ed Jones (XLIX)

  • 15396 Posts
  • 17576 Reply Likes
if we postulate that the fake ratings are concentrated on 1, 9 and 10 and that we roughly correct the 3 extreme values (1, 9 and 10 only), we obtain this new distribution:
What? Are you out of your mind? "Roughly Correct"!!!!!!
That's a nuclear solution!!!!!!!!!!
I'll have the cool aide your drinking!!!!!!!!!!!!
Photo of FrenchEddieFelson

FrenchEddieFelson

  • 19 Posts
  • 18 Reply Likes
What an impetuosity! I think you have not read carefully my email. As a simulation, I manually and approximately corrected the notes in order to virtually eliminate the fake ratings. This is purely a simulation to demonstrate that the 'weighted average' computed by IMDb is not pertinent enough.
Photo of bderoes

bderoes, Champion

  • 1342 Posts
  • 2125 Reply Likes
2 issues:
1) Just to clarify: did you eliminate ~4000 of 4800+ votes for your second distribution of The Last Boy to show how difficult/impossible it is to adjust a heavily skewed distribution? You weren't actually proposing those votes be eliminated? Or, if you were, how would that be determined?

2) I love your idea of a Credibility Index, and weighing that user's vote accordingly. I'd quibble with these choices: the distribution must be centered at 5.5, and the ratings must be spread from 1 to 10.

Even Col Needham, who's rated 11k+ titles, has a higher center than that (6.86):

10: 186 | 9: 798 | 8: 2,924 | 7: 4,387 | 6: 1,994 | 5: 825 | 4: 331 | 3: 161 | 2: 156 | 1: 111

My own distribution looks similar but very different (mean 6.33):

10: 15 | 9: 112 | 8: 559 | 7: 1,126 | 6: 2,336 | 5: 807 | 4: 91 | 3,2,1: 0
(I'm not thrilled with 6 being so dominant, but I decided long ago what falls there, and decided to remain consistent since I have no way to adjust them accurately.)

But my point is this: given the meanings I've assigned to the ratings 4-10, I'll probably never watch a film that I'll assign 3, 2, or 1. (I've managed to rate 14 titles with average between 3 and 4; none lower. My ratings for those were mostly 5s.) For me, a 4 has come to mean a film so irritating that I barely got through it.

Does seeing my distribution & explanation do anything to alter your recommendations, particularly 2.2.3?
Photo of Ed Jones (XLIX)

Ed Jones (XLIX)

  • 15396 Posts
  • 17576 Reply Likes
Lets have an election. Lets NOT count the votes from Paris. Too many votes come from there and are not to be trusted! But to be fair we will give those Paris people a weighted vote of .7%.

Your dropping the 10 votes by a 99.3% is absurd! 2054=15!
(Edited)
Photo of FrenchEddieFelson

FrenchEddieFelson

  • 19 Posts
  • 18 Reply Likes
You should watch your tone and open your mind
Photo of Ed Jones (XLIX)

Ed Jones (XLIX)

  • 15396 Posts
  • 17576 Reply Likes
You should not be so draconian in tone!
Photo of FrenchEddieFelson

FrenchEddieFelson

  • 19 Posts
  • 18 Reply Likes
Hello 'bderoes'
Hello everyone else


My wish is simple and I think that many users share it: I use IMDb, among other things to decide if I will see this or that film ... thanks to the votes of the other users. In this regard, IMDb must be credible.

As well as: What are the best Argentine movies of the 70s? What are the best Japanese animes of the 90s? And so on... I knew the 'before the internet'. IMDb is definitely a gold mine! With my previous email, I want to highlight a growing and irritating problem. And, if I may, I would like to propose solutions.


Focus on The last boy

As postulates:
     1) As soon as the ratings are quantitative enough, a distribution curve has to be a Gaussian. For a movie or for a user, the mathematical concept is strictly identical. Therefore, the 9 and 10 ratings for the movie The last boy are massively lacking of relevance and/or honesty, subjectively speaking. It's obvious and irritating.
     2 ) A specific rating can absolutely not be categorized 'fake' or irrelevant (everyone, and I'm one of them, rates sometimes with passion but with the awareness of exaggerating, with a 10/10 for Black Panther, a 10/10 for The red sea diving resort, a 10/10 for La cage aux folles, ...) but a user can be categorized trustworthy or untrustworthy, relevant or not, with a Credibility Index for instance. This is absolutely not easy but this is mandatory to eradicate the 'noise' within a distribution curve.

So, with the cleaned distribution of my previous mail, the objective is to eliminate the 'noise'. Removing virtually 4000 ratings is an objective in order to evaluate the real average, without the noise. It's approximate and raw. May I do it? Yes. Why? 900 ratings (i.e. 4900-4000) is definitely quantitative enough and the result is now almost a Gaussian. Yeees!

The >raw< average was exactly: 8,5 of 10 (with the noise)
The >corrected< average was approximately: 5,1 of 10 (without the noise)
The >weighted< average computed by IMDb was: 7,2 of 10

Conclusion: The IMDb team uses an undisclosed algorithm to compute a weighted average. Private as a black box and that's an excellent idea. But, this example clearly demonstrates the insufficiency of the algorithm. 7.2 is better than 8.5 but far, far, far away from 5.1


With the noise: 



Without the noise:


Focus on the Credibilty Index

My main point is: The IMDb team must irrevocably identify his audience. Why? An IMDb account might be created in 50 seconds. You may then massively create 100 accounts in about one hour and then rate 100 times any movie with a 1 or with a 10. Just like that! As long as IMDb will not irrevocably identify the users, the 'fake rating' issue will remain.

The mathematical analysis of the rating curve may complete the previous point. The IMDb team should evaluate the Credibility Index of each user and asses if each specific user is relevant (a little, a lot, passionately, ... not at all) for the whole community. A 'beautiful' Gaussian centered on a relevant value (6.3 for instance is excellent in my humble opinion) and spread from 1 up to 10 is an objective for every old-enough user. An objective as a postulate that everyone should share. You don't have to be shy to rate sometimes with a 10 or 1.


Best regards,
Stéphane
Photo of bderoes

bderoes, Champion

  • 1342 Posts
  • 2125 Reply Likes
Stéphane,
You have re-explained your original post.

I agree that the distribution of ratings for most films should look Normal (Gaussian), EXCEPT when films are incredibly good and skew to being centered at 9 or 10, when it should look like a truncated bell. Without having seen The Last Boy (2019), I have no reason to believe that the errant votes are not those BELOW 8, rather than those above it. Without seeing a list of the films voted on by the Top 1000 voters, I have no reason to trust their 41-vote (not enough) average (4.8), and the distribution is tri-modal.

Unfortunately that page does not break down the Top 1000 voters into their demographics, which might be helpful, especially with so few votes.

Regarding a Confidence Index: I still dislike being penalized for refusing to use 3,2,1. I'll never sit through a film that bad, and I refuse to rate a film I didn't finish. It's conceivable that a film I start out hating actually reveals something profound and redeems/explains the footage I thought was horrid. So when I quit watching, I don't rate.

Based on Col's data, on mine, and on the young user you cited, would you like to show us a trio of CI calculations? At one point you said Col would earn a .999 or 1, but his mean is 6.86, and you mentioned indexing the CI by subtracting that from an "ideal" mean (hopefully 6 or 6.5, not 5.5; we don't view films randomly).
  • how to test whether the votes were Normally distributed (is there one that yields a coefficient describing the fit that could be a factor in a CI calculation)
  • what would be the minimum number of votes, and how would that contribute to the CI, although perhaps that is not separate from the previous one, or perhaps a shortfall prevents such a calculation
  • perhaps the distance from the ideal mean is also already part of a Normal distribution test/coefficient
  • perhaps the absence of 3,2,1 scores would also be absorbed by a good Normal-fit coefficient
If you need a standard deviation to complete a computation, I exported my ratings and the mean is 6.32653, stdev is 1.020095 with N=5047.



Photo of FrenchEddieFelson

FrenchEddieFelson

  • 19 Posts
  • 18 Reply Likes
I indeed rephrased my first post to avoid any misunderstanding

Focus on The last boy
With 41 Top1000 voters, we leave the world of Statistics. Definitely! These data are fundamentally not exploitable. Not at all. Furthermore, the distribution curve with all the users is as odd  as weird with a double Dirac delta function with the 9 and 10 ratings. Unfortunately, with a behavior based on thoughts such as "I have no reason to believe that the errant votes are not those BELOW 8, rather than those above it", the "fake rating" issue will last for a long, long time.

Focus on the Credibilty Index
To assess the Gaussian-likeness of a distribution curve in a IMDb context is undoubtedly an iterative process. It seems difficult to develop or synthesize such concepts in a post. The idea is there.


No official response from IMDb ? Moreover, this post has been polluted by Ed Jones with remarks as irreverent as insignificant.  I'm done


See you
Photo of Ed Jones (XLIX)

Ed Jones (XLIX)

  • 15396 Posts
  • 17576 Reply Likes
See ya back!
You get two people that disagree with you and your outta here!
Please acknowledge the effort of bederoes to debunk your over reactionary stand on your heavy handed and biased treatment of a certain segment of fans that love Sci-Fi and for them, an 8 is OK, a 9 is good, and a 10 is excellent. They are not fake accounts voting. That is their interpretation. To them a 7 is bad. I fall in to this category of voters. Most Sci-Fi on average I rank higher because as a general rule it is more imaginative than Rocky or Jaws or Rambo type movies.
Photo of Ed Jones (XLIX)

Ed Jones (XLIX)

  • 15396 Posts
  • 17576 Reply Likes
Stephane, you have proposed draconian changes based on a non mathematical assumption that 99.3% of all 10 votes are not to be trusted. Since your formula is based on non existent data, your computations are no more reliable than those 10 votes you have such disdain for.

Yes, I agree with your theory of vote stuffing by fake accounts. When IMDb develops a way to identify them, you and I will be the last to know. They will continue to keep that a secret. So your expectation of a reply when you did not get one the first time is flawed thinking. They will not agree or disagree that your idea is sound. Also, one may or may not assume that they are already working on this. There is another topic going on right now about the shift in ratings in the IMDb Top 250.

They may be using a variation of your formula as we speak.

But DO NOT think for a nano-second that they will satisfy this topic with anything more than a canned reply or a closing of this topic with the above heading of "Answered"