Why You Should Be Suspicious of Online Movie Ratings (fivethirtyeight.com)
An anonymous reader writes: Statistical news blog fivethirtyeight.com noticed some odd discrepancies in online movie ratings, which caused them to do some investigating. They found it was generally a bad idea to rely on such ratings, particularly from sites like Fandango. "When I focused on movies that had 308 or more user reviews, none of the 209 films had below a 3-star rating. Seventy-eight percent had a rating of 4 stars or higher." Further, "In a normal rounding system, a site would round to the nearest half-star — up or down. In the case of Ted 2 [which was displaying 4.5 stars], then, we'd expect the rating to be rounded down to 4 stars. But Fandango rounded the 'ratingValue' [4.1] up. I pulled the number of stars listed on the page of each film in our sample of 437 (with at least one user review), as well as the ratingValue listed on the page's source. And I found that Fandango doesn't round a rating down when we'd mathematically expect that ... Fandango.com's rounding methodology, even if it was just an innocent bug, is a good example of why you should be skeptical of online movie ratings, especially from companies selling you tickets."
A friend is in the movie biz and his reaction to any criticism of the recent Star Trek reboots is Rotten Tomatoes is an objective measure. I can forgive him the logical error because he's in the industry and the financials are more important to him than say to you or I. So aggregated movie reviews that drive customer purchases to him indicate success.
However, as far as I know, Rotten Tomatoes never publishes its weighting formula
And it's opened by a movie studio.
This seems to me perfect for abuse.
---- The above post was generated by the Turing Institute. Maybe.
Rotten Tomatoes is the gold standard for movie quality measurement. Accept no substitutes.
Seriously, if someone is relying on Fandango to tell them if a movie is any good, they deserve to watch dreck.
Also remember that the minimum you are allowed to give on most sites is 1 out of 5, meaning even if everyone hated the movie it would have a "1". In other words people think of the "out of 5 stars" as a system of zero stars to 5 stars but in reality it is one star to five stars.
Example:
One person hates the movie and gives it "1"
One person love the movie and gives it "5"
The average is "3", which visually is not the middle,
Despite only 50% of people liking the movie it appears as if 60% like the movie
Also, stars have a positive connotation. 3 "stars" to me does not seem to represent that half of the people in the world hate it.
This affect also skews the results the worse the reviews. It is nearly impossible to have only 1 star show.
This sort of bias is so endemic to online polling that it's hopeless to try to correct it. All you can do is keep it in mind when you see ratings, and decide that Dark Knight is probably really around a 8.7, not a 9.0. And Shawshank Redemption must be really, really good if it's holding onto the #1 spot despite not appealing to a specific demographic.
I've seen some sites attempt to correct for this by assuming any "real" sample will be gaussian (have a distribution which falls on a normal curve). If the votes something receives are skewed away from guassian (e.g. clustered towards the high end), the site attempts to correct for this by skewing the score down. No idea how accurate or reliable that is, but it is being done in some places.
Rather than try to come up with one, universal rating which is implicitly applicable to everyone, Netflix's approach is probably more sensible. Depending on the movies you watch and the ratings you give them, Netflix builds up a profile of your preferences. They try to match your profile with that of other people who watched similar movies and gave them similar ratings, then makes recommendations based on what those other people watched. So if you hated Dark Knight, then there's a good chance you're not really into movies based on comic bo^H^H^H^H^H^H^H^Hgraphic novels, and so will downrate them for you personally.
This does raise some privacy implications, but on the balance I believe this is the more sensible approach to ratings. Giving up some privacy to greatly increase the signal-to-noise ratio of things like movie recommendations may be worth it in some cases. This also mostly corrects for self-selection bias, assuming your self-selection can be accurately measured.
The biggest problem is that we're using an all-positive scale to capture positive and negative opinions. The results will never be interpreted correctly because that's a feature of the system. "How much did you like this movie: a little, somewhat, a good amount, quite a bit, or a lot?" It's Colbert's "Great President or the greatest President?" bit, only with the expectation of being taken seriously. The RT method at least addresses this, to some extent.
But there's still a lot missing in terms of magnitude. Without standard deviation bars, the star ratings don't tell you a whole lot. A bland but tolerable movie might get straight 3s, while something with strong niche appeal might get the same average with mostly 1s and 5s. There's a lot more useful data in the voting, especially if you can determine patterns in voting behavior and group movies based on who feels strongly about them.
And the final issue is the skewed distribution of samples. Presumably, the worst of the bunch never see the light of day. The movies that don't even deserve one star and would be liked by nobody should never be released into the wild. Some still are, but many more aren't. So of course most movies, especially the ones seen by the most people, have high ratings, as seen in the comparison plot in the article. It is essentially a letter grading system where a 3 equates to a C; once you get much lower, you flunk out.
What matters isn't how many people, on average, don't hate a movie, it's who likes it and how their opinion relates to you. Expect ratings systems to get a lot more personal in the future, essentially becoming targeted advertising.