The Problem With Metacritic

← Back to Stories (view on slashdot.org)

Posted by Soulskill on Tuesday July 17, 2012 @07:54PM from the saturday-morning-game-developers dept.

Metacritic has risen to a position of prominence in the gaming community — but is it given more credit than it's due? This article delves into some of the problems with using Metacritic as a measure of quality or success. Quoting: "The scores used to calculate the Metascore have issues before they are even averaged. Metacritic operates on a 0-100 scale. While it's simple to convert some scores into this scale (if it's necessary at all), others are not so easy. 1UP, for example, uses letter grades. The manner in which these scores should be converted into Metacritic scores is a matter of some debate; Metacritic says a B- is equal to a 67 because the grades A+ through F- have to be mapped to the full range of its scale, when in reality most people would view a B- as being more positive than a 67. This also doesn't account for the different interpretation of scores that outlets have -- some treat 7 as an average score, which I see as a problem in an of itself, while others see 5 as average. Trying to compensate for these variations is a nigh-impossible task and, lest we forget, Metacritic will assign scores to reviews that do not provide them. ... The act of simplifying reviews into a single Metascore also feeds into a misconception some hold about reviews. If you browse into the comments of a review anywhere on the web (particularly those of especially big games), you're likely to come across those criticizing the reviewer for his or her take on a game. People seem to mistaken reviews as something which should be 'objective.' 'Stop giving your opinion and tell us about the game' is a notion you'll see expressed from time to time, as if it is the job of a reviewer to go down a list of items that need to be addressed — objectively! — and nothing else."

22 of 131 comments (clear)

Min score:

Reason:

Sort:

But it's all subjective anyway. by TheoGB · 2012-07-17 19:58 · Score: 3, Interesting

Personally I think anything less than 7 out of 10 isn't worth my while bothering with. That's me and about time I have. Friends of mine, however, would give a film a 5 out of 10 and say it's still decent enough to stick on one night when you want something to watch. Even if Metacritic was exactly showing a score that we agreed was 'accurate' it wouldn't really matter. Aggregation of this sort is as good as doing it by eye yourself, surely?
1. Re:But it's all subjective anyway. by Razgorov+Prikazka · 2012-07-17 20:13 · Score: 4, Funny
  
  Obligatory XKCD reference: http://xkcd.com/937/
  
  --
  rm -rf --no-preserve-root / ...and let /dev/null sort them out...
2. Re:But it's all subjective anyway. by Razgorov+Prikazka · 2012-07-17 22:50 · Score: 3, Funny
  
  Yeah, I surely hope that Randall Munroe makes a cartoon on the RaspberryPi or Bitcoins, that would make the prediction a whole lot easier on a lot of /. stories :-)
  
  --
  rm -rf --no-preserve-root / ...and let /dev/null sort them out...
3. Re:But it's all subjective anyway. by mooingyak · 2012-07-18 01:35 · Score: 2
  
  Nobody rates lower than 0 or higher than 100 so it can't be a normal distribution because the tail would be cut off. This would primarily be a problem for large standard deviations or extreme scores.
  Not just that, but there are numbers in the low range that just aren't going to get used. Who rates something as a 3 out of 100? What does that mean? It was absolutely terrible except for one relatively minor part that was done right?
  
  --
  William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Solve it with Machine Learning by LittleImp · 2012-07-17 20:02 · Score: 2

This would be perfect for machine learning. Just analyze _all_ the scores from a certain source and calculate a most probable score with a standard deviation. Then assign a score from 0-100 accordingly. I don't know that much about machine learning so I'm sure an expert could find a way better algorithm for that.

"Trying to compensate for these variations is a nigh-impossible task" - definitely not.
1. Re:Solve it with Machine Learning by bWareiWare.co.uk · 2012-07-17 21:18 · Score: 4, Interesting
  
  Your description is not so much machine learning as basic math. If you just want each scoring system to have equal weight on the results then compensating for the variation is trivial.
  Where machine learning would come in is to find underlying pattens in the data.
  This could be used to weed out reviewers to lazily copy scores, or are subject to influence.
  It would also allow them to test your scores for some games against the population of reviewers to find reviewers with similar tastes.
  You could also use clustering algorithms to find niche games which got a few really strong scores but whose average was really pulled down because they don't have wide appeal.
Just use a bell curve by wjh31 · 2012-07-17 20:07 · Score: 5, Insightful

sounds in principle like a fairly simple solution. Put together a separate histogram of the scores by each reviewer. From this you can estimate what an average score really is and how many standard deviations an individual score is above or below. The meta-score then becomes the average number of sigma's the game is above or below the various averages. If necessary this score can be sanitised to something easier to read for those less familiar with Gaussian statistics
1. Re:Just use a bell curve by fph+il+quozientatore · 2012-07-17 20:37 · Score: 3, Insightful
  
  My thoughts exactly. This is not rocket science; it is sad to read that no one in their whole company seems to have a clue about statistics.
  
  --
  My first program:
  Hell Segmentation fault
2. Re:Just use a bell curve by Anonymous Coward · 2012-07-17 20:41 · Score: 2
  
  Well it is extremely likely that Metacritic has a system like this implemented, but the article is just ignorant.
3. Re:Just use a bell curve by Relic+of+the+Future · 2012-07-18 07:04 · Score: 2
  
  The highest score for a game on metacritic last year was 96. Only 23 console games, out of over 260, scored above 90. I think people might be exaggerating a bit. http://www.metacritic.com/feature/best-video-games-of-2011
  
  --
  Those who fail to understand communication protocols, are doomed to repeat them over port 80.
I don't get the point of it anyway by Riceballsan · 2012-07-17 20:27 · Score: 3, Insightful

For me anyway, when it comes to finding good reviews of things, I've always found a mass average entirely useless. Just because 10,000 out of 15,000 like something, it has no bearing over if I will like it and quite often leads to the oposite. Instead what I do is I check reviews of the games movies shows etc... that I already have seen, I find the reviewers that are the closest match to how I felt about things in the past. Then I check their reviews of what I haven't seen. It isn't a perfect system, but it works overall, and tends to be more accurate to my tastes than other methods that I have tried. In addition of course actually reading detailed reviews with explanations of why they felt that way. If you are one who is looking for a game for a deep story and the review is 9/10 saying "Great explosions, incredible action at every turn, the graphics were spectacular, the story was a little weak but that is made up for by the incredible pace of the combat", odds are it isn't a good game for someone looking for a deep plot.
1. Re:I don't get the point of it anyway by thegarbz · 2012-07-17 21:22 · Score: 3, Funny
  
  Just because 10,000 out of 15,000 like something, it has no bearing over if I will like it and quite often leads to the oposite.
  
  So you're not interested in my remake of Twilight staring Justin Bieber?
Is that so? by Warma · 2012-07-17 20:35 · Score: 2

This line of thought seems faulty, but I have to admit that I feel the same way for most scores reviewing sites and magazines deal out. A score of less than 7 out of 10 is reserved for seriously failed works, and typically these works never merit a recommendation. However, if anything below 7 is not worth experiencing, you essentially only have five possible scores: bad, 7, 8, 9 and 10. The rest of the scale is simply wasted.
I always wondered if this is caused by school grades in youth influencing what people think as acceptable ratings. For example in Finland, grade school grades go from 4 (failed) to 10 (best), where 7 ends up being an average score and anything below it is considered poor.
1. Re:Is that so? by SpooForBrains · 2012-07-17 22:15 · Score: 5, Interesting
  
  I work for a review platform. We have decided that you only really need four ratings, Bad, Poor, Good, Excellent. We don't have a neutral option because really neutral tends to mean bad.
  Of course, quite a lot of our users (and our marketing department) seem to prefer stars. Because an arbitrary scale is so much more useful that simply saying what you think of something. Apparently.
  
  --
  "The dew has clearly fallen with a particularly sickening thud this morning"
Bell curve doesn't work that well either... by F69631 · 2012-07-17 20:45 · Score: 5, Insightful

One reviewer might only rate highly hyped games which he expects to be good (nearly all fall to 60-100 range) and other reviewer tries out pretty much everything he encounters to find out those lone gems among less well-known indie games, etc. (let's say ranging from 20 to 95). We can't just take a bell curve of each and say "Game A is slightly above average on first reviewer's scale and Game B is slightly above average on the second reviewer's scale... so they're probably about equally good!". Sure, with large number of reviewers, you can still see which games do well and which won't but you have lost at least as much precision as you would have if you hadn't taken the bell curve in the first place.
That said, I don't know if reviews are that relevant anymore. I am active gamer but don't remember when was the last time I read a full review... There have been two times recently when I bought newer games from series I had played years ago (Cossacks and Anno 1602). I just wanted to take a quick peek on whether the games were considered about equally good, better or worse than the ones I had liked and whether they were very similar with just better graphics etc. or if some major concept had changed. That consisted mostly of looking the games up on Wikipedia and quickly glancing the first reviews I found using Google. I think I also checked the metascore, but it was more among the lines of "I'll buy it unless it turns out to have metascore under 60 or something". I didn't use that as exact metric.
Most games I buy are ones recommended to me by my friends, those recommended by blogs I follow (e.g., the Penny Arcade guys' news feed... you could consider those reviews, but they don't mention the games they hated, don't give scores, etc., just mention "Hey, that was pretty good. Try it out.") or those that just seem fun and don't cost much (When I noticed Orcs Must Die on Steam for under 5 euros, I didn't start doing extensive research on the critical acclaim of the game.)
It can be used as an executive summary by Sycraft-fu · 2012-07-17 20:55 · Score: 4, Insightful

I find it useful for that. If there's something you have little knowledge or information about it can give you a quick breakdown of what you might expect. For example if a game has a 90 metascore, you know it is something that you should probably look in to further, that is uncommonly high. If something has a 40 metascore, you can pretty much give it a miss, that is uncommonly low.
What it'll then get you for things you do want to look in to further is a list of reviews. So you can see what sites have reviewed it, and then go and read the specifics if you wish. Along those lines it is a quick way to find good and bad reviews. When I'm on the fence about something I like to see what people thought was good and bad. I can then weigh for myself how much those matter to me.
Average ratings really can be of some use to filter. I just don't give enough of a shit about every game to go and read multiple full reviews on it and research it. So if it isn't a game I was already interested in, I want a sort of executive summary to decide if I should give it any more time. Metacritic helps with that.
Two recent examples:
1) Endless Space. I had never heard of this game, an indy 4X space game apparently, though rather well developed. Ok well ambitious indy titles can be all over the map. Metascore is 78. That tells me it is worth looking at, it is on my list and I'll look at it more in depth when I feel like playing such a game.
2) Fray, a turn-based strategy sci-fi game. Again, something I hadn't heard of, however a kind of game I like so maybe I'd be interested. Metascore of 32. So no, not wasting time on that.
Other games I won't bother on the Metascore, just use it to find reviews. Like Orcs Must Die 2. Looking forward to that one, so I'll spend time researching it to see if I want to buy it. I liked the original enough it'll be worth looking at reviews, no matter what the score, so see if I think I'll like the next one.
We need a meta-meta-critic by TechnoCore · 2012-07-17 20:55 · Score: 2

Even though meta critic has become the standard for measuring the quality of a game, they sadly do not check the quality or sincerity of the reviewers they pick. I myself work at a smaller indie game studio. Our last project got reviews between ranging from between 2 to 10. How that even is possible is due to several factors, though the main one being that some reviewers didn't really review the game at all. They just scraped at the surface of it, and Metacritic then used that score. Our game wasn't perfect, neither was it crap. It is fun, addictive, beautiful, with a few bugs. But was it a 2 or a 10? Never.

I know that the larger companies in the business keep track of every journalist and blog that has been lucky enough to have been taken up at Metacritic. If the reviewer is known for giving constantly low or bad reviews they will never receive a copy for reviewing. That doesn't hinder people from buying the game at release and then reviewing it anyway, though it might stop those important first reviews from being bad I guess. Guess we have to do the same at our little studio.

What is really needed is a meta-meta-critic. A site where journalists and reviewers themselves are rated based on their seriousness. Something like the system for rating comments here at Slashdot.
Depends on what you mean by using the range by Sycraft-fu · 2012-07-17 21:06 · Score: 3, Interesting

In most US schools, the scale is:
A: 100-90
B: 89-80
C: 79-70
D: 69-60
F (or sometimes E): 59-0
So while you can percentage wise score anywhere from 0-100 on an assignment and on the final grade, 59% or below is failing. In terms of the grades an A means (or is supposed to mean) an excellent grasp of the material, a B a good grasp, a C an acceptable grasp, a D a below average grasp but still enough, and an F an unsatisfactory grasp.
So translate that to reviews and you get the same system. Also it can be useful to have a range of bad. Anything under 60% is bad in grade terms but looking at the percentages can tell you how bad. A 55% means you failed, but were close to passing. A 10% means you probably didn't even try.
So games could be looked at the same way. The ratings do seem to get used that way too. When you see sites hand out ratings in the 60s (or 6/10) they usually are giving it a marginal rating, like "We wouldn't really recommend this, but it isn't horrible so maybe if you really like this kind of game." A rating in the 50s is pretty much a no recommendation but for a game that is just bad not truly horrible. When a real piece of shit comes along, it will get things in the 30s or 20s (maybe lower).
A "grade style" rating system does make some sense, also in particular since we are not rating in terms of averages. I don't think anyone gives a shit if the game is "average" or not, they care if it is good. The "average" game could be good or bad, that really isn't relevant. What is relevant is do you want to play a specific game.
Rottentomatoes by cheesecake23 · 2012-07-17 22:35 · Score: 4, Interesting

By that logic, Rottentomatoes (which averages reviews using only a binary fresh/rotten scale) should be utterly useless. Except it isn't. It's IMHO the most dependable rating site on the net.
It seems the magic lies not in the rating resolution, but in the quality and size of the reviewer pool (100+ for Rottentomatoes). In other words, make the law of averages work for you.
1. Re:Rottentomatoes by Dorkmaster+Flek · 2012-07-18 01:31 · Score: 3, Interesting
  
  Rotten Tomatoes uses a different system though. In fact, I really like their system. They look at a review and decide ultimately whether the critic enjoyed the movie enough to recommend it or not. It's like Siskel & Ebert's thumbs up or down system; fresh or rotten. The only factor is whether the enjoyed the movie or not. There's none of this trying to take a letter grade and turn it into a number from 1-100 bullshit. The Rotten Tomatoes rating is simply a percentage of the number of critics who liked the film enough to recommend it out of the total number of reviews, which I find much more useful. It's still no substitute for the most reliable method, which somebody else above mentioned: find a reviewer whose taste agrees with you on past films/games/whatever and see what they say about new ones. Rotten Tomatoes takes less time though.
  
  --
  I like to think of online DRM as something akin to a college -- you pay for lessons until you learn something.
2. Re:Rottentomatoes by Zaphod+The+42nd · 2012-07-18 01:50 · Score: 2
  
  Exactly, came here to say this.
  
  Rottentomatoes rating is not a rating of how good a movie is. Rather, it is how likely you are to enjoy it. It is a probability!!
  
  A movie with 10% on rottentomatoes doesn't mean its a movie worth a 10 grade, it means that only a niche audience enjoyed. So you're less likely to be part of that 10%, but its absolutely possible you still love the film, for very specific reasons. Similarly, a movie with a 98% rating isn't necessarily the best movie or a very high quality film, it just means that a large portion of the population will find it enjoyable overall.
  
  If you treat metacritic the same way, there's nothing wrong at all.
  
  --
  GCS/MU/P d- s:- a-- C++++$ UL++ P+ L++ E+ W++ N o K- w--- O M+ V- PS+++ PE Y+ PGP t+ 5- X R++ tv+ b++ DI++ D++ G+ e++ h-
Value in variation of reviews, not the "score" by swb · 2012-07-18 02:05 · Score: 2

I think the value in metacritic isn't the "score" but the variation across all reviews. You could have two titles with identical "80" scores, which would otherwise indicate both titles are equally well liked.
That being said, one title could have all of its reviews be between 70 and 90, while the other could have a lot of low scores and a lot of high scores. The high variation in scores tells you that there's something about that title that's amiss.
It would be interesting to see statistics compiled for reviewers, too. Do some reviewers always deviate above the average? Below? I would think a reviewer with a higher variability of ratings would be more trustworthy than one who was consistent with their reviews.