The Problem With Metacritic

← Back to Stories (view on slashdot.org)

Posted by Soulskill on Tuesday July 17, 2012 @07:54PM from the saturday-morning-game-developers dept.

Metacritic has risen to a position of prominence in the gaming community — but is it given more credit than it's due? This article delves into some of the problems with using Metacritic as a measure of quality or success. Quoting: "The scores used to calculate the Metascore have issues before they are even averaged. Metacritic operates on a 0-100 scale. While it's simple to convert some scores into this scale (if it's necessary at all), others are not so easy. 1UP, for example, uses letter grades. The manner in which these scores should be converted into Metacritic scores is a matter of some debate; Metacritic says a B- is equal to a 67 because the grades A+ through F- have to be mapped to the full range of its scale, when in reality most people would view a B- as being more positive than a 67. This also doesn't account for the different interpretation of scores that outlets have -- some treat 7 as an average score, which I see as a problem in an of itself, while others see 5 as average. Trying to compensate for these variations is a nigh-impossible task and, lest we forget, Metacritic will assign scores to reviews that do not provide them. ... The act of simplifying reviews into a single Metascore also feeds into a misconception some hold about reviews. If you browse into the comments of a review anywhere on the web (particularly those of especially big games), you're likely to come across those criticizing the reviewer for his or her take on a game. People seem to mistaken reviews as something which should be 'objective.' 'Stop giving your opinion and tell us about the game' is a notion you'll see expressed from time to time, as if it is the job of a reviewer to go down a list of items that need to be addressed — objectively! — and nothing else."

14 of 131 comments (clear)

Min score:

Reason:

Sort:

But it's all subjective anyway. by TheoGB · 2012-07-17 19:58 · Score: 3, Interesting

Personally I think anything less than 7 out of 10 isn't worth my while bothering with. That's me and about time I have. Friends of mine, however, would give a film a 5 out of 10 and say it's still decent enough to stick on one night when you want something to watch. Even if Metacritic was exactly showing a score that we agreed was 'accurate' it wouldn't really matter. Aggregation of this sort is as good as doing it by eye yourself, surely?
1. Re:But it's all subjective anyway. by Razgorov+Prikazka · 2012-07-17 20:13 · Score: 4, Funny
  
  Obligatory XKCD reference: http://xkcd.com/937/
  
  --
  rm -rf --no-preserve-root / ...and let /dev/null sort them out...
2. Re:But it's all subjective anyway. by Razgorov+Prikazka · 2012-07-17 22:50 · Score: 3, Funny
  
  Yeah, I surely hope that Randall Munroe makes a cartoon on the RaspberryPi or Bitcoins, that would make the prediction a whole lot easier on a lot of /. stories :-)
  
  --
  rm -rf --no-preserve-root / ...and let /dev/null sort them out...
Just use a bell curve by wjh31 · 2012-07-17 20:07 · Score: 5, Insightful

sounds in principle like a fairly simple solution. Put together a separate histogram of the scores by each reviewer. From this you can estimate what an average score really is and how many standard deviations an individual score is above or below. The meta-score then becomes the average number of sigma's the game is above or below the various averages. If necessary this score can be sanitised to something easier to read for those less familiar with Gaussian statistics
1. Re:Just use a bell curve by fph+il+quozientatore · 2012-07-17 20:37 · Score: 3, Insightful
  
  My thoughts exactly. This is not rocket science; it is sad to read that no one in their whole company seems to have a clue about statistics.
  
  --
  My first program:
  Hell Segmentation fault
I don't get the point of it anyway by Riceballsan · 2012-07-17 20:27 · Score: 3, Insightful

For me anyway, when it comes to finding good reviews of things, I've always found a mass average entirely useless. Just because 10,000 out of 15,000 like something, it has no bearing over if I will like it and quite often leads to the oposite. Instead what I do is I check reviews of the games movies shows etc... that I already have seen, I find the reviewers that are the closest match to how I felt about things in the past. Then I check their reviews of what I haven't seen. It isn't a perfect system, but it works overall, and tends to be more accurate to my tastes than other methods that I have tried. In addition of course actually reading detailed reviews with explanations of why they felt that way. If you are one who is looking for a game for a deep story and the review is 9/10 saying "Great explosions, incredible action at every turn, the graphics were spectacular, the story was a little weak but that is made up for by the incredible pace of the combat", odds are it isn't a good game for someone looking for a deep plot.
1. Re:I don't get the point of it anyway by thegarbz · 2012-07-17 21:22 · Score: 3, Funny
  
  Just because 10,000 out of 15,000 like something, it has no bearing over if I will like it and quite often leads to the oposite.
  
  So you're not interested in my remake of Twilight staring Justin Bieber?
Bell curve doesn't work that well either... by F69631 · 2012-07-17 20:45 · Score: 5, Insightful

One reviewer might only rate highly hyped games which he expects to be good (nearly all fall to 60-100 range) and other reviewer tries out pretty much everything he encounters to find out those lone gems among less well-known indie games, etc. (let's say ranging from 20 to 95). We can't just take a bell curve of each and say "Game A is slightly above average on first reviewer's scale and Game B is slightly above average on the second reviewer's scale... so they're probably about equally good!". Sure, with large number of reviewers, you can still see which games do well and which won't but you have lost at least as much precision as you would have if you hadn't taken the bell curve in the first place.
That said, I don't know if reviews are that relevant anymore. I am active gamer but don't remember when was the last time I read a full review... There have been two times recently when I bought newer games from series I had played years ago (Cossacks and Anno 1602). I just wanted to take a quick peek on whether the games were considered about equally good, better or worse than the ones I had liked and whether they were very similar with just better graphics etc. or if some major concept had changed. That consisted mostly of looking the games up on Wikipedia and quickly glancing the first reviews I found using Google. I think I also checked the metascore, but it was more among the lines of "I'll buy it unless it turns out to have metascore under 60 or something". I didn't use that as exact metric.
Most games I buy are ones recommended to me by my friends, those recommended by blogs I follow (e.g., the Penny Arcade guys' news feed... you could consider those reviews, but they don't mention the games they hated, don't give scores, etc., just mention "Hey, that was pretty good. Try it out.") or those that just seem fun and don't cost much (When I noticed Orcs Must Die on Steam for under 5 euros, I didn't start doing extensive research on the critical acclaim of the game.)
It can be used as an executive summary by Sycraft-fu · 2012-07-17 20:55 · Score: 4, Insightful

I find it useful for that. If there's something you have little knowledge or information about it can give you a quick breakdown of what you might expect. For example if a game has a 90 metascore, you know it is something that you should probably look in to further, that is uncommonly high. If something has a 40 metascore, you can pretty much give it a miss, that is uncommonly low.
What it'll then get you for things you do want to look in to further is a list of reviews. So you can see what sites have reviewed it, and then go and read the specifics if you wish. Along those lines it is a quick way to find good and bad reviews. When I'm on the fence about something I like to see what people thought was good and bad. I can then weigh for myself how much those matter to me.
Average ratings really can be of some use to filter. I just don't give enough of a shit about every game to go and read multiple full reviews on it and research it. So if it isn't a game I was already interested in, I want a sort of executive summary to decide if I should give it any more time. Metacritic helps with that.
Two recent examples:
1) Endless Space. I had never heard of this game, an indy 4X space game apparently, though rather well developed. Ok well ambitious indy titles can be all over the map. Metascore is 78. That tells me it is worth looking at, it is on my list and I'll look at it more in depth when I feel like playing such a game.
2) Fray, a turn-based strategy sci-fi game. Again, something I hadn't heard of, however a kind of game I like so maybe I'd be interested. Metascore of 32. So no, not wasting time on that.
Other games I won't bother on the Metascore, just use it to find reviews. Like Orcs Must Die 2. Looking forward to that one, so I'll spend time researching it to see if I want to buy it. I liked the original enough it'll be worth looking at reviews, no matter what the score, so see if I think I'll like the next one.
Depends on what you mean by using the range by Sycraft-fu · 2012-07-17 21:06 · Score: 3, Interesting

In most US schools, the scale is:
A: 100-90
B: 89-80
C: 79-70
D: 69-60
F (or sometimes E): 59-0
So while you can percentage wise score anywhere from 0-100 on an assignment and on the final grade, 59% or below is failing. In terms of the grades an A means (or is supposed to mean) an excellent grasp of the material, a B a good grasp, a C an acceptable grasp, a D a below average grasp but still enough, and an F an unsatisfactory grasp.
So translate that to reviews and you get the same system. Also it can be useful to have a range of bad. Anything under 60% is bad in grade terms but looking at the percentages can tell you how bad. A 55% means you failed, but were close to passing. A 10% means you probably didn't even try.
So games could be looked at the same way. The ratings do seem to get used that way too. When you see sites hand out ratings in the 60s (or 6/10) they usually are giving it a marginal rating, like "We wouldn't really recommend this, but it isn't horrible so maybe if you really like this kind of game." A rating in the 50s is pretty much a no recommendation but for a game that is just bad not truly horrible. When a real piece of shit comes along, it will get things in the 30s or 20s (maybe lower).
A "grade style" rating system does make some sense, also in particular since we are not rating in terms of averages. I don't think anyone gives a shit if the game is "average" or not, they care if it is good. The "average" game could be good or bad, that really isn't relevant. What is relevant is do you want to play a specific game.
Re:Solve it with Machine Learning by bWareiWare.co.uk · 2012-07-17 21:18 · Score: 4, Interesting

Your description is not so much machine learning as basic math. If you just want each scoring system to have equal weight on the results then compensating for the variation is trivial.
Where machine learning would come in is to find underlying pattens in the data.
This could be used to weed out reviewers to lazily copy scores, or are subject to influence.
It would also allow them to test your scores for some games against the population of reviewers to find reviewers with similar tastes.
You could also use clustering algorithms to find niche games which got a few really strong scores but whose average was really pulled down because they don't have wide appeal.
Re:Is that so? by SpooForBrains · 2012-07-17 22:15 · Score: 5, Interesting

I work for a review platform. We have decided that you only really need four ratings, Bad, Poor, Good, Excellent. We don't have a neutral option because really neutral tends to mean bad.
Of course, quite a lot of our users (and our marketing department) seem to prefer stars. Because an arbitrary scale is so much more useful that simply saying what you think of something. Apparently.

--
"The dew has clearly fallen with a particularly sickening thud this morning"
Rottentomatoes by cheesecake23 · 2012-07-17 22:35 · Score: 4, Interesting

By that logic, Rottentomatoes (which averages reviews using only a binary fresh/rotten scale) should be utterly useless. Except it isn't. It's IMHO the most dependable rating site on the net.
It seems the magic lies not in the rating resolution, but in the quality and size of the reviewer pool (100+ for Rottentomatoes). In other words, make the law of averages work for you.
1. Re:Rottentomatoes by Dorkmaster+Flek · 2012-07-18 01:31 · Score: 3, Interesting
  
  Rotten Tomatoes uses a different system though. In fact, I really like their system. They look at a review and decide ultimately whether the critic enjoyed the movie enough to recommend it or not. It's like Siskel & Ebert's thumbs up or down system; fresh or rotten. The only factor is whether the enjoyed the movie or not. There's none of this trying to take a letter grade and turn it into a number from 1-100 bullshit. The Rotten Tomatoes rating is simply a percentage of the number of critics who liked the film enough to recommend it out of the total number of reviews, which I find much more useful. It's still no substitute for the most reliable method, which somebody else above mentioned: find a reviewer whose taste agrees with you on past films/games/whatever and see what they say about new ones. Rotten Tomatoes takes less time though.
  
  --
  I like to think of online DRM as something akin to a college -- you pay for lessons until you learn something.