The Problem With Metacritic
Metacritic has risen to a position of prominence in the gaming community — but is it given more credit than it's due? This article delves into some of the problems with using Metacritic as a measure of quality or success. Quoting:
"The scores used to calculate the Metascore have issues before they are even averaged. Metacritic operates on a 0-100 scale. While it's simple to convert some scores into this scale (if it's necessary at all), others are not so easy. 1UP, for example, uses letter grades. The manner in which these scores should be converted into Metacritic scores is a matter of some debate; Metacritic says a B- is equal to a 67 because the grades A+ through F- have to be mapped to the full range of its scale, when in reality most people would view a B- as being more positive than a 67. This also doesn't account for the different interpretation of scores that outlets have -- some treat 7 as an average score, which I see as a problem in an of itself, while others see 5 as average. Trying to compensate for these variations is a nigh-impossible task and, lest we forget, Metacritic will assign scores to reviews that do not provide them. ... The act of simplifying reviews into a single Metascore also feeds into a misconception some hold about reviews. If you browse into the comments of a review anywhere on the web (particularly those of especially big games), you're likely to come across those criticizing the reviewer for his or her take on a game. People seem to mistaken reviews as something which should be 'objective.' 'Stop giving your opinion and tell us about the game' is a notion you'll see expressed from time to time, as if it is the job of a reviewer to go down a list of items that need to be addressed — objectively! — and nothing else."
Personally I think anything less than 7 out of 10 isn't worth my while bothering with. That's me and about time I have. Friends of mine, however, would give a film a 5 out of 10 and say it's still decent enough to stick on one night when you want something to watch. Even if Metacritic was exactly showing a score that we agreed was 'accurate' it wouldn't really matter. Aggregation of this sort is as good as doing it by eye yourself, surely?
This would be perfect for machine learning. Just analyze _all_ the scores from a certain source and calculate a most probable score with a standard deviation. Then assign a score from 0-100 accordingly. I don't know that much about machine learning so I'm sure an expert could find a way better algorithm for that.
"Trying to compensate for these variations is a nigh-impossible task" - definitely not.
sounds in principle like a fairly simple solution. Put together a separate histogram of the scores by each reviewer. From this you can estimate what an average score really is and how many standard deviations an individual score is above or below. The meta-score then becomes the average number of sigma's the game is above or below the various averages. If necessary this score can be sanitised to something easier to read for those less familiar with Gaussian statistics
Except most people use more than just the numeric score to make a gaming decision. There's also both the professional reviewers and the regular gaming public and their reviews.
This might all sound unimportant in terms of how it affects the industry, but as noted above, it's not just gamers who look at Metacritic. Publishers do, too, and in some cases they rely on these scores too heavily, as evidenced by well-publicized stories about bonuses being tied to Metacritic. Most famously, Obsidian's Chris Avellone revealed on Twitter earlier this year that the developer missed out on receiving a bonus for its work on Fallout: New Vegas, which it was not entitled to royalties on, because it failed to reach the required Metascore. 85+ was what was required to receive the bonus; the game ended up at 84.
Metacritic needs to die for the simple fact that the giant steaming pile of monkey shit that is The Elder Scrolls: Oblivion should get a 94. Stories like the above quote just add more reasons for it to disappear.
Metacritic a scoring system that works for a lot of people and seems to work for them isn't perfect? News at 11. Really? So someone writes an opinion piece on it backed with opinion and this is interesting? The methods Metacritic use seem to be fair and work, so who cares that someone doesn't think they are perfect?
I'll use the numbers as a guideline, but not as fact. Just like wikipedia, it's a place to start. Now if the article was aimed at pointing out that the publishers put too much emphasis on the metacritic score, then there should have been more documentation from that side of things. As it is, it's just someone standing back and saying this system sucks and suggesting that it needs to be scraped without offering any constructive advice on how to fix it. Just like people that complain about the government, but does that mean we should have no government and live in anarchy? Or perhaps wouldn't it serve customers better to try and figure out how to fix metacritics problems rather than just complaining that metacritic has too much power because people find it useful?
/* TODO: Spawn child process, interest child in technology, have child write a new sig */
For me anyway, when it comes to finding good reviews of things, I've always found a mass average entirely useless. Just because 10,000 out of 15,000 like something, it has no bearing over if I will like it and quite often leads to the oposite. Instead what I do is I check reviews of the games movies shows etc... that I already have seen, I find the reviewers that are the closest match to how I felt about things in the past. Then I check their reviews of what I haven't seen. It isn't a perfect system, but it works overall, and tends to be more accurate to my tastes than other methods that I have tried. In addition of course actually reading detailed reviews with explanations of why they felt that way. If you are one who is looking for a game for a deep story and the review is 9/10 saying "Great explosions, incredible action at every turn, the graphics were spectacular, the story was a little weak but that is made up for by the incredible pace of the combat", odds are it isn't a good game for someone looking for a deep plot.
This line of thought seems faulty, but I have to admit that I feel the same way for most scores reviewing sites and magazines deal out. A score of less than 7 out of 10 is reserved for seriously failed works, and typically these works never merit a recommendation. However, if anything below 7 is not worth experiencing, you essentially only have five possible scores: bad, 7, 8, 9 and 10. The rest of the scale is simply wasted.
I always wondered if this is caused by school grades in youth influencing what people think as acceptable ratings. For example in Finland, grade school grades go from 4 (failed) to 10 (best), where 7 ends up being an average score and anything below it is considered poor.
Of course it can be wrong, but its a good indicator of what others think of the game.
Nothing on earth can tell you if YOU will like the game unless you play it.
I personally sometimes enjoy playing terrible games, (or games with terrible reviews) and find them quite charming.
- http://www.milkme.co.uk
They shouldn't worry, Slashdot beats Metacritic hands down for subjectively erroneous (mod) scoring.
Mod me up for the hell of it.
tnx.
Who cares, it's sunny outside!
This article is a classic example of why game and movie rating are so terrible nowadays. since when is a 67 a terrible score, in proper scoring system it should be at least a pass. It seems sites tend to rate even trash with a range of 60-100. A B- or a 67 is NOT a terrible score, if scoring is done correctly this should be above average but not great. What is the point of having a 0-100 scale if you are not using the range.
I'd never, ever let a metacritic score determine whether or not I buy a game. That's not to say I don't let reviews (and review scores) influence a purchase, but I find metacritic useless.
When I check reviews of a game to work out whether I want to buy it, I'll look at the score, but it's only one small factor. What I'm actually looking for are certain factors that might be picked up in a review that will be highly likely to influence whether or not I like a game.
For example, I hate - and I do mean absolutely hate - being forced to replay long sections of a game after a death. If an overall positive review criticises a game for poor checkpointing, then I know that the game is highly likely to annoy me and I'm correspondingly less likely to buy it.
Then at other times, there are factors that drive a review score down, but don't bother me at all. A good recent example here is Lolipop Chainsaw. This one had a real spread of review scores, from 9/10 right down to about 4/10. It's clear that some of the reviewers didn't buy into the theme of the game. Others were disappointed that it wasn't (contrary to appearances) a button mashing hack-and-slash. I'd already worked out that the combat was basically a more Arkham Asylum-style methodical brawler and was quite happy with that. And the plot and setting struck me as hilarious. So I bought the game and loved it.
There's also the fact that not all sites mark to the same scale. IGN tends to be fairly "soft" in its marking - review scores tend to cluster around the 7.5+ range. But that's fine, because the review text still tends to pick up the major issues. Eurogamer tend to mark hard on most games and I generally trust their reviews - but their reviewers do feel like fanboys for a couple of companies (Blizzard and Nintendo in particular), so I know to discount them in those cases.
Anyway, tl;dr version - the factors that affect whether an individual will like a game will vary considerably depending on the individual. Trying to capture that in a single meta-score is never going to be workable.
One reviewer might only rate highly hyped games which he expects to be good (nearly all fall to 60-100 range) and other reviewer tries out pretty much everything he encounters to find out those lone gems among less well-known indie games, etc. (let's say ranging from 20 to 95). We can't just take a bell curve of each and say "Game A is slightly above average on first reviewer's scale and Game B is slightly above average on the second reviewer's scale... so they're probably about equally good!". Sure, with large number of reviewers, you can still see which games do well and which won't but you have lost at least as much precision as you would have if you hadn't taken the bell curve in the first place.
That said, I don't know if reviews are that relevant anymore. I am active gamer but don't remember when was the last time I read a full review... There have been two times recently when I bought newer games from series I had played years ago (Cossacks and Anno 1602). I just wanted to take a quick peek on whether the games were considered about equally good, better or worse than the ones I had liked and whether they were very similar with just better graphics etc. or if some major concept had changed. That consisted mostly of looking the games up on Wikipedia and quickly glancing the first reviews I found using Google. I think I also checked the metascore, but it was more among the lines of "I'll buy it unless it turns out to have metascore under 60 or something". I didn't use that as exact metric.
Most games I buy are ones recommended to me by my friends, those recommended by blogs I follow (e.g., the Penny Arcade guys' news feed... you could consider those reviews, but they don't mention the games they hated, don't give scores, etc., just mention "Hey, that was pretty good. Try it out.") or those that just seem fun and don't cost much (When I noticed Orcs Must Die on Steam for under 5 euros, I didn't start doing extensive research on the critical acclaim of the game.)
Given the range A+ to F-, how do they get that B- equals 67? If each letter has 3 options (A+, A and A-), shouldn't B- be 72 (100/18*13)?
I find it useful for that. If there's something you have little knowledge or information about it can give you a quick breakdown of what you might expect. For example if a game has a 90 metascore, you know it is something that you should probably look in to further, that is uncommonly high. If something has a 40 metascore, you can pretty much give it a miss, that is uncommonly low.
What it'll then get you for things you do want to look in to further is a list of reviews. So you can see what sites have reviewed it, and then go and read the specifics if you wish. Along those lines it is a quick way to find good and bad reviews. When I'm on the fence about something I like to see what people thought was good and bad. I can then weigh for myself how much those matter to me.
Average ratings really can be of some use to filter. I just don't give enough of a shit about every game to go and read multiple full reviews on it and research it. So if it isn't a game I was already interested in, I want a sort of executive summary to decide if I should give it any more time. Metacritic helps with that.
Two recent examples:
1) Endless Space. I had never heard of this game, an indy 4X space game apparently, though rather well developed. Ok well ambitious indy titles can be all over the map. Metascore is 78. That tells me it is worth looking at, it is on my list and I'll look at it more in depth when I feel like playing such a game.
2) Fray, a turn-based strategy sci-fi game. Again, something I hadn't heard of, however a kind of game I like so maybe I'd be interested. Metascore of 32. So no, not wasting time on that.
Other games I won't bother on the Metascore, just use it to find reviews. Like Orcs Must Die 2. Looking forward to that one, so I'll spend time researching it to see if I want to buy it. I liked the original enough it'll be worth looking at reviews, no matter what the score, so see if I think I'll like the next one.
Even though meta critic has become the standard for measuring the quality of a game, they sadly do not check the quality or sincerity of the reviewers they pick. I myself work at a smaller indie game studio. Our last project got reviews between ranging from between 2 to 10. How that even is possible is due to several factors, though the main one being that some reviewers didn't really review the game at all. They just scraped at the surface of it, and Metacritic then used that score. Our game wasn't perfect, neither was it crap. It is fun, addictive, beautiful, with a few bugs. But was it a 2 or a 10? Never.
I know that the larger companies in the business keep track of every journalist and blog that has been lucky enough to have been taken up at Metacritic. If the reviewer is known for giving constantly low or bad reviews they will never receive a copy for reviewing. That doesn't hinder people from buying the game at release and then reviewing it anyway, though it might stop those important first reviews from being bad I guess. Guess we have to do the same at our little studio.
What is really needed is a meta-meta-critic. A site where journalists and reviewers themselves are rated based on their seriousness. Something like the system for rating comments here at Slashdot.
In most US schools, the scale is:
A: 100-90
B: 89-80
C: 79-70
D: 69-60
F (or sometimes E): 59-0
So while you can percentage wise score anywhere from 0-100 on an assignment and on the final grade, 59% or below is failing. In terms of the grades an A means (or is supposed to mean) an excellent grasp of the material, a B a good grasp, a C an acceptable grasp, a D a below average grasp but still enough, and an F an unsatisfactory grasp.
So translate that to reviews and you get the same system. Also it can be useful to have a range of bad. Anything under 60% is bad in grade terms but looking at the percentages can tell you how bad. A 55% means you failed, but were close to passing. A 10% means you probably didn't even try.
So games could be looked at the same way. The ratings do seem to get used that way too. When you see sites hand out ratings in the 60s (or 6/10) they usually are giving it a marginal rating, like "We wouldn't really recommend this, but it isn't horrible so maybe if you really like this kind of game." A rating in the 50s is pretty much a no recommendation but for a game that is just bad not truly horrible. When a real piece of shit comes along, it will get things in the 30s or 20s (maybe lower).
A "grade style" rating system does make some sense, also in particular since we are not rating in terms of averages. I don't think anyone gives a shit if the game is "average" or not, they care if it is good. The "average" game could be good or bad, that really isn't relevant. What is relevant is do you want to play a specific game.
They should give you an option to give more weight the later a review came out. I just find myself generally distrustful of 0-day reviews because they usually mean:
a)The reviewer didnt actually spend enough time with the game to give it a meaningful review and/or
b) the reviewer had access to the game early, which of course raises questions about objectivity.....
The best reviews IMO are those that come out at least a week after the games release....
Monstar L
And that is how maths for students works :-)
bash$
Diablo 3 on Metacritic is the 2nd highest rated current game.
Don't take averages for truth, they're just averages. Use Metacritic as a source of reviews, find the reviewers (people) who you have the most affinity with over time, and then focus on what their own scores are.
Metacritic is good for avoiding games that are complete crap. Other than that, you really have to read some of the reviews to decide which game you will like more.
I just played The Last Story (metascore 82) after buying it instead of Xenoblade Chronicles (92). After reading some reviews I was sure that I would prefer The Last Story even though Xenoblade has a greater score and the games are in the same genre. Xenoblade is usually praised e.g. for having lots of stuff to do, but I really wanted a game that I can end someday.
So a number does not replace really making an informed decision, duh.
There are links to all of the collated reviews. They're there so that you, the consumer, can perform your due diligence more easily. Personally, I like to scan the middling and bad reviews to get an idea of the kind of warts I should expect if I buy it. One reviewer's selling point may be a no-sale for me-- and I wouldn't know about it if I hadn't taken the opportunity to read it.
The metascore itself is handy at a glance, but using it as a final arbiter of worth and taste is a terrible idea. Just ask Obsidian.
Fortunately, all smart, discerning, handsome, virile, gamers can still get their properly graded reviews from 1UP. Phew!
If you were blocking sigs, you wouldn't have to read this.
By that logic, Rottentomatoes (which averages reviews using only a binary fresh/rotten scale) should be utterly useless. Except it isn't. It's IMHO the most dependable rating site on the net.
It seems the magic lies not in the rating resolution, but in the quality and size of the reviewer pool (100+ for Rottentomatoes). In other words, make the law of averages work for you.
Yeah, that's why Rockpapershotgun have failed...oh wait.
Hmmmm...I hadn't realized this issue with Metacritic before. I now rate Metacritic's quality as 4.7 kumquats, down from 2-8 exahogsheads per quadriliter :(
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
As a ten+ year game reviewer (shameless plug: game-over.net), I see the problem from the other side. Even on a single review board, there are variations in how "hard" individual reviewers score. Over the years we have tried to implement a scoring system, giving XX/20 for graphics, XX/20 for story, that kind of thing, but found disagreements among reviewers as to the weightings. Is good graphics equally as important as good plot? What about good music and sound effects? What one reviewer sees as retro, another may call dated. Artsy, bland, exciting, dull - all terribly non-quantifiable adjectives. Aggregation is fine, but over the long haul you're better off finding a single reviewer or just a few, and gauging their opinions for yourself.
The problem with metacritic is that they don't take into account the Law of Truly Large Numbers.
(I don't know what that means, but I've decided I'm going to say this in all discussions involving statistics).
You are welcome on my lawn.
If your going to use some crazy scheme to rate games, then perhaps those websites should provide some kind of a translation value so that Metacritic can correctly identify the intention of the crazy review schemes.
But ultimately if you are getting reviews from 100's of websites, the aggregate value should be fairly accurate. I think a game that is rated 50% is bad compared to a game rated 90%, but I don't think people really care about the perceived quality between two games if their ratings are like 85% and 88%.
Of course, "professional" reviewers are among the most useless professions on the planet due to the sheer amount of online public opinion that come with every game or movie release.
I haven't thought of anything clever to put here, but then again most of you haven't either.
Larger/louder/more voices drown out smaller/quieter/fewer voices -- regardless of the authority or quality of comment. (Unlike on /., which has moderation and meta-moderation based on content.)
True story: My sister and brother-in-law left their kids with their grandmother and escaped to see a movie and relative peace for a few hours. My sister came back, really angry with her husband. "But sweetie," he said, "_2012_ got a good score on Metacritic!"
Really. Happened.
Oh wait...... they are (directors/actors/game reviewers were the beta, I think they stopped when they realized they were directly effecting peoples ability to do their job)
I can't wait until I walk into an interview and get told my metacritic score isn't high enough to be chosen for a job.
Watch this http://www.escapistmagazine.com/videos/view/jimquisition/3607-Metacritic-Isnt-the-Problem
I think the value in metacritic isn't the "score" but the variation across all reviews. You could have two titles with identical "80" scores, which would otherwise indicate both titles are equally well liked.
That being said, one title could have all of its reviews be between 70 and 90, while the other could have a lot of low scores and a lot of high scores. The high variation in scores tells you that there's something about that title that's amiss.
It would be interesting to see statistics compiled for reviewers, too. Do some reviewers always deviate above the average? Below? I would think a reviewer with a higher variability of ratings would be more trustworthy than one who was consistent with their reviews.
The problem isn't how you average the scores, it's that they average the scores at all. If you actually read a bunch of different reviews from different reviewers, you'll learn that they all have their own proclivities and eccentricities, different things that they like or dislike, some are more likely to try to score the game against its target audience regardless of their personal preferences and some will not, things like that. All of those things are super subjective and can't really be accounted for in a mathematical model.
from sales figures to quality. I think it's a good thing even if the method isn't at all scientific.
http://www.ign.com/articles/2012/07/16/is-metacritic-ruining-the-games-industry
tl;dr - Project managers only get their bonus if they get a metascore over X.
It's not one about 100 to 0 or A to F. It's one where people are hardly objective or, rather, how people rate games.
Take a look around. Games will get a rating of 90 to 100 if they're really good, 80 to 90 if they're halfway decent and 70 to 80 if they're kinda lukewarm. Then there's a big nothing until the 0-10 bracket for the stinkers. WTF?
This doesn't make any sense at all. But that's how things run today. Every game maker presses to get a 90+ review. Even if the game doesn't really deserve it. What is a 90+, actually? Or rather, what SHOULD it represent? Now, if game reviewing was anything like grades gotten in a class, a 90+ would mean something akin to "showing performance that surpasses the ordinary or shows understanding beyond what has been taught". By that analogy, a 90+ would only be available to games that set a new standard in some way, be it gameplay, graphics or a whole new genre.
Now, how many 90+ games of 2012 can you name that actually come even close to doing that?
Instead, every halfway good game gets a 90+. So what do we do with the really stellar one-a-year hits that redefine something? They get 95+ ratings, of course. So the rating system is now kinda like 95-98 (because 99 or 100 are just simply out, because, well, once a game got that there's no way to get over it) for the games that SHOULD be 90-100, 90-95 for games that SHOULD get a 70-90 and 80-90 for games that should have been "well received but nothing to write home about", which would probably be better suited in a 50-70 rating.
But 50-70 is already viewed as a failure. Despite representing what should be deemed above average (that's what it actually is), any game that got a 54 is a dud. Not because it got a 54, but because we learned that games that get a rating below 80 are actually games that we should stay away from.
That's what's wrong here. Not the grading system itself, but the fact that the system is out of proportion.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Oversimplifying things leaves them... what's that word? Right. "Oversimplified".
Who could have guessed that?
A 80 game is still, when ranked by a buch of people, better than a 70 game. On average.
You can do all kinds of statistical analysis on each reviewer for each platform and each genre of game and make adjustments to them, but at the end of the day you still put the pig in the sausage grinder to get a single number. And that number will not be significantly different than what's there now and the ranking of games will not change.
Compare Metacritic's movie review numbers to Rotten Tomatoes'. They both use different metrics to normalize their data, and yet they both agree on which movies stink and which are great.
Reviews are subjective things to begin with. any aggregate is just intended to be a lose heuristic, not some auditable fact.
I've found Metacritic's scores to be pretty good when it's pulling from a large number of reviews. It's further enhanced by the inclusion of a separate user score. Score inflation is a problem with nearly all reviews so it isn't like Metacritic is really suffering from any inconsistency. I think at this point your average person is well aware of that and assesses scores accordingly.
My problem isn't so much that most scores float above 75 except when they're exceptionally bad. My problem is with blatant inflation and herd mentality, a problem that is especially prevalent in gaming; movie critics seem to generally be more demanding. Game critics, however, while generally less forgiving with small time and indie developers, are far too generous with anything from the big publishers. Grand Theft Auto 4 is probably one of the most egregious examples of this. It's got a 98 average on Metacritic. While it's a good game, it's far from deserving a 98. But this is where those user scores provide better balance. Amongst the 1300+ users the game only averages a sensible 7.6.
In general I can't say I've disagreed too much with Metacritic's scores. But then I also tend to pick out reviews across the scoring range to get a more detailed assessment.
Where's the transcript of this Jimquisition video?
X-Blades got a metascore of 50. Crappy game, right?
Well, if you are in the mood to just unwind with a mindless button-masher that features a hot girl running around in her underwear slaughtering armies of monsters, the game is awesome. It is not $50 worth of awesome, but you can get it for cheap now.
Sometimes, that low rating indicates the game is exactly what you are looking for.
That sounds like a good idea. Give me a moment and I'll patent it.
Seriously though, that idea is mostly good but I think that the passing of time might be a problem. Deus Ex was awesome game when it came out, but if it were to come out today it wouldn't be all that good because the video games have taken huge leap forward in the last 12 years. So, should I vote it worse than nearly all new games and completely ignore the context, how much it influenced the genre, etc.? In general, it's very hard to compare older games to newer one because even if you make a decision on how to deal with the context, your memory might not be accurate (I remember having liked many games years ago but I can't remember how good they actually were compared to new ones).
You could of course only allow people to rate games they've played recently... or you could analyze that user A tends to think that older games aren't much worse than newer ones but B thinks that they are, see how much of B's ratings can be explained through the age of the game and correct that when showing his ratings to A... but it'd still be complex.
People were complaining about this stuff years ago. Nothing beats the time they told a reviewer he didn't understand his own score system.
It should be the common understanding among the public that results from MetaCritic (or any other sites of such nature) must bring to mind Slashdot's poll caveat, i.e.,:
"This whole thing is wildly inaccurate. Rounding errors, ballot stuffers, dynamic IPs, firewalls. If you're using these numbers to do anything important, you're insane"