Videogame Reviews - Playing With Numbers?
Thanks to NTSC-uk for its editorial discussing the possible confusion in using numbered rating schemes for videogame reviews. The author rhapsodizes: "No number can possibly capture the striking vision of the sun setting over Hyrule Field or the ingenious brilliance of Metal Gear Solid's interactive references to reality", before going on to conclude: "Treated as numbers with a defined value, they will always be looked down upon as having deficiencies. Yet when you read them as you would a word and open it up to your own interpretation, they begin to fully deliver the explanatory potential that is locked within." Do you think numbered ratings have an important place at the end of game reviews?
I think that number scores are important for what they are: A distillation of opinion. That's all it is, and all it should be treated as. If you want the justification of the opinion, then that is what the review is for. The way I think about is like this: A random art critic can say 'Van Gogh is the greatest artist, ever.', and that is like a numeric opinion. If I want to know why he thinks that, then I'll read further. If I hate Van Gogh, then I'll be curious about how his opinion could be so different to mine, and if I love him, well, to be honest I probably won't care too much to read a gushing review of the man's work.
I think that the review method in Zzap was the best I have ever seen. They would give it a % rating for several categories and an overall rating, then if they thought it was worth buying by most people, it got a Silver medal. If you didn't like that type of game then maybe you still wouldn't like it, but the very best games got a Gold medal. These ones should be at least looked at by everyone. The best bit though, was the boxes that had the other members of staff commenting on the game. This meant that you got three or four different opinions on each game. That made their review method the best in my opinion.
You're absolutely right. Rotten Tomatoes recognizes the problem that you raise, explicitly, when they determine if a game got enough 'good' reviews to be rated "Fresh".
From the FAQ:
Why is the cutoff for a Fresh Tomato so much higher for individual game reviews?
Although most publishers rate games on a 1-10 scale, it is a rarity for a game to get a score below 6. Because game reviews are mostly positive (a very high majority fall in the 7-10 range), the cutoff for a Fresh Tomato is raised to 8/10. This higher cutoff actually produces a wider spread of Tomatometer scores that is equivalent to movies; otherwise, almost all games are recommended!
The problem is that the bar being set this high has become a defacto standard. Some review site or magazine that starts doing what you suggest (and you're absolutely right, they should) will stand out as a sore thumb and as a company that routinely gives low scores. Which means that companies will stop sending them review copies to play. Which means that they can't compete (especially if they're a magazine) with the other reviewers.
I have a lot of opinions about Cyborgs and Architects
A discerning gamer should never base their game impression (and ultimate decide on whether to buy... or sadly, pirate) solely on these singular values because they abstract away all the qualitative properties of a game. That said, *-star ratings and final numbers /10 or /100 or percentages are all there to give a very quick and summative value on a product.
As someone mentioned earlier, many people want a general impression of what they're about to read. Personally, I like how sites like Gamespot and Gamespy throw the rating right up front, whereas a place like Firingsquad with its insightful yet girthy reviews requires navigating through a drop-down list to check out the "final verdict." I suspect most would rather spend time reading and learning about a "4-star" game than a "1-star" one.
Of course, that leads to the perceived notion that there is some grand quantitative scale when you see something like 79 and 81 / 100. Is the 81 game really better than the 79 reviewed on another site? Ultimately it's up to the reader. It's sometimes good to have bias -- if you're a hardcore genre or platform player, you may be more inclined to accept the given idiosyncrasies (i.e. directed linear levels vs. free-roaming, checkpoint saves vs. save anywhere, etc.).
These are ordinal values at heart, and should not be compared at interval levels.
Now with respect to that article, the author makes a good point about reinforcing the qualitative, descriptive muses of the reviewer. However, it's often preferential to give different abstraction levels of your information to pull in a greater volume of readers. The rating/percentage is a good start. It's doubtful that many readers will engage a lengthy game review (no matter how elegantly written) without having a hint of the final mark. Why read eight pages if it's a really crusty game? Conversely, why do that with a game that's already known from other sources to be great? Just a quick check to verify assumptions, and you're off to go get it. Game reviewers are not supposed to write elaborate and astounding essays for which its effect will fail if abstracted into a single value. They are supposed to aid in (and perhaps entertain) the decision to acquire a game for which the player will ultimately decide whether or not it is of good or sufficient quality.
It's necessary to have and utilize both a summative value and a qualitative review. Relying exclusively on a single value leads to game misconceptions, while a written piece alone cannot realistically convey your information to all but the committed (or bored) readers.
-Victor Chow (Elder_MMHS)
However I think we're running into trouble now, because at many points in the past milestone games have received, with hindsight, what we can now see as slightly overinflated scores which have owed more to their groundbreaking new effects than actual gameplay. Perusing the scores of early PS2 games can easily confirm this, as the typical average score has come down (I particularly hate when games get marked down because they're only marginal improvements over predecessors). I'm sceptical as to whether games are actually going to look that much more impressive in the next generation, and contention in the industry is squeezing the poorer software houses and titles out, which should lead to an overall level of quality increase.
This will (hopefully, IMO) lead to production quality being a virtual given, and allow scores to more accurately represent how good the game is, with 5/10 being average, and worth getting for fans of the genre (like I believe how most Final Fantasy games should score!). For example, if you were a big fan of a certain band, then you'd probably buy their CD even if a selection of music magazines gave them 4/10, as you know you want to hear it anyway; however if Mario 128 came out and received a 4/10 average you'd definitely have to think carefully about getting it.
Numbers don't always suck though.
There are plenty of sites that average review scores. Others have already pointed out GameTab and Rotten Tomatoes. There is also GameStats, Gaming Chart and Game Rankings
Game Rankings in particular is good because they include a "difference" listing for each site to compare how far their reviews are from the average of each game.
For example, you can see that the average PSX Nation review is 8.5% higher than the average.
You take every the game review and convert the score as a percentile (4/5 stars = 80%; 9/10 = 90% etc), then average all the available scores.
If a game can get 150 reviews and can average >95% then you know it's probably a pretty damn good title. Only about a dozen games have achieved 95% average ratings since the website started: Zelda:OOT, Halo, MGS2, GTA III, Metroid Prime are some examples and they represent some of the best titles those platforms have to offer.
Generally
>90% = excellent
>80% = good
>70% = mediocre
anything else is probably pretty poor.
No single reviewer with a bone to pickhas undue influence to the overall ranking
The problem is not the range of the system, but the ingrained perception in just about every (don't know about other countries, but) American kid that 7/10 is average (read: C, or if you're in my school system D). Therefore, if a game is average, they'll give it a 7, when they should give it a 5. And, of course, since they only have 7, 8, 9, and 10 to work with, many game review sites went to a decimal system some time ago to give themselves more room to work with.
The reason they don't give it the 5 it deserves, in addition, is that the publishers complain about such a negative review... after all, they have this same perception that 7 is average and 5 is failing.
The problem gets even worse on a user review site such as GameFAQs, where just about anything which looks like it took more than a minute to write and has halfway decent spelling and grammar gets posted. If you look at the reviews for any halfway popular game, 97% of them will be 9s or 10s, with a couple people who didn't like the game giving it a 7, and one troll giving it something below that. I even see reviews with headlines like "This game has a few flaws" and the score 10! Yes, some of this is selection bias, but sometimes it just gets ridiculous.
The reason the review numbers are less helpful than they could be is because the scale is skewed. I think only a few games should get 10s... maybe a few dozen in all of video game history, and that's being generous. A game that gets a 7 should be worth playing, and a game that gets a 5 is "average".
This is my sig. There are many others like it, but this one is mine.
Rating with numbers or percents is dangerous, because it seems to be a rule, that all games are rated between 80% to 100% and if any game receives any lower rating, it is automatically labelled as a bad game even if the game is billiant and the lower rating is given only by techincal reasons (bugs etc.).
In gaming magazines and websites that are craptacular enough to not have any kind of set ratings policy or enforce any kind of ratings consistency, it's unlikely that the writing will be any better than the numerical designations. Most of the most popular gaming outlets, specifically EGM and I believe also GameSpot, have a ratings policy and enforce some kind of consistency in the ratings, i.e. keeping a reviewer from claiming that one game is better than another and then giving it a lower score, making sure that 9s and 10s are reserved for serious overachievement, making sure that the scores match the tone of the articles in a uniform matter, ensuring that the scores don't needlessly fall into a specific range (like 6-10), etc. If this sort of basic editing isn't performed, then you might as well stop reading that outlet's reviews anyway, because they're clearly either A) a bunch of shills that are afraid to piss any of their advertisers off, or B) just amateurs.
Also, it's tempting to compare numbers between different reviews even if there isn't any common rule set between different gaming magazines for giving these ratings (so the comparing is actually pointless).
I actually agree with you on this point. It's worthwhile to check the score that one reviewer (probably one that you really agree with) gave some older games against the score that he gave to one that you're thinking about buying, but systems like GameRankings are ludicrous. Comparing the reviews from EGM, GameSpot, 1up.com, and other reputable, professional sources against GameSpy, GamePro, or IGN is like averaging out the opinions between a group of college professors and the judging panel for a wet T-shirt contest.
I don't think that's fair at all. While granted, GamePro is not exactly a bastion of gaming insight, they put out a decent product.
GamePro doesn't even take the time to perform basic proofreading. Not only is every issue filled with misspellings and grammatical errors, but in last month's issue, they actually claimed that Metal Gear Solid: The Twin Snakes was, in big bold print, "A remake of the 1988 PS1 classic." This is the sort of effort that goes into their magazine.
While I will concede that their features are usually very interesting, including the letters section and the Watchdog articles, their reviews usually skip over any details or complaints and generally offer you little more than "It rocks" or "It sucks, so I'm giving it a 3.5 out of 5."
And I was only using the terms "professional" and "amateur" in terms of how a publication carries itself. In my opinion, sites like Games Are Fun act much more professionally than GameSpy or IGN ever will.
The beauty of this is that the job of sorting out a huge variety of games (and other things) can be handed right over to a computer, which can access huge databases of this information and collect meaningful results. As of yet, computers cannot read reviews and understand what elements of a game might make it good if it were described in writing.
What good does this do? Well, it saves the prospective game buyer a lot of time and effort; he can easily pull up a game that has a general reputation to be good. Even without a computer to examine the values for him, he can even find this information out at a single glance. Delving further, if a game looks good then he will take more into consideration.
As was said, you really do need to have an in-depth review if you are going to make a final decision, but a number allows a person to get to that stage quickly and painlessly without suffering the tedium of sifting through a big pile of titles whose quality he has no clue about--not even an arbitrary clue. Numbers may be more arbitrary, but they're enough of a clue to use as a jumping-off point.
There are holes in this way of working; people might potentially miss out on a game that they would actually love, even if the reviewer didn't like it that much. The pros outweigh the cons, however, and personal experience has shown me that many of the best games I have played were indeed widely reputed as good (and therefore scored as such more often). This saves me time, and as long as I can find a few good games to keep me happy I don't have to worry about the others that slip by, even if they are super fantastic.
The only thing that keeps the number system from being perfect is that different people have different ideas on what good game is. If we all fealt the same way about all of them then there would be no titles "slipping by". So, in order to make the number system more reliable, one has to be personally matched to a specific reviewer. If you can find a like-minded person who will rate games in a manner that you would, then you'll have infinitely more seccuss than with any random critic.
We might not have the time to personally try every game in existence, but we do have time check out games that are reputed to be good, find reviewers and scorers that tend to think the same way we do, and even try a few random games "just for the heck of it" (just in case we might find one of those elusive masterpieces). This is what people do in the real world, and when you look at the system there doesn't really leave that much to be desired; all of the bases are adequately covered. Granted, they aren't perfectly, but to a degree that is satisfactory for everyone.
Taking all of this into account, the number system serves as one of the basises for a larger system in place that has not failed for me yet, and I'm probably not missing out on much because of it. If I am, it's only to a negligible degree. ;)
The thing is, people interpret numbers differently. It's a numeracy issue. Logically, 50% should indicate "average", 75% "better than most" and 90%+ "act of God". Yet people will look at a score less than 75% and have the exact reaction that you have to a 2/10 score. Games reviewers move to account for this: hence, anything worth playing receives at least 85% as its score. Which is daft: this leaves practically nothing to choose between the games in the 0-50% category.
I respect the system of UK multiformat gaming magazine Edge: games are marked out of ten, and each mark has a rough sentiment associated with it zero: nothing, through to five: average, seven: distinguished up to ten: revolutionary. Only four 10s have been awarded ever (five if you count GoldenEye). I trust their reviews enough that anything with 6 or higher is at least worth reading the review for.
qntm.org