Fixing Steam's User Rating Charts
lars_doucet writes: Steam's new search page lets you sort by "user rating," but the algorithm they're using is broken. For instance, a DLC pack with a single positive review appears above a major game with a 74% score and 15,000+ ratings.
The current "user rating" ranking system seems to divide everything into big semantic buckets ("Overwhelmingly Positive", "Positive", "Mixed", etc.), stack those in order, then sort each bucket's contents by the total number of reviews per game. Given that Steam reviews skew massively positive, (about half are "very positive" or higher), this is virtually indistinguishable from a standard "most popular" chart.
Luckily, there's a known solution to this problem — use statistical sampling to account for disparate numbers of user reviews, which gives "hidden gems" with statistically significant high positive ratings, but less popularity, a fighting chance against games that are already dominating the charts.
The current "user rating" ranking system seems to divide everything into big semantic buckets ("Overwhelmingly Positive", "Positive", "Mixed", etc.), stack those in order, then sort each bucket's contents by the total number of reviews per game. Given that Steam reviews skew massively positive, (about half are "very positive" or higher), this is virtually indistinguishable from a standard "most popular" chart.
Luckily, there's a known solution to this problem — use statistical sampling to account for disparate numbers of user reviews, which gives "hidden gems" with statistically significant high positive ratings, but less popularity, a fighting chance against games that are already dominating the charts.
As a developer with a title on Steam, I can say ratings are just 1 of many issues with the store.
I can't tell you how many support tickets I have to go through because steam can't properly launch or update the main binaries that are covered under their drm. Most of the time it's AV issues, but once in a blue moon it's steam borkin' out. (0.25-0.5% of sales)
In the backend specific game information can't be shown to specific users. So I'm locked out of my real time statistics because my publisher has other titles from other developers...
But the worst, the god awful worst, are games on Early Access that are pure shit or have been abandoned... They drag the entire system down and majorly screw over legitimate titles that are in Early Access. IMO Steam should have a purging system for these titles. Perhaps even give coupon codes to users who bought games that have been purged.
As for the rating system. It definitely needs to be weighted. But there should also be incentive to give a ratings (even if they give a review anything) and there should be at least a "maybe" option. The thumbs up/thumbs down system doesn't really do it for me. Specially since I have negative ratings on my project such as "Game needs a German translation. 5/10" :\
Actually the problem they are wrestling with here is one that has science has had to deal with for a long time: the uncertainty on a measurement. The star ratings are a measure of the popularity of a game so what you are really asking is "given the ratings it received which game is best?".
Unfortunately with a finite statistical sample you always have some degree of uncertainty and, within this uncertainty your data does not provide any ranking at all: you simply do not know which game is best to any sensible degree of certainty. However while correct this would lead to really confusing rankings since to be fair you would need to randomize the order within the uncertainty of each game's score. This would be complex and confusing to users!
Instead what they suggest is using a confidence level limit: what score can I be confident that 95% of people would rate the game higher than? We do this all the time in particle physics when we put limits on some new physics which we looked for an did not see. For example the precursor to the LHC, LEP had a result that it was 95% confident that the Higgs boson had a mass higher than 116 GeV/c2 (IIRC). There are better ways to do this than the method they quote but since this is just a game rating and not science it's a fine method to use.
I've noticed one big difference between the "professional" reviews on major sites and user reviews on Steam/Amazon etc.
By and large, the professional game reviews tend to cluster their scores in the 6/10 - 8/10 range. You have to be exceptionally good to get above that level or exceptionally bad to fall below it. You also - in most cases - get relatively little variation between professional review scores. A game might be 8/10 on one site and 9/10 on another, but it is rare to see a gap larger than 2 or at most 3 points. It does happen - Alien Isolation has had professional reviews ranging from 4/10 to 10/10 - but generally only with unusual games that go outside the usual templates (like Alien Isolation).
User reviews on the other hand, tend to be much more polarised. It's by no means unusual for games to pick up 10/10s from some users and 1/10s for another. Personal biases are much more likely to feature in user reviews ("I'm giving this game a 1/10 because I don't like something the developer said on twitter" or "I'm giving this game a 10/10 because I've spent the last 2 years boring everybody rigid about how good it is going to be and don't want to backtrack"). Often, the scores tend to average out in more or less the same place as the professional reviews once you have enough of both, but with much more divergence on the user reviews.
So which is more useful?
By and large - and with some important caveats - I find the professional reviews more honest and useful. A lot of people complain about the clustering of scores in the 6/10 to 8/10 range, but the nature of the modern games industry (quite risk-averse, with a lot of project oversight) means that most commercially produced games tend to fall into that range. If you assume a 6/10 is "not great, but overall more good than bad" and an 8/10 is "high enjoyable but not ground-breaking", then you're left with a spectrum into which most major releases fit. The industry does throw out the occasional piece of brilliance - which is usually recognised. And sometimes, things go wrong and it throws out the odd turkey (Aliens: Colonial Marines being perhaps the most recent example). When those things happen, most of the big review sites do seem to reflect them.
But those caveats I mentioned before are important. The first is that at the end of the day, the people doing the professional reviews are still human and they still have their own biases, preconceptions and agendas. True, they have people watching them to make sure that they don't give free reign to those... but occasionally, those checks and balances fail. In fact, most of the big review sites have a few known quirks that you learn to watch for. Eurogamer, for instance (which despite the criticism I'm about to hand out, I do, in general, rate highly), has a real Nintendo-nostalgia fetish and a habit of over-scoring first party Nintendo games. At the same time, until fairly recently, it went through a phase of trying to shoehorn political correctness into its reviews and marking down a few games which committed real or perceived transgressions (though I've noticed less of this recently).
The next big caveat with professional reviews is around bugs. The big review sites are often given pre-release copies of games, so that the reviews can go live before release. Indeed, a lack of pre-release reviews is often an early sign that a game will be a turkey (again... Aliens: Colonial Marines had a review embargo until its release day). Thing is, sometimes those review copies are unfinished code. And sometimes they aren't. But regardless, there is a tendancy for professional reviewers to either ignore or to be instructed to ignore bugs, on the basis that "they'll be fixed for release". And, surprise surprise, they often aren't fixed for release. User reviews are often your first warning that a game is a buggy mess - though on PC you do have to try to separate out the inevitable complaints that pop up on every new release's forms to the effect that "it won't run on my 8 year old PC running a
Exactly. I have enjoyed some games that many considered truly terrible, "You Are Empty" springs to mind because while it wasn't anything new or revolutionary the Russian writers created a story that was as batshit as David Lynch movies and it had TWENTY FOOT TALL MUTANT ATTACK CHICKENS...seriously how could I not love a game that considered those a serious enemy type?
But when I score a game like that I make it clear it is not getting a good review because its some amazing game, instead its strictly for the cheesy goodness. Take a game like "Two Worlds", sure the gameplay is just a generic "hack and slash" just like every other Diablo clone...but the dialog, that dialog was fricking great! It was like a fifth grader had all these "old English" words like "pray" and "forsooth" and had NO fucking idea what they meant but said "Yeah that sounds all cool and hamlet and shit, lets use 'em!" and when combined with actors that sound like they come from a dinner theater in SD doing Macbeth? Priceless!
On the flipside you have GTA 4 which reviewers tripped over themselves to praise...Dafuq? That game is irritating as shit! Too damned many of the missions can only be gotten by kissing the correct ass and that involves hauling their stupid asses around and doing stupid mini-games to make their worthless ass happy...Why in the fuck am I taking a dope dealer out for dinner like I'm trying to bang him? Just give me the damned mission and STFU! Hell it was so bad there are fricking parody songs making fun of it, yet what did the majority of critics give it? 9s and 10s!
So give me real gamers ANY day of the week, I'd much rather hear "Its decent, just don't buy it for more than $5 because X/Y/Z makes it not worth more" than listen to the critics which seem more and more to just kiss the industry booty.
ACs don't waste your time replying, your posts are never seen by me.