Fixing Steam's User Rating Charts
lars_doucet writes: Steam's new search page lets you sort by "user rating," but the algorithm they're using is broken. For instance, a DLC pack with a single positive review appears above a major game with a 74% score and 15,000+ ratings.
The current "user rating" ranking system seems to divide everything into big semantic buckets ("Overwhelmingly Positive", "Positive", "Mixed", etc.), stack those in order, then sort each bucket's contents by the total number of reviews per game. Given that Steam reviews skew massively positive, (about half are "very positive" or higher), this is virtually indistinguishable from a standard "most popular" chart.
Luckily, there's a known solution to this problem — use statistical sampling to account for disparate numbers of user reviews, which gives "hidden gems" with statistically significant high positive ratings, but less popularity, a fighting chance against games that are already dominating the charts.
The current "user rating" ranking system seems to divide everything into big semantic buckets ("Overwhelmingly Positive", "Positive", "Mixed", etc.), stack those in order, then sort each bucket's contents by the total number of reviews per game. Given that Steam reviews skew massively positive, (about half are "very positive" or higher), this is virtually indistinguishable from a standard "most popular" chart.
Luckily, there's a known solution to this problem — use statistical sampling to account for disparate numbers of user reviews, which gives "hidden gems" with statistically significant high positive ratings, but less popularity, a fighting chance against games that are already dominating the charts.
Like all things Valve does on Valve Time, Steam is _slowly_ getting better so I'd imagine this will get fixed ... eventually.
At least we can give a thumbs up or down to games. The ability to write reviews takes advantage of the best kind of marketing:
Word of Mouth.
Yep there's one about it!
It made me not get very enthusiastic about app stores and such.
It's not an algorithm, except in the trivial sense. It's a formula for calculating an adjusted rating value that discounts extreme ratings for items with small numbers of reviewers.
This actually matches what you do intuitively when you see an item with a single rating of 5.0 at the top of a list, just above another item with an average rating of 4.9 from a thousand users. You mentally deduct a bit from the "top rated" item because you know it's probably too high. Likewise a 1.0 rating from a single user is probably too low, so you mentally add a bit to that.
The question is, how much to deduct or add from the score?
The approach suggested is to ask a slightly different question. Instead of "what is the average rating of the product", you ask "what percentage of positive ratings can I be 95% certain the product would score above have if *everyone* rated it?" It turns out there's a number of mathematical formulas that are supposed to tell you precisely that.
There's still a lot of arbitrariness in this approach. Why 95%? I'm reasonably sure that results would be just as intuitively reasonable if we chose 80% instead. But if 95% seems to generate intuitively reasonable results there's no particular reason to monkey with that parameter.
BUT, I think, the level of arbitrariness involved probably means we could choose a simpler approximation than the Wilson interval if we could dream one up. The more familiar Wald interval taught in basic statistics courses is somewhat simpler, but not so much that it's worth worrying about, at least not if you're doing the calculation on a database server which typically has a few CPU cycles to spare.
If I were to attempt something like this on a massive scale in an environment where CPU cycles were precious, I'd probably devise some kind of simple algebraic scaling formula that tweaked scores toward the mean, depending on the number of ratings. The results wouldn't be quite as good as the Wald or Wilson intervals, but maybe not so much less good that anyone would notice.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.