Fixing Steam's User Rating Charts
lars_doucet writes: Steam's new search page lets you sort by "user rating," but the algorithm they're using is broken. For instance, a DLC pack with a single positive review appears above a major game with a 74% score and 15,000+ ratings.
The current "user rating" ranking system seems to divide everything into big semantic buckets ("Overwhelmingly Positive", "Positive", "Mixed", etc.), stack those in order, then sort each bucket's contents by the total number of reviews per game. Given that Steam reviews skew massively positive, (about half are "very positive" or higher), this is virtually indistinguishable from a standard "most popular" chart.
Luckily, there's a known solution to this problem — use statistical sampling to account for disparate numbers of user reviews, which gives "hidden gems" with statistically significant high positive ratings, but less popularity, a fighting chance against games that are already dominating the charts.
The current "user rating" ranking system seems to divide everything into big semantic buckets ("Overwhelmingly Positive", "Positive", "Mixed", etc.), stack those in order, then sort each bucket's contents by the total number of reviews per game. Given that Steam reviews skew massively positive, (about half are "very positive" or higher), this is virtually indistinguishable from a standard "most popular" chart.
Luckily, there's a known solution to this problem — use statistical sampling to account for disparate numbers of user reviews, which gives "hidden gems" with statistically significant high positive ratings, but less popularity, a fighting chance against games that are already dominating the charts.
Just a bad sort statement after the SQL
http://www.evanmiller.org/how-... This was linked before on slasdot IIRC. This is basically a blog post saying "hey this can be used on steam too, not just amazon!"
Like all things Valve does on Valve Time, Steam is _slowly_ getting better so I'd imagine this will get fixed ... eventually.
At least we can give a thumbs up or down to games. The ability to write reviews takes advantage of the best kind of marketing:
Word of Mouth.
Rise of the Triad (2013).
Probably one of the worst games of last year, yet thumbed up by millenials that are blind of pretention giving it "mostly positive".
Why are they suggesting skipping straight to this hot mess instead of using a simple and well tested algorithm?
http://www.files.fortressofdoors.com/images/steamratings.jpg
What mechanisms may account for a distribution like that? You need to figure that out and model it in order to tease out the relevant info for each user.
A friend of mine gets a really high steam level by going into every game and letting it sit idle on the cover screen of the game overnight.
It's really too bad the way Valve has screwed the pooch with Steam over the last few years. They literally had The gaming platform for PC all locked up. There was a time where I was desperately hoping they'd have an IPO so I could invest. But they tried to make the store so user friendly to Game controllers... a use case that may very well never become popular... that it's almost useless. Now, the only reason I think they are still relevant is because no one has bothered to try and challenge them. But I think they are 1 clever startup away from losing their position for good.
There are games on steam to this day, that I cannot find... even using Google searches with the site:steampowered.com modifier. I have to go to the damned games external website and use their link to get to the thing I want to buy. I want to buy it and Steams own search doesn't bring it up because their search algorithm is so broken. I try to browse games and it limits what I can browse to a few dozen. Yet, when I go back 2 days later, its the same few dozen... why doesn't it just show me game after game until I've seen them all? There are over 4000 games on Steam!!
And you know... I know what people are going to reply to me with... "You didn't click X!" or "You moron, you have to go to the blah page!" or whatever... I'm sure it's entirely my fault for not knowing how to do it right. But let me tell you something... the biggest moron on the planet can walk into Walmart and leave with less money. That's the key to their success. You cannot enter a Walmart and avoid seeing something you need to buy today. You don't have enough money? No worries there's a god damned bank on premises to give you a loan! It's easy to find something you like, it's easy to get it to the register and its easy to get it to your car.
Why Valve? Why is it so god damned hard to give you my money?!?! I can go on the google play store and spend money with one damned finger! My 4yr old spent $20 on Angry birds slingshots before my wife locked her phone. He couldn't even figure out how to launch a game from your damned app!
... they allow devs way too much control over the store page and forums. Uber entertainment and the planetary annihilation disaster proves it. They were banning people left and right because of Uber going back on its promises of being DRM free. Then they removed the forums from the store page.
Valve is sucking developer dick at this point. The review system is mostly bullshit because steams userbase tends to be retarded and there's no checks and balances on false accounts/bots inflating "positive review" scores. Everything on the net needs to be taken with a huge grain of salt because everything is pseudo-nonymous. You don't know if that is a real review, a bot, some 8 year old kid, etc.
There's no standards. Metacritic is easily gamed by game review sites simply upping their score and upping the average, the reality is that metacritic now works on a 70-95 system, 71% garbage, 80, mediocre, 85-89 - OK - 92 Great.
Or use something like the Condorcet method to put all the games in order from most to least liked, and then assign each game a percentile ranking based on its position on the list.
Any sufficiently unpopular but cohesive argument is indistinguishable from trolling.
Why are Rotten Tomatoes ratings taken as gospel? It's owned by a movie studio, in an industry known for buying reviews and yet people scream how wonderful Rotten Tomatoes ratings are when pure dreck gets stellar ratings.
As a developer with a title on Steam, I can say ratings are just 1 of many issues with the store.
I can't tell you how many support tickets I have to go through because steam can't properly launch or update the main binaries that are covered under their drm. Most of the time it's AV issues, but once in a blue moon it's steam borkin' out. (0.25-0.5% of sales)
In the backend specific game information can't be shown to specific users. So I'm locked out of my real time statistics because my publisher has other titles from other developers...
But the worst, the god awful worst, are games on Early Access that are pure shit or have been abandoned... They drag the entire system down and majorly screw over legitimate titles that are in Early Access. IMO Steam should have a purging system for these titles. Perhaps even give coupon codes to users who bought games that have been purged.
As for the rating system. It definitely needs to be weighted. But there should also be incentive to give a ratings (even if they give a review anything) and there should be at least a "maybe" option. The thumbs up/thumbs down system doesn't really do it for me. Specially since I have negative ratings on my project such as "Game needs a German translation. 5/10" :\
If the only two choices are positive/negative (or thumbs up/thumbs down or some other equivalent 0/1 scheme), here's a formula that should work fairly well:
(n_positive + 1) / (n_positive + n_negative + 2)
So a single positive review gives you a score of .6667, and a single negative review gives you .3333. For large numbers of reviews, the score quickly converges to the actual fraction. If you don't have any reviews, you are at .5000.
The mathematical justification for this formula is that if you try to use a Bayesian approach to estimating the true probability of getting a positive review, and you start with a flat prior, this formula gives you the average of the posterior probability after observing the given number of positive and negative reviews. The full posterior distribution is a beta distribution with parameters alpha=n_positive+1 and beta=n_negative+1.
This formula is often used when applying Monte Carlo techniques to the game of go. I believe a lot of programmers simply start the counters of wins and losses at 1 to avoid corner cases (like division by 0), and they accidentally use the correct formula.
Actually the problem they are wrestling with here is one that has science has had to deal with for a long time: the uncertainty on a measurement. The star ratings are a measure of the popularity of a game so what you are really asking is "given the ratings it received which game is best?".
Unfortunately with a finite statistical sample you always have some degree of uncertainty and, within this uncertainty your data does not provide any ranking at all: you simply do not know which game is best to any sensible degree of certainty. However while correct this would lead to really confusing rankings since to be fair you would need to randomize the order within the uncertainty of each game's score. This would be complex and confusing to users!
Instead what they suggest is using a confidence level limit: what score can I be confident that 95% of people would rate the game higher than? We do this all the time in particle physics when we put limits on some new physics which we looked for an did not see. For example the precursor to the LHC, LEP had a result that it was 95% confident that the Higgs boson had a mass higher than 116 GeV/c2 (IIRC). There are better ways to do this than the method they quote but since this is just a game rating and not science it's a fine method to use.
That whole article is about how TotalBiscuit didn't sell out while others did. WTF?
Read the first comment. The article forgets to mention that he had a particularly interesting time of it on Gamasutra.
What makes you think they're interested in fixing their rating system?
Don't you think they have software engineers with CS degrees working on their systems?
Seriously the nerds working there are complete idiots, they could fix their rating system if they actually wanted to.
Ok, so let me get this straight, we have an article basically rehashing citing a bad explanation of Wilson score interval as a suggested fix for interpreting multi level ranking data? That is really stupid. If you are having data quality problems (saturated scores in this case) the first thing you don't do is throw away most of your data. So whats he do? Reduce each ranking to a positive vs negative ranking (1 bit).
Summery of algorithm (I think, correct me if I'm wrong here):
Reduce scores to to 1 bit (0 or 1) based on threshold. Implicitly assume all ratios of 1 vs 0 scores are equally likely (Frequentist bullshit). Use lower bound of 95% confidence interval as ranking factor. Even the sources own page on a Bayesian approach calls this "a hack"! It has no statistical justification.
Sure, cited source suggests a Bayesian approach in the footnote (see previous link) but he wimps out again and still does a reduction to positive vs negative ratings! Why? I have no idea: its not hard.
Also, "Bayesian statistics, like quantum theory, sounds completely nuts on its face". Really? I never thought assuming probability distributions for the unknowns sounded nuts... (I mean how can that even be on par with treating the state of all particles as a complex wave function? If you can't take bayes' rule, you shouldn't even go near the Copenhagen interpretation of quantum mechanics!). I don't recommend getting your explanation of trivial topics from a guys who finds them confusing.
If you want an example of a decent and not overly complex Bayesian estimator used for rankings, you can check out anime news network. Just hit the little formula link for details (quoted below). IMDb seems to use a similar approach but is less clear about it.
This rating only includes titles that have at least 13 votes. The bayesian estimate is a statistical technique used to reduce the noise due to low sample counts. In effect, the less a title has votes, the more it is pulled towards the mean (7.5354). In other words, these are the titles that many people agree are great. (formula)
bayesian rating = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
R = average for the anime
v = number of votes for the anime
m = minimum votes required to be listed (currently 13)
C = the mean vote across the whole report (currently 7.5354)
In practice, I've found their rankings to work very well. They have similar saturation problems to steam, but it works out fine.
The said comment is complete and utter bullshit. When he did his Guns of Icarus thing with other people, he always disclosed it in video description. After some desperate gamasutra folks (who notably have massive vested interest in sinking Bain, NerdCubed and a couple of other big youtube gaming commentators, because they have been massively eating into their audience numbers) started whining that disclosure in video description was not enough (according to the laws, yes it is), he added a short message to the beginning of every such video where he states any potential sources of interest he may have, down to having received a review copy.
Notably, gamasutra itself does not do this, and has been central to the whole gamergate scandal which revolves exactly around this kind of acting, only with gamasutra folks not disclosing their conflicts of interest anywhere. Not in descriptions, not in topics, not in articles. Nowhere.
Frankly, Bain goes to ridiculous lengths to disclose any potential conflict of interest he has nowadays, to the point of holding 3-hour talk marathons with developers (including CEO of the company that bought the sponsored content you talk about) and games media journalists partially on this topic and then puts those on his channel:
https://www.youtube.com/watch?...
If you really care about the issue, join #gamergate and go after people who actually do this crap.
STEAM!
One of my favorite reads were the reviews for the newest Train Simulator. Bad, BAD reviews, canning the game, bemoaning its terrible qualities, and cruddy replay values...
By people who needed over 800 hours played to get bored.
I hope Valve never goes public; nothing good ever comes from companies that make art needing to answer to stockholders.
Knowing my own taste in games, if there is a huge amount of positive reviews I most likely won't like the game. Gimme turn based strategy, damnit. Every 1st person game is right out. Every Over the shoulder game is right out. Those are the types that seem to be the mass favourites nowdays, so I'm forced to mostly indie titles. Civilizations etc excluded.
They have to disclose their financial ties to the products in question:
Are there rules for Steam Curators?
Yes. We have a couple of rules that govern the use of the Steam Curator feature:
Whatever you write when making a recommendation is subject to the same rules and guidelines as other Steam User-Generated Content, as stipulated in the Rules and Guidelines For Steam: Discussions, Reviews, and User Generated Content.
A recommendation should not link to or promote any stores other than Steam.
If you’ve accepted money or other compensation for making a product review or for posting a recommendation, you must disclose this fact in your recommendation.
Yep there's one about it!
It made me not get very enthusiastic about app stores and such.
"But the worst, the god awful worst, are games on Early Access that are pure shit or have been abandoned"
So.. pretty much all of them? What steam direly needs is a "DO NOT FUCKING SHOW ME EARLY ACCESS GAMES" tickbox. I'm not going to pay for public beta. Those things used to be free. I used to play them, and give feedback, report bugs etc. I'm not going to be paying to test someones goddamn game :D
I can see exactly one use for early access that I can agree with; Put your game on early access like two weeks before real release to test the servers with real world loads. Then you wipe everything after that two weeks. You make it clear all character development, in game money etc. is gone after that two weeks, so everyone starts again when the release is actually made. This both lets you test the servers, lets the most enthustiastic people test drive the game beforehand, and balances the downloads because the one who got early access don't have to rush the download when the real release is made.
I've noticed one big difference between the "professional" reviews on major sites and user reviews on Steam/Amazon etc.
By and large, the professional game reviews tend to cluster their scores in the 6/10 - 8/10 range. You have to be exceptionally good to get above that level or exceptionally bad to fall below it. You also - in most cases - get relatively little variation between professional review scores. A game might be 8/10 on one site and 9/10 on another, but it is rare to see a gap larger than 2 or at most 3 points. It does happen - Alien Isolation has had professional reviews ranging from 4/10 to 10/10 - but generally only with unusual games that go outside the usual templates (like Alien Isolation).
User reviews on the other hand, tend to be much more polarised. It's by no means unusual for games to pick up 10/10s from some users and 1/10s for another. Personal biases are much more likely to feature in user reviews ("I'm giving this game a 1/10 because I don't like something the developer said on twitter" or "I'm giving this game a 10/10 because I've spent the last 2 years boring everybody rigid about how good it is going to be and don't want to backtrack"). Often, the scores tend to average out in more or less the same place as the professional reviews once you have enough of both, but with much more divergence on the user reviews.
So which is more useful?
By and large - and with some important caveats - I find the professional reviews more honest and useful. A lot of people complain about the clustering of scores in the 6/10 to 8/10 range, but the nature of the modern games industry (quite risk-averse, with a lot of project oversight) means that most commercially produced games tend to fall into that range. If you assume a 6/10 is "not great, but overall more good than bad" and an 8/10 is "high enjoyable but not ground-breaking", then you're left with a spectrum into which most major releases fit. The industry does throw out the occasional piece of brilliance - which is usually recognised. And sometimes, things go wrong and it throws out the odd turkey (Aliens: Colonial Marines being perhaps the most recent example). When those things happen, most of the big review sites do seem to reflect them.
But those caveats I mentioned before are important. The first is that at the end of the day, the people doing the professional reviews are still human and they still have their own biases, preconceptions and agendas. True, they have people watching them to make sure that they don't give free reign to those... but occasionally, those checks and balances fail. In fact, most of the big review sites have a few known quirks that you learn to watch for. Eurogamer, for instance (which despite the criticism I'm about to hand out, I do, in general, rate highly), has a real Nintendo-nostalgia fetish and a habit of over-scoring first party Nintendo games. At the same time, until fairly recently, it went through a phase of trying to shoehorn political correctness into its reviews and marking down a few games which committed real or perceived transgressions (though I've noticed less of this recently).
The next big caveat with professional reviews is around bugs. The big review sites are often given pre-release copies of games, so that the reviews can go live before release. Indeed, a lack of pre-release reviews is often an early sign that a game will be a turkey (again... Aliens: Colonial Marines had a review embargo until its release day). Thing is, sometimes those review copies are unfinished code. And sometimes they aren't. But regardless, there is a tendancy for professional reviewers to either ignore or to be instructed to ignore bugs, on the basis that "they'll be fixed for release". And, surprise surprise, they often aren't fixed for release. User reviews are often your first warning that a game is a buggy mess - though on PC you do have to try to separate out the inevitable complaints that pop up on every new release's forms to the effect that "it won't run on my 8 year old PC running a
And they should have stuck to the damn Metascore.
Valve is extremely lazy, and Steam has allowed that. They have by far the majority of digital PC game sales, and most PC sales are digital these days. So they make tons of money doing very little work. This has allowed them to do what they really like doing: Faffing about with random projects, not worrying about any kind of deliverables.
Unfortunately, lacking any real competition, Steam has no reason to really get better. The only store I would say is actually better than Steam is GOG, but they have the problem of not using DRM, which means many publishers won't release on them.
Steam isn't likely to get much better unless it has to. If some other company makes a good games store, as in one that actually hosts and deals with distribution, not just one that sells Steam keys (as many do) and it starts to eat in to Steam's market, then maybe Valve will give a shit, get a bigger dev team, hire some CS people, etc. However until then, they'll continue to just mess around.
They aren't beholden to investors, they make so much money (like $40 million PER EMPLOYEE per year) they don't have to adhere to any kind of schedule, and they are the one and only place many will go to buy games. As such, they can have it be a big shitfest and it doesn't really matter.
Personally, I go out of my way to buy any game I can on GOG, rather than Steam, if said game is available on GOG. They actually curate their store, make sure minimum quality standards are met, make sure old titles run, and don't put out unfinished games (aka Early Access). However, there's a lot I can't get there, and I'm one of the few people who do. Most gamers have this "Steam is the only place to buy, Gabe is god, all hail Valve," attitude.
It hasn't even started yet.
https://www.youtube.com/watch?...
And in case you're actually interested: here's the current status of the boycott campaign known as #gamergate:
https://gitorious.org/gamergat...
12 mysogynist kids apparently have indeed moved on. The >12.000 adults pissed off at utter lack of ethics and betrayal by people they expected to stand by them and their hobby on the other hand are just getting started.
What we really need is something like Pandoras system. Something where my own ratings, as a user, are factored in. If 10,000 random users rank a game as 9/10 but I thought it was a 3/10, we obviously disagree on some things. Match me against people who review similarly to me if you want to help me find games to buy. Use the statistical algorithm as a fallback.
If instead of talking about Steam, we were talking about iTunes Store or Google Play or XBox Live, 100% of the Steam users here would immediately start laughing about how stupid "those people" are, to be using the store to determine what to buy. That is obviously the very last intell source that you'd use. THAT WOULD BE STUPID.
But somehow, if you're a Steam user, all your common sense happen to be inapplicable, whenever we happen to be talking about Steam (and you get your common sense back whenever you talk about the iTunes store or XBox Live). You and they can each look down on each other, correctly secure that you're wiser than the other, and oblivious to the fact that you're also dumber than the other.
And you both chuckle at the guy who uses Amazon's star ratings to determine which widget to buy from Amazon. How fucking moronic is that guy? Doesn't he know how to Google for reviews? He stares back at you, being dumb in his Amazon purchases, yet shaking his head at how idiotic you two behave, when you're shopping for software.
But anyway, no, obviously of course, you wouldn't ever actually use Steam, to determine what games to buy on Steam. (Steam's rating system is totally irrelevant, because they're selling the things they're trying to rate. It's impossible for anyone to do a good job of that, unless you define the job as Fuck The Users.) To determine what to buy on Steam, you use the same method as you'd use for any other store: you go read disinterested third party reviews published on disinterested third party media, just like you expect the Amazon and Apple and Microsoft and Google customers to do, and you shake your head with sadness and despair for humanity's dim future, every time you see people doing it exactly, perfectly wrong.
But NOOOO, the one store I use, happens to also be the first store in history to have done it right and be trustworthy! Because I am SPECIAL!!! My vendors never have conflicts of interest!
"Dumb all over, yes we are, dumb all over, near and far, dumb all over, black and white. People, we is not wrapped tight." -- FZ
"Believe me!" -- Donald Trump