Chess Ratings — Move Over Elo

← Back to Stories (view on slashdot.org)

Chess Ratings — Move Over Elo

Posted by timothy on Wednesday August 4, 2010 @09:02AM from the checkmate-and-perhaps-match dept.

databuff writes "Less than 24 hours ago, Jeff Sonas, the creator of the Chessmetrics rating system, launched a competition to find a chess rating algorithm that performs better than the official Elo rating system. The competition requires entrants to build their rating systems based on the results of more than 65,000 historical chess games. Entrants then test their algorithms by predicting the results of another 7,809 games. Already three teams have managed create systems that make more accurate predictions than the official Elo approach. It's not a surprise that Elo has been outdone — after all, the system was invented half a century ago before we could easily crunch large amounts of historical data. However, it is a big surprise that Elo has been bettered so quickly!"

39 of 133 comments (clear)

Min score:

Reason:

Sort:

Indeed by mooingyak · 2010-08-04 09:05 · Score: 5, Funny

However, it is a big surprise that Elo has been bettered done so quickly!
Absolutely. I can almost guarantee no one thought that Elo would have been bettered done so quickly.

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
1. Re:Indeed by Braintrust · 2010-08-04 09:13 · Score: 3, Funny
  
  Indubitably. It filled with hope the one that no one thought Elo would have been bettered done so quickly.
  
  --
  Years later, a doctor will tell me that I have an I.Q. of 48, and am what some people call "mentally retarded".
2. Re:Indeed by Lord+Byron+II · 2010-08-04 09:13 · Score: 5, Funny
  
  Timothy is the bettered done editor of Slashdot!
3. Re:Indeed by camperdave · 2010-08-04 09:29 · Score: 5, Funny
  
  ELO hasn't done all that well since the big hair rock days of the late 1970s/early 1980s, pretty much since the drummer left to join Black Sabbath. I'm surprised at the band's connection to chess.
  
  --
  When our name is on the back of your car, we're behind you all the way!
4. Re:Indeed by Hognoxious · 2010-08-04 09:53 · Score: 2, Insightful
  
  The first time I Heard Bev Bevan had joined Sabbath I kind of went "WTF?". But they're all Brummies, along with a lot of heavy metal bands around that time. Priest, Magnum ... they probably all played in pubs together wwhen they were 15.
  Similarly you couldn't be a serious goth in the 80s unless you were from Leeds, or a flare-wearing floppy-mopped tossbag in the 90s if you weren't a Manc.
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
been bettered done THAT quickly??? by boneclinkz · 2010-08-04 09:08 · Score: 5, Funny

Elo-L
umm by buddyglass · 2010-08-04 09:08 · Score: 4, Informative

However, it is a big surprise that Elo has been bettered done so quickly!
Not really. Jeff Sagarin has had two systems of rating sports teams for a while now. One, ELO_CHESS, is based purely on win-loss, while the other, PURE POINTS, takes into account margin of victory. According to him, the latter is better at predicting future results. From his analysis:

In ELO CHESS, only winning and losing matters; the score margin is of no consequence, which makes it very "politically correct". However it is less accurate in its predictions for upcoming games than is the PURE POINTS, in which the score margin is the only thing that matters. PURE POINTS is also known as PREDICTOR, BALLANTINE, RHEINGOLD, WHITE OWL and is the best single PREDICTOR of future games.
Submission error by TubeSteak · 2010-08-04 09:09 · Score: 2, Informative

Already three teams have managed create systems that make more accurate predictions than the official Elo approach.
1 EdR* 0.729125
2 whiteknight* 0.731656
3 Elo Benchmark* 0.738107 {-- The "official Elo approach"
Maybe we're counting from zero and they forgot to put it on the leaderboard?

--
[Fuck Beta]
o0t!
1. Re:Submission error by databuff · 2010-08-04 09:53 · Score: 4, Informative
  
  The Elo Benchmark was submitted a second time. I wrote to Sonas about this. Apparently the rating system has to be seeded. He tried a different approach to calculating seed ratings and this performed better - pushing him one place higher in the rankings.
2. Re:Submission error by Martian_Kyo · 2010-08-04 18:15 · Score: 2, Informative
  
  1 Elo BenchmarkOpen 0.723834
  2 EdROpen 0.729125
  3 whiteknightOpen 0.731656
  so at this moment elo is back on top.
  Could it be that people have been done some quickly jumpening to conclusions?
  I guess george is working at /. now.
Less than 24 hours ago by LearnToSpell · 2010-08-04 09:14 · Score: 5, Funny

Less than 24 hours ago, the readers of Slashdot launched a competition to find an editing algorithm that performs better than the official "editors" of the site. The competition requires entrants to build their comment systems based on the results of over 9,000 historical submissions. Entrants then test their algorithms by predicting the results of the next 7,809 dup^H^H^Hstories. Already three teams have managed to create systems that make more accurate predictions than the official /. approach. It's not a surprise that Timothy has been outdone -- after all, he was invented half a century ago before English had been standardized. However, it is no big surprise that Slashdot has been bettered done so quickly! The winner: Texas Instruments!

--
Haida Manga
In other news... by Last_Available_Usern · 2010-08-04 09:14 · Score: 2, Funny

Organized crime members linked to gambling rackets have been endicted for kidnapping a busload of nerds after they refused to program similar algorithms in exchange for Warcraft game time and photoshopped Natalie Portman porn.

We all know that's not true though. They totally would have done it.
1. Re:In other news... by easterberry · 2010-08-04 09:38 · Score: 2, Funny
  
  They bettered have done it!
differences are minute by l2718 · 2010-08-04 09:20 · Score: 4, Interesting

Looking at the table, the differences in predictive power are small enough that it's not obvious they aren't due to chance alone; there needs to be some calculation that shows that the differences are meaningful validating the claim that the alternative methods actually extract more information than Elo does. Perhaps there is enough inherent randomness in Chess that even simple predictive models can extract most of the systematics so that what remains after Elo is mostly noise?
1. Re:differences are minute by l2718 · 2010-08-04 11:31 · Score: 2, Insightful
  
  No. Chess has no random elements to it. You play against an opponent, with a very strict set of rules.
  I don't think you understand what the discussion in this post is about. The game of chess has no element of randomness -- but the players do, and it's the players we are trying to model. Just because, on average, player A is better than player B, doesn't mean that player A will win every game. The fact is that the same player will play at different levels of ability on different days, and that is the randomness that is relevant to models trying to predict outcomes of chess games.
  Basically all rating systems are based on the assumption that players' ability for a given game fluctuates around an "average ability level" according to some distribution, and the goal of the rating system is to discover the average (and perhaps spread) of this indvidual distribution. So even under best conditions the most the system can do is predict the outcome with an error coming from the distribution of abilities. Now assume the distributions are relatively wide -- then there will be a large statistical error even for the best system.
  Returning to the main point, the discussion of the last paragraph has nothing to do with the fact that chess is deterministic. In fact, the fact that there is no randomness in chess makes things easier.
2. Re:differences are minute by shimage · 2010-08-04 11:34 · Score: 2, Informative
  
  Bullshit. Mistakes are roughly stochastic, ergo, there are random elements in chess players' performance. This is why chess matches involve more than just two games.
3. Re:differences are minute by phantomfive · 2010-08-04 17:00 · Score: 3, Interesting
  
  Mikhail Tal, one of the best players ever, would differ; because it's impossible to see deeply enough to know what the outcome of a move will be. He makes the point here, and I'll quote a small piece:
  
  Tal: - "Yes. For example, I will never forget my game with GM Vasiukov on a USSR Championship. We reached a very complicated position where I was intending to sacrifice a knight. The sacrifice was not obvious; there was a large number of possible variations; but when I began to study hard and work through them, I found to my horror that nothing would come of it. Ideas piled up one after another. I would transport a subtle reply by my opponent, which worked in one case, to another situation where it would naturally prove to be quite useless. As a result my head became filled with a completely chaotic pile of all sorts of moves, and the infamous "tree of variations", from which the chess trainers recommend that you cut off the small branches, in this case spread with unbelievable rapidity.
  
  Now I somehow realized that it was not possible to calculate all the variations, and that the knight sacrifice was, by its very nature, purely intuitive. And since it promised an interesting game, I could not refrain from making it."
  
  Journalist: - "And the following day, it was with pleasure that I read in the paper how Mikhail Tal, after carefully thinking over the position for 40 minutes, made an accurately-calculated piece sacrifice".
  You will find that lots of chess players have reported making similarly intuitive moves.
  
  --
  Qxe4
More like commenter error by Anonymous Coward · 2010-08-04 09:20 · Score: 3, Informative

That number is "Root Mean Square Error", so lower is better
1. Re:More like commenter error by digitig · 2010-08-04 10:37 · Score: 3, Insightful
  
  Yes, and count how many of them are better than the ELO approach.
  
  --
  Quidnam Latine loqui modo coepi?
how are victory margins relevant to chess? by l2718 · 2010-08-04 09:27 · Score: 4, Insightful

Indeed, Sagarin has shown that applying Elo in sports where the winner is based on points scored is not optimal, since the average margin of victory is a better predictor of strength than won-loss record. But this has nothing to do with applying the Elo method to its original setting of chess, where the outcome of the game is only "win/draw/loss" and there is no margin of victory.
1. Re:how are victory margins relevant to chess? by thousandinone · 2010-08-04 09:44 · Score: 5, Insightful
  
  This is pretty ridiculous. Margin of victory? Is there a committee overseeing ethical treatment of chess pieces now? If I sacrifice everything but my King and a Bishop to checkmate you, why is that intrinsically a better strategy than sparing some of my pieces?
  
  There are definite merits to a sacrificial strategy- it's all about board control. Long as theres more than one or two legal moves available to your opponent, you can't really predict where he'll send his pieces. A queen in the middle of the board can cover a lot of distance and do some impressive maneuvers, but any given piece only occupies one spot. Control where your opponent moves, control the game. Not to mention that less pieces on the board gives you more options for where to move with your remaining pieces, and by allowing your pieces to be taken, you have a measure of control over where the free space on the board is.
  
  Indeed, given the rules of the game, I would say a strategy that goes to great lengths to preserve as many of ones own pieces as possible is flawed...
2. Re:how are victory margins relevant to chess? by databuff · 2010-08-04 09:51 · Score: 2, Informative
  
  Data only shows results - so there's no scope for gauging the margin of victory.
3. Re:how are victory margins relevant to chess? by friedo · 2010-08-04 10:04 · Score: 2, Insightful
  
  If some metric X is a statistically reliable method of predicting future success, then X can be defined as a margin of victory. Whether X is a function of the "values" of remaining pieces, or their positions on the board, or the number of moves, or whatever, is immaterial.
4. Re:how are victory margins relevant to chess? by SomeJoel · 2010-08-04 10:09 · Score: 3, Insightful
  
  Sorry, but... You can't checkmate with only a king and a bishop.
  The hell you can't. It turns out, your opponent has pieces too! Have you ever even played chess?
  
  --
  <Complete your profile by adding a signature!>
5. Re:how are victory margins relevant to chess? by phantomfive · 2010-08-04 10:19 · Score: 3, Informative
  
  You know, you're really asking for it when you take a small point that isn't even relevant to his main point and attack it. Sorry, YOU'RE WRONG!!!!!.
  
  If you ever find yourself in a game where you can sacrifice all your pieces to get to that position, DO IT!
  
  --
  Qxe4
So they've got better... by frank_adrian314159 · 2010-08-04 09:28 · Score: 4, Interesting

Are the better entries as transparent? ELO's a pretty simple way do do this - add or subtract a few points from the rating based on a win or a loss based on the relative difference of the ratings. Would anyone understand (other than "It's a neural net") the ratings produced by these competitors? Would anything human be able to calculate them?
Also, are the new models' improvements in prediction statistically relevant? Or are they just fitting the noise? Both the training dataset and the test dataset seem rather small to me.
Finally, and most importantly, how stable are the ratings? If I'm drunk and lose to a "patzer", do I go down to his level? Fairness of tournaments having small numbers of games has a lot to do with rating stability (unless we're assuming a population periodically beset by huge random shifts in ability).
All-in-all, there's a lot of problems coming up with a good rating system. Opening the dataset to the world, saying "Have at it!", and looking at a single scorecard based solely on predictability is nowhere near sufficient.

--
That is all.
1. Re:So they've got better... by greg1104 · 2010-08-04 10:10 · Score: 2, Interesting
  
  Development of stock trading systems, which are also trying to rank things based on historical data, have this persistent problem there's been waaay more research into than chess rankings. If you train them on a bunch of historical data, you will discover the best system is invariably one that essentially does a giant curve fitting job on that exact data. One thing trading system developers do to address this are use techniques like walk forward testing, where the system gets trained on one set of data but is only evaluated on a second set.
  Luckily, this chess rating competition is using that sort of technique: "Competitors train their rating systems using a training dataset of over 65,000 recent results for 8,631 top players. Participants then use their method to predict the outcome of a further 7,809 games." In fact, the current leaderboard reflects results on only 1/10 of the training set. So long as real ranking is ultimately based on the unseen data set, not the training one, there's little risk of them fitting the noise in the training set and still winning.
Re:Bettered Done So Quickly by Anonymous Coward · 2010-08-04 09:31 · Score: 2, Funny

Battered done --> basted
Re:Apples and oranges? by mooingyak · 2010-08-04 09:32 · Score: 2, Informative

Since the Elo system is not designed to predict future performance (it's designed to capture current relative rankings), then is it really surprising that programs designed to predict future performance are better at it?
And if my current relative rank is higher than yours, doesn't that imply that if we play each other I should win? If not, what purpose does the rank serve?

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Elo in non-chess games by LambdaWolf · 2010-08-04 09:45 · Score: 4, Insightful

Ah man, no matter how inadequate the Elo system may be for chess, it's much worse seeing it applied to other games where it doesn't belong, which happens regrettably often. The trouble is that the Elo system depends on the premise that nothing affects the outcome of a game other than the skill of each player (and who gets the white pieces).
In chess, that assumption is a pretty good approximation to reality, since every tournament game in run the same way. But many games do have variations in rules or format across different events, such as different maps or races in a real-time strategy game, or different card pools in Magic: The Gathering. Then Elo ratings are biased by how often a player has the chance to play to his strong areas. Players in turn are compelled to game the system: "I should avoid this event because they're using Format X and my rating will stay stronger if I stick to Format Y." The Elo system is meant precisely to obviate that kind of gamesmanship: chess players should need to think only about the strengths of their opponents, which (in principle) will be weighted fairly when calculating rating adjustments. But if there are other competitive factors, which is true for most any popular game invented in the last 30 years, Elo ratings become that much less meaningful.

--
"This algorithm runs in constant time. Come on, 2,147,483,648 is a constant..."
Re:Apples and oranges? by vlm · 2010-08-04 09:46 · Score: 2, Funny

And if my current relative rank is higher than yours, doesn't that imply that if we play each other I should win? If not, what purpose does the rank serve?
Historical achievement, the glory of the grind. Much as my lower UID implies this comment should be more valuable than your high UID comment.

--
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
Allow me to clarify by jamrock · 2010-08-04 09:55 · Score: 3, Funny

Three teams done bettered Elo with betterer done algorithms, and the submitter is surprised that it was bettered done so quickly. I'm done. Was that better?

He sounds like Lady Macbeth on crack.
Re:Apples and oranges? by mooingyak · 2010-08-04 10:07 · Score: 2, Interesting

Much as my lower UID implies this comment should be more valuable than your high UID comment.
I used to think of myself as having a particularly high UID... until I realized that mine is actually lower than a majority of the total UIDs. Weirded me out a little. There are UIDs that are farther from the 1,000,000 mark than I am from Taco.

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Re:Microsoft's TrueSkill beat Elo before this comp by Maarx · 2010-08-04 11:07 · Score: 3, Informative

Not to belittle what Microsoft did, but in the interest if giving credit where credit is due:

Here’s the problem with Battle.net 2.0: 2002s Warcraft III: Reign of Chaos is one of the most underrated video games ever created. And that’s before you learn its online apparatus is the foundation for modern matchmaking, where Blizzard Entertainment should get royalties every time you brag about your X-Box Live Trueskill rating. (Then again, I shouldn’t be giving Blizzard ideas right now.)
Here’s how Warcraft III matchmaking worked: Everyone starts at level one. The maximum level is fifty. You play players within six levels of your own. Win five games, gain a level. Lose five games, lose a level. The penalty for losing is reduced during levels one to nine. Thus, players who win half their games will become level ten.
It was simple and transparent. That was the hook, and people choked on it. It turned Warcraft III ladder play into what ICCUP serves for Starcraft players, a stomping ground so competitive that climbing the food chain gave you a shot at the guys who played for a living. That’s what a good online gaming system does.
The quote comes from Battle.net 2.0: The Antithesis of Consumer Confidence. I would encourage you to read the entire thing, but for reasons completely unrelated to this thread.
Mx-doctor by martin-boundary · 2010-08-04 12:19 · Score: 2, Funny

However, it is a big surprise that Elo has been bettered done so quickly!
Absolutely. I can almost guarantee no one thought that Elo would have been bettered done so quickly.
Is it because elo would have been bettered done so quickly that you came to me?
Elo Anecdote by afabbro · 2010-08-04 14:24 · Score: 4, Informative

Not relevant specifically to this story, but I always laugh at the story of how a prisoner manpiulated the Elo system via closed pool ratings inflation.
Short summary: said prisoner only played against other prisoners, who he'd trained. Due to careful scheduling of the games, he rose from his true strength (probably sub-master) to being the second-highest rated played in the U.S. in 1996.

--
Advice: on VPS providers
Re:Elo Benchmark is #1 at this moment by daveime · 2010-08-04 16:39 · Score: 2, Funny

Pleased to say I jumped straight into the money at #7 with my first submission :-)
Where AM I going to spend a whole 50 Euros ? Maybe I'll donate it to Greece, seems like they need it.
Re:Apples and oranges? by Olivier+Galibert · 2010-08-04 19:28 · Score: 2, Informative

No we don't. This is not the crawler you're looking for.
OG.
Re:Apples and oranges? by Abstrackt · 2010-08-05 01:55 · Score: 2, Funny

Sometimes I suspect low UID users have a crawler that looks for people referencing low UIDs...
I had no idea COBOL was so powerful.

--
They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett