Chess Ratings — Move Over Elo

← Back to Stories (view on slashdot.org)

Chess Ratings — Move Over Elo

Posted by timothy on Wednesday August 4, 2010 @09:02AM from the checkmate-and-perhaps-match dept.

databuff writes "Less than 24 hours ago, Jeff Sonas, the creator of the Chessmetrics rating system, launched a competition to find a chess rating algorithm that performs better than the official Elo rating system. The competition requires entrants to build their rating systems based on the results of more than 65,000 historical chess games. Entrants then test their algorithms by predicting the results of another 7,809 games. Already three teams have managed create systems that make more accurate predictions than the official Elo approach. It's not a surprise that Elo has been outdone — after all, the system was invented half a century ago before we could easily crunch large amounts of historical data. However, it is a big surprise that Elo has been bettered so quickly!"

3 of 133 comments (clear)

Min score:

Reason:

Sort:

differences are minute by l2718 · 2010-08-04 09:20 · Score: 4, Interesting

Looking at the table, the differences in predictive power are small enough that it's not obvious they aren't due to chance alone; there needs to be some calculation that shows that the differences are meaningful validating the claim that the alternative methods actually extract more information than Elo does. Perhaps there is enough inherent randomness in Chess that even simple predictive models can extract most of the systematics so that what remains after Elo is mostly noise?
1. Re:differences are minute by phantomfive · 2010-08-04 17:00 · Score: 3, Interesting
  
  Mikhail Tal, one of the best players ever, would differ; because it's impossible to see deeply enough to know what the outcome of a move will be. He makes the point here, and I'll quote a small piece:
  
  Tal: - "Yes. For example, I will never forget my game with GM Vasiukov on a USSR Championship. We reached a very complicated position where I was intending to sacrifice a knight. The sacrifice was not obvious; there was a large number of possible variations; but when I began to study hard and work through them, I found to my horror that nothing would come of it. Ideas piled up one after another. I would transport a subtle reply by my opponent, which worked in one case, to another situation where it would naturally prove to be quite useless. As a result my head became filled with a completely chaotic pile of all sorts of moves, and the infamous "tree of variations", from which the chess trainers recommend that you cut off the small branches, in this case spread with unbelievable rapidity.
  
  Now I somehow realized that it was not possible to calculate all the variations, and that the knight sacrifice was, by its very nature, purely intuitive. And since it promised an interesting game, I could not refrain from making it."
  
  Journalist: - "And the following day, it was with pleasure that I read in the paper how Mikhail Tal, after carefully thinking over the position for 40 minutes, made an accurately-calculated piece sacrifice".
  You will find that lots of chess players have reported making similarly intuitive moves.
  
  --
  Qxe4
So they've got better... by frank_adrian314159 · 2010-08-04 09:28 · Score: 4, Interesting

Are the better entries as transparent? ELO's a pretty simple way do do this - add or subtract a few points from the rating based on a win or a loss based on the relative difference of the ratings. Would anyone understand (other than "It's a neural net") the ratings produced by these competitors? Would anything human be able to calculate them?
Also, are the new models' improvements in prediction statistically relevant? Or are they just fitting the noise? Both the training dataset and the test dataset seem rather small to me.
Finally, and most importantly, how stable are the ratings? If I'm drunk and lose to a "patzer", do I go down to his level? Fairness of tournaments having small numbers of games has a lot to do with rating stability (unless we're assuming a population periodically beset by huge random shifts in ability).
All-in-all, there's a lot of problems coming up with a good rating system. Opening the dataset to the world, saying "Have at it!", and looking at a single scorecard based solely on predictability is nowhere near sufficient.

--
That is all.