Slashdot Mirror


Chess Ratings — Move Over Elo

databuff writes "Less than 24 hours ago, Jeff Sonas, the creator of the Chessmetrics rating system, launched a competition to find a chess rating algorithm that performs better than the official Elo rating system. The competition requires entrants to build their rating systems based on the results of more than 65,000 historical chess games. Entrants then test their algorithms by predicting the results of another 7,809 games. Already three teams have managed create systems that make more accurate predictions than the official Elo approach. It's not a surprise that Elo has been outdone — after all, the system was invented half a century ago before we could easily crunch large amounts of historical data. However, it is a big surprise that Elo has been bettered so quickly!"

6 of 133 comments (clear)

  1. differences are minute by l2718 · · Score: 4, Interesting

    Looking at the table, the differences in predictive power are small enough that it's not obvious they aren't due to chance alone; there needs to be some calculation that shows that the differences are meaningful validating the claim that the alternative methods actually extract more information than Elo does. Perhaps there is enough inherent randomness in Chess that even simple predictive models can extract most of the systematics so that what remains after Elo is mostly noise?

    1. Re:differences are minute by phantomfive · · Score: 3, Interesting
      Mikhail Tal, one of the best players ever, would differ; because it's impossible to see deeply enough to know what the outcome of a move will be. He makes the point here, and I'll quote a small piece:

      Tal: - "Yes. For example, I will never forget my game with GM Vasiukov on a USSR Championship. We reached a very complicated position where I was intending to sacrifice a knight. The sacrifice was not obvious; there was a large number of possible variations; but when I began to study hard and work through them, I found to my horror that nothing would come of it. Ideas piled up one after another. I would transport a subtle reply by my opponent, which worked in one case, to another situation where it would naturally prove to be quite useless. As a result my head became filled with a completely chaotic pile of all sorts of moves, and the infamous "tree of variations", from which the chess trainers recommend that you cut off the small branches, in this case spread with unbelievable rapidity.

      Now I somehow realized that it was not possible to calculate all the variations, and that the knight sacrifice was, by its very nature, purely intuitive. And since it promised an interesting game, I could not refrain from making it."

      Journalist: - "And the following day, it was with pleasure that I read in the paper how Mikhail Tal, after carefully thinking over the position for 40 minutes, made an accurately-calculated piece sacrifice".

      You will find that lots of chess players have reported making similarly intuitive moves.

      --
      Qxe4
  2. So they've got better... by frank_adrian314159 · · Score: 4, Interesting

    Are the better entries as transparent? ELO's a pretty simple way do do this - add or subtract a few points from the rating based on a win or a loss based on the relative difference of the ratings. Would anyone understand (other than "It's a neural net") the ratings produced by these competitors? Would anything human be able to calculate them?

    Also, are the new models' improvements in prediction statistically relevant? Or are they just fitting the noise? Both the training dataset and the test dataset seem rather small to me.

    Finally, and most importantly, how stable are the ratings? If I'm drunk and lose to a "patzer", do I go down to his level? Fairness of tournaments having small numbers of games has a lot to do with rating stability (unless we're assuming a population periodically beset by huge random shifts in ability).

    All-in-all, there's a lot of problems coming up with a good rating system. Opening the dataset to the world, saying "Have at it!", and looking at a single scorecard based solely on predictability is nowhere near sufficient.

    --
    That is all.
    1. Re:So they've got better... by greg1104 · · Score: 2, Interesting

      Development of stock trading systems, which are also trying to rank things based on historical data, have this persistent problem there's been waaay more research into than chess rankings. If you train them on a bunch of historical data, you will discover the best system is invariably one that essentially does a giant curve fitting job on that exact data. One thing trading system developers do to address this are use techniques like walk forward testing, where the system gets trained on one set of data but is only evaluated on a second set.

      Luckily, this chess rating competition is using that sort of technique: "Competitors train their rating systems using a training dataset of over 65,000 recent results for 8,631 top players. Participants then use their method to predict the outcome of a further 7,809 games." In fact, the current leaderboard reflects results on only 1/10 of the training set. So long as real ranking is ultimately based on the unseen data set, not the training one, there's little risk of them fitting the noise in the training set and still winning.

  3. Re:Apples and oranges? by mooingyak · · Score: 2, Interesting

    Much as my lower UID implies this comment should be more valuable than your high UID comment.

    I used to think of myself as having a particularly high UID... until I realized that mine is actually lower than a majority of the total UIDs. Weirded me out a little. There are UIDs that are farther from the 1,000,000 mark than I am from Taco.

    --
    William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
  4. Re:how are victory margins relevant to chess? by buddyglass · · Score: 1, Interesting

    If I sacrifice everything but my King and a Bishop to checkmate you, why is that intrinsically a better strategy than sparing some of my pieces?

    Winning with only a king and a bishop remaining is no "better" than winning with all your pieces remaining. A win is a win. That said, winning a game while having many more pieces remaining than one's opponent may imply that the difference between your skill and your opponent's is greater than if you won with only a kind and bishop left. There may be some merit to working that into an algorithm if the goal is to predict the outcome of future matches.

    Another data point that might be valuable is simply "how many moves did the game take before checkmate"? Without any other knowledge, the guy who beats me in 10 moves is likely to be a better player than the guy who takes 50 moves to beat me.