Slashdot Mirror


Elo Chess Rating System Topped By Proposed Replacements

databuff writes "About six weeks ago, Slashdot reported a competition to find a chess rating algorithm that performed better than the official Elo rating system. The competition has just reached the halfway mark and the best entries have outperformed Elo by over 8 per cent. The leader is a Portuguese physicist, followed by an Israeli mathematician and then a pair of American computer scientists."

79 of 102 comments (clear)

  1. Sweet by Anonymous Coward · · Score: 2, Funny

    Castle this.

    1. Re:Sweet by robthebloke · · Score: 1

      i think you mean: castRelo this!

      The leader is after all a portRugese person from portRugal.....

  2. Re:what now? by cappp · · Score: 5, Interesting

    To be fair that owning represents a difference of 0.000629 in the RMSE between the two of them - hardly the sound thrashing those snooty mathematicians rightly deserve.

  3. Re:what now? by jhoegl · · Score: 3, Funny

    Yes, I agree. We should also fight amongst professions because we simply do not have enough to fight about.

    Long live Physicists and they physicisteries!

  4. Re:Interesting by JoshuaZ · · Score: 4, Informative

    This is chess rating algorithm. The goal is to predict given a matchup between two players with known histories how they will likely fare in a game or series of games against each other. Elo is the standard rating system and has been for some time. These algorithms are improvements on that. So they predict better who will win. They have nothing to do with playing actual chess. So the Turk is irrelevant to this discussion (aside from the not minor issue that the operator has been dead for some time.)

  5. Re:what now? by cappp · · Score: 5, Funny

    Whoah there partner, we don't want a full-scale fight between all professions - some of those guys are pretty buff. Pick off the mathematicians and physicists first because the law of the playground must be respected - the small, weak, bifocaled, or curiously gifted with numbers should be taken down first. Then nap time.

  6. Errata by Nuno+Sa · · Score: 1

    "Portrugese"?
    Did you mean "Portuguese"? :-)

    1. Re:Errata by Chuck+Chunder · · Score: 4, Funny

      Not any more.

      --
      Boffoonery - downloadable Comedy Benefit for Bletchley Park
    2. Re:Errata by Anonymous Coward · · Score: 2, Funny

      No, Portrugal. Between Spairn and the Atlantirc.

    3. Re:Errata by robthebloke · · Score: 1

      Not if you're from the west-country.... Reminds me of the graffiti outside my old school that simply read: "Nirvarnar"

  7. Can't be so by Waffle+Iron · · Score: 5, Funny

    A friend called my on my telephone line and told me out of the blue that the Elo rating system had been bested. I was so stunned I almost turned to stone. I said, "Dude, don't bring me down!". But the news slowly sunk in, and now I can't get it out of my head. But I'll tell you what, the jury is still out. I think there's gonna be a showdown, and then Elo will be back on top.

    1. Re:Can't be so by 93+Escort+Wagon · · Score: 1

      A friend called my on my telephone line and told me out of the blue that the Elo rating system had been bested. I was so stunned I almost turned to stone. I said, "Dude, don't bring me down!". But the news slowly sunk in, and now I can't get it out of my head. But I'll tell you what, the jury is still out. I think there's gonna be a showdown, and then Elo will be back on top.

      I wonder how many people on Slashdot are old enough to get this... at least 4, apparently!

      --
      #DeleteChrome
    2. Re:Can't be so by definate · · Score: 3, Funny

      I don't get it.

      REVEAL YOUR SECRETS!

      Wow, Slashdot won't allow me to post with that ratio of non-caps to caps. So I need to write all of this to correct the ratio. The error says "Filter error: Don't use so many caps. It's like YELLING.".

      Dear robotic automated moderating overlord,
      I know it's like yelling, that's the effect I was going for. Obviously your algorithm is shit, because you don't seem to understand context... or love.
      Sincerely,
      definate

      --
      This is my footer. There are many like it, but this one is mine.
    3. Re:Can't be so by halestock · · Score: 2, Funny

      I dunno, I heard the new system has an IQ of 1001, has a jumpsuit on, and is also a telephone.

    4. Re:Can't be so by LongearedBat · · Score: 1

      I want to mod you funny, but I've got no mod points. :)

      Anyway, I suspect the answer is...
      http://en.wikipedia.org/wiki/Electric_Light_Orchestra

    5. Re:Can't be so by somersault · · Score: 1

      I found it funny without realising it's a reference. In fact, finding out it's a reference to something makes it a little less funny.. though not as bad as when I make a joke and people are like "what's that from?".

      --
      which is totally what she said
    6. Re:Can't be so by Paradise+Pete · · Score: 2, Informative

      REVEAL YOUR SECRETS!

      His post is chock full o' snippets ELO songs.

    7. Re:Can't be so by TheCycoONE · · Score: 1

      To be fair it's not like he took it from one source and changed a couple words around. Stringing together song and album titles:

      A friend called my on my telephone line and told me out of the blue that the Elo rating system had been bested. I was so stunned I almost turned to stone. I said, "Dude, don't bring me down!". But the news slowly sunk in, and now I can't get it out of my head. But I'll tell you what, the jury is still out. I think there's gonna be a showdown, and then Elo will be back on top.

      There's probably some I'm missing.

    8. Re:Can't be so by somersault · · Score: 1

      Ah, I guess I'm just too young and have never been exposed to ELO's music. That's definitely a worthy set of references, though the post would still be funny even if the band ELO never existed.

      --
      which is totally what she said
    9. Re:Can't be so by Dabido · · Score: 1

      Mister Blue Sky told me. What a Discovery.

      --
      Sure enough, the cow costume was hanging up next to the superhero outfit and sailors uniform. (S,Spud)
  8. Not surprising at all by IICV · · Score: 5, Insightful

    This is entirely unsurprising. The Elo system was, in a sense, designed to be easily calculable in a time before things like computers or databases or data mining were especially common (after all, it was adopted by US Chess Federation in 1960!), and it hasn't been revised much if at all since then. Of course statisticians using modern methods and number crunching capabilities and huge databases of both game results and game moves are going to be able to beat it by a lot - this isn't like the Netflix prize, where a bunch of teams were competing to improve something that had been in active development up until that very year.

    1. Re:Not surprising at all by phantomfive · · Score: 1

      But the point of the story is to get more people interested in their contest by putting it on the front page of Slashdot. Which it probably will do.

      --
      Qxe4
    2. Re:Not surprising at all by Vintermann · · Score: 1

      There have been attempts to improve Elo over the years as well. Glicko and TrueSkill (from Microsoft reseach, used on the Xbox) are the most commonly mentioned. Also, a lot of game sites have developed variants on decayed history Elo by trial and error. The one at KGS, for instance, is pretty impressive. There's also less known academic research, such as Remi Coulom's paper on Whole History Rating.

      Deciding which is the better chess player from what they've won in the past is also a far simpler problem than predicting someone's tastes based on what he's liked in the past. Intransitivities are probably neglible. Data mining is probably overkill.

      --
      xkcd is not in the sudoers file. This incident will be reported.
    3. Re:Not surprising at all by Jurily · · Score: 1

      Of course statisticians using modern methods and number crunching capabilities and huge databases of both game results and game moves are going to be able to beat it

      You mean data miners can predict the database they built their algorithms on? Wow!

      A true test would be to accurately predict results in the next ten years.

    4. Re:Not surprising at all by tepples · · Score: 1

      You mean data miners can predict the database they built their algorithms on?

      As far as I can tell, the principle of the test works similarly to the following: Take a database with multiple years of results, train the algorithm on all but the final year, and predict the final year. Someone who cares enough about chess skill rankings to have read the article carefully could fill in more details.

    5. Re:Not surprising at all by sjames · · Score: 1

      There's not much choice but to start there. I'm sure they are interested in seeing how it does over the next 10 years of results, but unless you have a technology I don't know about, that'll take 10 years.

    6. Re:Not surprising at all by rm999 · · Score: 1

      That's not how prediction competitions work, obviously.

      Everyone is given a "training" dataset, which contains the results. The contestants mine this dataset to determine their algorithm, which is then applied to a "test" dataset that has hidden results (i.e. who won the game). The contestants are judged by how well they do on the test set.

    7. Re:Not surprising at all by retchdog · · Score: 1

      I'm participating in the contest. The training set is 100 months; the test set is months 101-105.

      --
      "They were pure niggers." – Noam Chomsky
  9. Whole History Rating by Vintermann · · Score: 4, Interesting

    The french computer scientist Remi Coulom, well-known for the pioneering computer go program Crazy Stone, has published some very interesting research on this issue. He claims not only to beat Elo, but also Glicko, Microsoft's TrueSkill and decayed-history approaches.

    I was going to see if I could implement his ideas for the competition, since he's not going to participate himself. But it doesn't look like I have time for it.

    Here's the paper in case anyone wants to give it a try. I suspect the approach is a bit more solid than the ad-hoc approaches of the quants.

    --
    xkcd is not in the sudoers file. This incident will be reported.
    1. Re:Whole History Rating by databuff · · Score: 1

      According to the leaderboard, Glicko is being beaten by ~5 per cent. Coulom's system better be pretty good!

    2. Re:Whole History Rating by Vintermann · · Score: 3, Informative

      Glicko isn't designed to take advantage of all the information that's available in this competition. To calculate your new Glicko rating, you just need the Glicko ratings of both players + the result. I bet all serious contenders in the competition use the whole history somehow. (I talked with one who uses a decayed history scheme; he beats Glicko).

      As to the leaderboard, it's really not so clear. Almost certainly, some of the contenders are accidentally overfitting to the leaderboard test data.

      --
      xkcd is not in the sudoers file. This incident will be reported.
  10. Obvious question by glwtta · · Score: 3, Funny

    So, how did they rank the entries?

    --
    sic transit gloria mundi
  11. Re:what now? by pjt33 · · Score: 1

    How many people in a playground wear bifocals?! (Teachers don't count.)

  12. What is the punchline? by snookerhog · · Score: 1
    A Portugese physicist, an Israeli mathematician and two American programmers walk into the bar.

    The bartender says:

    1. Re:What is the punchline? by Anonymous Coward · · Score: 1, Funny

      Elo, elo, elo, what's going on 'ere then?

      (He's a part time policeman as well)

    2. Re:What is the punchline? by Anonymous Coward · · Score: 1, Funny

      A Portugese physicist, an Israeli mathematician and two American programmers walk into the bar.

      The bartender says:

      Sorry lads, read the sign: "Cheques are NOT accepted."

      *rim-shot*

    3. Re:What is the punchline? by JamesP · · Score: 1

      "Whoa, is this some kind of a joke?!"

      --
      how long until /. fixes commenting on Chrome?
  13. Portrugese by Anonymous Coward · · Score: 5, Funny
    True facts about Portrugese:
    1. More than 250 million peoprle spek Portruguese, making it the firfth most sproken language in the wrorld.
    2. Portrugese is an adjective describing thrings relatd to Portrugal.
    3. Christropher Colurmbus spoke Portrugese.
    4. Portrugese is the officiral langurage of ther Repulic rof Angorlra.
    5. Hery trhe Navgatror, a Portugese prirnce, was in lrge partr resposible for Portugese effortrs durirng the age of explorartion.
    1. Re:Portrugese by xtracto · · Score: 5, Funny

      Hery trhe Navgatror, a Portugese prirnce, was in lrge partr resposible for Portugese effortrs durirng the age of explorartion.

      Wait just a second! you cannot go changing the subject suddenly like that... focus!, we are talking about Portrugese here!

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    2. Re:Portrugese by Anonymous Coward · · Score: 5, Funny

      That's just a typo! Don't be such a grammar nazi!

    3. Re:Portrugese by Stihdjia · · Score: 1

      Portrugese alrso usres mrore R'sr thranr anry otrher lrangruage.

      --
      I see the fnords!
  14. Electric Light Orchesra reference....no? by JDmetro · · Score: 1

    Well that popped into my head as soon as as I seen Elo in the story headline. And I'm only 30 and 2 days and I actually have one of their 8-track a couple of CD's.

  15. and what about rock/paper/scissors by Paradigma11 · · Score: 2, Interesting

    Many rating systems seem to assume transitive dominance structures. If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament. Many games (using Batttlenet, true skill..) propably are not interested in finding nontransitive structures since players want to be the best and fans want to know who is the best which is kind of pointless with r/p/s.

    1. Re:and what about rock/paper/scissors by srussia · · Score: 1

      Many rating systems seem to assume transitive dominance structures. If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament. Many games (using Batttlenet, true skill..) propably are not interested in finding nontransitive structures since players want to be the best and fans want to know who is the best which is kind of pointless with r/p/s.

      In other words, styles make fights.

      --
      Set your phasers on "funky"!
    2. Re:and what about rock/paper/scissors by Vintermann · · Score: 1

      If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament.

      That's true, but that's not because of intransitivities in the game, it's because there's so little difference between human players. Just because there's intransitivities in the game doesn't mean there's intransitivity in the rankings - Starcraft is built around intransitivities, but the rankings work just fine.

      --
      xkcd is not in the sudoers file. This incident will be reported.
    3. Re:and what about rock/paper/scissors by Paradigma11 · · Score: 1

      If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament.

      That's true, but that's not because of intransitivities in the game, it's because there's so little difference between human players. Just because there's intransitivities in the game doesn't mean there's intransitivity in the rankings - Starcraft is built around intransitivities, but the rankings work just fine.

      thats true, i should have said intransitives in the player vs player outcomes rather than the game mechanics.

      i am certain that the sc2 story is FAR more complicated than the rankings even tough every race setup in 1vs1 should be balanced in theory.

    4. Re:and what about rock/paper/scissors by comic-not · · Score: 1

      I am not quite certain that I follow. Depending on tournament type R/P/S is (mostly) a game of chance, chess isn't. The only way I can see R/P/S applying is if they represent the players themselves, not the game. In other words, instead of having individual strength ratings that can be measured in isolation, player R's style of play might be naturally stronger against player S's style than that of player P, in which case we could only express the relative strengths of any pairing. This would allow a scenario where R is stronger than S is stronger than P is stronger than R.

      However, given the pairing list of a tournament and the respective pairwise relative strengths, it is still possible to predict the outcome. As a very simple example, let's say that on the first round we have the pairs (rock_a, paper) and (rock_b, scissors). Winners of the first round (paper, rock_b) fight for gold, losers (rock_a, scissors) for bronze. Paper wins, followed by rock_a, rock_b takes bronze and leaves scissors last.

      Applied to the problem at hands, one would suspect that there exists an algorithm not far removed from Google's PageRank that can identify all the possible playstyles and their relative strengths, in which case the simplest predictive model would contain the playstyle preference and proficiency of the players combined with style coupling constants (it is likely that a player can play several styles with varying degree of skill, and try to use the one that they believe to give them the best chances against what they know of their opponent). Just my hunch, I'm not really into this stuff.

      --
      Existence usually comes as a surprise (Idem)
    5. Re:and what about rock/paper/scissors by TheRaven64 · · Score: 1

      R/P/S is an interesting game for precisely this reason. A pure random player will win 50% of the time, but no one actually plays completely randomly (even if they try to, humans are terrible random number generators). Both players are trying to model the other player's strategy - if you can predict what the other player will do in the next round, you can win it. For example, if he always follows rock with scissors, you can follow his rock with rock and win. Of course, if he realises that you are modelling his strategy in this way then he will follow rock with paper, and so on.

      This makes it interesting because it's a highly simplified version of the way the stock market works - you make money on the stock market by working out what other people are going to do, and they are constantly trying to do the same thing to you. In both games, an algorithm describing the perfect strategy would have to be more complex than itself, and therefore can not exist.

      For determining a ranking, it is possible that different playing styles have a non-transitive advantage. This is also true for chess, but to a lesser extent. I used to play chess at school, before I got bored with deterministic games, and I found that there were quite often cycles in the rankings. I could consistently beat some people, who could consistently beat other people who consistently beat me. The goal of this sort of ranking is to decide, based on prior matches, who will win a given game. A system which does not assume a strong ordering can potentially be more accurate, but will be significantly harder to design.

      --
      I am TheRaven on Soylent News
    6. Re:and what about rock/paper/scissors by Paradigma11 · · Score: 1

      you are ofcourse correct, i meant the players and not the game.

  16. Ole! by poptones · · Score: 1

    Had to be said...

    1. Re:Ole! by Nighttime · · Score: 1

      With milk?

      --
      I've got a fever and the only prescription is more COBOL.
  17. Ah confusion... such a terrible shame... by kale77in · · Score: 1

    Confusion. It's such a terrible shame.
    Confusion. You don't know what you're sayin'.
    You've lost your love and you just can't carry on.
    You know there's no-one for you to lean on.
    To le-ee-an on.

    -- ELO

  18. ELO isn't just for chess by Twinbee · · Score: 1

    The ELO rating system isn't just used for chess, but many other competitive games (including video games). Therefore, this new 'improvement' may not apply to other games so well, if they've only used chess win/loss data. Sometimes, the simplest formulas are the best/most general.

    Even within the ELO system, tweaks can be made, though FIDE still uses the original system for whatever reasons.

    --
    Why OpalCalc is the best Windows calc
  19. Re:Portrugese? by somersault · · Score: 1

    That's Jesus fucking Christ to you, FFSMS!

    --
    which is totally what she said
  20. Re:what now? by nusuth · · Score: 1

    Written like an engineer. To the mathematician the magnitude does not mean a thing, the ordering does.

    --

    Gentlemen, you can't fight in here, this is the War Room!

  21. Re:what now? by drewhk · · Score: 2, Informative

    Yeah, and his name is Él(Lowercase O-double acute), not Elo, but I understand that "hungarian umlauts" causes significant cognitive stress :)

    Even for Slashdot it seems...

  22. Re:what now? by mutu310 · · Score: 2, Informative

    Actually he was born Él Árpád Imre but changed his name to a more Americanized Arpad Emrick Elo.

  23. What is suprising... (Re:Not surprising at all) by dmgxmichael · · Score: 1

    Is that with the best tech (both machines and math techniques) ELO has only been bested by 8%. You'd think it would be at least in the low 20's. Whether ELO is retained, it's a testament to its genius.

    Incidently folks, Chess is only the most well known user of ELO ratings. Many other competitive games make use of them as well.

  24. Re:what now? by sleeping143 · · Score: 3, Informative

    Careful, those physicists have arsenals of powerful lasers at their disposal...

  25. Re:More people interested by TaoPhoenix · · Score: 1

    But notice that a ratings squabble gets prime coverage and Anand's championship win was ignored?

    --
    My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
  26. Emo Chess by Joebert · · Score: 1

    I could have sworn it said "emo chess". I was going to ask what the goal of the game was, to decide who gets to play black ?

    --
    Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    1. Re:Emo Chess by DQKennard · · Score: 1

      Fighting over who plays black in emo chess would be far too much effort. Who really cares, anyway, about the game or its outcome. It's all just a pointless metaphor for the pointless struggle that is life. There might be, I suppose, some small momentary fascination with the inexplicable passions people seem to hold to in chess or life. sighhhhhhhhhhhh. /emo-mode

  27. Re:what now? by Another,+completely · · Score: 3, Insightful

    But do they have sharks on which to mount them?

  28. Re:what now? by drewhk · · Score: 1

    It is funny that Slashdot swallows hungarian characters: "Él" is certainly not what you wanted to write :)

  29. Re:what now? by turbidostato · · Score: 2, Informative

    "But do they have sharks on which to mount them?"

    We must avoid them teaming to Biologists at all costs!

  30. Re:More people interested by BlackCreek · · Score: 1

    But notice that a ratings squabble gets prime coverage and Anand's championship win was ignored?

    Probably because people here have more interest in algorithms than in chess itself?

  31. Don't Bring Me Down by rossdee · · Score: 1

    ELO ?

    I didn't know the Electric Light Orchestra was still around

  32. And once again in dead last... by elrous0 · · Score: 1

    A Saudi Arabia mathematician who insists that Allah will guide his way to victory and a Liberty University physicist who insists that the universe revolves around the earth.

    --
    SJW: Someone who has run out of real oppression, and has to fake it.
  33. Octopus by skyggen · · Score: 1

    I say use the Soccer Octopus.

  34. Re:Bruce? by Waffle+Iron · · Score: 1

    Is that you, Bruce?

    No, my name is actually "Grroosss".

  35. now... by buddyglass · · Score: 1

    If Sagarin would just replace his ELO rating with the eventual winner of this contest. It would be interesting to see how much closer the "ELO replacement" performance is to what he gets from his PREDICTOR method (that takes into account point differentials).

  36. Re:More people interested by phantomfive · · Score: 1

    Uh, which championship? Last I can tell he took second to Carlsen in August's Arctic Securities Chess Stars championship in August. Besides that, it shouldn't be newsworthy that the current world champion wins a tournament.....

    --
    Qxe4
  37. Re:Interesting by neo · · Score: 1

    This is chess rating algorithm. The goal is to predict given a matchup between two players with known histories how they will likely fare in a game or series of games against each other. Elo is the standard rating system and has been for some time. These algorithms are improvements on that. So they predict better who will win. They have nothing to do with playing actual chess. So the Turk is irrelevant to this discussion (aside from the not minor issue that the operator has been dead for some time.)

    You don't understand, the winning system is using a midget to guess the outcomes.

  38. Re:what now? by egamma · · Score: 1

    Whatever you do, do not piss off the janitors.

  39. Re:what now? by turgid · · Score: 1

    Wake me up when a biologist puts in a credible challenge.

  40. Re:Chess Championship by TaoPhoenix · · Score: 1

    My point exactly... *The* World Chess Championship - the classical time control match with Topalov.

    Our every friendly Wiki Link -
    https://secure.wikimedia.org/wikipedia/en/wiki/World_Chess_Championship_2010

    "Arctic Securities Chess Stars" is, to quote Chessbase,
    "This rapid chess tournament is taking place in Kristiansund from Saturday, August 28th to Monday, August 30th 2010. It is a double round robin with four players: Magnus Carlsen, Viswanathan Anand, Judit Polgar and Jon Ludvig Hammer. On Monday there follows the finals between the two leading players, together with the bronze final for third place. Time controls are 20 minutes + 10 seconds increment per move."
    http://www.chessbase.com/newsdetail.asp?newsid=6641

    So respectfully your remark isn't formally logical. However, I'll give you total leeway for being confused because the chess world has been a mess of "championship tournaments" for about 10 years. But the Arctic Securities was a typical publicity event. World Championships do occasionally fail to take first in alternate time controls like Rapid.

    --
    My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
  41. Re:Chess Championship by phantomfive · · Score: 1

    Dude, the world championship ended in May. Why would you expect anyone to post in September about a tournament that ended in May? And even in May the result wasn't very interesting.

    --
    Qxe4
  42. Re:Interesting by Tiger4 · · Score: 1

    So the Turk is irrelevant to this discussion (aside from the not minor issue that the operator has been dead for some time.)

    So now we'll never know the answer to the Istanbul - Constantinople naming question!

    --
    Behold, this dreamer cometh. Come now, and let us slay him... and we shall see what will become of his dreams.