Chess Ratings — Move Over Elo

Indeed by mooingyak · 2010-08-04 09:05 · Score: 5, Funny

However, it is a big surprise that Elo has been bettered done so quickly!

Absolutely. I can almost guarantee no one thought that Elo would have been bettered done so quickly.

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.

Re:Indeed by Peach+Rings · 2010-08-04 09:08 · Score: 0

Does Timothy even glance at the stories he approves or is it pure pin the tail on the donkey?
Re:Indeed by Braintrust · 2010-08-04 09:13 · Score: 3, Funny

Indubitably. It filled with hope the one that no one thought Elo would have been bettered done so quickly.

--
Years later, a doctor will tell me that I have an I.Q. of 48, and am what some people call "mentally retarded".
Re:Indeed by Lord+Byron+II · 2010-08-04 09:13 · Score: 5, Funny

Timothy is the bettered done editor of Slashdot!
Re:Indeed by camperdave · 2010-08-04 09:29 · Score: 5, Funny

ELO hasn't done all that well since the big hair rock days of the late 1970s/early 1980s, pretty much since the drummer left to join Black Sabbath. I'm surprised at the band's connection to chess.

--
When our name is on the back of your car, we're behind you all the way!
Re:Indeed by Hognoxious · 2010-08-04 09:41 · Score: 0, Offtopic

# He's a bettered done kid
[bettered done baby]
Battered dome kid
[battered dome baby]
ooh ooooh ooh ooooh oo oo ooh ohh a hooway hooway hoowah hoowah/#
Fuck me, I'd forgotten what a pile of shite Deacon Park South Texas were. Thanks a bastarding bunch for reminding me, you heiferflap.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:Indeed by Anonymous Coward · 2010-08-04 09:52 · Score: 0

Has anyone ever really been so far as to have had been bettered done so quickly?
Re:Indeed by Hognoxious · 2010-08-04 09:53 · Score: 2, Insightful

The first time I Heard Bev Bevan had joined Sabbath I kind of went "WTF?". But they're all Brummies, along with a lot of heavy metal bands around that time. Priest, Magnum ... they probably all played in pubs together wwhen they were 15.
Similarly you couldn't be a serious goth in the 80s unless you were from Leeds, or a flare-wearing floppy-mopped tossbag in the 90s if you weren't a Manc.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:Indeed by Anonymous Coward · 2010-08-04 10:14 · Score: 0

Feeew. As a non-English native I was quite disturbed by yet another language construct which I don't understand.
Re:Indeed by mike260 · 2010-08-04 11:05 · Score: 0, Offtopic

Hopefully next time he will bettered posted checked done more carefully.
Re:Indeed by Anonymous Coward · 2010-08-04 12:31 · Score: 0

I'm not surprised at all. Anyone with a good grasp of mathematics and probability can see that it just comes down to how complicated you make your formula to be. It's extremely like finding a function given a list of coordinates.
Re:Indeed by Anonymous Coward · 2010-08-04 19:22 · Score: 0

I have no idea what you just wrote? Does it actually have any meaning?
Re:Indeed by Anonymous Coward · 2010-08-05 00:54 · Score: 0

Somebody please translate this into American for the AC..
Re:Indeed by Anonymous Coward · 2010-08-05 02:35 · Score: 0

The first time I Heard Bev Bevan had joined Sabbath I kind of went "WTF?". But they're all Brummies, along with a lot of heavy metal bands around that time. Priest, Magnum ... they probably all played in pubs together wwhen they were 15.

Similarly you couldn't be a serious goth in the 80s unless you were from Leeds, or a flare-wearing floppy-mopped tossbag in the 90s if you weren't a Manc.

Somebody please translate this into American for the AC..

I'm Canadian, so I reckon I'm close enough to give it a shot...

"WTF" - An acronym of an expression used when something is discovered to be out of place, or at odds with the utterer's concept of how it should be. For example, a person discovering a rash the size of a dinner plate on his abdomen, when there wasn't one there before, might utter the expression "WTF"?

Heavy Metal Bands - a band is a group of musicians. In this case, they focus on Heavy Metal.

Pubs - Bars

80s, 90s - short way of writing a decade. Eg 80s means the years 1980-1989.

Leeds - A city in England.

Hope that helped. :-)
Re:Indeed by Anonymous Coward · 2010-08-05 23:13 · Score: 0

Thanks but can you clarify 'Heavy Metal', 'pubs', 'decade' and 'England' because I'm an utter fuckwit.

Bettered Done So Quickly by Anonymous Coward · 2010-08-04 09:06 · Score: 0

However, it is a big surprise that Elo has been bettered done so quickly!"

*facepalm*

Re:Bettered Done So Quickly by Shikaku · 2010-08-04 09:10 · Score: 1

bettered done --> bested
Re:Bettered Done So Quickly by HeckRuler · 2010-08-04 09:16 · Score: 1

Yeah, "bettered done"?
I was JUST ranting about how we shouldn't care about trivial things like spacing after periods, but this is just a sad excuse for journalism.

"beaten"
"surpassed"
"out done"
"blown out of the water"
And while it seems a little old-fashioned, "bested" would work.

Come on people, read it twice before submitting for millions to read.
Git 'er bettered done'ed!
Re:Bettered Done So Quickly by Anonymous Coward · 2010-08-04 09:31 · Score: 2, Funny

Battered done --> basted

First Chess then the BCS! by Shadow+Wrought · 2010-08-04 09:07 · Score: 1

Not that they'd use it, but it certainly couldn't hurt.

--
If brevity is the soul of wit, then how does one explain Twitter?

Re:First Chess then the BCS! by PRMan · 2010-08-04 11:12 · Score: 1

Technically, they are already using ELO-CHESS in the BCS, because Jeff Sagarin uses it in his rating system. So all that has to happen is for Jeff Sagarin to change his method.

--
Peter predicted that you would "deliberately forget" creation 2000 years ago...
Re:First Chess then the BCS! by maxume · 2010-08-04 13:35 · Score: 1

You really think they are ever going to take special status away from Notre Dame and implement a playoff?

--
Nerd rage is the funniest rage.

been bettered done THAT quickly??? by boneclinkz · 2010-08-04 09:08 · Score: 5, Funny

Elo-L

umm by buddyglass · 2010-08-04 09:08 · Score: 4, Informative

However, it is a big surprise that Elo has been bettered done so quickly!

Not really. Jeff Sagarin has had two systems of rating sports teams for a while now. One, ELO_CHESS, is based purely on win-loss, while the other, PURE POINTS, takes into account margin of victory. According to him, the latter is better at predicting future results. From his analysis:

In ELO CHESS, only winning and losing matters; the score margin is of no consequence, which makes it very "politically correct". However it is less accurate in its predictions for upcoming games than is the PURE POINTS, in which the score margin is the only thing that matters. PURE POINTS is also known as PREDICTOR, BALLANTINE, RHEINGOLD, WHITE OWL and is the best single PREDICTOR of future games.

Submission error by TubeSteak · 2010-08-04 09:09 · Score: 2, Informative

Already three teams have managed create systems that make more accurate predictions than the official Elo approach.

1 EdR* 0.729125
2 whiteknight* 0.731656
3 Elo Benchmark* 0.738107 {-- The "official Elo approach"

Maybe we're counting from zero and they forgot to put it on the leaderboard?

--
[Fuck Beta]
o0t!

Re:Submission error by Anonymous Coward · 2010-08-04 09:22 · Score: 0

No, the posted leaderboard is just predictions of which prediction approach would be more accurate... unfortunately, the meta-prediction algorithm isn't bettered done yet, hence the inaccurate results.
Re:Submission error by databuff · 2010-08-04 09:53 · Score: 4, Informative

The Elo Benchmark was submitted a second time. I wrote to Sonas about this. Apparently the rating system has to be seeded. He tried a different approach to calculating seed ratings and this performed better - pushing him one place higher in the rankings.
Re:Submission error by socsoc · 2010-08-04 13:14 · Score: 1

It changes and isn't done... Elo is 1st at the moment.
Re:Submission error by Martian_Kyo · 2010-08-04 18:15 · Score: 2, Informative

1 Elo BenchmarkOpen 0.723834
2 EdROpen 0.729125
3 whiteknightOpen 0.731656
so at this moment elo is back on top.
Could it be that people have been done some quickly jumpening to conclusions?
I guess george is working at /. now.

When I see the word Elo by Ukab+the+Great · 2010-08-04 09:10 · Score: 0, Offtopic

I can't think of anything other than 70's cheese and largest white afro up until the release of Bobobo-bo Bo-bobo.

Less than 24 hours ago by LearnToSpell · 2010-08-04 09:14 · Score: 5, Funny

Less than 24 hours ago, the readers of Slashdot launched a competition to find an editing algorithm that performs better than the official "editors" of the site. The competition requires entrants to build their comment systems based on the results of over 9,000 historical submissions. Entrants then test their algorithms by predicting the results of the next 7,809 dup^H^H^Hstories. Already three teams have managed to create systems that make more accurate predictions than the official /. approach. It's not a surprise that Timothy has been outdone -- after all, he was invented half a century ago before English had been standardized. However, it is no big surprise that Slashdot has been bettered done so quickly! The winner: Texas Instruments!

--
Haida Manga

Re:Less than 24 hours ago by Anonymous Coward · 2010-08-04 09:29 · Score: 0

"Glicko Scoring" was suggested some years ago, but I don't know if anyone uses it.
Re:Less than 24 hours ago by roman_mir · 2010-08-04 09:46 · Score: 1

so what you are saying is that /. editing algorithm has been bettered done quickly?

--
You can't handle the truth.

In other news... by Last_Available_Usern · 2010-08-04 09:14 · Score: 2, Funny

Organized crime members linked to gambling rackets have been endicted for kidnapping a busload of nerds after they refused to program similar algorithms in exchange for Warcraft game time and photoshopped Natalie Portman porn.

We all know that's not true though. They totally would have done it.

Re:In other news... by easterberry · 2010-08-04 09:38 · Score: 2, Funny

They bettered have done it!

differences are minute by l2718 · 2010-08-04 09:20 · Score: 4, Interesting

Looking at the table, the differences in predictive power are small enough that it's not obvious they aren't due to chance alone; there needs to be some calculation that shows that the differences are meaningful validating the claim that the alternative methods actually extract more information than Elo does. Perhaps there is enough inherent randomness in Chess that even simple predictive models can extract most of the systematics so that what remains after Elo is mostly noise?

Re:differences are minute by Monkeedude1212 · 2010-08-04 10:29 · Score: 1

Perhaps there is enough inherent randomness in Chess that even simple predictive models can extract most of the systematics so that what remains after Elo is mostly noise?
No. Chess has no random elements to it. You play against an opponent, with a very strict set of rules.
Now sometimes the rules differ from game to game (such as timing, whether they use something like 3/5 fischer or 20 moves an hour sort of thing), which can have drastic changes to the outcome. For example if you do something like 20 moves an hour, sometimes Chess players will be running short on time, and they'll deliberately try to speed up their 18th 19th and 20th move to get that extra hour of time.
The only other thing that could be considered random is who plays black and who plays white (some players are stronger at one than the other). But in most tournaments, it's round robin with even playing both sides anyways.
Aside from that - it's not random at all. You play against an opponent, with the same setup every game, and the only things left to chance are your strategies.
Re:differences are minute by Shadow+Wrought · 2010-08-04 10:33 · Score: 1

I realize they are "predicting" games that have already taken place, but how would this affect a realtime match? How much would it change your moves knowing you've been predicted to lose? Or to win?

--
If brevity is the soul of wit, then how does one explain Twitter?
Re:differences are minute by l2718 · 2010-08-04 11:31 · Score: 2, Insightful

No. Chess has no random elements to it. You play against an opponent, with a very strict set of rules.
I don't think you understand what the discussion in this post is about. The game of chess has no element of randomness -- but the players do, and it's the players we are trying to model. Just because, on average, player A is better than player B, doesn't mean that player A will win every game. The fact is that the same player will play at different levels of ability on different days, and that is the randomness that is relevant to models trying to predict outcomes of chess games.
Basically all rating systems are based on the assumption that players' ability for a given game fluctuates around an "average ability level" according to some distribution, and the goal of the rating system is to discover the average (and perhaps spread) of this indvidual distribution. So even under best conditions the most the system can do is predict the outcome with an error coming from the distribution of abilities. Now assume the distributions are relatively wide -- then there will be a large statistical error even for the best system.
Returning to the main point, the discussion of the last paragraph has nothing to do with the fact that chess is deterministic. In fact, the fact that there is no randomness in chess makes things easier.
Re:differences are minute by shimage · 2010-08-04 11:34 · Score: 2, Informative

Bullshit. Mistakes are roughly stochastic, ergo, there are random elements in chess players' performance. This is why chess matches involve more than just two games.
Re:differences are minute by hoytak · 2010-08-04 11:34 · Score: 1

... and the only things left to chance are your strategies.
and whether you had too much coffee that morning, failed to see that move 10 steps ahead, etc. In high level chess, it seems that these kind of things have enormous effects on the outcome of the game and are not things that can be easily modeled except as random effects. Thus there is definitely a random element in the outcome of the game; Kasparov vs. Deep Blue was a mix of wins and losses; definitely not a deterministic outcome.

--
Does having a witty signature really indicate normality?
Re:differences are minute by dakameleon · 2010-08-04 14:16 · Score: 1

Welcome to the world of probability theory. In particular, get started with Bayes and work your way from there.

--
Man who leaps off cliff jumps to conclusion.
Re:differences are minute by phantomfive · 2010-08-04 17:00 · Score: 3, Interesting

Mikhail Tal, one of the best players ever, would differ; because it's impossible to see deeply enough to know what the outcome of a move will be. He makes the point here, and I'll quote a small piece:

Tal: - "Yes. For example, I will never forget my game with GM Vasiukov on a USSR Championship. We reached a very complicated position where I was intending to sacrifice a knight. The sacrifice was not obvious; there was a large number of possible variations; but when I began to study hard and work through them, I found to my horror that nothing would come of it. Ideas piled up one after another. I would transport a subtle reply by my opponent, which worked in one case, to another situation where it would naturally prove to be quite useless. As a result my head became filled with a completely chaotic pile of all sorts of moves, and the infamous "tree of variations", from which the chess trainers recommend that you cut off the small branches, in this case spread with unbelievable rapidity.

Now I somehow realized that it was not possible to calculate all the variations, and that the knight sacrifice was, by its very nature, purely intuitive. And since it promised an interesting game, I could not refrain from making it."

Journalist: - "And the following day, it was with pleasure that I read in the paper how Mikhail Tal, after carefully thinking over the position for 40 minutes, made an accurately-calculated piece sacrifice".
You will find that lots of chess players have reported making similarly intuitive moves.

--
Qxe4
Re:differences are minute by Monkeedude1212 · 2010-08-05 07:03 · Score: 1

That's not random though, and that kind of intuition is what makes the rankings.
What I mean is - if you were to take something like WoW, put 2 identical players against each other, have them preform the exact same moves at the exact same time - one will likely lose before the other. Because there is too much random generation in the game, like crit chances and things like that.
Chess does not have any of those elements. Yes, you may have tons of moves available to you with far reaching implications but ultimately the decision is yours and that will accurately reflect your rating. Something the new algorithm won't have to adjust for, as my original point. A random choice of moves will only go so far once you reach a certain level in chess.
Re:differences are minute by phantomfive · 2010-08-05 07:12 · Score: 1

Nah, haven't you ever heard of the chess god Caissa, that chess players pray to? The choice to make a move is yours, but in many situations there is no logical reason to make one move above another. It comes down to luck. In an average, typical position, there are something like 3 moves that are all equally good. Which one you choose might as well be completely random (and indeed, that is how I choose between three moves that I can't tell which is best: as randomly as I can to make myself unpredictable).

--
Qxe4
Re:differences are minute by Monkeedude1212 · 2010-08-05 07:15 · Score: 1

You clearly don't understand the point I'm trying to make then.
A mistake is an element in my performance and can happen at any time - and it will affect my ranking.
What Elo does it put me up against everyone else who is JUST as affected by these events as I am - There is nothing to say its an unfair battle. I do not have less pieces, I do not have a weaker position to start. Now there will be stronger players, and they will have higher rankings, weaker players will have lower rankings.
What the GP was saying was that it's difficult to predict on ELO alone - Someone with a higher ranking by a mere 3 points is not definately going to win the match.
And he is correct - in that ELO isn't perfect, but you can't just assume that this "noise" that would be generated, ie, someone with a lower ranking beating someone with a higher ranking - is due to any form of random chance - it is by skill and skill alone, and whether your skill is dependant on your intuition, your opponents condition, etc - either way there is nothing in or on the chess board that is random to change that.
Re:differences are minute by Monkeedude1212 · 2010-08-05 07:51 · Score: 1

Which is part of your strategy - which would reflect on how well you play chess, no? If you are actively trying to seem more random - and it works, that will make your chess rating go up.
Re:differences are minute by sahonen · 2010-08-05 14:23 · Score: 1

When you roll a die or spin a roulette wheel or deal a hand of cards, the outcome is governed purely by the laws of physics, yet you treat the result as random anyway. The outcome of a chess game is the same way. Even if the outcome of the game is decided solely by player skill within the rules of the game, the result is treated as a statistically random phenomenon.

--
Make me a friend and I'll mod you up
Re:differences are minute by Monkeedude1212 · 2010-08-06 02:07 · Score: 1

Why is it considered a random phenomenon when a player makes decisions in chess?
When it comes to something like Poker, you don't know that you will ever get a good hand, you are stuck trying to play against your opponent. You can go through the entire game without getting a solid winning hand compared to your opponent, and if your opponent pushes you at every turn - you've already lost and no matter how "skillful" you are, your bad luck would cause a lost even if your opponent plays stupidly calling bluffs all the time.
In Chess there isn't anything like that. You can't be dealt a bad hand, you can't NOT predict the future. Chess masters often think around 16 to 20 moves ahead of where they are in the game. They know that when they make a certain move, their opponent will usually go down 1 of 3 or 4 routes. It's so NOT random there is a high amount of predictability in Chess. This is why most chess masters try their best to be unpredictable instead of making "the right" move.
Re:differences are minute by sahonen · 2010-08-06 11:05 · Score: 1

You missed my point. I'll say it again. Every single physical phenomenon above the level of quantum physics is governed by deterministic physical laws, yet for the purpose of statistical analysis we treat them as random because we don't have the ability to know them exactly.

The poker analogy was talking about a *single hand.* When you shuffle a deck of cards, they will come out in an order which is precisely determined by the actions taken to shuffle them, yet we treat the order of the cards after shuffling as random for the purposes of the game, because no player has a way to know what order the cards came out in.

How about horse racing? You can't say that the outcome of a horse race is determined by any sort of random factor, it's simply a matter of which horse/jockey combination is the fastest... Yet people bet on horse racing. The future outcome of the race is being treated as random because it is impossible to know without just running the race.

In chess, we are not treating the individual moves as random phenomena, we are treating the overall outcome (win/loss/draw) as a random phenomenon for statistical purposes. Between players of equal skill, we treat the outcome of a game which hasn't been played yet as having an equal random chance of being won by either player, because we simply don't, and can't, know which player will win without simply playing the game.

This is not saying that the outcome of the game *is* random, it's saying that we simply don't have the predictive model to treat it as anything other than random. We don't have the ability to predict the exact result of the match with any certainty, we can only analyze past performance and assign a probability to certain future results occurring.

--
Make me a friend and I'll mod you up

More like commenter error by Anonymous Coward · 2010-08-04 09:20 · Score: 3, Informative

That number is "Root Mean Square Error", so lower is better

Re:More like commenter error by digitig · 2010-08-04 10:37 · Score: 3, Insightful

Yes, and count how many of them are better than the ELO approach.

--
Quidnam Latine loqui modo coepi?
Re:More like commenter error by Anonymous Coward · 2010-08-04 13:12 · Score: 0

42?
Re:More like commenter error by Anonymous Coward · 2010-08-04 15:44 · Score: 1, Informative

The leaderboard changes over time, and also consider this:

Update: The team Elo Benchmark (see the leaderboard), uses the Elo rating system. Note, the method for creating seed ratings for Elo Benchmark is being refined, so don't be surprised if the benchmark improves a little in the competition's first week.

Well, everyone knows by Duradin · 2010-08-04 09:23 · Score: 1

Well, everyone knows that arena is serious business.

how are victory margins relevant to chess? by l2718 · 2010-08-04 09:27 · Score: 4, Insightful

Indeed, Sagarin has shown that applying Elo in sports where the winner is based on points scored is not optimal, since the average margin of victory is a better predictor of strength than won-loss record. But this has nothing to do with applying the Elo method to its original setting of chess, where the outcome of the game is only "win/draw/loss" and there is no margin of victory.

Re:how are victory margins relevant to chess? by buddyglass · 2010-08-04 09:33 · Score: 1

It's not inconceivable that one might apply an artificial means of gauging "margin of victory" to the domain of chess. Some sort of differential in the "value" of the pieces remaining for each contestant when the game ends. For the three teams that beat ELO, do their ratings systems only take "win/loss" as input, or do they also get the board's configuration at the point when the game ended?
Re:how are victory margins relevant to chess? by thousandinone · 2010-08-04 09:44 · Score: 5, Insightful

This is pretty ridiculous. Margin of victory? Is there a committee overseeing ethical treatment of chess pieces now? If I sacrifice everything but my King and a Bishop to checkmate you, why is that intrinsically a better strategy than sparing some of my pieces?

There are definite merits to a sacrificial strategy- it's all about board control. Long as theres more than one or two legal moves available to your opponent, you can't really predict where he'll send his pieces. A queen in the middle of the board can cover a lot of distance and do some impressive maneuvers, but any given piece only occupies one spot. Control where your opponent moves, control the game. Not to mention that less pieces on the board gives you more options for where to move with your remaining pieces, and by allowing your pieces to be taken, you have a measure of control over where the free space on the board is.

Indeed, given the rules of the game, I would say a strategy that goes to great lengths to preserve as many of ones own pieces as possible is flawed...
Re:how are victory margins relevant to chess? by databuff · 2010-08-04 09:51 · Score: 2, Informative

Data only shows results - so there's no scope for gauging the margin of victory.
Re:how are victory margins relevant to chess? by Marcx77 · 2010-08-04 10:01 · Score: 1

Sorry, but... You can't checkmate with only a king and a bishop.
Re:how are victory margins relevant to chess? by friedo · 2010-08-04 10:04 · Score: 2, Insightful

If some metric X is a statistically reliable method of predicting future success, then X can be defined as a margin of victory. Whether X is a function of the "values" of remaining pieces, or their positions on the board, or the number of moves, or whatever, is immaterial.
Re:how are victory margins relevant to chess? by SomeJoel · 2010-08-04 10:09 · Score: 3, Insightful

Sorry, but... You can't checkmate with only a king and a bishop.
The hell you can't. It turns out, your opponent has pieces too! Have you ever even played chess?

--
<Complete your profile by adding a signature!>
Re:how are victory margins relevant to chess? by phantomfive · 2010-08-04 10:19 · Score: 3, Informative

You know, you're really asking for it when you take a small point that isn't even relevant to his main point and attack it. Sorry, YOU'RE WRONG!!!!!.

If you ever find yourself in a game where you can sacrifice all your pieces to get to that position, DO IT!

--
Qxe4
Re:how are victory margins relevant to chess? by TubeSteak · 2010-08-04 11:02 · Score: 1

But this has nothing to do with applying the Elo method to its original setting of chess, where the outcome of the game is only "win/draw/loss" and there is no margin of victory.
You can easily keep track of a "margin" by assigning point values for the pieces that have been taken.
http://en.wikipedia.org/wiki/Chess_piece_relative_value
That metric loses some relevance since someone behind on points can easily have a strategic victory,
but there may still be some information of value gained from crunching the numbers.

--
[Fuck Beta]
o0t!
Re:how are victory margins relevant to chess? by buddyglass · 2010-08-04 11:03 · Score: 1, Interesting

If I sacrifice everything but my King and a Bishop to checkmate you, why is that intrinsically a better strategy than sparing some of my pieces?
Winning with only a king and a bishop remaining is no "better" than winning with all your pieces remaining. A win is a win. That said, winning a game while having many more pieces remaining than one's opponent may imply that the difference between your skill and your opponent's is greater than if you won with only a kind and bishop left. There may be some merit to working that into an algorithm if the goal is to predict the outcome of future matches.
Another data point that might be valuable is simply "how many moves did the game take before checkmate"? Without any other knowledge, the guy who beats me in 10 moves is likely to be a better player than the guy who takes 50 moves to beat me.
Re:how are victory margins relevant to chess? by buddyglass · 2010-08-04 11:04 · Score: 1

I take back what I said, then. It is moderately surprising that there have already been three solutions that outperform ELO based solely on win/loss.
Re:how are victory margins relevant to chess? by microbox · 2010-08-04 14:27 · Score: 1

Except that if such a metric were used in the future - it would punish the most entertaining and trilling form of play.

In chess, you win or lose. If players started "grinding" just to raise their ratings - ick.

--

Like all pain, suffering is a signal that something isn't right
Re:how are victory margins relevant to chess? by magarity · 2010-08-04 15:49 · Score: 1

When chess nerds talk about end game strategies it is implied that "a king and a __ " ending is one where the other player has just a king.
Re:how are victory margins relevant to chess? by DavidShor · 2010-08-04 16:08 · Score: 1

I thought so too, but looking at the site, it seems relatively trivial to set up a Bayesian structural equation model that models evolution of individual player's ability. That will produce a ton of parameters, but hierarchal magic can take care of that. In fact, they even mention that on the official hint. It's clear to see why that would outperform ELO.
Re:how are victory margins relevant to chess? by bryonak · 2010-08-05 00:25 · Score: 1

I'm a programmer and a tournament chess player.
Your points just reflect the grossly skewed views laymen and players under maybe 2200 ELO (candidate degree) have of chess.
There is little sense in correlating piece delta with winning margin.
Especially when sacrificing a piece in order to gain tactical advantage, this measurement would return a lot of false positives.
Let's try one of those beloved car analogies:
Car A and car B are both faster than car X, but Car B is lighter, so Car A must be better because it's heavier but still won against car X. No, I don't see the sense in that either.
Also the count of moves until one gives up*** might give some insight into the player strength difference, but still isn't a convincing way of measuring performance.
My very last game some days ago was essentially over after 35 moves, but my opponent dragged it to the 62nd before giving up. During that time, he had almost no chance of not losing, but hoped for lack of concentration / a blunder on my part (which is not noble, but a valid strategy).
On the other hand, in a game a few weeks ago, I've surrendered in the 21st move because the game turned hopeless for me. My opponent was nominally about the same strength as I am.
Also note that this is a very fast game, and 10 moves usually only happen in quickdraw games. And ending a game by checkmate in 2 moves arguably requires more skill than ending it in 4.
One generally accepted method of measuring the margin is comparison with an AI. Look at Crafty's source for ideas on how to measure and quantify the position, but take into account that these methods have long been surpassed by much more powerful AIs (notably Rybka) whose source code isn't accessible.
The main problem with this is: whose vendor's engine should one use? How to guarantee fairness?
*** The count to checkmate is completely unrealistic anyway.
Re:how are victory margins relevant to chess? by dylan_- · 2010-08-05 02:10 · Score: 1

If you ever find yourself in a game where you can sacrifice all your pieces to get to that position, DO IT!
Unless 1.Bh4 b3
I get your point, but you need rid of that queen.

--
Igor Presnyakov stole my hat
Re:how are victory margins relevant to chess? by mdielmann · 2010-08-05 02:51 · Score: 1

So...what you're saying is it takes more skill to win the game with more of your pieces. Which means you'd be a better player than someone who needs to get rid of those pieces first. Which means the margin of victory would be a good predictor of future outcomes? Am I right?
Or, to put it another way. If a model is derived that accurately represents previous behaviour, and accurately predicts future behaviour, then the model is reasonably accurate. You're not liking it doesn't mean it's wrong.

--
Sure I'm paranoid, but am I paranoid enough?
Re:how are victory margins relevant to chess? by phantomfive · 2010-08-05 04:33 · Score: 1

Heh, you're looking at the puzzle backwards. It's actually solvable, you can click on the pieces and solve it. The black pawns are coming towards the bottom of the board.

--
Qxe4
Re:how are victory margins relevant to chess? by Anonymous Coward · 2010-08-05 05:27 · Score: 0

Yeah, It's possible if the other guy helps you out. If you are beating someone with a king and a bishop, Your opponents ranking is the least of their problems.
Re:how are victory margins relevant to chess? by dylan_- · 2010-08-05 21:23 · Score: 1

Oh, I did solve it, but I thought black was playing the wrong move! I realised what I had wrong later last night as I was mixing up yet another lemsip.
In my defense, I'm loaded with the cold at the moment and can't think straight. Someone posted an old school picture on facebook yesterday, and it took me about 3 minutes to figure out which one was me!

--
Igor Presnyakov stole my hat

So they've got better... by frank_adrian314159 · 2010-08-04 09:28 · Score: 4, Interesting

Are the better entries as transparent? ELO's a pretty simple way do do this - add or subtract a few points from the rating based on a win or a loss based on the relative difference of the ratings. Would anyone understand (other than "It's a neural net") the ratings produced by these competitors? Would anything human be able to calculate them?

Also, are the new models' improvements in prediction statistically relevant? Or are they just fitting the noise? Both the training dataset and the test dataset seem rather small to me.

Finally, and most importantly, how stable are the ratings? If I'm drunk and lose to a "patzer", do I go down to his level? Fairness of tournaments having small numbers of games has a lot to do with rating stability (unless we're assuming a population periodically beset by huge random shifts in ability).

All-in-all, there's a lot of problems coming up with a good rating system. Opening the dataset to the world, saying "Have at it!", and looking at a single scorecard based solely on predictability is nowhere near sufficient.

--
That is all.

Re:So they've got better... by greg1104 · 2010-08-04 10:10 · Score: 2, Interesting

Development of stock trading systems, which are also trying to rank things based on historical data, have this persistent problem there's been waaay more research into than chess rankings. If you train them on a bunch of historical data, you will discover the best system is invariably one that essentially does a giant curve fitting job on that exact data. One thing trading system developers do to address this are use techniques like walk forward testing, where the system gets trained on one set of data but is only evaluated on a second set.
Luckily, this chess rating competition is using that sort of technique: "Competitors train their rating systems using a training dataset of over 65,000 recent results for 8,631 top players. Participants then use their method to predict the outcome of a further 7,809 games." In fact, the current leaderboard reflects results on only 1/10 of the training set. So long as real ranking is ultimately based on the unseen data set, not the training one, there's little risk of them fitting the noise in the training set and still winning.
Re:So they've got better... by Anonymous Coward · 2010-08-04 16:12 · Score: 0

The weird part is the complexity found in the Elo system (specifically, the non-linear expected win% stuff) are not beneficial. Sonas himself (the guy who made this contest) actually got better results by simplifying Elo.
http://www.chessbase.com/newsdetail.asp?newsid=562
How stable the ratings are is irrelevant -- it's measuring the accuracy of predictions based on the rating system. If a rating system dropped you to patzer because you totally flubbed one game, then the predictions would not be accurate, and the system would get a poor mean squared error, etc. In general though, I think it's recognized that Elo was too static. I think FIDE changed the rating rules not that long ago, using a larger K factor (bigger rating changes per game) for players below a certain level.
I agree this is nowhere near sufficient, but it's not like all the chess federations are going to just drop Elo and go with the winner. The results may offer some innovative approaches though.
Re:So they've got better... by DavidShor · 2010-08-04 16:43 · Score: 1

Walk-forward testing(a special type of something more general called cross-validation), is a bit over-rated. It's very intuitive, which is why it's used so much in the technical analysis crowd. But statistically, it's really roughly equivalent to multiplying the standard error by a constant factor unless you have severe model mis-specification(In which case you're doing something very wrong!)
In general, it's good to parametrize a range of plausible models, test the assumptions of the model, and conservatively build up if it makes theoretical sense, and don't be afraid to move beyond OLS. That does a lot more to protect against over-fitting then a couple of choice statistics.This can't always be done, Machine-Learning is a necessary evil in some fields, but for something low-dimensional like financial time series or Chess? Should be doable...
Re:So they've got better... by Kijori · 2010-08-04 20:21 · Score: 1

Are the better entries as transparent? ELO's a pretty simple way do do this - add or subtract a few points from the rating based on a win or a loss based on the relative difference of the ratings. Would anyone understand (other than "It's a neural net") the ratings produced by these competitors? Would anything human be able to calculate them?
Take a look at the formulae used - Elo, particularly for tournament play, is already complicated enough that it's beyond the reach of a "back-of-the-napkin" calculation to work out your rating change. That's seen as one of the big advantages of the English Chess Federation's rating system; it's very simple, so you can just work out the change yourself.

Apples and oranges? by DerekLyons · 2010-08-04 09:28 · Score: 1

Since the Elo system is not designed to predict future performance (it's designed to capture current relative rankings), then is it really surprising that programs designed to predict future performance are better at it?

Re:Apples and oranges? by mooingyak · 2010-08-04 09:32 · Score: 2, Informative

Since the Elo system is not designed to predict future performance (it's designed to capture current relative rankings), then is it really surprising that programs designed to predict future performance are better at it?
And if my current relative rank is higher than yours, doesn't that imply that if we play each other I should win? If not, what purpose does the rank serve?

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Re:Apples and oranges? by vlm · 2010-08-04 09:46 · Score: 2, Funny

And if my current relative rank is higher than yours, doesn't that imply that if we play each other I should win? If not, what purpose does the rank serve?
Historical achievement, the glory of the grind. Much as my lower UID implies this comment should be more valuable than your high UID comment.

--
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
Re:Apples and oranges? by databuff · 2010-08-04 09:56 · Score: 1

how do you test current relative rankings without using them to make predictions?
Re:Apples and oranges? by mooingyak · 2010-08-04 10:07 · Score: 2, Interesting

Much as my lower UID implies this comment should be more valuable than your high UID comment.
I used to think of myself as having a particularly high UID... until I realized that mine is actually lower than a majority of the total UIDs. Weirded me out a little. There are UIDs that are farther from the 1,000,000 mark than I am from Taco.

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Re:Apples and oranges? by Orion · 2010-08-04 11:16 · Score: 1

Historical achievement, the glory of the grind. Much as my lower UID implies this comment should be more valuable than your high UID comment.
Wow... 69,642 is a low UID?
I *am* a god!
Re:Apples and oranges? by DerekLyons · 2010-08-04 11:23 · Score: 1

Since the Elo system is not designed to predict future performance (it's designed to capture current relative rankings), then is it really surprising that programs designed to predict future performance are better at it?
And if my current relative rank is higher than yours, doesn't that imply that if we play each other I should win?
That depends on the relative difference between the ranks. A narrow difference implies you might win, a wider difference implies you will win - and between the two lies a spectrum of gradual shifts from may to will. It's not an absolute quantitative measurement.
Re:Apples and oranges? by DerekLyons · 2010-08-04 11:25 · Score: 1

That's a damm good question, and one I don't know the answer to.
Re:Apples and oranges? by DavidShor · 2010-08-04 16:46 · Score: 1

Sometimes I suspect low UID users have a crawler that looks for people referencing low UIDs...
Re:Apples and oranges? by Olivier+Galibert · 2010-08-04 19:28 · Score: 2, Informative

No we don't. This is not the crawler you're looking for.
OG.
Re:Apples and oranges? by Abstrackt · 2010-08-05 01:55 · Score: 2, Funny

Sometimes I suspect low UID users have a crawler that looks for people referencing low UIDs...
I had no idea COBOL was so powerful.

--
They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
Re:Apples and oranges? by Abcd1234 · 2010-08-05 04:28 · Score: 1

You're picking nits. His point still remains: The ranking system *should* provide a prediction of future performance, as its supposed to be an indicator of relative skill. Of course, if two ranks are close together, that means your error bars will be wider, but that doesn't change the basic fact that a higher rank should fundamentally translate to a higher likelihood of winning.
Re:Apples and oranges? by rnj · 2010-08-05 04:33 · Score: 1

I'm sure you're aware that "should" is not the same as "will". I've played in events where a grandmaster has lost to a mere expert. (And then there's the whole issue of draws as well as specifics of style match-ups. In team competitions it's not unusual to carry specialists. Guys who are good at getting wins with black against weaker opposition for instance)
The old USCF formula was ((Wins-Losses)/Games)*400. ELO is more complex, but this still gives a pretty good sense of the meaning of any ratings difference. It takes a fairly long time for a rating difference as small as 10 points to show up head to head.
The biggest problem with any rating system is with rapidly improving (usually young) players. The current relative rank may well be wildly off for the current player's actual ability level.
Re:Apples and oranges? by mooingyak · 2010-08-05 05:54 · Score: 1

My main point though was that Elo is actually predictive. Not that it's perfect.

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Re:Apples and oranges? by rnj · 2010-08-09 05:34 · Score: 1

Probably too late but ...
Elo is predictive in terms of tournament standings (as long as we're talking established ratings. Any kind of provisional rating and ... well it's better than nothing, but Elo felt that provisional ratings were only accurate to within 20 or so points). My point though was that when you're talking specifically head to head they are much less so.
Styles make fights in boxing and the same seems to be largely true in chess
Re:Apples and oranges? by mooingyak · 2010-08-09 05:57 · Score: 1

Probably too late but ...
Never too late for a good discussion! Who cares if we have no audience?

Elo is predictive in terms of tournament standings (as long as we're talking established ratings. Any kind of provisional rating and ... well it's better than nothing, but Elo felt that provisional ratings were only accurate to within 20 or so points). My point though was that when you're talking specifically head to head they are much less so.
Styles make fights in boxing and the same seems to be largely true in chess
I will happily concede that Elo is imperfect and that there are factors such as style that it won't adequately account for. Given that, I believe that if you take a pool of non-provisionally Elo ranked players and randomly pit them against each other, picking the winner based on who has the higher rank will perform better than a coin toss with statistical significance. On this basis I submit that Elo is predictive, albeit with flaws.

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.

For a $50 voucher? by pongo000 · 2010-08-04 09:28 · Score: 1

I don't think so. The time I'd spend on this project is worth a bit more than $50...

Re:For a $50 voucher? by Scatterplot · 2010-08-04 09:43 · Score: 1

$50 is bettered than nothin!

Let Him Know How You Feel by Anonymous Coward · 2010-08-04 09:30 · Score: 0

Does Timothy even glance at the stories he approves or is it pure pin the tail on the donkey?

Timothy's e-mail address is timothy@monkey.org according to his home page. Tired of the half-assed submissions where he couldn't bother to read it over before submitting it for millions to read? Send him an e-mail.

I can see why it's such a surprise... by Anubis+IV · 2010-08-04 09:40 · Score: 1

After all, it's not like other ideas haven't already been created in the meantime to address Elo's perceived shortcomings, right?

The consequences... by Anonymous Coward · 2010-08-04 09:41 · Score: 0

Timothy bettered done goofed

Elo in non-chess games by LambdaWolf · 2010-08-04 09:45 · Score: 4, Insightful

Ah man, no matter how inadequate the Elo system may be for chess, it's much worse seeing it applied to other games where it doesn't belong, which happens regrettably often. The trouble is that the Elo system depends on the premise that nothing affects the outcome of a game other than the skill of each player (and who gets the white pieces).

In chess, that assumption is a pretty good approximation to reality, since every tournament game in run the same way. But many games do have variations in rules or format across different events, such as different maps or races in a real-time strategy game, or different card pools in Magic: The Gathering. Then Elo ratings are biased by how often a player has the chance to play to his strong areas. Players in turn are compelled to game the system: "I should avoid this event because they're using Format X and my rating will stay stronger if I stick to Format Y." The Elo system is meant precisely to obviate that kind of gamesmanship: chess players should need to think only about the strengths of their opponents, which (in principle) will be weighted fairly when calculating rating adjustments. But if there are other competitive factors, which is true for most any popular game invented in the last 30 years, Elo ratings become that much less meaningful.

--
"This algorithm runs in constant time. Come on, 2,147,483,648 is a constant..."

Re:Elo in non-chess games by selven · 2010-08-04 11:41 · Score: 1

Yes, linear ranking systems fail hard at anything as, let alone more, complex than rock-paper-scissors.
Re:Elo in non-chess games by Anonymous Coward · 2010-08-04 15:07 · Score: 0

Yes, linear ranking systems fail hard at anything as, let alone more, complex than rock-paper-scissors.
Elo is non-linear, so your point, although it may be valid, is irrelevant to the current discussion.
Re:Elo in non-chess games by jhol13 · 2010-08-04 15:49 · Score: 1

The Elo system does not depend on the premise "nothing affects the outcome of a game other than the skill of each player".
Sure it is modelled according to that, but in practise it is very untrue even for chess. There are a lot of examples where player A has won player B N out of M times although according to rating difference very different outcome should have happened.
The chess events are not similar, I have played a few and they do vary considerably (number of games per day, travel, lighting, temperature, players, mood, ...).
Elo rating is much more meaningful than ratings used e.g. in snooker or tennis. No, I am not saying they should change their rating, it would not make much sense to change the culture of the game, after all the rating is just a bloody number.
Re:Elo in non-chess games by selven · 2010-08-04 22:27 · Score: 1

Elo is a single number. The real number line is linear.
Re:Elo in non-chess games by Anonymous Coward · 2010-08-16 10:12 · Score: 0

Sounds right, not to mention some games can/should (but don't) incorporate handicaps (especially if there is a small player base).

I've recently played a game (League of Legends) that has a small player base that just matches to the closest ELO for its small community. The problem is when you have level 30s (with tier 3 items/setups) far more handicapped than the level 1-20s matched on a "ELO is close enough" scale. So not only are you competing with someone more difficult, but they have a 15-30% advantage over you. So the 30s are constantly bouncing from impossible to free win games constantly. Don't get me started on the fact that you can get up to 2 consecutive rematches as well, earning you a potential 3 free losses back to back.

In reality ELO is a simple answer for a difficult problem. Skill needs to be factored, not overall win/loss ratio. If it took you an entire day to beat someone, then it shouldn't be considered a win, compared to a 5 minute game where it was too easy (assuming you account for thrown matches).
ELO in games is meant to provide a fun and "balanced" game as well. If an actual handicap was applied to the rating difference (and the changes in rating were based on that handicap applied), then it would be a near 50% win/loss ratio, not a free win/free loss ratio.

Allow me to clarify by jamrock · 2010-08-04 09:55 · Score: 3, Funny

Three teams done bettered Elo with betterer done algorithms, and the submitter is surprised that it was bettered done so quickly. I'm done. Was that better?

He sounds like Lady Macbeth on crack.

Cheese! by future+assassin · 2010-08-04 10:06 · Score: 0

Man I was like WTF? Cheese ratings? Got confused with seeing the packman icon.

--
by TheSpoom (715771) Uncaring Linux user here. I have nothing to add to this but please continue. *munches popcorn*

Microsoft's TrueSkill beat Elo before this comp by blaisethom · 2010-08-04 10:08 · Score: 1

I believe the algorithm used by Microsoft to match players for X-Box games was already beating Elo before this competition. They have a description of their algorithm at http://research.microsoft.com/en-us/projects/trueskill/

Re:Microsoft's TrueSkill beat Elo before this comp by Maarx · 2010-08-04 11:07 · Score: 3, Informative

Not to belittle what Microsoft did, but in the interest if giving credit where credit is due:

Here’s the problem with Battle.net 2.0: 2002s Warcraft III: Reign of Chaos is one of the most underrated video games ever created. And that’s before you learn its online apparatus is the foundation for modern matchmaking, where Blizzard Entertainment should get royalties every time you brag about your X-Box Live Trueskill rating. (Then again, I shouldn’t be giving Blizzard ideas right now.)
Here’s how Warcraft III matchmaking worked: Everyone starts at level one. The maximum level is fifty. You play players within six levels of your own. Win five games, gain a level. Lose five games, lose a level. The penalty for losing is reduced during levels one to nine. Thus, players who win half their games will become level ten.
It was simple and transparent. That was the hook, and people choked on it. It turned Warcraft III ladder play into what ICCUP serves for Starcraft players, a stomping ground so competitive that climbing the food chain gave you a shot at the guys who played for a living. That’s what a good online gaming system does.
The quote comes from Battle.net 2.0: The Antithesis of Consumer Confidence. I would encourage you to read the entire thing, but for reasons completely unrelated to this thread.

Shouldn't it be "Roll over ELO" by Anonymous Coward · 2010-08-04 10:17 · Score: 0

by Beethoven

Little early yet by Anonymous Coward · 2010-08-04 10:42 · Score: 0

It looks to me like the data-set is rather small and so are the differences in the results. I don't see a clear winner yet by any means.

Steven, 2,156 Elo at my best.

Surreal Gone Kid by Dogtanian · 2010-08-04 11:18 · Score: 1

Fuck me, I'd forgotten what a pile of shite Deacon Park South Texas were. Thanks a bastarding bunch for reminding me, you heiferflap.

WTF? Is this what happens when some late-1980s Scottish bands get mixed up in a transporter with a popular animation series?

If something that tenuous links to "Real Gone Kid" in your head, you must have some major trauma :-/

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).

Glicko (better than Elo) has been around for years by whatteaux · 2010-08-04 12:17 · Score: 1

The Glicko chess rating system and its successor Glicko2 (creative, huh?) are better than Elo and have been around for years. Various online chess sites use it, as does the Australian Chess Federation.

Mx-doctor by martin-boundary · 2010-08-04 12:19 · Score: 2, Funny

However, it is a big surprise that Elo has been bettered done so quickly!

Absolutely. I can almost guarantee no one thought that Elo would have been bettered done so quickly.

Is it because elo would have been bettered done so quickly that you came to me?

validity and reliability of criterion by Hartmut · 2010-08-04 12:39 · Score: 1

The problem is not just to find another _method_ to predict game results, but to construct and evaluate a better workable scientific model of chess ability. That's hard, because the criterion 'game result' itself possibly is not a valid indicator of the quality of game play, and the stability of playing strength over time, which is reliability. To estimate these criteria, it is necessary, to design the data collection, as scientists do e.g. in experimental design.
In addition, the available tests of logistic models, like ELO, are not sufficient.

Elo Benchmark is #1 at this moment by schneidafunk · 2010-08-04 12:55 · Score: 1

1 Elo Benchmark 0.723834 3 6:03pm, Wednesday 4 August 2010
2 EdROpen 0.729125 2 11:47pm, Tuesday 3 August 2010
3 whiteknightOpen 0.731656 4 2:29am, Wednesday 4 August 2010

--
Some people die at 25 and aren't buried until 75. -Benjamin Franklin

Re:Elo Benchmark is #1 at this moment by daveime · 2010-08-04 16:39 · Score: 2, Funny

Pleased to say I jumped straight into the money at #7 with my first submission :-)
Where AM I going to spend a whole 50 Euros ? Maybe I'll donate it to Greece, seems like they need it.
Re:Elo Benchmark is #1 at this moment by daveime · 2010-08-04 16:58 · Score: 1

Damnit ... $50 USD ... that's only 38.50 Euros.

Elo Anecdote by afabbro · 2010-08-04 14:24 · Score: 4, Informative

Not relevant specifically to this story, but I always laugh at the story of how a prisoner manpiulated the Elo system via closed pool ratings inflation.

Short summary: said prisoner only played against other prisoners, who he'd trained. Due to careful scheduling of the games, he rose from his true strength (probably sub-master) to being the second-highest rated played in the U.S. in 1996.

--
Advice: on VPS providers

System Feedback by physicsphairy · 2010-08-04 16:21 · Score: 1

The problem with this kind of modeling is that many "good fitting" algorithms would, if implemented, change the system itself. There's more to competition chess than just the rules on how to move pieces. For example, while a game in isolation would almost always be played to win, there are many times that because of information from ratings (or due to the method of the tournament) you would start the game being equally happy to draw, which will affect how you play.

Now, even if the difference in the number of pieces remaining (e.g.) is a much better predictor of who will win than the ELO system, if you were ever to actually implement it you would no longer be playing the game the ELO system was trying to track--suddenly you have made players more conservative, not as willing to sacrifice pieces for a better mating position. Possibly some would say you had ruined the game.

--
When things get complex, multiply by the complex conjugate.

Complexity vs performance by drmofe · 2010-08-04 19:35 · Score: 1

Given that ELO is relatively simple, it is more surprising that more complex algorithms with the benefit of acces to a lot of historical data only marginally outperform it. i.e. the transparency and simplicity of ELO combined with a relatively accurate outcome is better.

Small but maybe significant differences? by Terje+Mathisen · 2010-08-04 21:16 · Score: 1

The differences are indeed quite small, but it seems obvious that you should be able to do better than ELO by splitting it into two parts:

Games played as White and games played as Black.

In fact, this seems so obvious that I suspect there's something I have overlooked! :-)

As the contest site mentions, there's a very significant advantage to White, enough so that in their training data set White has 30+% win vs 20+% for Black.

I suggest that taking the normal ELO-predicted outcome and then biasing it according to this known historical trend, would have to result in slightly better predictions than the naked ELO number.

Terje

--
"almost all programming can be viewed as an exercise in caching"

Re:Small but maybe significant differences? by Chris+Mattern · 2010-08-05 01:23 · Score: 1

Pointless. Every official ELO rating is (and any rating system that replaces will be) calculated off 50% games as White and 50% games as Black because officially rated games are played in tournaments and matches in which each player is assigned an equal number of games as each. Since every ELO rating has the same White/Black ratio, there is no "bias" from one rating to the next to be corrected for.
Re:Small but maybe significant differences? by Terje+Mathisen · 2010-08-05 01:57 · Score: 1

Not pointless at all!
Tournament results is what ELO really tries to predict, and there you are absolutely correct, i.e. everyone plays both White and Black equally often.
However, the current challenge is NOT to predict how well each player is going to do in the aggregate, but to minimize the error for each individual game. THIS IS CRUCIAL!
I.e. the error term is the RMS of the difference between the predicted and actual result for each individual game, not the sum of the normal pair of games against each competitor. ELO otoh is really trying to predict the result of the pair of games, not each of them individually, since it is the sum which correlates with the final result.
Simply biasing the ELO-prediction by something like +/-5% depending upon White/Black would correct for the known advantage of playing White.
Does this make sense?
Terje

--
"almost all programming can be viewed as an exercise in caching"

What we need now.... by GargamelSpaceman · 2010-08-05 02:12 · Score: 1

What we need now is a chess rating system rating system. Then chess rating systems can compete with each other and be rated as to how well they rate chess.

--
...

ELO Rankings by oldManSquad · 2010-08-07 10:37 · Score: 1

Thought I'd list the official ELO Chess rankings for reference:

Mr Blue Sky
Rock And Roll is King
Way Life's Meant to be
So Serious
I'm Alive
Getting to the Point
Don't Walk Away
Turn to Stone
Confusion
Last Train to London

Currently I'm ranked 7th (Turn to Stone - yeah I suck), but managed to beat a 6th ranked player after a close match (luckily he lived up to his title), so I should hopefully rank up soon. Can't wait to meet the Mr Wood and Mr Lynne at the ranking ceremony which is held every month at The Ship Inn, Frimley, UK.

Slashdot Mirror

Chess Ratings — Move Over Elo

133 comments