Building A Homemade Chess Supercomputer
nado writes "There's a new article on Chessbase.com which has
GM John Nunn showing you his chess-orientated PC upgrade to a double Xeon system, with some Fritz benchmarks." Elsewhere in the article, John Nunn discusses the unique computer needs for chess computation: "One of the problems with currently available processors is that they are not particularly well suited to the integer calculations used for chess. A Pentium 4 will be slower at chess than a Pentium 3 of an equivalent clock speed."
And Chessmaster 2000 kicked my arse on a 486!
I've got no chance.
-Joe
If we're all god's children, what's so special about Jesus? - Jimmy Carr
No thanks, I still get my ass kicked when I play chess on my pocket PC yet alone on a chess super computer. Im lucky I can even win in Othello :(
The ultimate network admin tool needs HELP!
I'm working my way up to chess. I'm starting by becoming a tic tac toe master.
- Joe
Does this actually surprise anyone? The P4 was only an exercise in marketing by Intel - redesign the chipset so it can be clocked nice and high (so it appeals to the average consumer) and to hell with the performance...
As this computer was to be focussed on chess, video performance was not important.
Hardcore Slashdot Games readers cringe...We recently had heard in the office over one of the Yellow Machine that's made by Anthology Solutions.
IBM's Deep Blue used special purpose chips, so it shouldn't really come as too much of a surprise that general-purpose processors aren't the best for chess computers.
A Pentium 4 will be slower at chess than a Pentium 3 of an equivalent clock speed
...
Just imagine the chess performances of a 8086 at 1GHz. And you get a space heater too, for those cold chess-playing winter nights
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
You should obviously change the game to take advantage of the hardware. Imagine it! Three dimensional chess where each piece has weapons, or magical attacks, deformable terrain, and lots of special effects to make use of the latest video cards! I can't wait!
Yes, it is.
Screw Tic-Tac-Toe, I'm gonna go play Global Thermonuclear War.
Sincerely,
W.O.P.R.
anything i tell you will cloud your opinion.
First of all, the whole point of the P4 is to rev up the clockspeed, so there are not and can not be any "equalent" P3s available (excepting early versions of the P4 which are way obsolete today anyway and irrelevant to the problem at hand)
Secondly, the Athlons are well known for their stellar integer performance, so who'd use P4s when high IP is needed?
Sure it is.
I'm checking my 1974 edition of the Merriam-Webster Dictionary right here, and on page 494, it clearly states that "orientated" is the past tense of the verb "orientate".
I suspect that you mistook the intended verb to be "orient", with a past tense of "oriented". However, when reading the sentence, one will clearly see that "John Nunn" is the subject of the sentance, and the the "PC" is the subject, with "chess" being the indirect object, upon which the "PC" is oriented towards.
You are completely correct that a subject is oriented towards a direct object.
However, as I understand it, a direct object is orientated towards an indirect object, by a subject.
Fritz is multithreaded. FritzMark, the benchmarking program that uses instruction sequences similar to those in Fritz, is not.
I've had this sig for three days.
Theoretically, a dual processor machine for chess WOULD be twice as fast as a single processor machine, unlike in normal tasks where dual doesn't mean double. Chess is full of interger operations, but at the same time, conditionals up the ass. To calculate the best move, the computer has to check every possibility a move can have and the possible consiquences several moves ahead. The nice thing about a dual processor machine is that each processor can focus on the branches of moves pending from different pieces. While one is calculating what one of the rooks can do, the other can calculate what one of the knights can do. One thing I see, though, is that hyperthreading would probably not do any good for such a game b/c all of the integer ALUs on a processor would be used by one thread, so there wouldn't be any ALUs open for another thread. I think in this sort of application of the Xenon, turning hyperthreading off would help boost performance, although I can't be 100% sure of it. Just a thought.
I came, I saw, She conquered.
Dear Grammar Nazi,
In your third paragraph you misspelled "sentence" as "sentance" the second time you used it.
Sincerely,
Spelling Nazi
Know what I like about atheists? I've yet to meet one that believes God is on their side.
can we promote you to /. editor?
I just turned up a dual Xeon 2.4 rack-mount server for work and it's BIOS mentioned warned us to turn off Hyperthreading for anything other than Windows XP or Linux 2.4 (yeah, mention of Linux in BIOS! :).
:)
Anyways, since I am using linux 2.4, two hyperthreaded Xeons look like four processors to the box, I"m sure it's not the same performance of for seperate processors, but I'm hopeing it's at least slightly better then two non Xeons
The writer of the article wrote that for Windows he prefers 2000 over XP. I am curious if XP (or Linux 2.4) and thus Hyperthreading might help his already built computer with a bit more performance...
Please send all UCE to scally@devolution.com so I can f
So this Brit (who's REALLY good at chess) put together a machine that overall isn't all that stunning, specifically to play chess.
Let me get this straight: he didn't select a purpose-designed processor, he didn't even do a survey of available processors (forget including non-Intel architecures) to see which would give him the best integer performance for the task, he doesn't consider chipset, he doesn't consider memory architecture, he's willing to accept one hardware-caused crash per month, he seems to think that configuring a machine and having his brother put it together is "building" one, and thinks that a purpose-built machine should be able to accept the OS and data (read: disk contents) from a previous machine without hiccough. While perhaps interesting to the chess afficionados, I fail to see the relevance on Slashdot.
Why are we seeing this article instead of something on any one of the serious chess machines? Why is this article more newsworthy than, say, Anandtech or SharkyExtreme or Tom's Hardware's pick for the baddest machine you can currently build? Just because a Grand Master did it?
To be fair, I have great respect for anyone who can attain the Grand Master level -- that's something I'll never do in my lifetime. He's clearly shown tremendous talent and devotion to chess, and my hat is off to John Nunn for that. But he's a computer harware expert? A supercomputer architect? Are we at the start of a new series of Slashdot articles on computers of the Rich and Famous? What's next, diet tips from RMS? Health advice from Linus? The EFF Cookbook?
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Damn, those Pentium 4 Xeons are slow!
No trees were harmed in your post, but twenty-seven clueless echelon operators were terribly inconvenienced.
Free as in mason.
First of all, the P4 is quite superior at doing tasks that are very mundane and repetitive. So simulators, counters, anything that performs the same operation on multiple data sets time and time again run very well on the P4.
Especially true with RDRAM, which has tremendous throughput but horrible latency.
The classic example of something the P4 is very good at: encoding frames of video into a compressed format such as MPEG-2. It's just cranking away through a big heap of data in a linear fashion.
Secondly, with branch prediction, the P4 out races competitors at some computer games,
Athlons do branch prediction, too. And they have a lower penalty for failure since their pipelines are shorter.
Branch prediction is very helpful also in the field of doing anything more than once because it knows what to expect next, and preps the processor for it.
What?!? Um, actually, branch prediction just keeps the chip's pipeline full. Branch prediction doesn't magically adapt the P4 to process data better, it simply allows the P4 to keep pipelineing instructions after a conditional branch. When a prediction is wrong, it must be backed out, which is expensive... but most of the time the prediction is good. (For example, a loop that does something 1000 times will have a conditional branch that will branch the same way 1000 times in a row, and then branch the other way the 1001th time. The prediction would be wrong that 1001th time, but would be correct for most of the other 1000.)
especially those that are optimised for P4 use.
It is hardly surprising that a P4 would do better than an Athlon at running P4-optimized code. However, this isn't a useless point, because Intel is the 800-pound gorilla and there are games optimized for the P4, and none for Athlons.
But AMD isn't about innovation, they are about making money plain and simple. Instead of making engines that try to predict the next move, they just built their processors with the very minimum everything, strapped on a few extra math units and away we go. This technique is very fast, but it's also expensive as most AMD users have learned, because all those extra adders do is add a LOT of ambient heat as the processor clocks up.
Actually, if you check the Thermal Design Power specs for equivalent-peforming AMD and Intel chips, the AMD chips run cooler.
So the P4 was for the mainstream user, to help spare some time from the physics boundry of the processor technology, and to improve on the things we do most on our computers today (music, videos, games).
Pure revisionist history. The P4 was designed for super high clock rates. They ripped too much stuff out of the design, so the P4 has some bad weaknesses it didn't need to have. That's why it's so critical to optimize code specifically for the P4 -- if you don't work around the flaws in the P4, it really hurts.
The Athlon, while it gets more work done per clock than the P4, isn't perfect. Its biggest problem is that it is physically very easy to destroy: you can fry it, or you can even crack its die trying to install a heat sink. The P4 with its heat spreader is much tougher, and with its built-in thermal throttling is more robust. AMD has learned its lesson, though, and the Opteron is robust.
Intel has aggressively marketed the P4 as The Multimedia Chip, but really an Athlon or a P4 will do well for multimedia stuff. The Opteron, for some specific kinds of tasks, will crush either one, and for other kinds of tasks will be slightly faster. I'm just guessing -- I haven't run benchmarks -- but I suspect that the Opteron will do very well on chess.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
If you look at SPECint2000, you will find an integer benchmark called 'crafty'. This is a chess simulator with code sequences that are probably similar to what this guy used.
Intel D875PBZ motherboard (3.0 GHz, Pentium 4 processor with HT Technology) scores 1137
ASUS A7N8X Motherboard rev. 2.0, AMD Athlon (TM) XP 3200+ scores 1324
You'll find that P6 derivaties (Banias, Athlon, Opteron etc...) do better on this benchmark. There are lots of unpredictable conditional branches in this application, so the incidence of mispredictions is higher than normal. You would think that this is the main contributer to poor P4 performance, but actually that is a second order effect, because the predictor on the P4 is far better than on other machines. It's the fact that the code will not fit inside the trace cache, but will fit nicely within Athlon's 64KB I-Cache.
That the software doesn't (seem) to exist to use a cluster instead.
No, really, this isn't one of the "imagine a Beowolf of these..." posts. Here's my point: For the cost of just one of the *processers* that he bought, you can build an *entire machine*, happily running an AthlonXP 2700+. An ENTIRE MACHINE. So, for the cost of the two processers, you've got two machines. For the cost of the SuperMicro motherboard and chassis, you can build two MORE machines. With the cost for the rest of the stuff, there's a fifth machine thrown in to boot.
So, what will be faster - a dual 2.8 GHz Xeon, or 5 AthlonXP 2700+ machines? My money's on the cluster, for this particular application. The Xeon machine has 533 MHz of total memory bandwidth, split between two processers, effectively 266 MHz each. The AthlonMP systems, with 333 MHz each, would have a combined bandwidth of 1,665 MHz - about three times that of the Xeon system.
To make it better, the Athlon is MUCH better than the P3 OR the P4 for integer work, which makes me wonder why he would choose the P4 in the first place. Furthermore, not only does the Athlon do much more in a clock cycle than a P4, you'd have a combined clock speed of 10.8 GHz with the Athlons instead of the 5.6 GHz of the Xeons. Twice the clock speed, AND more work per cycle!
Now, of course, being able to actually USE that clock speed would be dependent upon actually transmitting the messages back and forth, and efficiently dividing the work between the machines. In this sort of situation, where for any one point in time, there would be a great deal of possibilities to compute, it would seem like it would divide up very well.
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
I am a go player. When I play chess I raise the pieces high in the air then slam them down on whatever grid intersection I feel like.
I am not popular at chess clubs.
graspee
This thesis shows a system that a guy from McGill University built to use Field Programmable Gate Arrays to generate possible moves. Since FPGAs allow you to do man simple tasks in parallel instead of trying to do one thing at a time very fast as in software, he was able to get an order-of-magnitude speed increase. Special chess computers like Big Blue used custom-designed ASICs for this same purpose, but FPGAs are a much more accessible solution and will blow a software solution out of the water.
___
Cogito cogito, ergo cogito sum.