Building A Homemade Chess Supercomputer
nado writes "There's a new article on Chessbase.com which has
GM John Nunn showing you his chess-orientated PC upgrade to a double Xeon system, with some Fritz benchmarks." Elsewhere in the article, John Nunn discusses the unique computer needs for chess computation: "One of the problems with currently available processors is that they are not particularly well suited to the integer calculations used for chess. A Pentium 4 will be slower at chess than a Pentium 3 of an equivalent clock speed."
And Chessmaster 2000 kicked my arse on a 486!
I've got no chance.
-Joe
If we're all god's children, what's so special about Jesus? - Jimmy Carr
No thanks, I still get my ass kicked when I play chess on my pocket PC yet alone on a chess super computer. Im lucky I can even win in Othello :(
The ultimate network admin tool needs HELP!
I'm working my way up to chess. I'm starting by becoming a tic tac toe master.
- Joe
Does this actually surprise anyone? The P4 was only an exercise in marketing by Intel - redesign the chipset so it can be clocked nice and high (so it appeals to the average consumer) and to hell with the performance...
I'm not sure this should have been said:
... than a Pentium 3 of an equivalent clock speed."
"A Pentium 4 will be slower at chess than a Pentium 3 of an equivalent clock speed."
That's too easy to be distorted
I'm sure a marketing group or some such, for intel competitors or even PPC, will say
"A Pentium 4 will be slower
And then use it to justify their own means.
Hmmm?
As this computer was to be focussed on chess, video performance was not important.
Hardcore Slashdot Games readers cringe...We recently had heard in the office over one of the Yellow Machine that's made by Anthology Solutions.
IBM's Deep Blue used special purpose chips, so it shouldn't really come as too much of a surprise that general-purpose processors aren't the best for chess computers.
A Pentium 4 will be slower at chess than a Pentium 3 of an equivalent clock speed
...
Just imagine the chess performances of a 8086 at 1GHz. And you get a space heater too, for those cold chess-playing winter nights
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
You should obviously change the game to take advantage of the hardware. Imagine it! Three dimensional chess where each piece has weapons, or magical attacks, deformable terrain, and lots of special effects to make use of the latest video cards! I can't wait!
Yes, it is.
Screw Tic-Tac-Toe, I'm gonna go play Global Thermonuclear War.
Sincerely,
W.O.P.R.
anything i tell you will cloud your opinion.
Software to examine chess games would be a perfect example of the major performance improvements to be had with multi-threading. A new thread per processor, with each thread examining different possible move paths, would give dramatic speed gains.
"God, root, what is difference?" - Pitr, userfriendly
First of all, the whole point of the P4 is to rev up the clockspeed, so there are not and can not be any "equalent" P3s available (excepting early versions of the P4 which are way obsolete today anyway and irrelevant to the problem at hand)
Secondly, the Athlons are well known for their stellar integer performance, so who'd use P4s when high IP is needed?
Sure it is.
I'm checking my 1974 edition of the Merriam-Webster Dictionary right here, and on page 494, it clearly states that "orientated" is the past tense of the verb "orientate".
I suspect that you mistook the intended verb to be "orient", with a past tense of "oriented". However, when reading the sentence, one will clearly see that "John Nunn" is the subject of the sentance, and the the "PC" is the subject, with "chess" being the indirect object, upon which the "PC" is oriented towards.
You are completely correct that a subject is oriented towards a direct object.
However, as I understand it, a direct object is orientated towards an indirect object, by a subject.
From America in 2003, where you damn well better DIY or DW (do without). Then, write it up and sell it as a big 'hint'...
Theoretically, a dual processor machine for chess WOULD be twice as fast as a single processor machine, unlike in normal tasks where dual doesn't mean double. Chess is full of interger operations, but at the same time, conditionals up the ass. To calculate the best move, the computer has to check every possibility a move can have and the possible consiquences several moves ahead. The nice thing about a dual processor machine is that each processor can focus on the branches of moves pending from different pieces. While one is calculating what one of the rooks can do, the other can calculate what one of the knights can do. One thing I see, though, is that hyperthreading would probably not do any good for such a game b/c all of the integer ALUs on a processor would be used by one thread, so there wouldn't be any ALUs open for another thread. I think in this sort of application of the Xenon, turning hyperthreading off would help boost performance, although I can't be 100% sure of it. Just a thought.
I came, I saw, She conquered.
See my earlier post, asking how old FritzMark is, because the article says that it only uses one processor - ie: It's not a multi-threaded app.
"God, root, what is difference?" - Pitr, userfriendly
he could get a Mac and play chess.app on his "supercomputer."
Dear Grammar Nazi,
In your third paragraph you misspelled "sentence" as "sentance" the second time you used it.
Sincerely,
Spelling Nazi
Know what I like about atheists? I've yet to meet one that believes God is on their side.
can we promote you to /. editor?
I just turned up a dual Xeon 2.4 rack-mount server for work and it's BIOS mentioned warned us to turn off Hyperthreading for anything other than Windows XP or Linux 2.4 (yeah, mention of Linux in BIOS! :).
:)
Anyways, since I am using linux 2.4, two hyperthreaded Xeons look like four processors to the box, I"m sure it's not the same performance of for seperate processors, but I'm hopeing it's at least slightly better then two non Xeons
The writer of the article wrote that for Windows he prefers 2000 over XP. I am curious if XP (or Linux 2.4) and thus Hyperthreading might help his already built computer with a bit more performance...
Please send all UCE to scally@devolution.com so I can f
Pentium 4 clock speed vs. performance discussion...
Seconds before G3, G4 or PPC970 is mentioned:
3...
Sigh, another P4 troller. But let's examine what all the P4 has to offer.
First of all, the P4 is quite superior at doing tasks that are very mundane and repetitive. So simulators, counters, anything that performs the same operation on multiple data sets time and time again run very well on the P4.
Secondly, with branch prediction, the P4 out races competitors at some computer games, especially those that are optimised for P4 use. Branch prediction is very helpful also in the field of doing anything more than once because it knows what to expect next, and preps the processor for it.
What the P4 is bad at are things that change a lot during operation. Things that use different resources at different times, things that seemingly fire random calls for resources, like word processing, desktop editing (like making a website or newspaper or the such), and the like.
Now that We've cleared that, my second point. Look at what all AMD has taken from intel processors. MMX, SSE, and various other byte level optimizations have made the athlon quite the processor. But AMD isn't about innovation, they are about making money plain and simple. Instead of making engines that try to predict the next move, they just built their processors with the very minimum everything, strapped on a few extra math units and away we go. This technique is very fast, but it's also expensive as most AMD users have learned, because all those extra adders do is add a LOT of ambient heat as the processor clocks up. Intel's processors stay relitivly cool and run nearly twice as fast. So the P4 was for the mainstream user, to help spare some time from the physics boundry of the processor technology, and to improve on the things we do most on our computers today (music, videos, games).
You have been taught a lesson grasshoppa, use it wisely.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
the computer beats YOU! ...wait a sec...
It seems you could make about 3 dual Athlon ~2GHz systems for the price of one 2x2.8 Xeon and that the cluster would outperform. Or maybe build, like, a 20-processor VIA C3 system that would perform the same and use less power.
it is running linux, you clod.
RTFA and LATFP (look at the fucking pictures).
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
So this Brit (who's REALLY good at chess) put together a machine that overall isn't all that stunning, specifically to play chess.
Let me get this straight: he didn't select a purpose-designed processor, he didn't even do a survey of available processors (forget including non-Intel architecures) to see which would give him the best integer performance for the task, he doesn't consider chipset, he doesn't consider memory architecture, he's willing to accept one hardware-caused crash per month, he seems to think that configuring a machine and having his brother put it together is "building" one, and thinks that a purpose-built machine should be able to accept the OS and data (read: disk contents) from a previous machine without hiccough. While perhaps interesting to the chess afficionados, I fail to see the relevance on Slashdot.
Why are we seeing this article instead of something on any one of the serious chess machines? Why is this article more newsworthy than, say, Anandtech or SharkyExtreme or Tom's Hardware's pick for the baddest machine you can currently build? Just because a Grand Master did it?
To be fair, I have great respect for anyone who can attain the Grand Master level -- that's something I'll never do in my lifetime. He's clearly shown tremendous talent and devotion to chess, and my hat is off to John Nunn for that. But he's a computer harware expert? A supercomputer architect? Are we at the start of a new series of Slashdot articles on computers of the Rich and Famous? What's next, diet tips from RMS? Health advice from Linus? The EFF Cookbook?
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
My home desktop machines both have ECC memory. I never open the boxes. Haven't had a crash on either the Windows 2000 machine or the QNX machine in over a year.
Price (as mentioned) and performance in specific applications, if applicable.
"Current platform" is of course also a big reason to stick with one or the other when upgrading, though that might be a little bit more relevant for people upgrading on the Athlon-line, since AMD stuck with SocketA for a long time while Intel enjoyed going from Slot-1/whatever to SocketX/Y/Z forcing motherboard replacement between processors.
Belief is the currency of delusion.
Why is it that no one has written a chess benchmarking program for the mac (ie *nix)?
I mean, for number crunching and math and calcs, the mac seems to rule close to the top...
just my 2cents
"an eye for an eye only makes the whole world blind"
Damn, those Pentium 4 Xeons are slow!
No trees were harmed in your post, but twenty-seven clueless echelon operators were terribly inconvenienced.
Free as in mason.
First of all, the P4 is quite superior at doing tasks that are very mundane and repetitive. So simulators, counters, anything that performs the same operation on multiple data sets time and time again run very well on the P4.
Especially true with RDRAM, which has tremendous throughput but horrible latency.
The classic example of something the P4 is very good at: encoding frames of video into a compressed format such as MPEG-2. It's just cranking away through a big heap of data in a linear fashion.
Secondly, with branch prediction, the P4 out races competitors at some computer games,
Athlons do branch prediction, too. And they have a lower penalty for failure since their pipelines are shorter.
Branch prediction is very helpful also in the field of doing anything more than once because it knows what to expect next, and preps the processor for it.
What?!? Um, actually, branch prediction just keeps the chip's pipeline full. Branch prediction doesn't magically adapt the P4 to process data better, it simply allows the P4 to keep pipelineing instructions after a conditional branch. When a prediction is wrong, it must be backed out, which is expensive... but most of the time the prediction is good. (For example, a loop that does something 1000 times will have a conditional branch that will branch the same way 1000 times in a row, and then branch the other way the 1001th time. The prediction would be wrong that 1001th time, but would be correct for most of the other 1000.)
especially those that are optimised for P4 use.
It is hardly surprising that a P4 would do better than an Athlon at running P4-optimized code. However, this isn't a useless point, because Intel is the 800-pound gorilla and there are games optimized for the P4, and none for Athlons.
But AMD isn't about innovation, they are about making money plain and simple. Instead of making engines that try to predict the next move, they just built their processors with the very minimum everything, strapped on a few extra math units and away we go. This technique is very fast, but it's also expensive as most AMD users have learned, because all those extra adders do is add a LOT of ambient heat as the processor clocks up.
Actually, if you check the Thermal Design Power specs for equivalent-peforming AMD and Intel chips, the AMD chips run cooler.
So the P4 was for the mainstream user, to help spare some time from the physics boundry of the processor technology, and to improve on the things we do most on our computers today (music, videos, games).
Pure revisionist history. The P4 was designed for super high clock rates. They ripped too much stuff out of the design, so the P4 has some bad weaknesses it didn't need to have. That's why it's so critical to optimize code specifically for the P4 -- if you don't work around the flaws in the P4, it really hurts.
The Athlon, while it gets more work done per clock than the P4, isn't perfect. Its biggest problem is that it is physically very easy to destroy: you can fry it, or you can even crack its die trying to install a heat sink. The P4 with its heat spreader is much tougher, and with its built-in thermal throttling is more robust. AMD has learned its lesson, though, and the Opteron is robust.
Intel has aggressively marketed the P4 as The Multimedia Chip, but really an Athlon or a P4 will do well for multimedia stuff. The Opteron, for some specific kinds of tasks, will crush either one, and for other kinds of tasks will be slightly faster. I'm just guessing -- I haven't run benchmarks -- but I suspect that the Opteron will do very well on chess.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
not sure if this is off-topic...mod me thus if you must.
I play free internet chess at the Free Internet Chess Server. Find them at...you guessed it: www.freechess.org.
All you CLI guys out there will love the fact that using a graphical client is optional! For those of us who are sane, there are a handful of graphical boards available to complement the irc-ish interface that allows people to find opponents.
It's fairly popular already, but I sure wouldn't mind a bigger crowd...cause all the guys on there kick my arse consistently. I've got a whopping 1300-something rating right now, and I'm already 0-3 for the night...sheesh.
peace.
@ASP.NET's parent-teacher meeting: "Little Johnny.NET is very bright, but he doesn't play well with others."
If you look at SPECint2000, you will find an integer benchmark called 'crafty'. This is a chess simulator with code sequences that are probably similar to what this guy used.
Intel D875PBZ motherboard (3.0 GHz, Pentium 4 processor with HT Technology) scores 1137
ASUS A7N8X Motherboard rev. 2.0, AMD Athlon (TM) XP 3200+ scores 1324
You'll find that P6 derivaties (Banias, Athlon, Opteron etc...) do better on this benchmark. There are lots of unpredictable conditional branches in this application, so the incidence of mispredictions is higher than normal. You would think that this is the main contributer to poor P4 performance, but actually that is a second order effect, because the predictor on the P4 is far better than on other machines. It's the fact that the code will not fit inside the trace cache, but will fit nicely within Athlon's 64KB I-Cache.
That the software doesn't (seem) to exist to use a cluster instead.
No, really, this isn't one of the "imagine a Beowolf of these..." posts. Here's my point: For the cost of just one of the *processers* that he bought, you can build an *entire machine*, happily running an AthlonXP 2700+. An ENTIRE MACHINE. So, for the cost of the two processers, you've got two machines. For the cost of the SuperMicro motherboard and chassis, you can build two MORE machines. With the cost for the rest of the stuff, there's a fifth machine thrown in to boot.
So, what will be faster - a dual 2.8 GHz Xeon, or 5 AthlonXP 2700+ machines? My money's on the cluster, for this particular application. The Xeon machine has 533 MHz of total memory bandwidth, split between two processers, effectively 266 MHz each. The AthlonMP systems, with 333 MHz each, would have a combined bandwidth of 1,665 MHz - about three times that of the Xeon system.
To make it better, the Athlon is MUCH better than the P3 OR the P4 for integer work, which makes me wonder why he would choose the P4 in the first place. Furthermore, not only does the Athlon do much more in a clock cycle than a P4, you'd have a combined clock speed of 10.8 GHz with the Athlons instead of the 5.6 GHz of the Xeons. Twice the clock speed, AND more work per cycle!
Now, of course, being able to actually USE that clock speed would be dependent upon actually transmitting the messages back and forth, and efficiently dividing the work between the machines. In this sort of situation, where for any one point in time, there would be a great deal of possibilities to compute, it would seem like it would divide up very well.
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
I am a go player. When I play chess I raise the pieces high in the air then slam them down on whatever grid intersection I feel like.
I am not popular at chess clubs.
graspee
Regarding your sig. I checked out /. japan a while ago but I thought I'd take another look. My limited Japanese got me nowhere-babelfish got me further, but still wouldn't translate a whole page of comments.
What struck me, as I read a freeBSD post, was the complete lack of trolls and crapflooders. Everything was like "Score: 3 it is interesting". The lowest was 0 for all the AC posts.
In a way it was both refreshing and disappointing. I had been looking forward to babelfish trying to cope with goatse and "is dying" posts...
Yes, I know this is offtopic. (sarcasm) Forgive me, oh fascist moderators. (/sarcasm)
graspee
Unfortunately, performance is not measured in work-done-per-clock. It's measured in absolute time.
Not always. Performance may be measured in main loop executions per hour, but sometimes it is more useful to measure main loop executions per megajoule (speed vs. energy consumption; there are 3.6 MJ in 1 kWh) or main loop executions per cubic meter hour (speed vs. rack space). And if increasing work done per clock can increase the rate of work done for a given amount of electric power or rented rack space, then bean-counters would find increasing work done per clock a worthy goal.
Will I retire or break 10K?
This thesis shows a system that a guy from McGill University built to use Field Programmable Gate Arrays to generate possible moves. Since FPGAs allow you to do man simple tasks in parallel instead of trying to do one thing at a time very fast as in software, he was able to get an order-of-magnitude speed increase. Special chess computers like Big Blue used custom-designed ASICs for this same purpose, but FPGAs are a much more accessible solution and will blow a software solution out of the water.
___
Cogito cogito, ergo cogito sum.