Slashdot Mirror


Building A Homemade Chess Supercomputer

nado writes "There's a new article on Chessbase.com which has GM John Nunn showing you his chess-orientated PC upgrade to a double Xeon system, with some Fritz benchmarks." Elsewhere in the article, John Nunn discusses the unique computer needs for chess computation: "One of the problems with currently available processors is that they are not particularly well suited to the integer calculations used for chess. A Pentium 4 will be slower at chess than a Pentium 3 of an equivalent clock speed."

25 of 282 comments (clear)

  1. special purpose hardware by robindmorris · · Score: 5, Informative

    IBM's Deep Blue used special purpose chips, so it shouldn't really come as too much of a surprise that general-purpose processors aren't the best for chess computers.

    1. Re:special purpose hardware by bpmcdermott · · Score: 0, Informative

      Special purpose chips? Not really.
      RS/60000 SPs use PowerPC processors.

  2. "ORIENTATED" IS NOT A WORD! by LaminatorX · · Score: 0, Informative
    Perhaps those who seem to have learned their grammar from semi-literate HR staffers need to go through some further orientation. Once they have been properly oriented, they will no longer embarrass themselves when addressing public forums.

    1. Re:"ORIENTATED" IS NOT A WORD! by gantrep · · Score: 5, Informative
    2. Re:"ORIENTATED" IS NOT A WORD! by VoidEngineer · · Score: 4, Informative

      Sure it is.

      I'm checking my 1974 edition of the Merriam-Webster Dictionary right here, and on page 494, it clearly states that "orientated" is the past tense of the verb "orientate".

      I suspect that you mistook the intended verb to be "orient", with a past tense of "oriented". However, when reading the sentence, one will clearly see that "John Nunn" is the subject of the sentance, and the the "PC" is the subject, with "chess" being the indirect object, upon which the "PC" is oriented towards.

      You are completely correct that a subject is oriented towards a direct object.

      However, as I understand it, a direct object is orientated towards an indirect object, by a subject.

  3. Not so far from the truth... by Thinkit3 · · Score: 1, Informative

    Internally at Intel, we had an employee bonus tied to the "race to a gigahertz" with AMD. And have you seen the price differences for small speed increases at the high end? More RAM would make a much bigger difference.

    --
    -Libertarian secular transhumanist
  4. Actually by Anonymous Coward · · Score: 1, Informative

    Actually, it's not that shtraightforward..

    There are 2 ways for you to go for performance: a high IPC (instructions per clock) or a high clock. While AMD chose the IPC as a way to go, Intel decided for the high clock, low IPC way..the result: You got a same performance in a higher clock, and a better publicity for the unsuspected costumers...but trust me, bying a P4 2.0Ghz and an Athlon 2000+, you get the same performance, but dont forget the SSE2 instruction set that really helps in some aplications..

    anyway dont get me for an Intel fanatic, I actually prefer and use AMD, but truth is truth, no mather how much we want to protect one over another..

  5. That CPU comment looks stupid. by Anonymous Coward · · Score: 4, Informative

    First of all, the whole point of the P4 is to rev up the clockspeed, so there are not and can not be any "equalent" P3s available (excepting early versions of the P4 which are way obsolete today anyway and irrelevant to the problem at hand)

    Secondly, the Athlons are well known for their stellar integer performance, so who'd use P4s when high IP is needed?

  6. Re:P3 faster then P4 at same clock speed? by Alizarin+Erythrosin · · Score: 2, Informative

    Exactly. A P4 has a longer pipeline then a PIII, so any branch misprediction will result in a longer time penalty for a pipeline flush. The PIII 1ghz I have sitting on my floor over there --> is an equivelant of about a P4 1.8ghz.

    Although the longer pipe does allow for ramping of clock speeds higher then before (part of the reason AMD added 2 more stages to the Opteron and by association the Athlon64) it needs to be complemented with a more efficient branch prediction algorithm.

    --
    There are only 10 kinds of people in this world... those who understand binary and those who don't
  7. supercomputer? by Anonymous Coward · · Score: 0, Informative

    Did anyone read the freakin' article? It's not a supercomputer, it's a dual CPU (2X 2.8GHz Xeon) machine running Windows. And the guy's brother put it together, fer cryin' out loud! Supercomputer, my arse.

  8. Re:FritzMark by addaon · · Score: 4, Informative

    Fritz is multithreaded. FritzMark, the benchmarking program that uses instruction sequences similar to those in Fritz, is not.

    --

    I've had this sig for three days.
  9. It comes as no supprise that he used Dual Xenons by KingArthur10 · · Score: 4, Informative

    Theoretically, a dual processor machine for chess WOULD be twice as fast as a single processor machine, unlike in normal tasks where dual doesn't mean double. Chess is full of interger operations, but at the same time, conditionals up the ass. To calculate the best move, the computer has to check every possibility a move can have and the possible consiquences several moves ahead. The nice thing about a dual processor machine is that each processor can focus on the branches of moves pending from different pieces. While one is calculating what one of the rooks can do, the other can calculate what one of the knights can do. One thing I see, though, is that hyperthreading would probably not do any good for such a game b/c all of the integer ALUs on a processor would be used by one thread, so there wouldn't be any ALUs open for another thread. I think in this sort of application of the Xenon, turning hyperthreading off would help boost performance, although I can't be 100% sure of it. Just a thought.

    --
    I came, I saw, She conquered.
  10. Re:FritzMark by abucior · · Score: 3, Informative

    To be precise, I think it's Deep Fritz that's the multiprocessor version. Fritz by itself is just a single processor version. To quote their blurb from Deep Fritz 7:

    "Deep Fritz is the multi-processor version of Fritz7, which leads the world ranking list since four years. Deep Fritz 7 will run in computers with between one and eight processors. On a dual system the increase in speed is around 85% compared to a single processor of equivalent speed. But even if you have a single processor system the playing strength is greater than that of the regular Fritz7. The âoeDeepâ version has been improved and enhanced, it has more positional understanding and additional endgame knowledge. This has been achieved without diminishing the programâ(TM)s legendary tactical power. Deep Fritz 7 comes with the full Fritz7 interface and gives you full access to the playchess server."

    Interestingly, I can't find a Deep Fritz 8, which makes me think that either Fritz 8 is inherently multi-processor (which I doubt, since it's cheaper than Deep Fritz 7), or they haven't released a multi-processor version of 8.

  11. Re:P3 faster then P4 at same clock speed? by eddy · · Score: 5, Informative

    That's not insightful, that's dumb.

    There is no great difference in performance between the AthlonXP and Pentium 4 lines. The small difference that exists is largely due to platform specific optimizations in the specific software benchmarked. That's relevant in the real world, sure, but it's not a measure of raw perfomance.

    I don't think that it is in dispute that Intel went for low IPC/high clock at least partly because it was seen as good for PR -- with the MHz-race and MHz-myth and all.

    It's with some humor we now see them back-peddle as they try to sell their high-performance low-energy processors which is clocked much lower than the P4s, but like the AthlonXP-line, have a higher IPC.

    --
    Belief is the currency of delusion.
  12. Re:P4 vs. P3 by ciroknight · · Score: 3, Informative

    Sigh, another P4 troller. But let's examine what all the P4 has to offer.

    First of all, the P4 is quite superior at doing tasks that are very mundane and repetitive. So simulators, counters, anything that performs the same operation on multiple data sets time and time again run very well on the P4.

    Secondly, with branch prediction, the P4 out races competitors at some computer games, especially those that are optimised for P4 use. Branch prediction is very helpful also in the field of doing anything more than once because it knows what to expect next, and preps the processor for it.

    What the P4 is bad at are things that change a lot during operation. Things that use different resources at different times, things that seemingly fire random calls for resources, like word processing, desktop editing (like making a website or newspaper or the such), and the like.

    Now that We've cleared that, my second point. Look at what all AMD has taken from intel processors. MMX, SSE, and various other byte level optimizations have made the athlon quite the processor. But AMD isn't about innovation, they are about making money plain and simple. Instead of making engines that try to predict the next move, they just built their processors with the very minimum everything, strapped on a few extra math units and away we go. This technique is very fast, but it's also expensive as most AMD users have learned, because all those extra adders do is add a LOT of ambient heat as the processor clocks up. Intel's processors stay relitivly cool and run nearly twice as fast. So the P4 was for the mainstream user, to help spare some time from the physics boundry of the processor technology, and to improve on the things we do most on our computers today (music, videos, games).

    You have been taught a lesson grasshoppa, use it wisely.

    --
    "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
  13. Re:I don't know about you guys... by deuist · · Score: 1, Informative

    Actually, you can learn all about mastering tic-tac-toe from this manual. Or you can test your skills at this Java game.

  14. Re:If newer pc's aren't well suited for chess... by MillionthMonkey · · Score: 2, Informative

    You should obviously change the game to take advantage of the hardware. Imagine it! Three dimensional chess where each piece has weapons, or magical attacks, deformable terrain, and lots of special effects to make use of the latest video cards! I can't wait!

    Been there, done that. See The Chess Variant Pages. It lists about a hundred chess variants, some of which are three dimensional, and has links to places where you can download software to play variants (commercial and otherwise). The site has an applet that you can play different variants on (although it plays a horrible game).

  15. Re:huh? by Anonymous Coward · · Score: 1, Informative
    Gee, in the article he says he's gonna run Win, he never mentions linux in the article, and fritz isn't available for linux. Additionally your comment was just plain rude.


    I think Slashdot needs an apology mod. When you get so many apology mods, you must apologize.


    I hearby mod this (-1) RTFA abuse

  16. Re:XP vs 2000 in the application (Hyperthreading) by kc8apf · · Score: 4, Informative

    Having just done some serious testing on a couple of hyperthreading capable machines (dual Xeon 3.06GHz and 3.0GHz P4), I can say a bit about it's effects on programs. If the code is multi-threaded (I didn't read the article to see if his is and this is meant to be really general) it will be distributed over all the "processors" equally. This works great for programs that have 2 very different threads. However, for an app that is very int or very fp intensive in multiple threads, hyperthreading actually hinders overall throughput.

    This is due to the fact that hyperthreading is still limited to the number of functional units in the processor. For code that is very intensive on a particular type of unit (int or fp), you basically end up with a stall condition on the virtual processor while all the functional units of that type are used by the first processor.

    Hyperthreading is better suited to cases such as a user using a 3d modeling program and a MP3 player. The MP3 player will hopefully end up on one virtual processor and use the int units while the 3d modeling will end up on the other and use the fp units. This would allow both to run in parallel on the same processor.

    So, if you are using a very int or very fp intensive, multi-threaded app, turn off hyperthreading. If you are a typical user running many programs that use both int and fp, then turn it on.

    --
    kc8apf
  17. Re:P4 vs. P3 by steveha · · Score: 5, Informative

    First of all, the P4 is quite superior at doing tasks that are very mundane and repetitive. So simulators, counters, anything that performs the same operation on multiple data sets time and time again run very well on the P4.

    Especially true with RDRAM, which has tremendous throughput but horrible latency.

    The classic example of something the P4 is very good at: encoding frames of video into a compressed format such as MPEG-2. It's just cranking away through a big heap of data in a linear fashion.

    Secondly, with branch prediction, the P4 out races competitors at some computer games,

    Athlons do branch prediction, too. And they have a lower penalty for failure since their pipelines are shorter.

    Branch prediction is very helpful also in the field of doing anything more than once because it knows what to expect next, and preps the processor for it.

    What?!? Um, actually, branch prediction just keeps the chip's pipeline full. Branch prediction doesn't magically adapt the P4 to process data better, it simply allows the P4 to keep pipelineing instructions after a conditional branch. When a prediction is wrong, it must be backed out, which is expensive... but most of the time the prediction is good. (For example, a loop that does something 1000 times will have a conditional branch that will branch the same way 1000 times in a row, and then branch the other way the 1001th time. The prediction would be wrong that 1001th time, but would be correct for most of the other 1000.)

    especially those that are optimised for P4 use.

    It is hardly surprising that a P4 would do better than an Athlon at running P4-optimized code. However, this isn't a useless point, because Intel is the 800-pound gorilla and there are games optimized for the P4, and none for Athlons.

    But AMD isn't about innovation, they are about making money plain and simple. Instead of making engines that try to predict the next move, they just built their processors with the very minimum everything, strapped on a few extra math units and away we go. This technique is very fast, but it's also expensive as most AMD users have learned, because all those extra adders do is add a LOT of ambient heat as the processor clocks up.

    Actually, if you check the Thermal Design Power specs for equivalent-peforming AMD and Intel chips, the AMD chips run cooler.

    So the P4 was for the mainstream user, to help spare some time from the physics boundry of the processor technology, and to improve on the things we do most on our computers today (music, videos, games).

    Pure revisionist history. The P4 was designed for super high clock rates. They ripped too much stuff out of the design, so the P4 has some bad weaknesses it didn't need to have. That's why it's so critical to optimize code specifically for the P4 -- if you don't work around the flaws in the P4, it really hurts.

    The Athlon, while it gets more work done per clock than the P4, isn't perfect. Its biggest problem is that it is physically very easy to destroy: you can fry it, or you can even crack its die trying to install a heat sink. The P4 with its heat spreader is much tougher, and with its built-in thermal throttling is more robust. AMD has learned its lesson, though, and the Opteron is robust.

    Intel has aggressively marketed the P4 as The Multimedia Chip, but really an Athlon or a P4 will do well for multimedia stuff. The Opteron, for some specific kinds of tasks, will crush either one, and for other kinds of tasks will be slightly faster. I'm just guessing -- I haven't run benchmarks -- but I suspect that the Opteron will do very well on chess.

    steveha

    --
    lf(1): it's like ls(1) but sorts filenames by extension, tersely
  18. Look to SPECint2000 for fast chess machines by akuma(x86) · · Score: 4, Informative

    If you look at SPECint2000, you will find an integer benchmark called 'crafty'. This is a chess simulator with code sequences that are probably similar to what this guy used.

    Intel D875PBZ motherboard (3.0 GHz, Pentium 4 processor with HT Technology) scores 1137

    ASUS A7N8X Motherboard rev. 2.0, AMD Athlon (TM) XP 3200+ scores 1324

    You'll find that P6 derivaties (Banias, Athlon, Opteron etc...) do better on this benchmark. There are lots of unpredictable conditional branches in this application, so the incidence of mispredictions is higher than normal. You would think that this is the main contributer to poor P4 performance, but actually that is a second order effect, because the predictor on the P4 is far better than on other machines. It's the fact that the code will not fit inside the trace cache, but will fit nicely within Athlon's 64KB I-Cache.

  19. Re:It comes as no supprise that he used Dual Xenon by Anonymous Coward · · Score: 1, Informative

    This is completely wrong.

    Alpha-beta search is difficult to parallelize because the branches are *not* independent. This is the trivial mistake that everyone who isn't familiar with computer chess always seems to fall into.

  20. Performance != speed by yerricde · · Score: 2, Informative

    Unfortunately, performance is not measured in work-done-per-clock. It's measured in absolute time.

    Not always. Performance may be measured in main loop executions per hour, but sometimes it is more useful to measure main loop executions per megajoule (speed vs. energy consumption; there are 3.6 MJ in 1 kWh) or main loop executions per cubic meter hour (speed vs. rack space). And if increasing work done per clock can increase the rate of work done for a given amount of electric power or rented rack space, then bean-counters would find increasing work done per clock a worthy goal.

    --
    Will I retire or break 10K?
  21. Using FPGAs to really speed things up by Space+Coyote · · Score: 4, Informative

    This thesis shows a system that a guy from McGill University built to use Field Programmable Gate Arrays to generate possible moves. Since FPGAs allow you to do man simple tasks in parallel instead of trying to do one thing at a time very fast as in software, he was able to get an order-of-magnitude speed increase. Special chess computers like Big Blue used custom-designed ASICs for this same purpose, but FPGAs are a much more accessible solution and will blow a software solution out of the water.

    --
    ___
    Cogito cogito, ergo cogito sum.
  22. Re:So why didn't he get ECC memory? by richy+freeway · · Score: 1, Informative

    Nice to see you're reading the article so thoroughly.

    I quote from the article :

    After reviewing the various possibilities, I decided on the following basic components:

    * Supermicro X5DAL-G dual Xeon motherboard
    * Two 2.8 GHz Xeon processors
    * 200 GB hard disc
    * 1 GB ECC Registered RAM
    * Supermicro SC762-420 case