Slashdot Mirror


Truly Off-The -Shelf PCs Make A Top-500 Cluster

SLiDERPiMP writes: "Yahoo! News is reporting that HP created an 'off-the-shelf' supercomputer, using 256 e-pc's (blech!). What they ended up with is the 'I-Cluster,' a Mandrake Linux-powered [Mandrake, baby ;) ] cluster of 225 PCs that has benchmarked its way into the list of the top 500 most powerful computers in the world. Go over there to check out the full article. It's a good read. Should I worry that practically anyone can now build a supercomputer? Speaking of which, anyone wanna loan me $210,000?" Clusters may be old hat nowadays, but the interesting thing about this one is the degreee of customization that HP and France's National Institute for Research in Computer Science did to each machine to make this cluster -- namely, none.

60 of 231 comments (clear)

  1. Can you imagine? (obligatory) by Andorion · · Score: 5, Funny

    Can you imagine a Beowulf cluster of these... erm... clusters?

    -Berj

  2. What was even cooler... by UserChrisCanter4 · · Score: 4, Funny

    Was when the HP-powered cluster started assimilating some of the Compaq multi-Alpha machines as it's own.

  3. $210,000 ?? by a.out · · Score: 5, Interesting

    How about $0 Baldric a student run beowulf at the University of Western Ontario built one on hardware dontations. It's not exactly top 500 but it still kicks ass.

  4. Practically anyone? I think not. by Pulzar · · Score: 3, Insightful

    Should I worry that practically anyone can now build a supercomputer?

    Unless "practically anyone" has the funds, the storage room, and the manpower to maintain this monstrosity, there is nothing to worry about.

    And even if anyone could build a supercomputer, what's there to worry about? We don't live in the "War Games" world where supercomputers play chess, tic-tac-toe, and start nuclear wars for fun.

    --
    Never underestimate the bandwidth of a 747 filled with CD-ROMs.
    1. Re:Practically anyone? I think not. by Pulzar · · Score: 3, Informative

      Actually, the worry about the PS2 machines was that their imaging capabilities are strong enough to be used in the missile guidance systems. I think he never actually attempted to get any of them, but US blocked shipments to Iraq just in case.

      --
      Never underestimate the bandwidth of a 747 filled with CD-ROMs.
    2. Re:Practically anyone? I think not. by archen · · Score: 2, Funny

      think I'd be more worried about someone building a cluster using AMD Athlons, and thus reducing everything for a 500 meter radius into a smoking pile of ash.

    3. Re:Practically anyone? I think not. by KyleCordes · · Score: 3, Insightful

      My guess is that people who understand how to use computers to do modeling for nuclear weapons design would be somewhat harder to come up with than the appropriate degree of computing power.

      Knowing nothing about it, I would nonetheless guess that it's rather non-trivial.

      Keep in mind that nukes were invented without the aid of a Beowulf or a Cray.

  5. Putting them to work. by nairnr · · Score: 5, Informative

    Well, it seems like super clusters are becoming very easy to build hardware-wise. If you throw enough commodity at a problem, it becomes easier. I would think the biggest problem with supercomputers is no longer the hardware itself, but networking, and the programming to take advantage of the hardware. These computers still only really work for something that distributes easily. The biggest factors are now the ability to distribute, and schedule work for each node. The more nodes you engage, the more you hope your problem is CPU bound, so it will scale more.

    Data transfer and message passing are such a big issue I belive the most important developments are in the networking topologies and hardware for these environments.

    That said, I still want one in my basement :-)

  6. This just goes to show you by bstrahm · · Score: 4, Insightful
    How powerful standard desktop computers are. There is only two orders of magnitude between a normal desktop computer (I refuse to call a Pentium III 733 as outdated) and a mainframe computer.


    Now all we need are ways of getting local connections significantly faster (Did someone say Gig Ethernet) to allow faster communication between the nodes and we will be able to scale beyond several hundred and break the top 100. I hear 1gig NICs will be falling in price to under $100 US retail soon...


    How fast do you connect to your cluster ?

    1. Re:This just goes to show you by kindbud · · Score: 3, Insightful

      I hear 1gig NICs will be falling in price to under $100 US retail soon...

      I hear Gigabit switches won't be...

      --
      Edith Keeler Must Die
    2. Re:This just goes to show you by BrentN · · Score: 4, Informative
      The problem with Ethernet in clustering isn't bandwidth, its the latency.

      The real issue is how parallel-efficient your algorithms are. We do molecular dynamics (MD) on large clusters, and we can get away with slow networks because each node of the cluster has data that is relatively independent of all other nodes - only neighboring nodes must communicate. If you have a case (and most cases are like this) where every node must communicate to every other node, it becomes a more difficult problem to manage. To deal with this, you need a high-speed, low-latency switch like the interconnects in a Cray. The only real choice for that is a crossbar switch, like Myrinet.

      And Myrinet is tres expensive.

    3. Re:This just goes to show you by GlobalEcho · · Score: 2

      I'm curious. Given that neigbor communications are the most important, how do you network the machines? I mean, it seems that a useful design might be to have extra network cards in all the machines, to make overlapping network topologies reminiscent of the physical dependencies.

      Or is th elatter to problem-dependent to make this practical?

  7. worry? by motherhead · · Score: 3, Funny

    Should I worry that practically anyone can now build a supercomputer?

    Yes, you should probably worry that practically anyone can build a supercomputer. But you could mitigate all that fear with the fact that not practically anyone can whip up software that takes full advantage of it.

    Thank god there isn't any off the shelf "missile trajectory" software in the CDW catalog. you would hope that any society that can whip together motivated coders to write such code already has access to some pretty spiffy kit.

    (yeah i said "kit"... and I'm from Chicago... I feel like such a wanker.)

  8. An interesting project by bstrahm · · Score: 2, Interesting
    I have not had the chance to play with Beowulf clusters at all. Do I still get a local desktop on certain clustered computers ???


    The ultimate Linux selling tool, every linux box in your company is a node in a cluster, add a few servers for extra speed, add a few computers to provide file I/O and backup capability, and you have one of the fastest supercomputers available to your company without having to spend an extra dime (everyone needs a desktop anyway). Can you imagine the extra cycles available for simulation, whatever when people start going home at 5 PM.

    1. Re:An interesting project by bstrahm · · Score: 2, Insightful
      Absolutely none I would hope... The dataset resides on a centrally managed server, and because they are running a Linux desktop I get to laugh at what a trojan horse virii can do to a user account on a Unix box. This can also be removed as a problem by putting a keyboard, mouse and monitor on the desktop and locking the PC into a cabnet under each desk... What the user can't touch the user can't screw-up


      That is a serious problem though, and one I assume Beowulf clusters will take care of, what if a node goes down in the middle of processing, how does the cluster respond to it ?

    2. Re:An interesting project by tolldog · · Score: 2

      Software such as Platform's LSF take care of this magicly... it even allows for checkpointing, assuming your task allows it. Because my render software didn't really do checkpointing, I had to add that in to my wrapper code.

      We do use desktops at night to work with our render farm. Platform has some cool tools to work with for such environments. I have never tried LSF in conjunction with PVM or MPI but they have support for it, so I imagine it does pretty well.

      --
      -I just work here... how am I supposed to know?
  9. Re:Imagine... by Bonker · · Score: 3

    A Beowulf cluster of E-machines?

    I dunno. It's kinda lacking when you compare it to all the other Beowulf clusters we've considered.

    --
    The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
  10. Yahoo! news by Anonymous Coward · · Score: 3, Funny

    shouln't that be Yahoo! Serious News now?

  11. Spare cycle approach is interesting... by alexhmit01 · · Score: 2

    They mention in the end that they are working with Microsoft to support this approach. They also suggest using spare cycles. Unlike SETI@home, where you download some stuff, work on it, send it back, this appears to be a system where the power scales linearly with nodes.

    Windows support makes a difference. Take a large company (10,000+ in a single location) that has some intensive projects. In this case, they could just drop the $210,000 (call it $750,000 with installation, support, etc.) and put it in a room.

    However, a smaller shop, say, 50-250 employees, being able to install this software on the staff's machines. They rarely use their computers to capcacity, and can probably contribute 90% of the CPU 90% of the time. This approach could let people doing giant calculations do so cheaply.

    The real question, however, is who needs that kind of horse-power. For those that need the horse power, is the savings with off-the-shelf components meaningful.

    Its a tremendous accomplishment, and I wonder how much of the changes were new (vs. Beowulf clusters that we always hear about). However, if this fills a need, congratulations, its an impressive accomplishment regardless.

    Alex

  12. okay um.. by Wakko+Warner · · Score: 2

    i'll give you $210,000 so you can do exactly what with your new supercomputer?

    also, who will pay your power bills?

    i don't get this "drool factor" thing some people have for supercomputers... sure, they're cool and all, but they can do exactly nothing you would want or need to do on a day-to-day basis...

    --
    "Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
    1. Re:okay um.. by dillon_rinker · · Score: 2

      Photo realistic first-person shooters, 75 frames per second.

    2. Re:okay um.. by sharkey · · Score: 2

      The combined vibration of the coolings fans would make timothy's pr0n browsing that much better.

      THAT'S the drool factor, so to speak.

      --

      --
      "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
    3. Re:okay um.. by fgodfrey · · Score: 2

      Kinda depends on what you do for a living doesn't it? I'd like to see you predict the weather acurately (well, as I guess I should say "as accurately as the standard weather forcast") over the coming week or two without a supercomputer. Or design an aerodynamic car. Or an airplane. Or a new medical drug. Or a spaceship. Or any number of defense/military applications. There is plenty that a supercomputer can do that people do on a day to day basis. You just have to be in the right line of work. (And by the way, a supercomputer *can* do everything a desktop machine could do, it'd just be rather pointless to use a machine that big as a desktop...)

      --
      Go Badgers! -- #include "std/disclaimer.h"
  13. Okay, so I'm curious: by Saint+Aardvark · · Score: 3, Funny

    Sez the cost was $210k US w/o cabling...why the qualification? What *would* cabling for 225-odd boxen cost?

  14. It's at #385 in the list by CormacJ · · Score: 4, Informative
    The latest top 500 list is here : http://www.top500.org/list/2001/06/

    The cluster is at #385

  15. Re:Imagine... by xkenny13 · · Score: 2, Insightful
    • A Beowulf cluster of E-machines?

      I dunno. It's kinda lacking when you compare it to all the other Beowulf clusters we've considered.

    Maybe so, but this cluster still made into the top 500 most powerful computers.

    Now, imagine a cluster of Athlon 1.4GHz machines doing the same thing ... now there's a drool factor, and probably cheaper to boot!!

  16. distributed supercomputer cluster by maxpublic · · Score: 2, Interesting

    What I'd like to see is a shot at a distributed supercomputer cluster utilizing the spare cpu cycles of computers on high-speed internet connections (cable or DSL). Since efficiency would be remarkably degraded by slow communication times and the fact that many of these computers would be running Office (ahem), you'd have to scale up at least one order of magnitude.

    Technically I can't see why this wouldn't be feasible. It would be beyond SETI and protein folding in that the 'control center' could change what problem was being worked on at any time. It may not be incredibly practical compared to setting up specific machines in a single large room, but it would be free and have a potential user base in the hundreds of thousands or millions.

    Imagine: instead of the same SETI screen output time and again, you'd get a message on your SS saying "would you like to see what your computer is working on right now? How about high-pressure fluid dynamics in environment x?"

    Max

    --
    My god carries a hammer. Your god died nailed to a tree. Any questions?
  17. Re:Technicly.. by flegged · · Score: 2, Informative

    Yes, we all saw the Apple ads for the G4 being capable of 1GFlop. What you didn't see, was that the Pentium III 500 was capable of ~2GFlop. Now that can run an 1GHz. You also didn't see that AMD's Athlon, having a superscalar FPU, is faster than a P3. And now they can run at 1.6GHz. The P4 has new instructions to speed up certain types of multimedia processing as well. By contrast, the G4 is only now approaching 1GHz. Go figure (as you Americans say.. :o)

    An Apple is not a supercomputer.

    RISC does not mean faster. It allows for simpler design which can lead to increased speed, but as we have seen, Apple have consistently failed to compete with Intel and AMD (not that they even make thier own chips...). CISC is actually a good idea, since with the huge speed differential between CPU and memory, and the introduction of cache, the bottleneck in any system is the memory bandwidth. Think for a moment : why did Intel add instructions to the x86 architecture in every iteration? Because its faster having one instruction doing something complex, than many simple ones, simply because of the reduced frequency of memory access. In todays computers, RISC doesn't mean anything, since memory, storage and network bandwidth is the bottleneck.

    The moral of this story:
    1: Don't believe Apple's advertising.
    2: Don't believe what a Mac Zealot will tell you about RISC or some other claptrap.
    3: Get ppc Mandrake if you're unfortunate enough to have actually bought a G4.

    Yes, I use Macs. Daily. And I hate Apple. But my PHB is a Mac zealot. It frightens the hell out of me seeing all our company's work being stored on a Mac (OS9 (no pre-emption, memory protection, RAID, journalling, or anything you would want for a server...)).

    --

    "I think he was truly surprised at how little I cared about how big a market the Mac had" - Linus on Jobs
  18. Re: RISC by Turq · · Score: 2, Informative

    While I agree with what much of flegged said, his/her post implies that modern Intel/AMD CPUs -are- largely CISC devices. This simply isn't the case. Both (the AMD moreso though) make heavy use of RISC-type design and technique.

    RISC does matter, or Intel and AMD wouldn't be using it.

    --
    - Turq - "That's TRON, he fights for the users."
  19. Power Usage by Xunker · · Score: 2, Interesting

    I wasn't able to get hard facts about this, so I'm going to throw out the question for general "gee whiz" value.

    I was pondering the computrons per watt of a cluster such as this versus a real honest-to-Bob supercomputer (Something from Cray/Terra/SGI, for example). we can assume that each machine in HPs cluster uses probably 60-80 watts (because they're sans monitor), so youre looking at about between 1.2 and 1.8 kilowatt hours to power this thing. I'm not sure what a Cray TSE uses, but I have to think it's nowhere near that because of all the redundancy that PC clusters use (one Power supply, chipset, etc per Core).

    Though, I'm sure if you can afford either a Cray or 256 PCs, you can afford the power bills, too. If you have to ask how much it will cost you, you can't afford it. But while CIP (Cluster of Inexpensive PCs) is cheaper, is it as efficient?
    .

    --
    Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
  20. On the preponderance of "Kit" by ColGraff · · Score: 2

    As we all know, "kit" is a british slang term for computer hardware. What many people may not know is that it is also the secret weapon in a British campaign of cultural assimilation.

    Yes, you heard me right. Cultural assimilation. The brits are sick of seeing Mickey Mouse and Donald Duck and the sexy chick from Enterprise on TVs all over the world, and they're going to do something about it.

    The British invented the English language, and in many circles certain British accents are percieved as more sophisticated or upper-class. They're capitalizing on that by inventing slang terms - "kit" being among the forerunners - that other English-speaking peoples appropriate. Thus is begins.

    Soon, British TV will move off of PBS, where it belongs. British computer games and hardware will surpass American in popularity. And there is nothing - absolutely nothing - we can do about it.

    (In case you hadn't realized it, yes this is a joke. And yes, I know it's offtopic and will be moderated as such. But this was fun to write.)

    --
    I'm the stranger...posting to /.
    1. Re:On the preponderance of "Kit" by Seanasy · · Score: 2

      Your plan is futile.


      Like most Americans, when I hear the word "kit" used in a technological context I instantly think of the popular 80's television show "Knight Rider" starring David Hasselhoff.



      Now if you were German...

  21. what about... by Phalkin · · Score: 2, Interesting

    using a bunch of those 1U dual athlon rackmount boxes for this? seems like it would reduce the overall footprint by several orders of magnitude, as well as easily doubling (if not tripling) the power. comments, anyone?

    --
    I stole this sig.
  22. Air strikes against computers? by Dr.+Spork · · Score: 2, Interesting
    I bet you HP and many other tech companies have people who called the government telling them they should bomb "enemy" computers because they are "weapons systems." With this cluster, we see this justification could apply to any computer whatsoever.

    Then, the US gets tired of bombing, and HP sells them new machines. Soon thereafter, we decide their new "good" dictator is just as bad as their old "bad" dictator, and the cycle begins again.

    1. Re:Air strikes against computers? by WillSeattle · · Score: 2

      So the question then is, is this good for Open Source computing? I mean, this gives us dollar metrics like the IDG and other measurement people want, but the end work product couldn't be described as beneficial, so it's really not that good that this happens.

      Of course, the same could be argued about a Win2K or WinXP hacked clone - but the utility in solving nuclear equations and modeling explosions is not as high.

      Even a suitcase bomb requires that you have:
      1. component parts - detonator, nuclear material, shaping material
      2. supercomputer to model the charge shape and impact velocities
      3. willingness to deliver the material (even if lead shielded)

      We know that they have #3, this HP open source supercomputer may give them #2, now they only have to pick up #1 - maybe Pakistan or the Taliban have such and will sell them to raise cash or create more problems for us.

      --
      --- Will in Seattle - What are you doing to fight the War?
    2. Re:Air strikes against computers? by Dr.+Spork · · Score: 2
      I honestly doubt that terrorist organizations would go through all the trouble of sending off people to get educated if physics, having them build a supercluster and compose simulation software, throw up and test some designs, etc. Much more likely, they'll buy finished bombs, or they'll at least buy pre-tested blueprints.

      This might be different in for a country like Iraq who already have many educated physicists, and they have a realistic chance of actually doing all this work from scratch. Of course, the IEEE is doing random inspections there all the time, but maybe they could "disguise" their number-crunching supercomputer as a 256 separate workstation terminals for all the government clerks who write email. By night, thought, it's Linuxtime.

      You're right about the missing material, but I'm sure someone somewhere will be willing to sell 10kg of it... (the more people we bomb, the more likely that seems).

      spork

  23. Re:what scares me is... by Fencepost · · Score: 2

    It's probably out of date because processors that speed are either already unavailable or will be shortly. They could presumably underclock, but it makes more sense to just tweak the model number slightly.

    --
    fencepost
    just a little off
  24. An idea for a business plan by motherhead · · Score: 2

    Hey remember all those completely and hopelessly out of work Russian PhD CS grads sitting around and starving and writing strong crypto software for the Russian Mafia? You might even have heard that the Russian Mafia is always looking to explore new business ideas and strategy.

    Well hell wouldn't this be a great business opportunity for both of them?Call it RMBM (Russian Mafia Business Machines), and then build cheap super-clusters and turnkey code for "specialized" clients. The possibilities are endless.

    This is where you get them now: Support. You sell them the machines at a 25% markup and then charge a ridiculous annual service agreement.

    From the presentation:
    "Using "borrowed" Post-CCCP Mi-8TV assault/commando choppers RMBM support staff can be deployed to your corner of the desert in a matter of hours! Lets see IBM match that! Not even Larry Ellison and his personal Mig can touch that! (canned laugh track)"

    I don't know, maybe not.

  25. Downturn clustering by Daniel+Quinlan · · Score: 2, Funny

    I guess this is what you do with all of that extra inventory. Clusters coming from Gateway and Dell next.

  26. Notice by poot_rootbeer · · Score: 2, Funny


    Anyone who posts a comment containing the word "Beowulf" will be shot.

    Including me.

    Uh-oh.

  27. Kit == Rig by kindbud · · Score: 2, Informative

    As we all know, "kit" is a british slang term for computer hardware.

    No it isn't. That's just the only context in which you've heard it used (translation: you read too much Slashdot, and should get out more often). "Kit" is the British equivalent of the American "rig" when used in this context. It is not used specifically to refer to computers.

    --
    Edith Keeler Must Die
  28. amd for less by Jaeger- · · Score: 2, Interesting

    i think i could build a better supercomputer
    for less money with amd procs/mobos/etc

    1gig tbird $100
    decent cheap amd mobo w/integrated vid/snd/net $100
    256meg ram $25
    15gig ide $50
    floppy drive (needed??) $15
    cdrom (needed??) $25
    decent nic $20
    cheap case $40

    total $375
    subtract 10% (due to quantity purchase) gives less than $350 total each

    pay a bunch of college kids $10/hour
    they'd build 2 machines/hour
    so 125hrs total to build comps is $1250

    $350 x 250 machines is $87,500
    add in (8) good quality 32 port switches @ $200 each and you're up another $2k
    add in 250ish cat5 cables for another $1k (who wants to make them, buy for $3-4 ea)

    your total cost is way under $100k

    or even better
    use the new SMP durons, 1gig each

    not much more $$ since durons are cheap
    add like $50 for the 2nd proc (total $150 for 2 duron 1gig smps, unsure if thats reasonable pricing) and another $50 to mobo cost for dual smp mobo

    thats $450 ea box
    250 x $450 gives us $112.5k for the boxes
    add in networking stuff etc
    less than $125k prob

    man i want to do this
    need someone with $$ =P

    --wayne =)

    --
    E V E R Y T H I N G I W R I T E I S F A L S E
    1. Re:amd for less by Graspee_Leemoor · · Score: 2, Insightful

      why in god's name would you want a cd and a floppy for each node ?

      Also why have a mb with integrated sound (and video) - this is a beowulf cluster, not a John Cage album...

      graspee

  29. Re:The amusing implication of this by geekoid · · Score: 2

    if they were easy to get, don't you think they would have used one?

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  30. Yawn. Another One. by 4of12 · · Score: 3, Insightful

    You know this Beowulf business is getting to be pretty staid and routine by now.

    In fact, I'd almost say it would be newsworthy if there were any organization (university, company, govt lab) that had not yet built "a supercomputer from the COTS components".

    What I'd like to see now is more metrics (some of which the article does, admittedly, reveal).

    1. hardware cost per FLOP (everyone already tells you this)
    2. FLOPS per human time to build
    3. FLOPS per sysadmin time to maintain
    4. FLOPS per kilowatt of electricity
    5. FLOPS per cubic foot of rack space
    6. can it run smoothly if Bad Andy goes behind the rack and unplugs a few network connections, a few power cords to some nodes?
    Everyone knows that you can spend your own time scouring dumpsters for cast-off computers and coaxing them to life, bringing up an old 486 with an ISA 10bT card as a member of your cluster. Unless you're doing it for your own educational benefit, it's just not worth it.

    Don't get wrong. I love these clusters and want to use them. It's just that, in 2001, their mere existence is no longer as exciting as it was in the mid 1990s.

    Now days, I care more about ease of use and ease of maintenance, taking the low cost of a Beowulf cluster as a given.

    With the size of these clusters going up and the ratio of hardware cost to human time constantly decreasing, I'd be more impressed to see how a system with many hundreds of nodes was brought up in a short time, never rebooted for a year, even as 13 of the nodes developed variously problems and become unproductive members of the cluster.

    --
    "Provided by the management for your protection."
  31. News worthy? by tolldog · · Score: 2

    I fail to see what is impressive about this.
    It looks like the wheel reinvented several times.
    For cluster installs on several machines, use system imager .
    For using and controlling a cluster of machines for various taskes, use LSF .

    The number of machines is pathetic too ... 225 @ 733 mhz? That makes it to #325?
    How sad. I need to bench mark our render farm (200+ boxes, 120 are dual 1ghz) and see what we can come up with. I know it is higher than that... and we have a smaller install for the industry.

    I looked for info to spec our machines but I couldn't find any info.... any help?

    --
    -I just work here... how am I supposed to know?
  32. Incorrect citation by kindbud · · Score: 2
    Yahoo! News is reporting ...

    No, it isn't. Yahoo! News is repeating a story which, if you'd bother to read the byline they wrote, was

    By CNET News.com Staff CNET News.com

    The article on CNET's site should be getting the Slashdot treatment, don't you think?
    --
    Edith Keeler Must Die
  33. Yes, it was = you're right by ColGraff · · Score: 2

    I refer to "the british" meaning the residents of the island now known as England, not in the sense of citizens of the modern political entity. I probably shouldn't have done that, but hey - it wasn't meant to be accurate anyway.

    --
    I'm the stranger...posting to /.
    1. Re:Yes, it was = you're right by pmc · · Score: 2

      But there is no island now known as England.

  34. Thanks for the info by ColGraff · · Score: 2

    You're right, I should get out more.

    Of course, you know that makes the threat presented by the word even more insidious. If non-techies can use it well - I shudder to think of the potential for linguistic infiltration!

    --
    I'm the stranger...posting to /.
  35. For those wondering about nuclear testing... by segfaultcoredump · · Score: 5, Insightful

    No, a beowulf cluster is the last thing that one would use for nuclear simulation.

    While great at highly parallel tasks that require very little synchronization between threads (think code cracking), nuclear testing (and almost all other fluid dynamic problems) generally requires all of the cpu's to have high speed access to all of the memory. So one needs a huge shared memory system (think Cray or Sun StarCat).

    And for this reason, I find the top 500 list to be a bit misleading in these days of massively parallel systems. Its great as a test of how many flops the system can crank out, but it does not take into account the memory bandwidth between the cpu's, and that is often more important than raw cpu horsepower.

    1. Re:For those wondering about nuclear testing... by tolldog · · Score: 3, Informative

      Ahh... Somebody else who gets it...

      I find too that people assume that an "X" type of cluster will solve all problems, regardless of what they are. Each cluster type serves a purpose. Cray and then SGI spent time developing the Cray Link for a reason. Sun, IBM, HP and others have gotten into the game as well. Sometimes you need a ton of procs with access to the same memory, sometimes the task divides well.

      I see this from almost the opposite side of the spectrum with rendering. To render a shot, you can divide the task amongst ignorant machines. They just need to know what they are working on. The cleverness goes into the managment of these machines. A place where the massively parallel machines would be nice is rendering a single frame. After the renderers initial load of the scene file and preping for render, the task can be divided amongst many processors on the same machine. To divide it beowulf style would throttle the network with the memory sharing over the ethernet ports.

      So from my experience:

      big data, long time ... massivley parallel machinebig data, short time ... generic cluster with smart master
      little data, long time ... beowulf style cluster
      little data, short time ... generic cluster with smart master

      --
      -I just work here... how am I supposed to know?
  36. Power consumption of old CPUs by muffel · · Score: 2, Insightful
    I'm wondering: if you use many old CPUs (486, early Pentiums) vs. not so many recent (PIII/Athlon, ~1GHz) wouldn't you pay for your elecricity bill more than you saved on the hardware?

    Is there anything like a MIPS/Wh rating for CPUs? (Would thermodynamics dictate a certain minimum?)

    With a seperate power supply and hard disk per CPU (i.e. complete box) I would imagine that old PCs generate a *lot* of heat per CPU cycle.

    Has anybody done measurements/calculations on this?

    --

    bla
    1. Re:Power consumption of old CPUs by jmv · · Score: 3, Informative

      Actually, the best MIPS/Wh is probably with the slower versions of the current laptop chips. Maybe portable G3/G4?

      Also, I don't think you'd get much useful stuff done with early Pentiums and 486. Consider that a P4 2 GHz has 20 times the clock speed and probably does twice as much per cycle, so it's ~40X faster. Now, if you connect 40 P100 together, unless your problem is completly parallel (like breaking keys, as opposed to most linear algebra), you're going to lose at least a factor of 2 there. This means that in order to equal 1 P4 @ 2 GHz, you'll need almost 100 Pentium 100 Mhz. This means that 10 P4 would be like a thousand Pentiums. At these numbers, it's going to cost so much in networking and power...

      I'd say (pure opinion here) the slower you'd want to have today is something like a Duron 1 GHz and the best MIPS/$ is probably with a bunch of dual Athlon 1.4 GHz (A dual is not cheaper that 2 single, but you get more done because of parallelism issues).

    2. Re:Power consumption of old CPUs by Paul+Komarek · · Score: 3, Informative

      My experience doesn't suggest that the P4 does twice as much per cycle. I'm seeing P4s do a fair bit less than the P3 per cycle, and the P3, P2, and PPro cores didn't seem *that* much faster per clock than the original Pentiums. My gut tells me that the P4 doesn't do any more than the original Pentiums per clock cycle, and the only thing they have going for them is Intel's ability to manufacture them at high clock speeds.

      If you really want a cpu that does a lot in a single cycle, look at the IBM POWER series. IIRC, on the floating point side, a 2xx MHz POWER III is darn not too far from an Alpha 21264 at 733 MHz. And now there are 1.1GHz and 1.@GHz POWER IV chips, in the new IBM p690 machines. I don't know how they compare to the POWER III per cycle, though, because the POWER IV opens a whole new (good) can of worms.

      -Paul Komarek

    3. Re:Power consumption of old CPUs by jmv · · Score: 2

      I'm seeing P4s do a fair bit less than the P3 per cycle, and the P3, P2, and PPro cores didn't seem *that* much faster per clock than the original Pentiums.

      For "unoptimized" applications, that may be approximatly true. However, if you're going to build a cluster, you're also going to optimize your code for it. What kills the P4 is branch misprediction. However, by carefully writing your code, you can avoid most of these problems. Also, most of the big clusters are for numerical code, for which branch predictions does well (plus you can do lots of loop unrolling).

      Another thing that the P4 (and PIII) has is SSE. On a P4 2 GHz, you can do theoratically 8 gflops. In practice, if you write good code, you'll bet between 1 and 2 gflops. On a plain pentium, the FPU is not pipelined, so a P100 (I'm guessing here) probably has a theoretical maximum of ~25 mflops with a performance for real code around ~10 mflops. That means the P4 is probably 100-200 times faster at floating that a P100.

      Of course, you're right in saying that other architectures are probably faster that P4. ...and by the way I'm not saying that the P4 is great... but if you're doing numerical stuff and using SSE, it's VERY fast (in my experience, 3DNow! has been faster than SSE at the same clock rate, but 2GHz is too much higher than the fastest Athlon).

    4. Re:Power consumption of old CPUs by Paul+Komarek · · Score: 2

      My experience with optimizing research code for a particular platform: it will never happen, unless you're starting from scratch and expect your code to have a short lifetime. We've got libraries that are several years old, written to be portable across various unix and Windows platforms, running on MIPS, Alpha, x86, SPARC and PA-RISC. These libraries aren't optimized for any particular platform, and nobody has time to mess with platform-specific optimizations.

      I've never tried optimizing for SSE, but someone in lab did once. He reported higher performance when doing his computations element-at-a-time than vector-at-a-time. His conclusion, for his particular application, was that memory latency was killing SSE. He was better off doing lots of work on a few numbers, than he was doing fancy stuff to a lot of numbers with SSE. On the other hand, some people have had some luck with SSE optimized FFTs, or so I've heard.

      At 2GHz, I'll bet that you're better off doing element computations than vector computations because of the radical difference in memory-versus-processor performance, if the P4's L1 take more than a cycle to feed the registers. Otherwise, do whatever fits in L1 and can be prefeteched -- like elementwise computations on long vectors. Anyone have any real or anecdotal evidence to refute or support this theory?

      In the end I think that platform-specific optimizations are a waste of time for research code. I seem to remember some people eventually including hooks in BLAS or LAPACK to allow the user to specify cache sizes; and FFTW does some runtime experiments for optimization. But my guess is that, overall, SSE, 3DNOW, and even AlitVec are irrelevant to most computer researchers. I'll bet their highly relevant to most embedded engineers or many robotics researchs, i.e. people targetting specific hardware.

      -Paul Komarek

    5. Re:Power consumption of old CPUs by jmv · · Score: 2

      I've never tried optimizing for SSE, but someone in lab did once. He reported higher performance when doing his computations element-at-a-time than vector-at-a-time. His conclusion, for his particular application, was that memory latency was killing SSE.

      Maybe his problem was really special, but most likely he didn't know how to write SSE code. First, if you write your code correctly, the worse you can do is a bit better than the x87, because you can use SSE with scalar and take advantage of the linear register model (as opposed to stack).

      The only time I've converted some code for SSE, I got a 2-3x improvement in speed. There is one thing you really have to be careful when writing SSE code: ALIGNE EVERYTHING TO 16 BYTES. That's very important. The "movaps" (move aligned packed single) is more than twice faster than the "movups" (unaligned) when data is unaligned (movups on aligned is not too bad). That makes all the difference. Also, sometimes you need to change the way you do your computations.

      For the case I have (optimizing a speech recognition engine), just by changing the order of of the loops (inner vs. outter), we got a 2-3x improvement (still with x87) because of increased L1 hit rate. Then when switching to SSE, it ended up to a 5x improvement over the original code. Had I just re-written the code in SSE (with cache optimization), the gain would have been around 25%, because memory would still be the bottleneck.

      As for libraries not being optimized, just look at FFTW (www.fftw.org) or ATLAS (www.netlib.org/atlas) and you might change your mind.

  37. Yes there is by ColGraff · · Score: 2

    The United Kingdom consists of Wales, England, the Isles of Man, Scotland, and part of Ireland. So There!

    Hail Brittania!

    --
    I'm the stranger...posting to /.
  38. Re:CPU not the limiting factor by jmv · · Score: 2

    On the other hand, for a weather simulation, I would bet on the cluster.

    No way. Weather sinulations involves lots of linear algebra on huge matrices. When you parallelize that, you need a lot of communication between the nodes. With a cluster of P100, communication will kill you right there (10x of so penalty). It's not that much the network bandwidth, but the latency. Weather simulation is one of the hardest problems to parallelize and that's why until recently, SMP was prefered to MPP (and of course clusters of small workstations).

    As for the memory bandwidth, depending on the problem, sometimes the L1 is really effective.