Slashdot Mirror


North America's Fastest Linux Cluster Constructed

SeanAhern writes "LinuxWorld reports that 'A Linux cluster deployed at Lawrence Livermore National Laboratory and codenamed 'Thunder' yesterday delivered 19.94 teraflops of sustained performance, making it the most powerful computer in North America - and the second fastest on Earth.'" Thunder sports 4,096 Itanium 2 processors in 1,024 nodes, some big iron by any standard.

32 of 325 comments (clear)

  1. Imagine a ... by Anonymous Coward · · Score: 5, Funny
    pineapple on a monkey.

    And you thought I was going to say something else...

    1. Re:Imagine a ... by JPriest · · Score: 5, Funny

      I see your pineapple monkey and raise you a rabbit with a pancake on its head. (cache)

      --
      Saying Java is nice because it works on all OS's is like saying that anal sex is nice because it works on all genders.
  2. Very great and all... by irokitt · · Score: 5, Interesting

    But why did they use itanium processors? Were they acquiring parts before Opterons were availabel? Did they have a problem with Xeon processors? Or did they have too much cash lying around?

    --
    If my answers frighten you, stop asking scary questions.
    1. Re:Very great and all... by MBCook · · Score: 5, Insightful
      I like the Opteron as much as the next guy and I'm no fan of the Itanic. But the fact is that for some types of calculations the Itanium can smoke Opterons. If you want the fastest, in many cases you want the Itanium. If you want the best value (which still performs quite close to the fastest), you want an Opteron. I don't remember which operations are better on which, so you'll have to look that up (or someone will reply with the answer).

      Depending on budget, price (I wouldn't be suprised if Intel cut them a sweet deal to get this cluster publicized to help our their product's sales), and other factors, the Itanium could have been a good choice.

      Especially if they were using software that had been designed for the Itanium (like they were replacing an older cluster) then they wouldn't have to port the software which would have saved real money.

      I'm not a fan of Intel lately, but the Itanium isn't overpriced garbage no matter what. That smacks of fanboyism. Interesting you didn't add G5s to your list, BTW.

      ALSO: Don't forget that the Itanium 2 was DESIGNED FOR big iron, while the Opteron was designed for servers and small iron. They can be used in other ways (you could run a web site off an Itanium 2), but the Itanium was designed for these kind of applications.

      --
      Comment forecast: Bits of genius surrounded by a sea of mediocrity.
    2. Re:Very great and all... by tap · · Score: 5, Informative

      Do you have any kind of benchmark where the Itanium smokes the Opteron? The Itanium does have a greater memory bandwidth, but not by a lot. If you look at the spec benchmarks, it can be faster on some of them, but not by a lot. However, the Itamium is a lot more expensive!

      Compared to a Xeon or AthlonMP cluster, the Itanium faired poorly in price/performance. The only reason to use Itaniums was if you needed 64 bits for more than 4GB of memory, or needed high single CPU performance for a pooly parallized application. (Of course if your application parallizes poorly, a cluster is probably a bad choice to begin with). Then Opterion came out and changed all that. It's 64 bits, it's fast, and it's a fraction of the price of the Itanium2.

      I just purchased a new Beowulf cluster. The decision was between Xeons vs Opterons. The Opterons had better price/performance, but the Xeons would fit in better with our existing Pentium3 Beowulf, other ia32 servers, and existing software. In the end, we went with Opterons. Itanium2 was never even in contention. Just one look at the price and performce of a Itanium2 system was all it took to cross it of the list.

    3. Re:Very great and all... by tap · · Score: 4, Informative

      Ok, checked them again. The best 1.5 GHz Itanium2 SPECfp2000 score is 2148 while the opteron 248 is 1691. That's 27% faster. I'd hardly call that smoked.

      The Opteron 248 is $670 on pricewatch, while the 1.5 GHz It2 is $5200! The motherboards are like $1400 vs $400.

      You have to keep in mind that this isn't a single machine, it's a cluster. You could take the money spent on an Itanium2 cluster, and buy an opteron cluster with five times as many processors. I am well aware that one does not get perfect scaling. But if you are running something on a cluster in the first place, I have a hard time imagining something that is faster with one fifth as many 27% faster processors. Yes, there are codes that would be faster on 1000 Itanium2 vs 5000 Opterons, but you would never runs these on cluster, because they would be faster still shared memory system.

    4. Re:Very great and all... by SuperQ · · Score: 4, Insightful

      the problem is not that you couldn't get the processors, the problem is scale.

      A system like this will use a high-speed interconnect, not gige. The popular choice right now is infiniband, and that stuff isn't cheap, and also has limits to the number of ports per IB switch. The system at LLNL has 4 procs per node, which reduces the number of IB switches involved. 5000 dual proc (you suggest 248 proc) machines would require 2500 IB ports, instead of 1024.

      now if you considered the opteron 848 ($1300), in 8proc nodes, that would be something to think about, reduce the number of IB ports in half, and be able to double the processors.

      the other consideration is also processor scale. the 27% per CPU is signifigant, because even with dual proc SMP, you loose some % of the CPU time. There was a posting on an article about how processors scale this way. I forget how the principle works.

  3. "Most" powerful by Alomex · · Score: 4, Interesting

    Look, any way you cut it the 100K computers Google is reputed to have is the most powerful Linux cluster anywhere in the world.

    1. Re:"Most" powerful by 0xC0FFEE · · Score: 5, Insightful

      If google's cluster is interconnected via ethernet, there is a whole range of computational problems it can't tackle. If you want to simulate a spatial phenomenon with lot of things going back and forth in a volume, you're bound to have a _lot_ of communications. The cost of the interconnect system in those simulation systems is often a substantial proportion of the total cost of the installation.

    2. Re:"Most" powerful by irokitt · · Score: 4, Interesting

      4,096 iTanium processors versus ~8,000 boxes sporting Pentium II, III, and 4 processors. But remember that the interest Google has is in disk access and redundacy, not complex mathematical computation. So it isn't configured as a 'supercomputer' per se.

      --
      If my answers frighten you, stop asking scary questions.
    3. Re:"Most" powerful by smitty45 · · Score: 4, Insightful

      Powerful = fastest computation, not biggest. A roomfull of Chevettes do not make a Corvette.

    4. Re:"Most" powerful by tap · · Score: 5, Informative

      I think you've got that backwards, Quadrics is the performance leading, not the price/performance leader. Myrinet, SCI, and Infiniband all beat it in price/performance. Quadrics is faster, and scales to more nodes than the others.

      According to Quadrics latest price list, the cards are $1200 each, $913 per port for a 64 node switch, and $185-$265 for a cable. That's $2300/node.

      Myrinet cards are $595, the switch is $400 per port for 64 nodes, and the cables are ~$50. That's $1050/node.

      Quadric's price for a 1024 node interconnect is $4,176,094. That's hardly chump change. The bandwith is about 10x higher than gigabit ethernet, and the latency about 100x lower.

  4. how fast is it? by chickenrob · · Score: 4, Funny

    Is it fast enough to run all the latest spyware, adware, and viruses and not slow down your solitaire game?

    --
    People say my sig is the best thing about me.
  5. but but but by Anonymous Coward · · Score: 4, Funny

    Can it run Windows?

  6. Re:Awesome! by MrRuslan · · Score: 5, Funny

    It's all for reserved for Doom III on longhorn.

  7. Did hell freeze over? by SuperBanana · · Score: 4, Funny

    LLNL built a supercomputer, and it's going to do things besides simulate nuclear weapons?

    Quick, someone ring Satan and ask how the sno-cones are.

    1. Re:Did hell freeze over? by geek · · Score: 4, Insightful

      I grew up in Livermore, the lab was some 500 yards from my bedroom window. They work on a lot more than nuke simulations, including alternate fuels (my brother in law was driving a hydrogen fuel car from the lab 10 years ago as a test), laser technology and about a million other things. Why is it people like you who hear "Nuke" rant on and on like biased little children and post inflamatory things like this?

      The lab is a GOOD thing damnit. Do you even know what nukes are? What nuclear research has done for us? Grow up man.

  8. I don't care what anyone says by MrRuslan · · Score: 5, Funny

    this thing should do doom 3 with a software renderer at a very playable 47 FPS...

  9. Re:Whoa. by TravisWatkins · · Score: 4, Informative

    That would be the Earth Simulator in Japan.

    --

    "But I'm still right here, giving blood and keeping faith. And I'm still right here."
  10. Another Article by Flashbck · · Score: 4, Interesting

    And only 55 people were needed to build it!

  11. 2nd fastest supercomputer by m1kesm1th · · Score: 5, Funny

    Also in completely unrelated news, Bill Gates announced the first fully installed test of Longhorn happened today.

  12. apple's response will be interesting by Twid · · Score: 4, Insightful

    If I calculate right, they are claiming an Rmax of 19.94 teraflops with 4096 processors.

    The Virginia Tech cluster for Apple had an Rmax of 10.28 teraflops with 2200 processors.

    So, the Itaninum 2 delivered 4.8 gigaflops per processor, the G5 delivered 4.6 gigaflops per processor.

    This seems like a pretty poor showing for Itanium 2, overall. It's a much hotter chip than the Opteron or the G5, so cooling and power costs are likely much higher than a comparable apple cluster. The Xserve G5 is also likely cheaper than a similarly equipped Itanium 2 server, given that the Itanium 2 is $1398 per chip on Pricewatch, and a dual processor Xserve G5 cluster node is $2,999 list. Even with 4 cpus in a single box, I think the Itanium 2 server would easily top $6,000.

    But anyway, good game to Lawrence Livermore. I'll be curious to see if Apple has another volley to fire before the top500 list closes for this round.

    --
    - "When you want something with all your heart, the entire universe conspires to give it to you" -Paulo Coelho
    1. Re:apple's response will be interesting by prockcore · · Score: 4, Insightful

      This seems like a pretty poor showing for Itanium 2, overall.

      It does? You know that clustered computing doesn't scale linearly. If virginia tech were to double the amount of processors used, they wouldn't double their performance.

    2. Re:apple's response will be interesting by Anonymous Coward · · Score: 4, Insightful

      Actually, there's more to it than that. Virginia Tech's machine only gets ~55% of its peak performance, whereas Thunder gets 87%. Given that Thunder has twice as many processors, that's an EXCELLENT showing. Remember, the actual work that's going to run on Thunder won't scale anywhere near as well as the easily scaled LINPACK benchmark, so the performance gap between "benchmark" and "real world" will only get wider in practice.

      Thunder is an absolutely remarkable machine.

  13. Rejoicing at Intel by Animats · · Score: 4, Funny

    "We sold the Inaniums! We sold the Inaniums!"

  14. Re:vs google by complete+loony · · Score: 4, Informative
    Google have lots of little (in comparison only) jobs that have to process heaps of data, googles cluster(s) wouldn't perform well in the top 500 list since they don't concentrate on link speed, which is the main factor in performace for supercomputers, but on raw data processing power.

    The GFS article that appeared a while back said they used standard 100MBit ethernet, this is not going to get you a good score in any supercomputer benchmark.

    --
    09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
  15. before everyone starts shouting at once... by painehope · · Score: 4, Insightful

    yes, they're hot as hell and eat power the way oprah eats twinkies, and yes Intel has made a poor handling of the Itanium line, but the Itanium architecture is very interesting, and is actually very appropriate for a HPC environment. Not the part of the HPC market that clusters dominate, but the segment that Cray, SGI, HP Alphaservers, etc. have traditionally dominated. The segment that doesn't give a shit about cooling, power consumption, or price-performance, but who just need to get the job done as quickly as possible.

    Some of the coolest features of the Itanium are also some of the reasons why a lot of people don't want to use it. The EPIC ISA, for example. It was designed ( along w/ the physical hardware ) to expose a lot of the internal workings of the processor to the user. But rather than recompile and re-optimize their code, people would rather bitch about migration. That's fine for workstations and servers, but in an HPC environment, you want the nifty features, you want to occasionally hand-tune code segments in assembler, etc.

    Anyways, I'm not a fanboy ( well, maybe an AMD and MIPS fanboy ), just wanted to get in a few honest points before everyone started shooting holes in the Itanic.

    --
    PC moderators can suck my White pierced, tattooed dick. If you think pride == hate, s/dick/Aryan meat mallet/g.
    1. Re:before everyone starts shouting at once... by slamb · · Score: 5, Informative
      Some of the coolest features of the Itanium are also some of the reasons why a lot of people don't want to use it. The EPIC ISA, for example. It was designed ( along w/ the physical hardware ) to expose a lot of the internal workings of the processor to the user. But rather than recompile and re-optimize their code, people would rather bitch about migration. That's fine for workstations and servers, but in an HPC environment, you want the nifty features, you want to occasionally hand-tune code segments in assembler, etc.

      I just coded some IA-64 assembly and from what I've seen, this comment is dead-on. They've got a lot of interesting features:

      • Speculation. The idea is to do memory fetches far in advantage to avoid waiting for the (much slower) memory system. You can do a LD.S operation that tells the machine something like "I might want the value from this memory address in a few instructions." It fetches it from memory, if it's in a good mood. If the address is paged out, it doesn't get it. (Instead, it sets a NaT (not a thing) bit to tell you nothing useful is there.) Later, you do a CHK.S. If it turns out that the speculative load fails, it jumps to some "recovery" code which gets it for real.
      • Lots of registers. 128 general-purpose 64-bit registers. Floating point registers. Some specialized ones, I think.
      • EPIC. (Explicitly Parallel Instruction Computing.) It has different types of instructions, aimed at different execution units. In the current incarnation, there are two sets of these in each processor. You give it bundles of three instructions, more broadly divided into groups. Instructions in a group don't depend on any earlier results calculated by the group, so they can be executed in parallel.
      • Rotating registers. This lets you make different iterations of the same loop work with different registers, to take advantage of EPIC more fully.
      • Predicated instructions. There are a bunch (16? 64? don't remember) of predicate bits, set by the CMP instruction and the like. Every instruction has an associated predicate. (p0 is hardcoded to true, so you normally don't notice.) So you can do conditional execution without jumping. More efficient, especially if it's just a few instructions that differ.

      If you just have a simple sequence of operations, each dependant on the one before, you can't really take advantage of these capabilities. (My code was like this. Even though performance wasn't my reason for writing assembly, it was a little disappointing that I couldn't play with the new toys.) If you're expecting these features to make Word start faster, you'll probably be disappointed.

      But if you're doing intensive computations in a tight loop, you can do amazing things. If you can get all the execution units working simultaneously, it will fly. And the features like rotating registers are designed to make that possible. You need a very good compiler or a very smart person to hand-tune it. You may need to recompile to tune if your memory latency changes (affecting how many iterations to run at once) or they come out with a new chip with more sets of execution units. But in a situation like this, none of that is a problem. They'll have applications designed to run as fast as possible on this machine. They may never be run anywhere else.

  16. What about SCO? by watsondk · · Score: 4, Funny


    do they have the nerve to go after this cluster?

    afterall they are trying extortion by lawyer against other large Linux users

  17. Sadly... by System.out.println() · · Score: 5, Funny

    "We sold the Inaniums! We sold the Inaniums!"

    "The Itaniums, however, remain unsold."

    *hopes that was not an actual mistake but rather a poorly conceived pun on "inane"...*

  18. Big Iron? by nacturation · · Score: 4, Funny

    Thunder sports 4,096 Itanium 2 processors in 1,024 nodes, some big iron by any standard.

    If the government gets a hold of that, we're going to need some big tinfoil...

    --
    Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
  19. Itanium vs Opteron by vlad_petric · · Score: 4, Insightful
    Itanium's instruction set is actually a lot more geared towards scientific computing than server benchmarks. Scientific stuff usually is made of very regular code, that is quite easily schedullable by the compiler. Server stuff is generally memory-bound and very irregular, so the processor usually gets less than one instruction executed per cycle - bundling instructions (static schedulling by the compiler) is completely pointless.

    "Big Iron" is a very vague term - server benchmarks behave very differently than scientific computation as far as performance is concerned; if you don't believe me I can easily point you to a couple of research papers analyzing them.

    The humongous on-die caches makes the Itanium perform well on servers, and definitely not the instruction-set architecture. So "WAS DESIGNED FOR" is only 50% true.

    --

    The Raven