North America's Fastest Linux Cluster Constructed

← Back to Stories (view on slashdot.org)

North America's Fastest Linux Cluster Constructed

Posted by CowboyNeal on Thursday May 13, 2004 @02:53PM from the where-was-the-lightning dept.

SeanAhern writes "LinuxWorld reports that 'A Linux cluster deployed at Lawrence Livermore National Laboratory and codenamed 'Thunder' yesterday delivered 19.94 teraflops of sustained performance, making it the most powerful computer in North America - and the second fastest on Earth.'" Thunder sports 4,096 Itanium 2 processors in 1,024 nodes, some big iron by any standard.

10 of 325 comments (clear)

Min score:

Reason:

Sort:

Re:"Most" powerful by 0xC0FFEE · 2004-05-13 15:04 · Score: 5, Insightful

If google's cluster is interconnected via ethernet, there is a whole range of computational problems it can't tackle. If you want to simulate a spatial phenomenon with lot of things going back and forth in a volume, you're bound to have a _lot_ of communications. The cost of the interconnect system in those simulation systems is often a substantial proportion of the total cost of the installation.
Re:Very great and all... by MBCook · 2004-05-13 15:06 · Score: 5, Insightful

I like the Opteron as much as the next guy and I'm no fan of the Itanic. But the fact is that for some types of calculations the Itanium can smoke Opterons. If you want the fastest, in many cases you want the Itanium. If you want the best value (which still performs quite close to the fastest), you want an Opteron. I don't remember which operations are better on which, so you'll have to look that up (or someone will reply with the answer).
Depending on budget, price (I wouldn't be suprised if Intel cut them a sweet deal to get this cluster publicized to help our their product's sales), and other factors, the Itanium could have been a good choice.
Especially if they were using software that had been designed for the Itanium (like they were replacing an older cluster) then they wouldn't have to port the software which would have saved real money.
I'm not a fan of Intel lately, but the Itanium isn't overpriced garbage no matter what. That smacks of fanboyism. Interesting you didn't add G5s to your list, BTW.
ALSO: Don't forget that the Itanium 2 was DESIGNED FOR big iron, while the Opteron was designed for servers and small iron. They can be used in other ways (you could run a web site off an Itanium 2), but the Itanium was designed for these kind of applications.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
apple's response will be interesting by Twid · 2004-05-13 15:10 · Score: 4, Insightful

If I calculate right, they are claiming an Rmax of 19.94 teraflops with 4096 processors.

The Virginia Tech cluster for Apple had an Rmax of 10.28 teraflops with 2200 processors.

So, the Itaninum 2 delivered 4.8 gigaflops per processor, the G5 delivered 4.6 gigaflops per processor.

This seems like a pretty poor showing for Itanium 2, overall. It's a much hotter chip than the Opteron or the G5, so cooling and power costs are likely much higher than a comparable apple cluster. The Xserve G5 is also likely cheaper than a similarly equipped Itanium 2 server, given that the Itanium 2 is $1398 per chip on Pricewatch, and a dual processor Xserve G5 cluster node is $2,999 list. Even with 4 cpus in a single box, I think the Itanium 2 server would easily top $6,000.

But anyway, good game to Lawrence Livermore. I'll be curious to see if Apple has another volley to fire before the top500 list closes for this round.

--
- "When you want something with all your heart, the entire universe conspires to give it to you" -Paulo Coelho
1. Re:apple's response will be interesting by prockcore · 2004-05-13 15:53 · Score: 4, Insightful
  
  This seems like a pretty poor showing for Itanium 2, overall.
  
  It does? You know that clustered computing doesn't scale linearly. If virginia tech were to double the amount of processors used, they wouldn't double their performance.
2. Re:apple's response will be interesting by Anonymous Coward · 2004-05-13 17:18 · Score: 4, Insightful
  
  Actually, there's more to it than that. Virginia Tech's machine only gets ~55% of its peak performance, whereas Thunder gets 87%. Given that Thunder has twice as many processors, that's an EXCELLENT showing. Remember, the actual work that's going to run on Thunder won't scale anywhere near as well as the easily scaled LINPACK benchmark, so the performance gap between "benchmark" and "real world" will only get wider in practice.
  
  Thunder is an absolutely remarkable machine.
Re:"Most" powerful by smitty45 · 2004-05-13 15:18 · Score: 4, Insightful

Powerful = fastest computation, not biggest. A roomfull of Chevettes do not make a Corvette.
Re:Did hell freeze over? by geek · 2004-05-13 15:27 · Score: 4, Insightful

I grew up in Livermore, the lab was some 500 yards from my bedroom window. They work on a lot more than nuke simulations, including alternate fuels (my brother in law was driving a hydrogen fuel car from the lab 10 years ago as a test), laser technology and about a million other things. Why is it people like you who hear "Nuke" rant on and on like biased little children and post inflamatory things like this?

The lab is a GOOD thing damnit. Do you even know what nukes are? What nuclear research has done for us? Grow up man.
before everyone starts shouting at once... by painehope · 2004-05-13 15:33 · Score: 4, Insightful

yes, they're hot as hell and eat power the way oprah eats twinkies, and yes Intel has made a poor handling of the Itanium line, but the Itanium architecture is very interesting, and is actually very appropriate for a HPC environment. Not the part of the HPC market that clusters dominate, but the segment that Cray, SGI, HP Alphaservers, etc. have traditionally dominated. The segment that doesn't give a shit about cooling, power consumption, or price-performance, but who just need to get the job done as quickly as possible.

Some of the coolest features of the Itanium are also some of the reasons why a lot of people don't want to use it. The EPIC ISA, for example. It was designed ( along w/ the physical hardware ) to expose a lot of the internal workings of the processor to the user. But rather than recompile and re-optimize their code, people would rather bitch about migration. That's fine for workstations and servers, but in an HPC environment, you want the nifty features, you want to occasionally hand-tune code segments in assembler, etc.

Anyways, I'm not a fanboy ( well, maybe an AMD and MIPS fanboy ), just wanted to get in a few honest points before everyone started shooting holes in the Itanic.

--
PC moderators can suck my White pierced, tattooed dick. If you think pride == hate, s/dick/Aryan meat mallet/g.
Itanium vs Opteron by vlad_petric · 2004-05-13 16:25 · Score: 4, Insightful

Itanium's instruction set is actually a lot more geared towards scientific computing than server benchmarks. Scientific stuff usually is made of very regular code, that is quite easily schedullable by the compiler. Server stuff is generally memory-bound and very irregular, so the processor usually gets less than one instruction executed per cycle - bundling instructions (static schedulling by the compiler) is completely pointless.
"Big Iron" is a very vague term - server benchmarks behave very differently than scientific computation as far as performance is concerned; if you don't believe me I can easily point you to a couple of research papers analyzing them.
The humongous on-die caches makes the Itanium perform well on servers, and definitely not the instruction-set architecture. So "WAS DESIGNED FOR" is only 50% true.

--
The Raven
Re:Very great and all... by SuperQ · 2004-05-13 23:55 · Score: 4, Insightful

the problem is not that you couldn't get the processors, the problem is scale.

A system like this will use a high-speed interconnect, not gige. The popular choice right now is infiniband, and that stuff isn't cheap, and also has limits to the number of ports per IB switch. The system at LLNL has 4 procs per node, which reduces the number of IB switches involved. 5000 dual proc (you suggest 248 proc) machines would require 2500 IB ports, instead of 1024.

now if you considered the opteron 848 ($1300), in 8proc nodes, that would be something to think about, reduce the number of IB ports in half, and be able to double the processors.

the other consideration is also processor scale. the 27% per CPU is signifigant, because even with dual proc SMP, you loose some % of the CPU time. There was a posting on an article about how processors scale this way. I forget how the principle works.