BlueGene/L Puts the Hammer Down

← Back to Stories (view on slashdot.org)

BlueGene/L Puts the Hammer Down

Posted by Hemos on Thursday March 24, 2005 @07:29PM from the welcome-to-the-machine dept.

OnePragmatist writes "Cyberinfrastructure Technology Watch is reporting that BlueGene/L has nearly doubled its performance to 135.3 Teraflops by doubling its processors. That seems likely to keep it at no. 1 on the Top500 when the next round comes out in June. But it will be interesting to see how it does when they finally get around to testing it against the HPC Challenge benchmark, which has gained adherents as being more indicative of how a HPC system will peform with various different types of applicatoins."

9 of 152 comments (clear)

Min score:

Reason:

Sort:

Wait another year... by Anonymous Coward · 2005-03-24 20:00 · Score: 5, Interesting

That's like, what, 527 Cell processors?
Obviously that number's based on an unrealistic, 100% efficient scaling factor. But still. The 137 TFlop is coming from 64,000 processors.
It's fun to think about what's just around the corner.
Re:But What about the Crays? by yennieb · 2005-03-24 21:59 · Score: 2, Interesting

You're confused and lost. According to the top 500 rankings referenced by the article, the highest ranking Cray (an X1) puts out less than 6 TFLOPS.

So try... a cluster of 25+ X1s and then we'll talk =)!
More than Teraflops by gtsili · 2005-03-24 22:39 · Score: 2, Interesting

What it would also be interesting is the power consumption and heat production figures of those systems when idle and under heavy load and also the load statistics.

In other words what is the cost in the quest for performance?
Re:Windows HPC by Jules+Labrie · 2005-03-24 22:44 · Score: 2, Interesting

Well, if you had Windows on this machine (but be serious, please !)... This would only be one every 64 nodes. I explain why.

Blue Gene is known to run Linux. True, but... In fact, there are two types of nodes in Blue Gene. The computing nodes and the IO nodes. There is 1 IO node for 63 computing nodes. So for a 64000 nodes cluster, there are in fact only 1000 processors that runs Linux. The other 63000 are running an ultra light runtime environment (with MPI and other essential things) to maximize the speed. Even Linux is too heavy for that ! So windows would maybe not make the performances so bad... But I don't believe IBM didn't ever considered this option !
One in every home by BeerCat · 2005-03-24 23:18 · Score: 2, Interesting

Several decades ago, a computer filled an entire room, and "I think there is a world market for maybe five computers"

A few decades ago, people thought Bill Gates was wrong when he reckoned there would soon be a time when there was a computer in every home.

Now, a supercomputer fills an entire room. So how long before someone reckons that there will come a time when there will be a supercomputer in every home?

--
"She's furniture with a pulse"
hpc test? other types of apps? by Bongzilla · 2005-03-24 23:57 · Score: 1, Interesting

I think the whole point of using a machine of this size is that you write your custom application specifically with it in mind. I would be highly surprised if after leasing one, or a share on one, IBM doesn't provide documentation on how to create an application which takes advantage of the machine's architecture.

--
;///////////////////////////////////////////////// /
Re:Cell vs HPC by shizzle · 2005-03-25 00:57 · Score: 3, Interesting

2) DP Matrix-Matrix multiplies. IBM added DP support to their VMX set for Cell (though at 10% the execution rate), check.
[...]
...clearly Cell is meant as a supercomputer first and a PS3 second.
I think you've refuted your own argument there: double precision floating point performance is critical for true supercomputing. (In supercomputing circles DP and SP are often referred to as "full precision" and "half precision", respectively, which should give you a better idea of how they view things.)
In contrast, SP is plenty of accuracy for things like rendering and game physics, since (very loosely speaking) as long as you're within a fraction of a pixel of the right answer you don't need any more accuracy.
I'd say the Cell architecture is very well suited for supercomputing as well as gaming, but the announced Cell implementation appears to me to be clearly targeted at the PS3. They'll have to come out with a "Cell HPC Edition" that has much better DP performance before they take over supercomputing. Not that I don't expect that they're working on that as we speak...
Scalar performance -- Unimpressed! by tarpitcod · 2005-03-25 03:28 · Score: 3, Interesting

What's the scalar performance of one of these beasties?

Can an Athlon 64 / P4 beat it on scalar code? The whole HPC world has gotten boring since Cray died. Here's why I say that:

The Cray 1 had the best SCALAR and VECTOR performance in the world.

The Cray 2 was an ass kicker, the Cray 3 was a real ass kicker (if only they could build them reliably).

Cray pushed the boundaries, he pushed them too far at some points -- designing and trying to build machines that they couldn't make reliable.

So it'll be a cold day in hell before I get all fired up over the fact that someone else managed to glue together a bazillion 'killer micros' and win at Linpack...
Now if someone would bring back the idea of transputers, or we saw some *real* efforts at Dataflow and FP then I'd be excited. I'd love a PC with 8 small, simple, fast, in-order tightly bound cpus. Don't say CELL, all indications are that they will be a *real* PITA to program to get any decent performance out of.
Re:Cell vs HPC by tarpitcod · 2005-03-25 03:53 · Score: 2, Interesting

I don't think they thought that at all (Let's build a supercomputer). I think the natural problem they were trying to solve.

This is because when you have the following conditions:

-- Lots of memory bandwidth needed
-- Fast floating point
-- Parallelizable code
-- Hand tuned kernels OK

You end up with something that looks lots like a supercomputer. You just turned your compute bound problem into an IO bound problem. We may want to revise that saying -- and say 'You turned your compute bound problem into a coding problem'. Supercomputer performance seems more bound by the feasibility of extracting decent performance from the iron than it used to be -- Judging by the stuff I have read by the old-hands.