BlueGene/L Puts the Hammer Down
OnePragmatist writes "Cyberinfrastructure Technology Watch is reporting that BlueGene/L has nearly doubled its performance to 135.3 Teraflops by doubling its processors. That seems likely to keep it at no. 1 on the Top500 when the next round comes out in June. But it will be interesting to see how it does when they finally get around to testing it against the HPC Challenge benchmark, which has gained adherents as being more indicative of how a HPC system will peform with various different types of applicatoins."
Obviously that number's based on an unrealistic, 100% efficient scaling factor. But still. The 137 TFlop is coming from 64,000 processors.
It's fun to think about what's just around the corner.
You're confused and lost. According to the top 500 rankings referenced by the article, the highest ranking Cray (an X1) puts out less than 6 TFLOPS.
So try... a cluster of 25+ X1s and then we'll talk =)!
What it would also be interesting is the power consumption and heat production figures of those systems when idle and under heavy load and also the load statistics.
In other words what is the cost in the quest for performance?
Well, if you had Windows on this machine (but be serious, please !)... This would only be one every 64 nodes. I explain why.
Blue Gene is known to run Linux. True, but... In fact, there are two types of nodes in Blue Gene. The computing nodes and the IO nodes. There is 1 IO node for 63 computing nodes. So for a 64000 nodes cluster, there are in fact only 1000 processors that runs Linux. The other 63000 are running an ultra light runtime environment (with MPI and other essential things) to maximize the speed. Even Linux is too heavy for that ! So windows would maybe not make the performances so bad... But I don't believe IBM didn't ever considered this option !
Several decades ago, a computer filled an entire room, and "I think there is a world market for maybe five computers"
A few decades ago, people thought Bill Gates was wrong when he reckoned there would soon be a time when there was a computer in every home.
Now, a supercomputer fills an entire room. So how long before someone reckons that there will come a time when there will be a supercomputer in every home?
"She's furniture with a pulse"
I think the whole point of using a machine of this size is that you write your custom application specifically with it in mind. I would be highly surprised if after leasing one, or a share on one, IBM doesn't provide documentation on how to create an application which takes advantage of the machine's architecture.
In contrast, SP is plenty of accuracy for things like rendering and game physics, since (very loosely speaking) as long as you're within a fraction of a pixel of the right answer you don't need any more accuracy.
I'd say the Cell architecture is very well suited for supercomputing as well as gaming, but the announced Cell implementation appears to me to be clearly targeted at the PS3. They'll have to come out with a "Cell HPC Edition" that has much better DP performance before they take over supercomputing. Not that I don't expect that they're working on that as we speak...
What's the scalar performance of one of these beasties?
Can an Athlon 64 / P4 beat it on scalar code? The whole HPC world has gotten boring since Cray died. Here's why I say that:
The Cray 1 had the best SCALAR and VECTOR performance in the world.
The Cray 2 was an ass kicker, the Cray 3 was a real ass kicker (if only they could build them reliably).
Cray pushed the boundaries, he pushed them too far at some points -- designing and trying to build machines that they couldn't make reliable.
So it'll be a cold day in hell before I get all fired up over the fact that someone else managed to glue together a bazillion 'killer micros' and win at Linpack...
Now if someone would bring back the idea of transputers, or we saw some *real* efforts at Dataflow and FP then I'd be excited. I'd love a PC with 8 small, simple, fast, in-order tightly bound cpus. Don't say CELL, all indications are that they will be a *real* PITA to program to get any decent performance out of.
I don't think they thought that at all (Let's build a supercomputer). I think the natural problem they were trying to solve.
This is because when you have the following conditions:
-- Lots of memory bandwidth needed
-- Fast floating point
-- Parallelizable code
-- Hand tuned kernels OK
You end up with something that looks lots like a supercomputer. You just turned your compute bound problem into an IO bound problem. We may want to revise that saying -- and say 'You turned your compute bound problem into a coding problem'. Supercomputer performance seems more bound by the feasibility of extracting decent performance from the iron than it used to be -- Judging by the stuff I have read by the old-hands.