IBM Sets Supercomputer Speed Record

← Back to Stories (view on slashdot.org)

IBM Sets Supercomputer Speed Record

Posted by timothy on Wednesday September 29, 2004 @12:49AM from the hi-gene dept.

T.Hobbes writes "IBM's BlueGene/L has set a new speed record at 36.01 TFlops, beating the Earth Simulator's 35.86 TFlops, according to internal IBM testing. 'This is notable because of the fixation everyone has had on the Earth Simulator,' said Dave Turek, I.B.M.'s vice president for the high-performance computing division. The AP story is here; the NY Times' story is here."

14 of 308 comments (clear)

Min score:

Reason:

Sort:

Tecord? by el_benito · 2004-09-29 00:54 · Score: 5, Interesting

A new tecord?!? That's timpossible! But more seriously, does anyone know if there's an impartial 3rd party that ever confirms these measurements? I'm all for improving technology, but how do they verify their "tecords"?

--
http://liquidben.com - Aspiring to an 'under construction' gif
I read all three articles but couldn't find... by halivar · 2004-09-29 00:56 · Score: 2, Interesting

...what operating system it uses. Anybody know?
1. Re:I read all three articles but couldn't find... by Harbinjer · 2004-09-29 01:04 · Score: 2, Interesting
  
  Actually I do think its linux. I live in Rochester and know some of the IBMers.
  
  I wonder if they let normal people see this thing? I'll ask.
36 TFlops ? by MadX · 2004-09-29 00:56 · Score: 4, Interesting

I wonder if that is sustained ??
I know that when the Mac G5 Cluster was developed they claimed tremendous speed, but when the sustained rate was calculated, it turned out to be much lower ...
Huh? by attam · 2004-09-29 01:02 · Score: 2, Interesting

From TFA:
the Blue Gene/L system next year with 130,000 processors and 64 racks, half a tennis court in size.

The prototype for which IBM claimed the speed record is located in Rochester, Minn., has 16,250 processors and takes up eight racks of space.

So does this mean the finished product, with almost 10x as many procs will be much faster still? Or am I reading this wrong?
Off the shelf configuration by erick99 · 2004-09-29 01:05 · Score: 3, Interesting

From the article:
Unlike the more specialized architecture of the Japanese supercomputer, IBM's BlueGene/L uses a derivative of commercially available off-the-shelf processors. It also uses an unusually large number of them. The resulting computer is smaller and cooler than other supercomputers, reducing its running costs, said Hirschfeld. He did not have a dollar figure for how much lower Blue Gene's costs will be than other supercomputers.
This is the most interesting part of the article to me. Makers of supercomputers are going to go back and forth for the speed record. However, holding the speed record with off the shelf components seems like a separate achievement in and of itself. The article did mention, however, that the IBM system is not as capable as other supercomputers.

--
http://www.busyweather.com/
The most interesting part: by onetrueking · 2004-09-29 01:11 · Score: 5, Interesting

From the NYTime article:

"The new system is notable because it packs its computing power much more densely than other large-scale computing systems. BlueGene/L is one-hundredth the physical size of the Earth Simulator and consumes one twenty-eighth the power per computation, the company said."

1/100th the size and 1/28th the power. Now if that isn't a beautiful thing, I don't know what is.
Smart machines by fionbio · 2004-09-29 01:29 · Score: 5, Interesting

I've heard that the neural network of human brain has calculation speed of 4.4 TFLOPS. How soon these machines will start to THINK? Seems like what we need now is just more storage capacity and some well-written "thinking" software...
1. Re:Smart machines by Doesn't_Comment_Code · 2004-09-29 01:59 · Score: 4, Interesting
  
  You're getting into some pretty deep issues now. Can a computer ever think? How would we know if it was thinking? At what point does the computer start thinking instead of just following instructions. No matter how complex it's instructions are or how fast it executes them, isn't it still just following instructions? What about us? Are we just following instructions?
  
  Timeout-- my head hurts.
  
  Which brings me to my next point. If computer ever could think, it would eventually start to think about how it thinks... And then it would overheat or explode.
  
  --
  
  Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
Re:way to catch up guys. by joib · 2004-09-29 01:56 · Score: 2, Interesting

I expect a lot of this system's performance depends on the scalability of the system software, and the compilers / libraries.

The blue gene is an all out MPI machine. System software scalability is not that crucial, since every compute node kernel only controls 2 cpu:s. With this modest number of cpu:s per node, I'd guess it doesn't require any extreme trickery from the scalability point of view to achieve near hardware performance.

Software-wise, all the scalability problems lie in the design of the applications.
Va. Tech cluster not on current Top 500 list by Troy+Baer · 2004-09-29 02:49 · Score: 3, Interesting

The peak of VTs System X cluster was about 17 Tflops, and the sustained rate was just over 10 (which rendered it the third place on the Top500 list).

Except that it's not on the most recent Top 500 list anywhere.
Remember how Va. Tech replaced all 1100 G5 nodes with G5 XServes a few months ago? Well, when you do something like that, you have to rerun and resubmit the benchmark. Va. Tech were not able to get the machine back together soon enough to rerun the benchmark in time to make the last list; there's even a big caveat about it on the Top 500 home page.
(It's also not clear that the original version of the Va. Tech machine ever did anything other than run that benchmark, but that's another matter.)
--Troy

--
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Re:Thats nice for IBM but real computing power.. by museumpeace · 2004-09-29 03:22 · Score: 2, Interesting

oh, if forgot to mention: I don't think you can fit Blue Gene into any UAV unless you count running a Boeing 747 on autopilot and stuffing it to the gills with computer and diesel generators. the computers proposed at HPEC are mostly very mobile.

--
SLASHDOT: news for people who can't concentrate on work or have no life at all and got tired of yelling back at the TV.
Re:way to catch up guys. by flaming-opus · 2004-09-29 03:29 · Score: 2, Interesting

That's only sort-of true. Blue Gene, like asci-red, cri t3e, paragon, etc use a microkernel OS to control the compute nodes. This is basically a couple of network stacks that allow the application to use the interconnect network, and some hooks for the larger OS which runs on dedicated OS-nodes. The microkernel mostly just gets out of the way, and lets the application run balls-out on the compute nodes. Blue Gene was even designed so cleverly that MPI barriers and all-reduce are implemented as part of the interconnect network.

But then the application does something like write(file, offset, &buffer). That can't be handled by microkernel, and must be handled by an OS node. The system call might even be handled by a different node from the I/O node connected to the disk drive. The system call is performed by a "server" on the OS node that may be part of that node's operating system, or might be a user-space daemon. Since there is only 1 thread on the compute node, it blocks until the i/o request is serviced.

This is not a hard thing to do if there are 60 compute nodes, 2 OS nodes and 2 i/o nodes. But with 100,000 compute nodes, there would have to be hundreds or thousands of OS nodes. Far too many to run with a monolith kernel. Scalability within this pool of OS nodes is a tricky problem. Previous MPP designs have demonstated that it's really easy to get the common case working, but much much harder for a few corner cases, like concurrent un-structured writes to the same file. (which tends to happen at the beginning and end of many big MPI programs. - Remember that you don't solve a problem any faster if the machine runs 30Tflops for 2 days, and then spends 25 days putting the output data together)

On a machine that large, check-point / restart is a big deal. Node failures are going to be common when that many components are involved. You end up with huge amounts of data, all of which needs to be written quickly, while the machine sits idle.

These problems are well understood. MPP designers have been wrestling with them for almost 20 years now. But any new system will have some kinks and bugs to deal with. I'm sure IBM is working hard to get them solved. Thay may have it all working already, for all I know.

You're right though, that the performance of the inner loops depends a lot on the application developers.
IBM vs. SGI by nboscia · 2004-09-29 05:08 · Score: 2, Interesting

I wonder how this compares to the one NASA is building, which is being collaborated with Intel and SGI. Since you can't base performance simply on the number of processors, it should be interesting.