NNSA Supercomputer Breaks Computing Record

← Back to Stories (view on slashdot.org)

NNSA Supercomputer Breaks Computing Record

Posted by timothy on Sunday April 3, 2005 @05:16AM from the we're-simulating-a-stockpile dept.

Lecutis writes "National Nuclear Security Administration (NNSA) Administrator Linton F. Brooks announced that on March 23, 2005, a supercomputer developed through the Advanced Simulation and Computing program for NNSAs Stockpile Stewardship efforts has performed 135.3 trillion floating point operations per second (teraFLOP/s) on the industry standard LINPACK benchmark, making it the fastest supercomputer in the world."

22 of 266 comments (clear)

Min score:

Reason:

Sort:

and its only half the machine too! by rebelcool · 2005-04-03 05:21 · Score: 5, Informative

wait till its fully online.

--
-
1. Re:and its only half the machine too! by einhverfr · 2005-04-03 12:02 · Score: 2, Informative
  
  You have a point. Nuclear weapons were a heavily stabilizing force in the cold war because they made it impossible for any leader to consider all-out war with the other country. THings are more complex now, and nuclear proliferation is a different issue. Yet it is not a simple issue. On one hand, nuclear weapons continue to help prevent horrible war crimes like the firebombings of civilian population centers (Dresden, Tokyo) because it is simply too risky to do this. Yet they themselves are effective simply because they represent this risk.
  
  And the real risk is what happens if a group which is unbeholden to a public body, such as an international terrorist group, obtains such a device. They would be able to strike with one of these weapons but be immune to any counterattack.
  
  --
  
  LedgerSMB: Open source Accounting/ERP
From the press release... by Zebra_X · 2005-04-03 05:22 · Score: 3, Informative

This performance was achieved at Lawrence Livermore National Laboratory (LLNL) at only the half-system point of the IBM BlueGene/L installation. Last November, just one-quarter of BlueGene/L topped the TOP500 List of the world's top supercomputers.

Is there anything that will be able to touch this when it's complete?
1. Re:From the press release... by As+Seen+On+TV · 2005-04-03 05:55 · Score: 5, Informative
  
  The X1E isn't intended to be a fastest-in-the-world supercomputer. It's intended to be a low-cost scalable vector system. The fact that it's fast is great, but it's not its main design feature.
  
  Now, the X2, on the other hand, is a whale. They're talking 150 TFLOPS at roll-out next year (unimpressive) and 300 TFLOPS after the block 10 update the year after that (very impressive).
  
  Of course, the X2 isn't working yet, so who the hell knows. But it's fun to think about.
Blue Gene? by eth8686 · 2005-04-03 05:22 · Score: 2, Informative

Didn't IBM push Blue Gene to 180'something teraflops recently?? News story herer
1. Re:Blue Gene? by EBorisch · 2005-04-03 05:40 · Score: 5, Informative
  
  This is Blue Gene. Read the article...
Did you RTFA? by Donny+Smith · 2005-04-03 05:24 · Score: 5, Informative

> has performed 135.3 trillion floating point operations per second (teraFLOP/s) on the industry standard LINPACK benchmark, making it the fastest supercomputer in the world."

Did you read the fucking article?

"This performance was achieved at Lawrence Livermore National Laboratory (LLNL) at only the half-system point of the IBM BlueGene/L installation. Last November, just one-quarter of BlueGene/L topped the TOP500 List of the world's top supercomputers."

See, this is the SAME supercomputer that has already topped the list last November, so the latest record did NOT make it the fastest supercomputer in the world.

It already had been the fastest supercomputer in the world.
Link to the list by dnaboy · 2005-04-03 05:39 · Score: 4, Informative

FYI the top 500 supercomputers list is maintained at http://www.top500.org/.
Re:hmmmmm... by Yartrebo · 2005-04-03 05:43 · Score: 3, Informative

With SSE instructions, you can process 4 floats at once, so I'm guessing that 3.2 GHz processor can do a few gigaflops.
Dupe by karvind · 2005-04-03 05:45 · Score: 3, Informative

Didn't we cover this before ?
This *is* Blue gene. by daveschroeder · 2005-04-03 05:49 · Score: 2, Informative

RTFA

Or, at least the article's title:

"NNSA Supercomputer Breaks Computing Record: Exceeds 100 TERAFLOPS DOE/NNSA and IBM partnership on BlueGene/L, a tool for national security"
Re:More important issues by tgamblin · 2005-04-03 06:05 · Score: 5, Informative

Despite the fact that BlueGene/L is being built to simulate nukes, this kind of research does impact some of these other issues, and there is government money going into them. Here are some examples... The National Center for Atmospheric Research uses supercomputers to simulate effects of pollution and global warming, and projects like LEAD are using grids with supercomputers attached to predict weather. Check out some of the projects at RENCI, as well. There's NIH-sponsored genetic research in addition to the weather stuff.
It may be sad that we live in a world where nuclear weapons research is driving the computing power, but it doesn't mean that the power of BlueGene/L isn't going to be used for thousands of other peaceful scientific applications, too.
Re:hmmmmm... by tgamblin · 2005-04-03 06:13 · Score: 5, Informative

Depends on the problem and the memory performance as much as it does on the GPU. There's no good answer to that question. For kicks though, this paper has some measurements for matrix multiply using ATLAS. It's comparing a Pentium 4 to an NV40 GPU. The P4 wins at about 7 GFlops, and the NV40 loses due to horrible memory performance. That's pretty ironic considering that the NV40 has quite a few more FPU's, and that they're in parallel. It's a good example of why you can't ever say for sure how a processor's going to perform until you test it on a real workload.
Re:hmmmmm... by Swedentom · 2005-04-03 06:19 · Score: 2, Informative

An Apple Xserve G5 does 30+ gigaflops.

--
Sig Nature
Re:hmmmmm... by fafalone · 2005-04-03 06:36 · Score: 2, Informative

A 3.2GHz Intel Xeon processor performs 6.4gflops, but clock speed isn't the only determining factor.
Re:Neat by brsmith4 · 2005-04-03 07:03 · Score: 5, Informative

That's not how linpack works. Sure, increasing your number of nodes will give definite performance advantages to course-grained, embarassingly parallel applications, but Linpack is not one of these applications. As well, Linpack should not be used as a guide for raw floating point performance, but is much better suited to gauge throughput.

Linpack does its benchmarks using a more fine-grained algorithm, creating lots of communications for Message Passing to share segments of dense matrices for rather large linear systems. Not only is the number of nodes a factor, but so is the interconnect speed. If that cluster was using GigE for its interconnect, its Linpack benchmarks would not be nearly as impressive. Haven't RTFA but its likely that BlueGene/L is using Myranet or Infinband for its interconnect (or possibly a more proprietary backplane style interconnect, though that cluster is way too big for that).

These latest generations of high-speed interconnects (esp. Infinband) have brought clusters closer to the point of being near shared-memory performance and hence is more of a throughput test than anything else.

This description of the HPL benchmark (The "official" name for the Linpack benchmark) should provide some clarity as to how memory-dependent Linpack actually is:

The algorithm used by HPL can be summarized by the following keywords: Two-dimensional block-cyclic data distribution - Right-looking variant of the LU factorization with row partial pivoting featuring multiple look-ahead depths - Recursive panel factorization with pivot search and column broadcast combined - Various virtual panel broadcast topologies - bandwidth reducing swap-broadcast algorithm - backward substitution with look-ahead of depth 1.

http://www.netlib.org/benchmark/hpl/

They took a lot of time to get Linpack to be less shared-memory dependent, like adding the swap-broadcast algorithm (which i'm fairly certain was absent in the old mainframe version of Linpack), to make it more "fair" to run on a cluster versus a shared memory set up. However, on a typical cluster, Linpack can push your interconnect pretty hard, esp. if you are stuck on GigE. However, Linpack has _lots_ of settings and parameters to "tune" the benchmark for your particular cluster.

My point: Linpack/HPL is not an overall flops benchmark for a cluster. It measures the performance not only of double precision CPU performance, but also the performance of a cluster's interconnect.
Re:AMazing by brsmith4 · 2005-04-03 07:14 · Score: 2, Informative

A slightly larger dose of logic would tell you that NASA has nothing to do with this cluster, that it belongs to the NNSA or the National Nuclear Security Agency. They are probably more interested in testing new reactor designes or running simulations to demonstrate the effects of an aircraft crashing into one of their reactor domes (though I honestly believe that no one really believes that will happen).
Re:Human Intelligence? by myukew · 2005-04-03 08:40 · Score: 2, Informative

No. The human brain has about 10^11 neurons, each with about 1000 connections to other neurons. Every neuron can fire about 200 times a second. So scientists expect the human brain to have about 20 PFlop/s. Still a little faster than blue gene...

--
See pictures of tits
Re:DOE's Senior Activity Center by T5 · 2005-04-03 09:38 · Score: 2, Informative

DOE's stewardship program is not for retired scientists, but current ones. The laboratory directors at the nuclear labs (Sandia/LLNL/maybe others) are required to certify the stockpile as being ready to go each year. Their supercomputers are the only way to test the aging stockpile without actually detonating a few to see which designs age better than others.

And let's remember that almost everything in the current arsenal was designed and actually tested, not just worked up via computer. It takes a whole lot more computing power to run the thermodynamic and nuclear codes for simulation than it does to validate designs.
Re:Neat by Anonymous Coward · 2005-04-03 09:44 · Score: 1, Informative

"Blue Gene/L uses a federation of five different networks that were engineered specifically for this system. The networks are assembled in a modified hypercube configuration which is optimized for inter-rack communications as well as in rack messages."

Straight from my notes during the product brief at Yorktown by George Chiu, Senior Manager, Advanced Server Hardware Systems Research Division
Re:Neat by kayak334 · 2005-04-03 11:35 · Score: 2, Informative

Myranet or Infinband

Just some minor corrections and informaton for those interested.

Myricom is the company, Myrinet is the protocol. Infiniband is an open protocol. Myrinet has a maximum speed of 2.2Gb/sec while Infiniband can scale up to 30Gb/sec on a 16x PCI-E card and a 12x port on the switch.

As for what BlueGene/L uses, I don't think I'm at liberty to discuss that.
Re:Neat by brsmith4 · 2005-04-03 11:50 · Score: 2, Informative

Were you correcting my spelling? Because I always make that mistake (myranet... it's myrinet damn it!). You know what I meant though ;) It looks like BlueGene/L is using a hybrid backplane/hypertorus interconnect where a whole bunch of "machines" (more like system-on-a-chip) are connected via a backplane, then that case of "machines" is connected to another case in the same rack on some number of layers of interconnect. Then the racks are connected using some other protocol. Though you may not "be at liberty" to discuss this, the top500 site already disclosed an ample amount of information on the subject for any beowulfer to get the general idea of what type of interconnect topology/setup BlueGene/L is using.

And I quote:

The nodes are interconnected through multiple complementary high-speed low-latency networks, including a 3D torus network and a combining tree network. The physical machine architecture is targeted to be most closely tied to the 3D torus, a simple 3-dimensional nearest neighbor interconnect which is "wrapped" at the edges. An independent combining tree network provides for fast global operations, such as global max or global sum.

http://www.top500.org/sublist/System.php?TB=2&id=7 101

Enjoy.