NNSA Supercomputer Breaks Computing Record
Lecutis writes "National Nuclear Security Administration (NNSA) Administrator Linton F. Brooks announced that on March 23, 2005, a supercomputer developed through the Advanced Simulation and Computing program for NNSAs Stockpile Stewardship efforts has performed 135.3 trillion floating point operations per second (teraFLOP/s) on the industry standard LINPACK benchmark, making it the fastest supercomputer in the world."
wait till its fully online.
-
Just imagine running Fractint on this puppy!
Paleotechnologist and connoisseur of pretty shiny things.
There was another machine that had already beaten that record, but unfortunately failed a diagnostic test for banned substances...
> has performed 135.3 trillion floating point operations per second (teraFLOP/s) on the industry standard LINPACK benchmark, making it the fastest supercomputer in the world."
Did you read the fucking article?
"This performance was achieved at Lawrence Livermore National Laboratory (LLNL) at only the half-system point of the IBM BlueGene/L installation. Last November, just one-quarter of BlueGene/L topped the TOP500 List of the world's top supercomputers."
See, this is the SAME supercomputer that has already topped the list last November, so the latest record did NOT make it the fastest supercomputer in the world.
It already had been the fastest supercomputer in the world.
The closest I've heard of is the Cray X1E, but even that only claims 147 TFLOPS.
Just for a point of reference, does anybody know how many floating point operations a 3.2ghz processor can do per seccond?
I know its not 3.2billion because most micro operations take at least 3 or 4 clock cycles.
FYI the top 500 supercomputers list is maintained at http://www.top500.org/.
This is Blue Gene. Read the article...
The X1E isn't intended to be a fastest-in-the-world supercomputer. It's intended to be a low-cost scalable vector system. The fact that it's fast is great, but it's not its main design feature.
Now, the X2, on the other hand, is a whale. They're talking 150 TFLOPS at roll-out next year (unimpressive) and 300 TFLOPS after the block 10 update the year after that (very impressive).
Of course, the X2 isn't working yet, so who the hell knows. But it's fun to think about.
It may be sad that we live in a world where nuclear weapons research is driving the computing power, but it doesn't mean that the power of BlueGene/L isn't going to be used for thousands of other peaceful scientific applications, too.
Depends on the problem and the memory performance as much as it does on the GPU. There's no good answer to that question. For kicks though, this paper has some measurements for matrix multiply using ATLAS. It's comparing a Pentium 4 to an NV40 GPU. The P4 wins at about 7 GFlops, and the NV40 loses due to horrible memory performance. That's pretty ironic considering that the NV40 has quite a few more FPU's, and that they're in parallel. It's a good example of why you can't ever say for sure how a processor's going to perform until you test it on a real workload.
One would say this supercomputer is already more than twice as smart as Data!
You're right, I wouldn't steal a car. But if it were possible, I sure as hell would download one!
Having massive computing power in the hands of Lawrence Livermore scientists reduces or even eliminates the need for U.S. nuclear forces to actually detonate nuclear and thermonuclear explosions.
Of course, some people would prefer to see the United States undertake unilateral nuclear disarmament, something they've been advocating since SANE/FREEZE was telling us we could trust the Soviet Union in the 1980s. Only today they claim we can trust Kim Il Jong and the mullahs of Iran more than the democratically elected government of the United States, just as they claimed we could trust Leonid Breshnev and Yuri Andropov more than we could trust Ronald Reagan. Their views are every bit as ill-conceived now as they were then.
Lawrence Person (lawrencepersonh@gmailh.com (remove all "h"s to mail)
http://www.lawrenceperson.com/
That's not how linpack works. Sure, increasing your number of nodes will give definite performance advantages to course-grained, embarassingly parallel applications, but Linpack is not one of these applications. As well, Linpack should not be used as a guide for raw floating point performance, but is much better suited to gauge throughput.
Linpack does its benchmarks using a more fine-grained algorithm, creating lots of communications for Message Passing to share segments of dense matrices for rather large linear systems. Not only is the number of nodes a factor, but so is the interconnect speed. If that cluster was using GigE for its interconnect, its Linpack benchmarks would not be nearly as impressive. Haven't RTFA but its likely that BlueGene/L is using Myranet or Infinband for its interconnect (or possibly a more proprietary backplane style interconnect, though that cluster is way too big for that).
These latest generations of high-speed interconnects (esp. Infinband) have brought clusters closer to the point of being near shared-memory performance and hence is more of a throughput test than anything else.
This description of the HPL benchmark (The "official" name for the Linpack benchmark) should provide some clarity as to how memory-dependent Linpack actually is:
The algorithm used by HPL can be summarized by the following keywords: Two-dimensional block-cyclic data distribution - Right-looking variant of the LU factorization with row partial pivoting featuring multiple look-ahead depths - Recursive panel factorization with pivot search and column broadcast combined - Various virtual panel broadcast topologies - bandwidth reducing swap-broadcast algorithm - backward substitution with look-ahead of depth 1.
http://www.netlib.org/benchmark/hpl/
They took a lot of time to get Linpack to be less shared-memory dependent, like adding the swap-broadcast algorithm (which i'm fairly certain was absent in the old mainframe version of Linpack), to make it more "fair" to run on a cluster versus a shared memory set up. However, on a typical cluster, Linpack can push your interconnect pretty hard, esp. if you are stuck on GigE. However, Linpack has _lots_ of settings and parameters to "tune" the benchmark for your particular cluster.
My point: Linpack/HPL is not an overall flops benchmark for a cluster. It measures the performance not only of double precision CPU performance, but also the performance of a cluster's interconnect.
"making it the fastest supercomputer in the world"
Or rather the fastest supercomputer with published LINPACK results. There are a number of reasons that agencies with supercomputers might not want to publish results.
Government of the people, by corporate executives, for corporate profits.
135.3 trillion floating point operations per second
Does this mean we can't slashdot it?
-Alex. http://bit.ly/1iVPtfA