LLNL/RPI Supercomputer Smashes Simulation Speed Record
Lank writes "A team of computer scientists from Lawrence Livermore National Laboratory and Rensselaer Polytechnic Institute have managed to coordinate nearly 2 million cores to achieve a blistering 504 billion events per second, over 40 times faster than the previous record. This result was achieved on Sequoia, a 120-rack IBM Blue Gene/Q normally used to run classified nuclear simulations. Note: I am a co-author of the coming paper to appear in PADS 2013."
Was i the only one who thought for a second that this was about a raspberry pi cluster?
I was already running Warp 3 in 1995! :-)
(OS/2 Warp 3, to be exact)
The Tao of math: The numbers you can count are not the real numbers.
I clicked hoping to read the paper, but the actual paper doesn't seem to be posted, only the abstract. The ACM copyright policy explicitly allows authors to "Post the Accepted Version of the Work on ... the Author's home page", so there is no legal barrier to the authors putting a PDF online. Doing so would of course increase readership of the paper, so ought to benefit everyone.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
No, those events are Who. Simulating is How. What is calculated.
Well.. maybe. Or Maybe not. But Definitely not sort of.
Cats.
This is a simulation, events. The summary doesn't say what the events are, but probably more complicated than just testing a key.
Besides, brute-forcing a key wouldn't be best done on general-purpose or even GPU. An ASIC would be the fastest, and you can be confident such chips would be easily within the capability of any major and a lot of not-to-major governments. So you're looking at a chip that can do, as a back-of-the-envelope, a key every cycle and clocked at 1.2GHz - standard for a lot of systems, as a sort of performance-per-watt peak. Times 64 cores per chip, times eight chips per PCI-e card, times eight processor cards per 2U case, times 42/3=14 systems per rack (leave space for cooling and switch), that's 1.2 * 64 * 8 * 8 * 14 = 68812 GK/s per rack.
I'd be interested in seeing if this system could run our full Poliovirus simulations (consisting of around 3.5 million atoms). I've run our simulations on the BlueGene/Q at VLSCI using 32,768 cores (65,536 threads) and have been getting a very respectable 11.2 nanoseconds per day of simulation data using NAMD. Some data on our full virus simulations can be found here... (VIDRL supercomputer simulation page). Hey Lank, maybe you can help me figure out a way to crack the millisecond mark for our full-virus sims??? Great work and cheers from down under :-)
The title to this piece is wrong. The supercomputer in question was Sequoia, the Blue Gene/Q supercomputer located at Lawrence Livermore National Laboratory. Some preliminary work was done on a smaller RPI BG/Q machine, however. (I am a coauthor of the paper.)
It runs a custom IBM OS specifically designed for Blue Gene/Q. It proveds an API very similar to Linux, but with some restrictions, e.g. static limits on threads, no process forking, and custom MPI messaging instead of a TCP/IP stack.
The simulation was a well-known parallel discrete event benchmark called PHold. It is not a model of any particular physical system, but is more of a stress and scalability test for the simulator, in this case the ROSS simulator developed at RPI. PHold has particularly fine-grained events, which stresses the synchronization mechanism known as Time Warp, implemented ROSS with support for reverse computation. It stresses the scalability of the Global Virtual Time commitment mechanism (used for I/O, error detection, storage management, and termination detection). And because PHold has no locality in its communication, it greatly stresses the underlying communication layer, MPI. The general idea is that a simulator that can achieve high performance on PHold at very large parallel scale can achieve high performance on just about any realistic, load balanced discrete event simulation at that scale.
I have to disagree. PHold was not designed to run well under Time Warp. It was designed as a stress test for any parallel discrete event simulator, whether based on Time Warp or not, and in particular originally to compare optimistic to conservative synchronization algorithms. Also, Sequoia is much less biased toward regular geometry continuum simulations that other world class supercomputers. It has no GPUs, for example. Machines of this class will be used more and more in the future for discrete simulations such as network models, or agent-based models, or for huge data problems, or for mixed continuous-discrete models such as of the power grid.
It's good to see that you've thought this through properly.
To have a right to do a thing is not at all the same as to be right in doing it