Software to Make Blue Gene Top 200 Teraflops

← Back to Stories (view on slashdot.org)

Software to Make Blue Gene Top 200 Teraflops

Posted by ryuzaki0 on Friday June 23, 2006 @08:33AM from the crunching-rather-than-taste dept.

An anonymous reader writes "New Scientist has a story about the most intensive computer program ever created. It runs on IBM's big beast, Blue Gene/L, at Lawrence Livermore National Laboratory in California and carries out 207.3 teraflops (trillion cacluations per second). The program, called Qbox, performs very complex quantum calculations to simulate the behaviour of thousands of atoms in three dimensions. Wow."

11 of 171 comments (clear)

Min score:

Reason:

Sort:

Specs by neonprimetime · 2006-06-23 08:39 · Score: 5, Informative

Specs here and yes, Suse
1. Re:Specs by Anonymous Coward · 2006-06-23 09:10 · Score: 1, Informative
  
  Not quite. The front end run and service node run Suse, but the compute nodes run a CNK (compute node kernel) that supports a subset of the system calls that Linux does and can only run one process per core (i.e. two per compute node). That means your code can't fork().
  
  Lots of good reading material here: http://www.research.ibm.com/journal/rd49-23.html
Just wait... by Raul654 · 2006-06-23 08:48 · Score: 3, Informative

BlueGene/L has a sister project, Cyclops64 (formerly known as BlueGene/C) due out sometime late in 2006 or early 2007. My research group is (a) helping IBM do hardware verification on it. and (b) developing the systems software for it [esp. the compiler]. Cyclops64 could very well blow BlueGene/L out of the water.

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
1. Re:Just wait... by Raul654 · 2006-06-23 10:08 · Score: 3, Informative
  
  What you are describing has already been done, and was done quite a while ago. Around 1990, NASA realized that the way we do parallel benchmarks sucks. The way most benchmarks (including hte parallel ones) work is that some organization posts the code, and people have to compile and run the code as-is. There's not much room there for optimization (other than tweaking the compiler flags, some trivial hardware settings, 'etc), which is essential to getting good parallel performance (because parallel machines vary so widely). So performance was tied very closely to the implimention over which nobody had any control.
  
  NASA approached the problem differently. Their numerical analysis group put out a set of "paper and pencil" benchmarks (based on real world problems that one would encounter, for example, fluid dynamics). The actual implimentation was left up to the individual companies. This is what we know today as the NAS benchmark suite.
  
  --
  
  To make laws that man cannot, and will not obey, serves to bring all law into contempt.
  --E.C. Stanton
it doesn't work like that by tpjunkie · 2006-06-23 08:52 · Score: 4, Informative

It doesn't take .2 teraflops to model one atom, or even two atoms, even account for effects on the quantum level.. However, when you take into acount that each atom will more or less interact with every other atom, you have a massive amount of interactions to model. Thats what takes so much processing power.
Re:Only the most intensive USEFUL program by frank_adrian314159 · 2006-06-23 08:53 · Score: 2, Informative

while(1); uses no FLOPS. OTOH, if you used while (1.0);...
(And for those of you who are humor-impaired, I do realize that neither would use any FLOPS because they would both be optimized into L1: jmp L1).

--
That is all.
HPCWire Interview by multimediavt · 2006-06-23 08:56 · Score: 4, Informative

http://www.hpcwire.com/hpc/699401.html

There's some additional info about BlueGene and what Livermore thinks of it here. What this interview neglects to mention is the millions of dollars being spent on IBM and internal developers to get this code (and any others) working on BlueGene. I was briefed by the hardware and software teams that built BlueGene and I can tell you, it's no easy task to bring apps to that platform. Kuznezov seems to trivialize it in the interview and I'm gonna have to go back and review the process again. Maybe it has changed since my briefing in early 2004, but somehow I doubt it.
Re:...wow... by mhore · 2006-06-23 08:57 · Score: 5, Informative

So in essence, it takes about .2 teraflops per atom... And that was only after spending a lot of time condensing the algorithms. This makes me wonder two things. First, what do these equations look like such that it takes 200 gigaflops just to model one atom. Second, over what timeframe does this simulation take place? Are we talking real-time, calculating for 50 years, what?
0.2 TFlops per atom, yes. But there are 1000 atoms, and it's molybdenum which has 42 eletrons... so that's 42,000 particles that all interact with each other. Still... that's not too many. But maybe they're considering interactions between nuclei, too. Who knows...
As for your question about what the equations look like? They're probably very nasty integrals of sines and cosines and what not to various odd (read: strange) powers and stuff. I do fairly computationally intensive simulations on some big IBM machines and just simple equations can amount to quite a bit of calculations. Nothing like what these guys are doing, though.
Finally... what time frame is the simulation over? I'd wager VERY SHORT times, maybe nanoseconds or something like that. Even casual "molecular dynamics" simulations can only probe very short timeframes. Their coarse-grained cousins can maybe do microseconds or milliseconds.
Mike.

--
Mmmm......sacrelicious.
Re:...wow... by exp(pi*sqrt(163)) · 2006-06-23 09:17 · Score: 2, Informative

In a classical physical system the time to compute what happens to N particles typically grows as a polynomial in N. The masses and positions of the particles form a 6N dimensional space (3 for velocity, 3 for position) and you're typically trying to trace a path through that 6n-dimensional space.
In quantum mechanics the state of the system is defined by a wavefunction on a 3N dimensional space. The state of a system is no longer a point, it's a *function* on a 3N dimensional space. That means that at any position in this space the function can take any value. So you need to compute the value of this function at every point in this 3N-dimensional space. Suppose we model this really crudely. Instead of considering a wavefunction that varies continuously through this 3N-dimensional space 'discretize' this space. consider just 10 points along each of the 3N axes rather than the infinite number required by quantum mechanics. We can then model the system by computing values of the wavefunction at 10^3N points. Suppose we're dealing with 1000 atoms. Let's model the atoms really crudely as one nucleus and one electron. That means 2000 particles and a 6000-dimensional space. So we need to compute the wavefunction at 100000000...000 points, where we have '1' followed by 6000 zeroes.
I know that physicists have a few tricks up their sleeves but it seems pretty obvious to me that these guys are actually cutting a lot of corners, and to accurately model this many atoms on a computer anything like what they have sounds pretty implausible to me.
There's a quick and easy way to look at this. When you combine two classical systems the work required to simulate the combination is typically the sum of the work required to simulate them separately (modulo a polynomial). When you combine two quantum systems you need to multiply the amount of work. Combining 1000 quantum systems borders on the insane...

--
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
Quantum Monte Carlo by poszi · 2006-06-23 09:23 · Score: 2, Informative

First, what do these equations look like such that it takes 200 gigaflops just to model one atom.
The article is light on details but I suppose the only quantum algorithm that can handle 1000 atoms is Quantum Monte Carlo. The problem is that the algorithm is cubic with the number of particles (and has a huge prefactor). So in essence 1000 atoms is 1000^3=10^9 more time consuming than one. And I'm sure they still use dramatic simplifications, even though they have the most powerful computer. They probably do not consider all electrons, instead they use pseudopotentials. And the Quantum Monte Carlo is likely in a fixed-node variant which is approximate. How long does it take? It's hard to tell but probably a few hours or days each and they are performing several those with different conditions.

--
Save the bandwidth. Don't use sigs!
Re:Slight clarification by Bill+Barth · 2006-06-23 12:53 · Score: 2, Informative

It's not "fake" so much as it's an approximation. I guarantee you the know by exactly how much they are in error (but not in what direction!). The Schroedinger Equation that is at the heart of this represents the probability (well its modulus does, at least) of something as a continuous function of space and time. These scientists make errors in that the equations that they use are discrete (in terms of mathematical degrees of freedom, strictly speaking, by discretizing space and time directly) models of the Schroedinger equation and in that the initial and boundary are not perfectly well known. That doesn't constitute "faking it" in my book. If they were faking it, they'd be making pretty pictures with no predictive value, and presuably their work makes good predictions, which, as you note, puts it in the category of "good science."

--
Yes...I am a rocket scientist.