Stanford Uses Million-Core Supercomputer To Model Supersonic Jet Noise
coondoggie writes "Stanford researchers said this week they had used a supercomputer with 1,572,864 compute cores to predict the noise generated by a supersonic jet engine. 'Computational fluid dynamics simulations test all aspects of a supercomputer. The waves propagating throughout the simulation require a carefully orchestrated balance between computation, memory and communication. Supercomputers like Sequoia divvy up the complex math into smaller parts so they can be computed simultaneously. The more cores you have, the faster and more complex the calculations can be. And yet, despite the additional computing horsepower, the difficulty of the calculations only becomes more challenging with more cores. At the one-million-core level, previously innocuous parts of the computer code can suddenly become bottlenecks.'"
Pfft. I can simulate supersonic jet noise just by overclocking my Radeon 7970.
everything is in the subject
http://Lenny.com
4 great justice!
Pfft is my simulation of jet noice
Slashdotters don't have sex, and so they cannot have slashdaughters. Ergo, slashdaughters do not exist. QED.
In Soviet Russia, Jesus asks: "What Would You Do?"
There's that sound again.
Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
But searching for "5-d torus interconnect" gets you nothing on wikipedia. Here's the 2-dimensional version explanation: http://en.wikipedia.org/wiki/Torus_interconnect
and the K computer by Fujitsu at Riken uses a 6-d (six dimensional) torus network. So how does the 5-d torus interconnect lead to the 2**19 + 2**20 cores or possibly 2**17+2**18 cpus? I'm not seeing it in my head clearly. Off to a paper-napkin to sketch it out!
.
Each core connects 5-dimensionally going forward or back in each dimension gives 10 interconnects from one core to the 10 5-dimensional neighbors one distance away. But the number of cores is divisible only by twos and a three (factor number of cores = 3 * 2^19) so I'm not seeing the construct...
simulate the Matrix?
One. Actually, you could do it with rocks.
When our name is on the back of your car, we're behind you all the way!
You get some pretty interesting problems, when you increase the number of cores in your computer.
A couple of years ago, we replaced a 4-core IBM P5 with a 32-core HP DL 580. We tested it for a couple of months with just a user, or two, at a time. Then, we took a day and tested with the entire company (roughly 250 users). Thank goodness we did before we put it into production because, for some people, it was actually slower than the P5. It looked like it was going to be a disaster.
Fortunately, I had seen this problem before (on a Sequent Symmetry, of all things). I ran "strace" on the offending process, and sure enough, we were having problems with lock contention. We talked to our software vendor and, while it took a while for them to admit it was their problem (and probably cost us multiple thousands of dollars to have them fix it), they rewrote the code to use fewer locks. Problem solved.
Sit, Ubuntu, sit. Good dog.
Most of these CFD problems are time marching problems, governed by hyperbolic differential equations. Basically the state of fluid at some point X, at time t, is influenced only by the state of the fluid prior to that time. So when they are marching from t to t+delta(t), only the solution at the previous time step matters. Even in space, only a small region at T-Delta(t) affects any give point at T. Such problems are inherently parallel in data dependency. Such problems lend themselves for parallelism. This is not to minimize what they have achieved. If it was that easy, they would have done it long time ago. Physics governed by elliptical (and to some extent parabolic) equations are not that lucky.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
At the one-million-core level, previously innocuous parts of the computer code can suddenly become bottlenecks.
When they say this, they mean it. To put this in perspective: with 1,572,864 cores, an application which is 99.9999% scalable will use LESS THAN HALF of the hardware! Over 60% of the hardware will be tied up waiting for that 0.0001% of serial code to execute.
This problem is explained by Amdahl's law, an important (yet depressing) observation which shows just how difficult writing an effective parallel algorithm actually is -- even when you're only writing for 4 cores.