A Look Inside Oak Ridge Lab's Supercomputing Facility

← Back to Stories (view on slashdot.org)

A Look Inside Oak Ridge Lab's Supercomputing Facility

Posted by timothy on Tuesday September 11, 2012 @03:01AM from the plus-a-fun-museum dept.

1sockchuck writes "Three of the world's most powerful supercomputers live in adjacent aisles within a single data center at Oak Ridge National Laboratory in Tennessee. Inside this facility, technicians are busy installing new GPUs into the Jaguar supercomputer, the final step in its transformation into a more powerful system that will be known as Titan. The Oak Ridge team expects the GPU-accelerated machine to reach 20 petaflops, which should make it the fastest supercomputer in the Top 500. Data Center Knowledge has a story and photos looking at this unique facility, which also houses the Kraken machine from the University of Tennessee and NOAA's Gaea supercomputer."

9 of 59 comments (clear)

Min score:

Reason:

Sort:

Can we build ... by PPH · 2012-09-11 03:07 · Score: 3, Funny

... a Beowulf cluster of these?

--
Have gnu, will travel.
1. Re:Can we build ... by smittyoneeach · 2012-09-11 03:33 · Score: 2
  
  And if they need BFD power, they can release the Biden.
  
  --
  Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
2. Re:Can we build ... by mcgrew · 2012-09-11 06:25 · Score: 2
  
  With all due respects, Dr. Cooper, are you on crack?
  -- George Smoot
  (Oh, for God's sake, I click the "post anonymously" button 25 minutes after the last comment I post and it says I didn't wait long enough. Good way to spoil a joke... No, I'm not really Dr. Smoot. Posting under my real name is the only way to post this. Are you on slashdot staff, Dr. Cooper?
  Oh, BTW, BAZINGA!
  
  --
  Free Martian Whores!
Re:What? by serviscope_minor · 2012-09-11 03:23 · Score: 2

Yeah, WTF does that mean, "...which should make it the fastest supercomputer in the Top 500..."
Not the capitalisation of T in Top 500, which you quoted correctly! It's the name of the list of fastest supercomputers.
Now, please hand in your nerd card on the way out.

--
SJW n. One who posts facts.
Re:What? by WhitePanther5000 · 2012-09-11 03:23 · Score: 4, Informative

The Top 500 is a specific list: http://top500.org/
It's more correct to say it's the fastest on the list, than the fastest in the world. There are any number of metrics you can use to compare supercomputers. Top 500 just uses the most popular metric. Another machine could easily be the fastest on a different list, like http://www.graph500.org/.
Re:What? by Sarten-X · 2012-09-11 03:25 · Score: 2

Unless there are faster ones that aren't considered for comparison, such as secret military projects.

--
You do not have a moral or legal right to do absolutely anything you want.
Topology matters more than GFLOPS by bratmobile · 2012-09-11 03:38 · Score: 5, Insightful

I really, really wish articles would stop saying that computer X has Y GFLOPS. It's almost meaningless, because when you're dealing with that much CPU power, the real challenge is to make the communications topology match the computational topology. That is, you need the physical structure of the computer to be very similar to the structure of the problem you are working on. If you're doing parallel processing (and of course you are, for systems like this), then you need to be able to break your problem into chunks, and map each chunk to a processor. Some problems are more easily divided into chunks than other problems. (Go read up on the "parallel dwarves" for a description of how things can be divided up, if you're curious.)
I'll drill into an example. If you're doing a problem that can be spatially decomposed (fluid dynamics, molecular dynamics, etc.), then you can map regions of space to different processors. Then you run your simulation by having all the processors run for X time period (on your simulated timescale). At the end of the time period, each processor sends its results to its neighbors, and possibly to "far" neighbors if the forces exceed some threshold. In the worst case, every processor has to send a message to every other processor. Then, you run the simulation for the next time chunk. Depending on your data set, you may spend *FAR* more time sending the intermediate results between all the different processors than you do actually running the simulation. That's what I mean by matching the physical topology to the computational topology. In a system where the communications cost dominates the computation cost, then adding more processors usually doesn't help you *at all*, or can even slow down the entire system even more. So it's really meaningless to say "my cluster can do 500 GFLOPS", unless you are talking about the time that is actually spent doing productive simulation, not just time wasted waiting for communication.
Here's a (somewhat dumb) analogy. Let's say a Formula 1 race car can do a nominal 250 MPH. (The real number doesn't matter.) If you had 1000 F1 cars lined up, side by side, then how fast can you go? You're not going 250,000 MPH, that's for sure.
I'm not saying that this is not a real advance in supercomputing. What I am saying, is that you cannot measure the performance of any supercomputer with a single GFLOPS number. It's not an apples-to-apples comparison, unless you really are working on the exact same problem (like molecular dynamics). And in that case, you need some unit of measurement that is specific to that kind of problem. Maybe for molecular dynamics you could quantify the number of atoms being simulated, the average bond count, the length of time in every "tick" (the simulation time unit). THEN you could talk about how many of that unit your system can do, per second, rather than a meaningless number like GFLOPS.
1. Re:Topology matters more than GFLOPS by Overunderrated · 2012-09-11 07:36 · Score: 3, Informative
  
  I'll drill into an example. If you're doing a problem that can be spatially decomposed (fluid dynamics, molecular dynamics, etc.), then you can map regions of space to different processors. Then you run your simulation by having all the processors run for X time period (on your simulated timescale). At the end of the time period, each processor sends its results to its neighbors, and possibly to "far" neighbors if the forces exceed some threshold. In the worst case, every processor has to send a message to every other processor. Then, you run the simulation for the next time chunk. Depending on your data set, you may spend *FAR* more time sending the intermediate results between all the different processors than you do actually running the simulation. That's what I mean by matching the physical topology to the computational topology. In a system where the communications cost dominates the computation cost, then adding more processors usually doesn't help you *at all*, or can even slow down the entire system even more. So it's really meaningless to say "my cluster can do 500 GFLOPS", unless you are talking about the time that is actually spent doing productive simulation, not just time wasted waiting for communication.
  Considering that computational fluid dynamics, molecular dynamics, etc., break down into linear algebra operations, I'd say that the FLOPS count on a LINPACK benchmark is probably the best single metric available. In massively parallel CFD, we don't match the physical topology to the computational topology, because we don't (usually) build the physical topology. But I can and do match the computational topology to the physical one.
Re:What? by jeffmeden · 2012-09-11 03:52 · Score: 3, Informative

The Top 500 is a specific list: http://top500.org/
It's more correct to say it's the fastest on the list, than the fastest in the world. There are any number of metrics you can use to compare supercomputers. Top 500 just uses the most popular metric. Another machine could easily be the fastest on a different list, like http://www.graph500.org/.
The other specific consideration is that the list is ONLY for those that volunteer to run the Linpack benchmark and wish to publicize the results. It is presumed that governments with classified computing facilities withhold this information, for obvious reasons, so there are likely many "supercomputers" (perhaps even a "fastest") that will never be part of the Top 500. The US NSA, for example, is widely believed to operate facilities at or near the top of the list, but they are nowhere in sight for obvious reasons.