SGI & NASA Build World's Fastest Supercomputer

← Back to Stories (view on slashdot.org)

SGI & NASA Build World's Fastest Supercomputer

Posted by timothy on Tuesday October 26, 2004 @02:45PM from the but-does-it-run-windows dept.

GarethSwan writes "SGI and NASA have just rolled-out the new world number one fastest supercomputer. Its performance test (LINPACK) result of 42.7 teraflops easily outclasses the previous mark set by Japan's Earth Simulator of 35.86 teraflops AND that set by IBM's new BlueGene/L experiment of 36.01 teraflops. What's even more awesome is that each of the 20 512-processor systems run a single Linux image, AND Columbia was installed in only 15 weeks. Imagine having your own 20-machine cluster?"

16 of 417 comments (clear)

Min score:

Reason:

Sort:

What is the stumbling block? by Dancin_Santa · 2004-10-26 14:58 · Score: 5, Insightful

Why does it take so long to build a super computer and why do they seem to be redesigned each time a new one is desired?

It's a little like how Canada's and France's nuclear power plant system are built around standardized power stations, cookie cutter if you will. The cost to reproduce a power plant is negligble compared to the initial design and implementation, so the reuse of designs makes the whole system really cheap. The drawback is that it stagnates the technology and the newest plants may not get the newest and best technology. Contrast this with the American system of designing each power plant with the latest and greatest technology. You get really great plants each time, of course, but the cost is astronomical and uneconomical.

So to, it seems with supercomputers. We never hear about how these things are thrown into mass production, only about how the latest one gets 10 more teraflops than the last and all the slashbots wonder how well Doom 3 runs on it or whether Longhorn will run at all in such an underpowered machine.

But each design of a supercomputer is a massive success of engineering skill. How much cheaper would it become if instead of redesigning the machines each time someone wants to feel more manly than the current speed champion, that the current design be rebuilt for a generation (in computer years)?
1. Re:What is the stumbling block? by anon+mouse-cow-aard · 2004-10-26 15:34 · Score: 4, Insightful
  
  Thought experiment: Order 10000 PC's. time how long it takes to get them installed, with power, network cabling, and cooling, in racks, and installed with the same OS.
  
  Second thought experiment. Imagine the systems are built out of modular bricks that are identical to deskside servers. so that they can sell exactly the same hardware in anywhere from 2 to 512 processors by just plugging the same standard bricks together, and they all get the same shared memory, and run one OS. Rack after rack after rack. That is SGI's architecture. It is absolutely gorgeous.
  
  So they install twenty of the biggest boxes they have, and network those together.
  
  $/buck ? I dunno. Is shared memory really a good idea? Probably not. but it is absolutely gorgeous, and no-one can touch them in that shared memory niche that they have.
2. Re:What is the stumbling block? by Geoff-with-a-G · 2004-10-26 15:44 · Score: 4, Insightful
  
  Why does it take so long to build a super computer and why do they seem to be redesigned each time a new one is desired?
  
  Well, are we talking about actual supercomputers, not just clusters? 'Cause if you're just trying to break these Teraflops records, you can just cram a ton of existing computers together into a cluster, and voila! lots of operations per second.
  
  But it's rare that someone foots the bill for all those machines just to break a record. Los Alamos, IBM, NASA, etc. want the computer to do serious work when it's done, and a real supercomputer will beat the crap out of a commodity cluster at most of that real work. Which is why they spend so much time designing new ones. Because supercomputers aren't just regular computers with more power. With an Intel/AMD/PowerPC CPU, jamming four of them together doesn't do four times as much work, because there's overhead and latency involved in dividing up the work and exchanging the data between the CPUs. That's where the supercomputers shine: in the coordination and communication between the multiple procs.
  
  So the reason so much time and effort goes into designing new supercomputers is that if you need something twice as powerful as today's supercomputer, you can't just take two and put them together. You have to make new architecture that is even better at handing vast numbers of procs first.
3. Re:What is the stumbling block? by sloth+jr · 2004-10-26 15:59 · Score: 2, Insightful
  
  I don't build supercomputers, but I do build systems that look a lot like them in very similar infrastructures. I'm not sure why it took them 120 days (okay, "under" 120 days), but when we build out a datacenter with 70 to 100 machines, it usually takes a bit of time:
  a) obtain space. Usually, raised floors, rack systems, with adequate HVAC for the huge thermal load you're about to throw into a few racks. For collocation, it'll take some time for your provider to wire together a cage for your installation, especially if you need earthquake bracing. Expect two weeks.
  
  b) obtain power. For our production environment, each redundant power supply needs to be served by a separate circuit. The way most redundant power supplies seem to work is they split the load between the two circuits - so each circuit has to be able to handle the full load. Supercomputers may not have the same production requirements, but probably - lost cycles is lost money. Anyway, this is contracted out in almost all cases - expect two weeks minimum. b is usually dependent on a, some providers may perform buildout concurrently. Not much of an issue if you use Equinix - very cool overhead power systems (imagine a very large power track system, with drops wherever you need them).
  
  c) obtain equipment. delivery time from a week to 5 weeks.
  
  d) it takes some time to unbox 100 machines and rack them. Throw people at it, or throw time at it.
  
  e) network infrastructure. do it yourself, you're using a lot of time to cut cables to length. contract it out, you get very neat work, at expense, and usually only to rack-specific patch panels. Buy lots of different length cables and forego contracting, you save time, but you end up with a cage that looks like hell that's easy to snag.
  
  f) configure 100 machines. This is probably the easiest part - set up your DHCP server and PXE boot server, roll up some kickstart system, and deploy - 100 machines can be done in a few hours. There's obviously some setup and thought that needs to be put into the installation scripts, but that can be done ahead of time.
  
  In my experience, buildout of production datacenters is very difficult to do in less than 6 weeks.
  
  sloth jr
Re:Ok, what is the point of this? by servognome · 2004-10-26 15:24 · Score: 4, Insightful

Really, given the fact that most popular computers have enough processing power to handle anything, and the fact that clustering technology has evolved and is usable in case they aren't...what is the point in the "super computer"?
The super computer is a cluster (10k+ processors in 20 nodes).
Not all applications/computations scale by just adding computers to the cluster.
An example would be solving for z: x=84+19, y=5*3, z=x+y
The ultimate solution z is limited by the speed x & y can be solved. You can have an individual computer solve for x and another for y in parallel. But no matter how many more computers you add, none of them can solve z until x&y are solved first, and none of them would speed up the computation of x&y.
After a certain scale, you do not get benifits of parellel processing, so the only way to speed things up is to make each individual computer faster.

--
D6 63 0D 70 89 81 BB 8E 7B 7C 5F 5D 54 EA AB 73
Re:70.93 TeraFLOPs by Anonymous Coward · 2004-10-26 15:33 · Score: 1, Insightful

Except running SETI is akin to playing Minesweeper on Columbia. Really, who gives a flying fuck about analyzing the universe's background EM noise in some pointless massive-scale sci-fi masturbatory nerd fantasy?

Do something useful, like folding@home, for fuck's sake.
Imagine? by macz · 2004-10-26 15:41 · Score: 2, Insightful

Imagine having your own cluster...
I seriously doubt that all but the very edge of the bell curve could usefully use this much CPU horsepower. Even given the upper limits of Academia. While we, as a species, have been good at developing bigger, better, stronger, faster computing machines, we have not advanced very far in asking them meaningful questions.
Inevitably someone will say "we can finally predict the weather..." and in true Futurama Farnsworth fashion I say PSHAW! We don't even know how to properly frame the QUESTION of how to predict the weather, much less get closer to an "Answer" like "The hurricane will hit EXACTLY here, at EXACTLY this time. Only the people on these specific streets are boned."
Still, I bet I could get like 1 billion FPS on UT2004 at 3600x4800!
Seriously though, I want to see small improvements. Better, easier to grasp programming languages. More critical thinking skills taught in schools. And a cluster like this dedicated to uber-porn. I'm talking full frame, Hi Def, ggg stuff. (did I type that last part out loud?)

--
...But I digress. TREMBLE PUNY HUMANS!ONE DAY MY SPECIES WILL DESTROY YOU ALL!
Re:Read on to the next paragraph by luvirini · 2004-10-26 16:00 · Score: 3, Insightful

"NASA Secures Approval in 30 Days" Knowing how govermental processes normally go, this part really seems incredible. Normally even the "fluffy" pre-study would take that long(or way more), before anyone actually sits down to discuss actual details and such. Specially the way most everything with NASA seems to be over budget and way late. It is indeed good to see that there is still some hope, so lets hope they get the procurement prosesses in general more working.
Re:Ok, what is the point of this? by BottleCup · 2004-10-26 16:02 · Score: 1, Insightful

Yes of course the answer is 42. This computer was built to find out what the question is.
it's the wetware by Doc+Ruby · 2004-10-26 16:08 · Score: 4, Insightful

Weather prediction, it turns out, is *not at all* like playing chess. Chess is a deterministic linear process operating on rigid, unchanging rules. There is always a "best move" for every board state, which a sufficiently fast and capacious database could search for. Weather is chaotic, a nonlinear process. It feeds back its state into its rules, in that some processes increase the sensitivity to change of other simultaneous processes. Chaos cannot be merely "solved", like a linear equation; it must be simulated and iterated through its successive states to identify more states.

Of course, we're just getting started with chaos dynamics. We might find chaotic mathematical shortcuts, just like we found algebra to master counting. And studying weather simulation is a great way to do so. Lorenz first formally specified chaos math by modeling weather. While we're improving our modeling techniques to better cope with the weather on which we depend, we'll be sharpening our math tools. Weather applications are therefore some of the most productive apps for these new machines, now that they're fast enough to model real systems, giving results predicting not only weather, but also the future of mathematics.

--
--
make install -not war
Re:Read on to the next paragraph by luvirini · 2004-10-26 16:10 · Score: 2, Insightful

Indeed it is the hardware, and no you cannot directly claim it is a open source victory except for one small thing...
Wonder why they run open source instead of proprprietary operating system on this? Maybe the multitude of answers to that question can show you why it can be considered open source victory.
Re:Intent of NASA... by harlows_monkeys · 2004-10-26 16:35 · Score: 3, Insightful

With all of the new private space industry, NASA has been set free to explore the further reaches of space
What new private space industry? Spaceship One, for example, reached space. That's a long way from being able to do anything useful in space. They were nowhere near orbital velocity, for example. We're still many years, if not decades, away from private industry being able to take over NASA's near-earth space role.
Re:hmmmm...... by Shag · 2004-10-26 16:52 · Score: 4, Insightful

"...with Columbia, scientists are discovering they can potentially predict hurricane paths a full five days before the storms reach landfall."
You don't live somewhere that gets hurricanes, do you? 'Cause scientists can already "potentially predict hurricane paths a full five days before the storms reach landfall." Hell, I can do that. A freakin' Magic 8 Ball can potentially do that.
Maybe they're trying to say something about doing it with a better degree of accuracy, or being right more of the time, or something like that, but it doesn't sound like it from that quote.
"Hey, guys, look at this life-sized computer-generated stripper I'm rendering in real-ti... oh, what? Um, tell the reporter we think it'd be good for hurricane prediction."

--
Village idiot in some extremely smart villages.
Re:Read on to the next paragraph by RageEX · 2004-10-26 17:28 · Score: 5, Insightful

Good job NASA? Yeah I'd agree. But what about good job SGI? Why does SGI always seem to have bad marketing and not get the press/praise they deserve?

This is an SGI system. SGI has laid out plans for terascale computing (stupid marketing speak for huge ccNUMA systems) a while ago. I'm sure NASA and SGI worked together but this is essentialy an 'Extreme' version of an off-the-shelf SGI system.
Re:The worst thing about this... by general_re · 2004-10-26 17:44 · Score: 3, Insightful

...just about any system with the same number of... well, gosh, almost any processor except an Itanium would be even faster...
Like what? Go out and look up SPEC results next time you're bored. I think you'll find that I2 is quite a bit more capable than you make out. IBM's dual-core POWER5 is just about the only thing out there that's even close to (a single-core) I2 in FP performance, and Opteron isn't even in the game at that level.
Is it a commercial failure? Probably, but so was Alpha - commercial success is not an indicator of actual performance.

--
ABSURDITY, n.: A statement or belief manifestly inconsistent with one's own opinion.
Re:Read on to the next paragraph by lweinmunson · 2004-10-27 02:14 · Score: 2, Insightful

Umm, not true. Sun, can hold up to 106 processers in its Sunfire 15K product, or 72 dual-core processors in the E25K.

SGI's Origin systems are equally large I believe. And manufacturers like IBM also have large SMP machines.

There's a difference between SMP and NUMA used in the big iron. SMP is normally a shared bus or switch topology with the processors connected to each other with little or no arbitration logic. So if you get above 4 you normally max out the busses as the CPUs try to figure out who's doing what and what instruction comes next. NUMA architecture is somewhere between SMP and clustering. The SGI boxes use c-bricks of 4 CPU's and I think 8GB of RAM. Each c-brick is connected to one or more routers via craylink cables. Get enough of these together and you've got your 512 CPU monster. Sun uses the same idea, but is unfortunatly a LOT slower with their interconnect technology. I've seen 16x SMP boxes before, but they really didn't scale at all. Anything over the standard 4-8 SMP is a waste of CPU's and money.