Slashdot Mirror


SGI & NASA Build World's Fastest Supercomputer

GarethSwan writes "SGI and NASA have just rolled-out the new world number one fastest supercomputer. Its performance test (LINPACK) result of 42.7 teraflops easily outclasses the previous mark set by Japan's Earth Simulator of 35.86 teraflops AND that set by IBM's new BlueGene/L experiment of 36.01 teraflops. What's even more awesome is that each of the 20 512-processor systems run a single Linux image, AND Columbia was installed in only 15 weeks. Imagine having your own 20-machine cluster?"

20 of 417 comments (clear)

  1. That's nothing... by Anonymous Coward · · Score: 5, Funny

    ...when they hit the "TURBO" button on the front of the boxes they'll really scream.

    1. Re:That's nothing... by jm92956n · · Score: 5, Informative

      when they hit the "TURBO" button on the front of the boxes they'll really scream.

      They did! According to C-Net article they "quietly submitted another, faster result: 51.9 trillion calculations per second" (equivalent to 51.9 teraflops).

      --
      An effective signature identifies a particular user amongst a base of thousands.
  2. 20 system cluster?!? by Emugamer · · Score: 5, Funny

    I have one of those... in a spare room!

    Who cares about a 20 system cluster, I want a one 512 processor machine!

    or 20, I'm not that picky

  3. Everyone needs one! by Dzimas · · Score: 5, Funny

    Just what I need to model my next H-bom... uhh... umm.... I mean render my next feature film. I call it "Kaboom."

  4. Wow---- by ZennouRyuu · · Score: 5, Funny

    I bet gentoo wouldn't be such a b**ch to get running with all of that compiling power behind it :)

  5. its not the hardware thats important by fender_rock · · Score: 5, Funny

    If the same software is used, its not going to make weather predictions more accurate. Its just going to give them the wrong answer, faster.

  6. Photos of System by erick99 · · Score: 5, Informative

    This page contains images of the NASA Altix system. After reading the article I was curious as to how much room 10K or so processors take up.

    --
    http://www.busyweather.com/
    1. Re:Photos of System by RadioheadKid · · Score: 5, Funny

      You'd think with all that super-computing power they'd be able to figure out the zipping JPEGs is retarted.

      --
      "Karma can only be portioned out by the cosmos." -Homer Simpson
    2. Re:Photos of System by cnkeller · · Score: 5, Interesting
      After reading the article I was curious as to how much room 10K or so processors take up.

      I don't have a square footage number, but it's the overwhelming majority of the server floor. We had to "clear the floor" earlier this summer to make room.

      --

      there are no stupid questions, but there are a lot of inquisitive idiots

  7. Re:hmmmm...... by Anonymous Coward · · Score: 5, Funny
    Today we predict a high of +3 Funny, with localised Trolling.

    Tomorrow looks like developing a slight rise in Insightful post, but a drop in overall Informative. "First Post" will remain as a constant pattern.

  8. NASA.org? by lnoble · · Score: 5, Funny

    Wow, I didn't know the NewAdvancedSearchAgent had such an interest or budget for super computing. I'd think they'd be able to afford their own web server though instead of being parked at domainspa.com and having to fill their entire page with advertisments.

    Try NASA.GOV.

  9. What is the stumbling block? by Dancin_Santa · · Score: 5, Insightful

    Why does it take so long to build a super computer and why do they seem to be redesigned each time a new one is desired?

    It's a little like how Canada's and France's nuclear power plant system are built around standardized power stations, cookie cutter if you will. The cost to reproduce a power plant is negligble compared to the initial design and implementation, so the reuse of designs makes the whole system really cheap. The drawback is that it stagnates the technology and the newest plants may not get the newest and best technology. Contrast this with the American system of designing each power plant with the latest and greatest technology. You get really great plants each time, of course, but the cost is astronomical and uneconomical.

    So to, it seems with supercomputers. We never hear about how these things are thrown into mass production, only about how the latest one gets 10 more teraflops than the last and all the slashbots wonder how well Doom 3 runs on it or whether Longhorn will run at all in such an underpowered machine.

    But each design of a supercomputer is a massive success of engineering skill. How much cheaper would it become if instead of redesigning the machines each time someone wants to feel more manly than the current speed champion, that the current design be rebuilt for a generation (in computer years)?

  10. This time there really is a turbo button! by Dink+Paisy · · Score: 5, Informative
    This result was from the partially completed cluster, at the beginning of October. At that time only 16 of the 20 machines were online. When the result is taken again with all 20 of the machines there will be a sizeable increase in that lead.

    There's also a dark horse in the supercomputer race; a cluster of low-end IBM servers using PPC970 chips that is in between the BlueGene/L prototype and the Earth Simulator. That pushes the last Alpha machine off the top 5 list, and gives Itanium and PowerPC each two spots in the top 5. It's amazing to see the Earth Simulator's dominance broken so thoroughly. After so long on top, in one list it goes from first to fourth, and it will drop at least two more spots in 2005.

    --

    Whoever corrects a mocker invites insult;
    whoever rebukes a wicked man incurs abuse.
    --Proverbs 9:7
  11. Cost by MrMartini · · Score: 5, Interesting

    Does anyone know how much this system cost? It would be interesting to see how good of a teraflop per million dollar ratio they achieved.

    For example, I know the Virginia Tech cluster (1,100 Apple Xserve G5 dual 2.3Ghz boxes) cost just under $6 million, runs at a bit over 12 teraflops, so it gets a bit over 2 teraflops per million dollars.

    Other high-ranking clusters would be interesting to evaluate in terms of teraflops per million dollars, if anyone knows any.

  12. Ya know... by Al+Al+Cool+J · · Score: 5, Funny
    It's getting to the point where I'm going to have call shenanigans on the whole freakin' planet. Am I really supposed to believe that an OS started by a Finnish university student a decade ago and designed to run on a 386, is now running the most powerful computer ever built? I mean, come on!

    Seriously, am I on candid camera?

  13. 70.93 TeraFLOPs by chessnotation · · Score: 5, Interesting

    Seti@home is currently reporting 70.93 TeraFLOPs/sec. It would be Number One if the list were a bit more inclusive.

  14. Read on to the next paragraph by jd · · Score: 5, Interesting
    There it talks of a third run, at 61 teraflops, slightly over the estimated 60 teraflops predicted.


    Ok, so we have Linux doing tens of teraflops in processing, FreeBSD doing tens of petabits in networking, ... What other records can Open Source smash wide open?

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Read on to the next paragraph by RageEX · · Score: 5, Insightful

      Good job NASA? Yeah I'd agree. But what about good job SGI? Why does SGI always seem to have bad marketing and not get the press/praise they deserve?

      This is an SGI system. SGI has laid out plans for terascale computing (stupid marketing speak for huge ccNUMA systems) a while ago. I'm sure NASA and SGI worked together but this is essentialy an 'Extreme' version of an off-the-shelf SGI system.

    2. Re:Read on to the next paragraph by jd · · Score: 5, Informative
      Hardware only takes you so far. Scalability comes largely from the efficiency of the software. Poor software results in large amounts of communication between nodes, slowing down a cluster.


      This is why SMP computers tend to have 2 or 4 processors, and 8 at a pinch, but no more. It's just not practical, using current methods, to directly wire up more than 8 processors in such a tight package.


      Lets say you have N processors, each capable of executing I instructions per second. Your total theoretical throughput would be N x I. However, this would only be the case if the system is 100% parallel, and no processor needed to communicate with any other. Rarely the case.


      In practice, the function of performance to processors follows a distribution that looks a bit like a squished bell curve. As you increase the number of processors, the performance gain decreases, reaches zero, and actually becomes negative. At that point, adding more CPUs will actually SLOW the computer down.


      The exact shape and size of the curve is partly a function of the way the components are laid out. A good layout keeps the amount of traffic on any given line to a minimum, minimizes the distances between nodes, and minimizes the management and routing overheads.


      However, layout isn't everything. If your software can't take advantage of the hardware and the topology, then all the layout in the world won't gain you a thing. To take advantage of the topology, though, the software has to comprehend some very complex networking issues. It has to send data by efficient pathways.


      If connections are not all the same speed or latency, then the most efficient pathway may NOT be the shortest. This means that the software must understand the characteristics of each path and how to best utilize those paths, by appropriate load-balancing and traffic control techniques.


      If you look at extreme-end networking hardware, they can be crudely split into two camps - those where the bandwidth is phenomenal, at the expense of latency, and those where the latency is practically zero but so's the bandwidth.


      The "ideal" supercomputer is going to mix these two extremes. Some data you just need to get to point B fast, and sometimes you're less worried about speed, but do need to transfer an awful lot of information. This means you're going to have two physical networks in the computer, to handle the two different cases. And that means you need something capable of telling which case is which fast enough to matter.


      Even when only one type of network is used, latency is a real killer. Software, being the slowest component in the machine, is where most of the latency is likely to accumulate. Nobody in their right minds is going to build a multi-billion dollar machine with superbly optimized hardware, if the software adds so much latency to the system they might as well be using a 386SX with Windows 3.1


      And that means Linux has damn good traffic control and very very impressive latencies. And it looks like these are areas the kernel is going to be improving in still further...

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  15. Re:Ok, what is the point of this? by HermesHuang · · Score: 5, Interesting

    The answer here is "complexity". I do some scientific computing (have done chemistry, then materials science, now doing photonic devices) and there's always more you want to be able to consider. Of course, the best I've used is an 8-processor SGI machine (although that one was a bit old - I think the 2-processor opteron system I'm using now is actually better). But especially with the materials studies, ideally we wanted to do everything with full quantum-mechanical calculations. which turns into gigantic matrices, even for a system of 100 atoms or so. And even then we put strict limits on what orbitals we consider and all that good stuff.

    Slightly more concrete example - right now with my photonics simulations (finite element) on my dual-opteron rig the max I can handle is about 180,000 elements (which means a (4*180000)x(4*180000) matrix with complex elements needs to be diagonalized, among other things), and it takes about half an hour for a standing-wave calculation. To do any time propogation, repeat same calculation in picosecond increments. And with the gridding I can do, for a 100 micron disc resonator in 2-D I have to use light at about 40 microns. To go to the 320nm wavelength these resonators are operating at, I'd need roughly 2 orders of magnitude more memory. There's also the time factor to be considered. As with any design process, one must iterate. Tweak a little here, run the program, rinse, repeat. How long are you willing to spend in this process before you feel something is "good enough"? The faster the computer spits the answer out, the more things you can try, and the more you can think things over and hopefully make it better.

    And this is a single component in what can be a fairly complex integrated-photonics chip. [And might I mention again I've been working in 2-D this entire time instead of doing a full 3-D simulation?] You give me the computational power and I'll use it. And I'm an experimentalist doing fairly basic research who just wants to check some stuff in the computer before sinking a lot of time and effort into fabricating a test device.

    On the other hand, I actually don't want to have one of the T100 supercomputers in our lab. That would mean I'd be spending all day writing code and designing complex simulations instead of in the lab getting my hands dirty.

    And as for the commonality of problems requiring such computational power, I think almost any sort of simulation can easily use it. Consider more terms (everything I've done to date is horribly linearized - let's see some more terms in the Taylor expansion) to account for nonlinear behavior, grid things up finer to get more accurate results, consider more possibilities when dealing with chaotic behavior... I would hope any good scientist would find the possibilties endless.