Slashdot Mirror


Sandia's Red Storm Detailed Architecture

Roland Piquepaille writes "Bill Camp & Jim Tomkins, from Sandia National Laboratories, have published a 77-page document about the architecture of the Red Storm supercluster being built by Cray Inc. The new nickname for the 40 teraflops system is "Thor's Hammer." Please read the full presentation if you have the time (PDF format, 3.54 MB). This technical analysis gives you the major characteristics of the system which will be operational by August 2004. With its 108 compute cabinets and its 10,368 compute node processors (AMD Opteron running at 2.0 GHz), it is expected to reach 20 teraflops on MP-Linpack. The report also looks at scalability and reliability, which are essential for a sytem which will be expanded to 30,000 processors in the future."

18 comments

  1. in related news by Geno+Z+Heinlein · · Score: 5, Funny

    The new nickname for the 40 teraflops system is "Thor's Hammer."

    In related news, Sandia National Laboratories has laid off all but one of its Jaffa technicians, citing diplomatic and security concerns.

    1. Re:in related news by grendel_x86 · · Score: 1

      I think i may be the only person who got that... If they were only Naquadah powered....

      --
      Im glad /. isnt the real world, that would really suck..
    2. Re:in related news by Jeff+DeMaagd · · Score: 1

      It got modded up to 5, so I am betting that at least three people "got" it.

      I actually did see that episode. I forget the stunt they did to get Tealc through. That part seemed to be a cop-out.

      I guess I'll have to get the DVDs someday.

  2. Powerpoint? by dk.r*nger · · Score: 1

    They may have a Very Large Computer, so very large, that it isn't even applicable for "..a beowolf cluster of those", but I can not take it seriously that the presentation is so incredibly ugly. And in Comic Sans...

    1. Re:Powerpoint? by Anonymous Coward · · Score: 0

      I think some people just don't get it. I really do. You can argue all you want about whether Keynote is a good presentation program, but for my money it was worth it just for the typography alone.

      I swear, that presentation looks like something out of the early 1990's.

  3. Imagine by Konster · · Score: 0, Redundant

    Imagine a Beowulf cluster of...

    er, wait! Nevermind!

  4. network bandwidth by martin · · Score: 3, Insightful

    AS per usual the difficult but of clusters (esp at this level), isn't the code (quote "can easily be done" at the right level of efficiency), or physically linking the boxes,,,

    but making the network linking up the thing.

    # Sustained file system bandwidth of 50 GB/s for each color
    # Sustained external network bandwidth of 25 GB/s for each color

    wow! That's not peak, but sustained..for me that's the impressive bit.

  5. uses "only" 2 MegaWatts for power and cooling. by anon+mouse-cow-aard · · Score: 1


    It's pretty impressive.
    2 megawatts and 3000 sq. ft. is quite good.

    frightning, but given the power, quite good.
    Imagine the UPS.

    on the other hand... "100 hour MTBI is desirable"
    Ack! It is hoped that it won't crash more than
    once every three days? That is up from 40 hours
    on the current one. oh. They're putting in lots of
    RAS features, and they still can't target higher than that. depressing.

    custom interconnect. That is the exciting part.
    It looks like a lot of fun. The directions are good
    and make sense. connectionless api (mcast & bcast only?) is going to take some getting used to...

    Hope they can pull it off.

    1. Re:uses "only" 2 MegaWatts for power and cooling. by hawkstone · · Score: 1

      on the other hand... "100 hour MTBI is desirable"
      Ack! It is hoped that it won't crash more than
      once every three days?


      I'm sure they're talking about a single node failure, not the whole machine. Most people aren't running jobs on more than a few hundred processors, so a compute node failing will take out only one of a few dozen running jobs. And it's more like having your program crash, anyway; resubmit your job and it will simply run on a different set of nodes.

      My question is: what is this operating system they've got running on the compute nodes? It's called "LWK (Catamount)". Their previous Red machines were runnning Cougar/TFLOPS, which was a home-grown OS, on the compute nodes, and it didn't support some basic (IMO) features like sockets. It makes it really hard to get data out of those compute nodes except by writing to disk. I wonder if this LWK is the same thing.....

    2. Re:uses "only" 2 MegaWatts for power and cooling. by Anonymous Coward · · Score: 0

      Catamount is the next iteration of Cougar.. but I don't have any particular details. Anybody out there with more info?

    3. Re:uses "only" 2 MegaWatts for power and cooling. by hawkstone · · Score: 1

      I'm sure I can make a phone call and get more, but I'm too damn busy..... I'll just wait until the next story. ('cuz you know there's gonna be one)

      It won't really matter until it comes time to port my code to the thing anyway. I just have a feeling sockets are going to be the deal breaker.

    4. Re:uses "only" 2 MegaWatts for power and cooling. by convolvatron · · Score: 1

      catamount is a slightly rewarmed cougar

      tflops was a dsm os called osf/ad, its being replaced by suse with some minimal cluster extensions

    5. Re:uses "only" 2 MegaWatts for power and cooling. by Anonymous Coward · · Score: 0

      If you've got over 10,000 nodes, you're gonna have boxes falling over now and again. 1 in 10,000 every 100 hours is a rather respectable reliability factor for such a complex cluster.

      -AC

  6. "Thor's Hammer"? by Resound · · Score: 3, Insightful

    Why not go all the way and call it Mjollnir? (and while they're at it,nickname the two head techs "Brok" and "Eitri")

  7. Nickname by red_dragon · · Score: 1

    The new nickname for the 40 teraflops system is "Thor's Hammer".

    Ah, curious. I guess what goes around comes around. But, shouldn't it be "Thor's Hammers"? It's got 10,368 Hammers, y'know.

    --
    In Soviet Russia, Jesus asks: "What Would You Do?"
  8. Catamount? by Anonymous Coward · · Score: 0

    This is mentioned on slide 6 and 72 as the OS which will run on the compute nodes. Does anyone know more about it?

  9. MOD PARENT UP by Anonymous Coward · · Score: 0



    We don't all have to be illiterate.