Slashdot Mirror


World's Fastest Supercomputer To Be Built At ORNL

Homey R writes "As I'll be joining the staff there in a few months, I'm very excited to see that Oak Ridge National Lab has won a competition within the DOE's Office of Science to build the world's fastest supercomputer at Oak Ridge National Lab in Oak Ridge, Tennessee. It will be based on the promising Cray X1 vector architecture. Unlike many of the other DOE machines that have at some point occupied #1 on the Top 500 supercomputer list, this machine will be dedicated exclusively to non-classified scientific research (i.e., not bombs)." Cowards Anonymous adds that the system "will be funded over two years by federal grants totaling $50 million. The project involves private companies like Cray, IBM, and SGI, and when complete it will be capable of sustaining 50 trillion calculations per second."

25 of 230 comments (clear)

  1. good stuff by Anonymous Coward · · Score: 4, Interesting


    Personally I'm happy to see Cray still making impressive machines. Not every problem can be solved by "divide and conquer" clusters.

    1. Re:good stuff by sotonboy · · Score: 4, Insightful

      I disagree. There is a huge difference. Bolting a load of boxes together with ethernet and all the associated overheads can never be as efficient as dedicated hardware for connecting, and sharing the processing load.

      Obviously there is a lot more that could affect the performance, such as how memory is implemented. In general though, the system will perform best when each processor is performing calculations, rather than looking after ehernet connections.

    2. Re:good stuff by adam872 · · Score: 4, Informative

      Some problems are easily partitioned up and distributed to separate nodes. In particular, code where the nodes do not need to talk to each other much are ripe for clusters, as the interconnect speed is less important. Therefore, you can build a commodity cluster fairly cheaply.

      For other problems, where interprocess/node communication is high or very high, you need a high speed interconnect (like NUMAflex in SGI's) to get you the scalability you need, as you increase the number of processors/nodes and the size of the data set increases. The big systems like Crays and the bigger SGI's and IBM Power series have those high speed interconnects and will allow you to scale more efficiently than the clusters. They cost a lot more though :)

      A good book to read on the subject of HPC is High Performance Computing by Severance and Dowd (O'Reilly). It's a little old now, but it covers a lot of the concepts you need to know about building a truly HPC system (architecture as well as code).

    3. Re:good stuff by Jeremy+Erwin · · Score: 4, Interesting

      But Virginia Tech's cluster doesn't use Ethernet as its primary network. It uses Infiniband. As for the cost not scaling linearly, ask yourself whether Big Mac's performance scales linearly.

  2. Wow... by nother_nix_hacker · · Score: 3, Funny
    The project involves private companies like Cray, IBM, and SGI, and when complete it will be capable of sustaining 50 trillion calculations per second."
    Outlook with no slowdown!
    1. Re:Wow... by FenwayFrank · · Score: 5, Funny

      It's so fast, the blue screen shifts to red!

  3. Qualifier by andy666 · · Score: 5, Insightful

    As usual, there should be a qualifier as to what is meant by fastest. According to their definition they are, but not according to NEC's, for example.

  4. Hmm by LaserLyte · · Score: 5, Funny

    > ...capable of sustaining 50 trillion calculations per second.

    Hmm...I wonder if I could borrow it for a few days to give my dnet stats a boost :D

  5. Shamelessly plagerized by Anonymous Coward · · Score: 3, Funny

    Wow, 50 trillion calculations per second. Thats almost fast enough to finish an infinite loop in under ten hours.

  6. Re:50 trillion by WindBourne · · Score: 3, Insightful

    I wonder if that processing power could be used for rendering like was done by Weta and how the performance could compare to their renderfarm.
    Sure, but the real question is why would you? The cost of this on a per mip basis is sure to be much higher than a renderfarm. In addition, ray tracing lends itself to parellelism. There are many other problems out there that do not that can use this kind of box.

    --
    I prefer the "u" in honour as it seems to be missing these days.
  7. Doom III by MrRuslan · · Score: 4, Funny

    at an Impresive 67fps on this baby...

  8. They better hurry ... by realSpiderman · · Score: 5, Interesting
    ... or this is going to beat them hard.

    Still a whole year until they have a full machine, but the 512-way prototype reached 1.4 TFlops (LinPack). The complete machine will have 128 times the nodes and 50% higher frequency. So even with pessimistic scalability, this will be more than twice as fast.

    1. Re:They better hurry ... by flaming-opus · · Score: 4, Informative

      Two radically different designs, will probably solve very different sorts of problems. Linpack is extremely good at giving a computer an impressive number. It's the sort of problem that fills up execution piplines to their maximum. Blue Gene was origionally designed to do protein-folding calculations. While many other tasks will work well on that machine, others will work very poorly.

      It's a mesh of a LOT of microcontroller-class processors. The theory being that these processors give you the best performance per transistor. Thus you can run them at a moderate clock, get decent performance out of them, and cram a whole hell of a lot of them into a cabinet. It's a cool design, I'm interested to see what it will be able to do, once deployed. However, for the problems they have at ORNL, I'm sure the X1 was a better machine. Otherwise they would have bought IBM. They already have a farm of p690s, so they have a working relationship.

  9. 50 trillion calcs/sec...how fast really? by Debian+Troll's+Best · · Score: 4, Insightful
    I love reading about these kinds of large supercomputer projects...this is really cutting edge stuff, and in a way acts as a kind of 'crystal ball' for the types of high performance technologies that we might expect to see in more common server and workstation class machines in the next 10 years or so.

    The article mentions that the new supercomputer will be used for non-classified projects. Does anyone have more exact details of what these projects may involve? Will it be a specific application, or more of a 'gun for hire' computing facility, with CPU cycles open to all comers for their own projects? It would be interesting to know what types of applications are planned for the supercomputer, as it may be possible to translate a raw measure of speed like the quoted '50 trillion calculations per second' into something more meaningful, like 'DNA base pairs compared per second', or 'weather cells simulated per hour'. Are there any specialists in these kinds of HPC applications who would like to comment? How fast do people think this supercomputer would run apt-get for instance? Would 50 trillion calculations per second equate to 50 trillion package installs per second? How long would it take to install all of Debian on this thing? Could the performance of the system actually be measured in Debian installs per second? I look forward to the community's response!

  10. Re:Talking out my ass here, but by Waffle+Iron · · Score: 4, Insightful

    There are still a few computing problems that can't be efficiently split into a large number of subproblems that can be executed in parallel. For those cases, a cluster of small machines won't help.

  11. Re:Maybe it's me. by henryhbk · · Score: 4, Informative

    Yes, DOE is the Federal Government's Department of Energy. Oak Ridge is a large federal govt. lab.

  12. 2 Years? by XMyth · · Score: 3, Informative

    I don't think Crays that were build 5 years ago are considered obsolete by anyone's standards.

    Clusters solve different jobs than supercomputers. Sometimes they bleed into one another, but there are some things supercomputers will always be better at (because of higher memory bandwidth for one thing).

  13. Re:Talking out my ass here, but by flaming-opus · · Score: 4, Interesting

    If you care to, read the pdf on their early impressions of the X1. The Army High Performance Computing Research Center (www.arc.umn.edu) did an analysis of their application and found that the X1 was actually MORE cost effective than a commodity cluster.

    Firstly, the X1 was greater per-processor performance by a factor of 4. Then you add an interconnect that has half the latency, and 50 times the bandwidth of myrinet or infiniband. It also has memory and cache bandwidth enough to actually fill the pipelines, unlike a Xeon which can do a ton of math on whatever will fit in the registers. Some problems just don't work real well on clustered PCs, they need this kind of big iron.

    Secondly, some problems cannot tollerate a failure in a compute node. IF you cluster together 10,000 PCs, the average failure rate means that one of those nodes will fail about every 4 hours. If your problem takes three days to complete, the cluster is worthless to you. A renderfarm can tolerate this sort of failure rate, just send those frames to another node. Some problems can't handle it.

    Oak ridge is very concerned with getting the most bang for the buck.

  14. Being Snide Here by Seanasy · · Score: 4, Insightful

    I think ORNL and PSC know a lot more about supercomputing than you (or Internet rag pundits) do. As others have noted, there are real reasons for Big Iron.

    Clusters are great for certain problems but for heavy computation -- think simulating two galaxies colliding or earthquake modeling -- off the shelf clusters don't cut it.

    They're not wasting tax-payer money unless you consider basic researcher a waste.

  15. 3D torus topology by elwinc · · Score: 4, Informative
    I checked out the topology of the Cray X1; they call it an "enhanced 3D torus." A 3D torus would be if you made an NxNxN cube of nodes, connected all ajacent nodes (top, bottom, left, right, front, back), and then connected all the processors on one face thru to the opposite face. I can't tell what an "enhanced" torus is. (Each X1 node, by the way, has four 12.8 gflop MSPs, and each MSP has eight 32-stage, 64 bit floating point pipelines.)

    So each node is directly connected to six ajacent nodes. Contrast this with the Thinking Machines Connection Machine CM2 topology, which had 2^N nodes connected in an N dimensional hypercube. So each node in a 16384 node CM2 was directly connected to 16 other nodes. There's a theorem that you can always embed a lower dimensional torus in an N dimensional hypercube, so the CM2 had all the benefits of a torus and more. This topology was criticized because you never needed as much connectivity as you got in the higher node-count machines, to CM2 was in effect selling you too much wiring.

    Thinking Machines changed the topology to fat trees in the CM5. One of the cool things about the fat tree is it allows you to buy as much connectivity as you need. I'm really surprised that it seems to have died when Thinking Machines collapsed. On the other hand, any kind of 3D mesh is probably pretty good for simulating physics in 3D. You can have each node model a block of atmosphere for a weather simulation, or a little wedge of hydrogen for an H-bomb simulation. But it might be useful to have one more dimension of connection for distributing global results to the nodes.

    --
    --- Often in error; never in doubt!
  16. as a former DOE employee by bsDaemon · · Score: 5, Interesting

    I worked in Instrumention and Control for the Free Electron Laser project at the Thomas Jefferson National Accelerator Facility. We also host the CEBAF (Concentrated Electron Beam Accelerator Facility), which is a huge ass particle accelerator.
    the DOE does a lot of basic research in nuclear physics, quantam physics, et cetera. the FEL was used to galvanize power rods for VPCO (now Dominion Power) and made them last 3 times as long. Some William & Mary people use it for doing protein research, splicing molecules and stuff.
    The DOE does a lot of very useful things that need high amounts of computing power, not just simulating nuclear bombs (although Oak Ridge does taht sort of stuff, as does Los Alamos). We only had a lame Beowulf cluster at TJNAF. I wish we would have had something like this beast.
    I want to know how it stacks up to the Earth Simulator.

  17. NOT the fastest! by VernonNemitz · · Score: 4, Interesting

    It seems to me that as long as multiprocessor machines qualify as supercomputers, then the Google cluster counts as the fastest right now, and will still count as the fastest long after this new DOE computer is built.

  18. Re:Fighting the temptation ... by flaming-opus · · Score: 3, Informative

    The SGI altix runs a hacked up version of linux that's part 2.4 with a lot of backported 2.6 stuff as well as the Irix scsi layer. They are migrating to a pure 2.6 OS soon. The IBM system runs AIX 5.2. The Cray runs Unicos, which is a derivative of Irix 6.5, though they seem to be moving to Linux also. I'm gonna geuss that they run totalview as their debugger. They use DFS as their network filesystem. They have published plans to hook all these systems up to the Stornext filesystem which does Heirchical Storage Management. MPI and PVM are likely important libraries for a lot of their apps.

    For these sorts of machines, one can by utilities for data migration, backup, debugging, etc. However, the production code is written in-house, and that's the way they want it. Weather forcasting, for example, uses software called MM5, which has been evolving since the Cray-2 days, at least. A lot of this code is passed around between research facilities. It's not open source exactly, but the DOD plays nice with the DOE, etc.

    The basic algorithms have been around for a long time. In the early 90's, when MPPs and then clusters came onto the schene, a lot of work was done in structuring the codes to run on a large number of processors. Sometimes this works better than other times. Most of the work isn't in writing the code, but rather in optomising it. Trying to minimize the synchronous communication between nodes is of great importance.

  19. Re:Talking out my ass here, but by Waffle+Iron · · Score: 4, Informative
    I'm sorry dude, but this macine is going to have more than 1 CPU in it, and the work will have to be split among the processors and ran in parallel.

    The number of processors isn't as important as the memory architecture. Clusters of workstation-class machines have isolated memory spaces connected by I/O channels. Many non-clustered supercomputers have a single unified memory space where all processors have equal access to all of the memory in the system. This can be important for algorithms that heavily use intermediate results from all parts of the problem space.

    Even so, for a given number of FLOPS, a vector machine would generally require fewer CPUs than a cluster of general-purpose machines. This reduces the amount of splitting that has to be done to the problem in the first place.

  20. Un-classified research uses by bradbury · · Score: 3, Interesting
    One of the major un-classified research uses is for molecular modeling for the study of nanotechnology. This really consumes a lot of computer time because one is dealing with atomic motion over pico-to-nano-second time scales. An example is the work done by Goddard's group at CALTECH on simulating rotations of the Drexler/Merkle Neon Pump. If I recall properly they found that when you cranked the rotational rate up to about a GHz it flew apart. (For reference macro-scale parts like turbochargers or jet engines don't even come close...)

    In the long run one would like to be able to get such simulations from the 10,000 atom level up to the billion-to-trillion (or more) atom level so you could simulate significant fractions of the volume of cells. Between now and then molecular biologists, geneticists, bioinformaticians, etc. would be happy if we could just get to the level of accurate folding (Folding@Home is working on this from a distributed standpoint) and eventually to be able to model protein-protein interactions so we can figure out how things like DNA repair -- which involves 130+ proteins cooperating in very complex ways -- operate so we can better understand the causes of cancer and aging.