Slashdot Mirror


Fastest Commercial Supercomputer To Be Built

Zeus305 writes: "Today NuTec Sciences, Inc. will be announcing its purchase of the world's fastest commercial supercomputer, second overall only to ASCI White. NuTec will use and lease time on the 1,250 clustered IBM servers to analyse genes decoded by the human genome project to try to better understand the causes of diseases like cancer by running month-long algorithms that analyse the relationships between different areas of the genome. This beast will have 2.5 terabytes of RAM and 50 TB of disk space."

29 of 106 comments (clear)

  1. Re:Ho-hum by Anonymous Coward · · Score: 2

    Actually, there are major disk I/O problems with supercomputing in many instances not because it needs to move data to and from the disk during compute time but rather because the files are frequently incredibly large and they need to be moved to archival storage/visualization/whatever. ASCI white, for example, is slated to be producing data sets that will take up an average of 5-8TB of space per session and all that data needs to be piped through a network for stoage/viz. They have fat network pipes (lots of fiber for those things), but disks often have a hard time keeping up in those situations. How many hours are we talking about for a single 8TB chunk of data to be pulled from the drive arrays? Spending the extra on faster disks doesn't necessarily speed computation, but it makes a lot of other work more efficient when you're talking about such a massive scale.

    Of course, I'm not sure what type of storage requirements they're going to have in that field or if they're planning on keeping the majority of the archival storage on the main system's 50TB or if that's just a "scratch pad" for huge jobs that will eventually be moved to storage arrays.

    :::shrug:::

  2. Re:Ho-hum by tolldog · · Score: 2

    I don't think they are looking for fast drives either. But... for storage of something this important I would bet that a lot of money did go in storage for some of those fancy scsi drives to go into a RAID system.
    If the data is really important, a redundant system is needed and that ofcourse can double the price. I have looked at good 10TB RAID systems and they are not cheap.

    --
    -I just work here... how am I supposed to know?
  3. Protein folding? by swingkid · · Score: 2

    Last i heard, wasn't IBM working on a supercomputer to be used for calculating protein folding? If I recall, that was supposed to be the fastest, when complete. Of course, I could be wrong.

    1. Re:Protein folding? by cperciva · · Score: 2

      wasn't IBM working on a supercomputer to be used for calculating protein folding?

      You're thinking of Blue Gene, which is supposed to hit 1 petaflop.

      Unfortunately it isn't supposed to be operational until 2005.

  4. Re:Parallel computing & computer science... by Natedog · · Score: 2

    "Compute the Fibbonaci sequence (without solving it for x) and race your PIII with this computer - and you might win."

    Or, transform the generic Fibbonaci recurrence into a dynamic programming problem an solve it in linear time :) -- this assumes of course unit time for computation so solving fib for small numbers is not very interesting. However, for arbitrarly large numbers (say 1 meg integers) you would be much better off distributing the dynamic fibbonaci program (which operates on an array: for(array[0]=1,array[1]=1,i=2; 1; printf("%i ",array[i++])) array[i]=array[i-1]+array[i-2];) and having one master computer which manages the array and the other computers would act as adders (each computer in a group would act as a sub adder and there would have to be some degree of carry lookahead).

    I guess my point is that many problems that appear to be "0% verctor-optimized" actually are not.

    --
    \forall code \in C, \frac{\Delta readability(code)}{\Delta t} < 0
  5. fastest COMMERCIAL computer by peter303 · · Score: 2

    The ASCI series are owned by the National Energy
    Labs. The ASCI series are sub-commercial proof-of-concept computers. That is, the mainstream makers are always bragging they can configure a teraflop computer, but no customer can afford them. So Uncle Sam kicks in a few bucks to call them at their word. Everyone wins. The government gets something really fast. The computer companies get an R&D test at government expense. The second customer, a commercial site, gets a more affordable computer.

    1. Re:fastest COMMERCIAL computer by damiangerous · · Score: 2
      The ASCI series are owned by the National Energy Labs.

      They're actually leased by the DoE, not owned, according to this CNet story.

      ASCI = Accelerated Strategic Computing Initiative

      Their site is here. More on ASCI White, including a picture, is here.

  6. Ah, but when will we have streamlined processors? by Mr.+Flibble · · Score: 2

    As discussed in the book Cracking DES chips specifically made to handle the DES algorithm are much faster than Alpha or X86 chips at cracking (brute force) the DES code. I don't know if this method applies to DNA, but my guess is that it might.

    Using a similar method as outlined in the book, I suspect that chips that are custom built to understand DNA and how it acts (or at least the inherant algorithms that will be used in studying DNA)are possible to build. Ironically, they probably will not be built until we understand genes and DNA better than we currently do. Once machines like this start to be built, how far off will we be from the machines described in Michael Chritons book Jurassic Park? Some scientists have purposed that it will take 100 years to understand the human genome (or other genomes) perhaps, but I think we are closer than that.

    --
    Try to hack my 31337 firewall!
  7. How does this compare to projects like seti@home? by IpSo_ · · Score: 2

    IBM claims this computer will do 7.5teraflops.
    Compare this to Seti@home's 26.11 TeraFLOPs/sec.

    Why wouldn't NuTec develope the software so every joe blow and there handheld could run a distributed client that does this. I personally have a hard time justifying time spent installing distributed.net or seti@home clients on all the machines I have access to, as I know my boss wouldn't understand the importance of cracking encryption, or searching for aliens on company time. ;)

    However searching for cures to human illnesses, who wouldn't want to do this? With a good piece of software and some proper advertising, theres no doubt they would surpass 30 or even 50 teraflops.

    Though this may not be a possibility if huge amounts of data are required for the calculations. Anyone have some ideas about this?

    --
    Open Source Time and Attendance, Job Costing a
  8. Hardware specs of ASCI White by SeanAhern · · Score: 2
    And in fact, ASCI White uses 375 Mhz Power3 SMP processors, somewhat slow in Mhz than what most people have on their desks.

    Here are the specs from a web page I found at LLNL:

    • 512 16-CPU nodes (8192 processing units)
    • 375 Mhz Power3 SMP processors
    • Will have 6 TB total of system memory
      • 8 or 16 GB per node
    • Will have 484 batch and 8 debug nodes
    Data from http://www.llnl.gov/computing/tutorials/jw.lcres/t sld014.htm.
  9. Re:useful? by SEWilco · · Score: 2
    "think of what all that processing power can do. is this really what we need it to do? raw power like that, available only to those who can construct it. oh well."

    What do you want, to give the results of this effort to those who don't need it and can't use it? "Here are your food stamps and your array processing ration card."

  10. Re:How does this compare to projects like seti@hom by travisd · · Score: 2

    Gene matching is very data intensive -- basically one of the things that they're going to be doing is matching each part against every other part to see where things match. This means having to have a copy of all of the data available to each client. Seti/distributed/etc are really just compute-intensive apps -- they don't need to see a huge data set to do their work.

  11. doomed to fail by Capt+Dan · · Score: 2

    2.5 Terabytes? Piff. That's not enough ram. Tell those boys to call back when they're ready to play.
    Sig:

    --
    Sig:
    Barbeque is a noun. Not a verb.
  12. Re:I'm glad to see... by FattMattP · · Score: 2
    The next big issue will be how fast will they be churning out cures/treatments?
    It probably won't be anywhere near as fast as they will be churning out patents, unfortunatly.
    --
    Prevent email address forgery. Publish SPF records for y
  13. Re:What a ratio! by jmv · · Score: 2

    What's the point of virtual memory when you've got RAM like that

    That's the thing: I'm guessing that this machine doesn't use swap at all. There would be so much memory to swap with a (relativly) very small disk bandwidth. For that reason, if you want any performance, you need to put almost everything in memory. The disk is only there when you need permanant storage.

  14. Re:Ho-hum by john@iastate.edu · · Score: 2
    Regardless of whether they need "super-duper-ultra-scsi7" or not, you've still forgotten about controllers, cabinets, power supplies, cables, and stuff. And my guess is, with something like this you are talking hot-swappable RAID arrays, and that stuff isn't quite at spitting-distance-from-free yet.

    --
    Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
  15. finally a good cause by ^chuck^ · · Score: 2

    I'm sick of all the money being spent on supercomputers that simply make sure that dollars and cents add up (for instance, VISA's supercomputers). It's heartening to see how motivated people (or companies) are willing spend the doe to solve cancer. My father narrowly escaped it a few years ago, and I wish a cheaper and not so nasty treatment could be developed to prevent it. Others have said, but it needs repeating, that hopefully this will spurn competition in the medical market as pharmucitical companies try to beat each other to better cures.

    --

    Lemure, wtf! Don't you mean Lemur?
  16. But could they.... by slashdoter · · Score: 2
    I have a question about these clusters, is it posable to add extra nodes after it is up and running? It would look to me as though you could just keep adding nodes year after year to keep it up to date. Did I miss something ?


    ________

    --
    Does anyone actually have a Java program designed to control air traffic, or for the operation of a nuclear facility?
    1. Re:But could they.... by Flat5 · · Score: 2

      You could, but the key to the power of this type of machine is not the number of nodes, its the switch. The switch is the bottleneck, and adding nodes to a machine this size doesn't buy you much for the kinds of problems they're running.

      Flat5

    2. Re:But could they.... by tolldog · · Score: 3

      Not always.
      The switch could be a bottleneck. It doesn't have to be. It all depends on how much data has to be transfered.
      I know that with a render farm, which is a NOW (network of workstations) the switch is not the limiting factor. The machine pulls in data, thinks and spits out data, in small chuncks (a couple K at a time). Any switch should be able to handle this.

      Also, if more of an interaction is needed between machines, they can be networked in hypercube configs by adding a few more switches.

      The backbone can get in the way, but if that is what the limiting factor is, then maybe smaller groups should be used, each working on different segments. Or, more stuff should be stored locally with some sort of smart push script.

      --
      -I just work here... how am I supposed to know?
  17. Re:Payment by update() · · Score: 2

    Uhh, yeah. This sort of thing is precisely why patents exist. This isn't Linux - biotech venture capitalists aren't stupid enough to sink billions into harebrained schemes to recoup costs by selling hard drive space or airplane tickets.

  18. This could be dangerous. by Verteiron · · Score: 2

    They keep this kind of thing up and soon we'll be finishing stuff before we start it, and who knows what sorts of embarassment that will cause.

    --
    End of lesson. You may press the button.
  19. long time by Spit_Fire1 · · Score: 2

    It would take a traditional computer 447 years to solve the first equations that were constructed, according to Michael Mott, a spokesperson for NuTec Sciences, Inc. That's a long time to wait for the health miracles promised by the decoding DNA.

    I'm sure than in only 47 years not 447, that there will be pcs that can put ascii white to shame.

    --

    "The secret of success is to know something nobody else knows." -Aristotle Onassis
  20. Finally! by Steejee · · Score: 2

    At last a decent machine for playing Quake 3 on!

  21. Re:Parallel computing & computer science... by acacia · · Score: 3

    The advances in parallel computing today are driven by three factors, as I see it. Marketing, homogeneous (relatively) environments, and Mathematics & Methodology.

    1.) Marketing - Computer manufacturers and systems integrators/consultancies purport to be able to solve bigger/more ambitious problems. Moreover, it makes good business sense to be able to do so. Business got a hit from the crack pipe of information and they got hooked. These problems now fall outside of the realm of national security. Furthermore, government work can be precarious for many companies, and by diversifying their wares and selling to public corporations, vendors spread the risk around.

    2.) Homogeneous (relatively) dev/prod environments - Not too many people can claim knowing how to program for a Cray or Thinking Machines box, but a lot of intelligent people can move around in/administrate a UNIX environment, and some of them can code to a messaging interface. For that matter, some know tools like Ab Initio or Orchestrate and can create parallel applications very easily.

    3.) Mathematics and methodology have changed - People now recognize the conceptual and practical challenges of parallel computing, and can tailor the algorithms, hardware, and OS to accommodate the challenges of that paradigm.

    Seeing random poster's on Slashdot recognize that compartmentalized data and code is necessary for distributed computing to be effective is a tribute to how far this field has come. The engineering has come a long way, as has the marketing, and overall level of conciousness.

    As for the new adjective, I would say that wider is o.k., but you have to recognize that a machine does not have to be uniformly wide. I think of parallel programming as a stream metaphore, with speed (CPU), width, (#of CPU's or units of work/data ways parallel) and depth (depth of queue/instructions between checkpoints). How about liters? :-p

    --
    ~Religion is O.K., as long as it gets you laid.
  22. Re:I'm glad to see... by Verteiron · · Score: 3

    Well, here's a link to IBM's story on the thing. It delves a LITTLE more into the technical side of it but not much more than the CNN article.

    --
    End of lesson. You may press the button.
  23. Sorry to be cynical, but. . . by Sialagogue · · Score: 3

    I was surprised to read the responses and not see any discussion of these increasingly "super" computers' ability to crack strong encryption.

    Believe me, I'm not an Area-51-head, but in the few short years after strong ecryption has been widely available to concerned citizens and terrorists alike there seems to be many more huge supercomputers getting built, each with a greater altruistic purpose attached. "It will allow us to test nuclear weapons without building them! It will cure cancer!"

    It's wholly reasonable to assume that there are military initiatives to ensure that we can't be snuck up on by PGP-wielding bad guys. As someone not wanting to be blown up, I hope there are, anyway. It's also wholly reasonable to assume that the military couldn't amass the kind of hardware necessary to do this without lighting up some analyst's bat-computer.

    But does anyone feel that the initiative could survive being entirely in the light of day? What would the /. response be to an announcement that says "Military announces super-computer initiative break strong encryption in real time, promises to leave private citizens alone!"

    Of course I'm not saying that every computer faster than a Pentium 4 is part of an arms program, there are serious economic incentives to making progress in cancer treatment. I just think that we would expect to see the military arming itself with the weapons du jour, and my guess is that a few are probably sitting in plain view.

    --
    The only acceptable defense of scientific results is to say that they were the product of the Scientific Method.
  24. I'm glad to see... by glebite · · Score: 4

    That very powerfull distributed systems are starting to become more mainstream. It's about flipping time that companies made use of computing resources beyond their previously wildest dreams.

    Estimates of 437 years compressed to 1 month timeframes are awesome! The next big issue will be how fast will they be churning out cures/treatments? If this helps speed this up, there will certainly be a great number of lives saved.

    Hopefully though more companies will jumpt to the forefront, and try to outdo each other ( you know they will) and come up with more radical applications and solutions.

    I was curious - it had been asked - what OS are these beasts running?

    --
    I donate all spillover Karma to the charity of my choice... Ada was still a babe despite what people may say...
  25. Parallel computing & computer science... by mwalker · · Score: 5

    Isn't this trend towards faster supercomputers being driven by advances in Computer Science, rather than Engineering?

    Remember the Cray Y-MP? Used to be the world's fastest computers were designed to be extremely fast CPU's, built as a sphere to shorten contact length and liquid-cooled. Parallel computing was possible then - the problem was that we couldn't break down the problems we wanted to solve into parallel events.

    Today's brand of parallel supercomputers exist to solve a different kind of problem - a problem in which the "search space" can be compartmentalized and distributed- like the RSA challenge, fluid dynamics, chess, and -of course- the human genome.

    The thing to remember when we read about ever-faster parallel computers is that, for all intents and purposes, when you have to solve a truly sequential problem (what the cray folks would have called a 0% vector-optimized problem) - today's supercomputers usually aren't any faster than the desktop computer you're sitting in front of. Compute the Fibbonaci sequence (without solving it for x) and race your PIII with this computer - and you might win.

    Just something I wish they'd point out. We need a better adjective than faster for parallel computers. They're something else. Maybe... wider.

    Suggestions?