Slashdot Mirror


Supercomputers To Move To Specialization?

lucasw writes "The Japan Earth Simulator outperformed a computer at Los Alamos (previously the world's fastest) by a factor of three while using fewer, more specialized processors and advanced interconnect technology. This spawned multiple government reports that many suspected would ask for more funding in the U.S. for custom supercomputer architectures and less emphasis on clustering commodity hardware. One report released yesterday suggests a balanced approach."

41 of 174 comments (clear)

  1. Cost comparison? by Tyrdium · · Score: 4, Interesting

    Ignoring size, how does the cost of a cluster of fewer, highly specialized computers (with special interconnects, etc.) compare with that of a cluster of more, less specialized computers?

    1. Re:Cost comparison? by ybmug · · Score: 5, Insightful

      The problem is that it may not be possible to match the computation of a cluster with specialized interconnects using just commodity hardware no matter how many machines you throw at it. If a simulation has a low computation to communication ratio it's scalability is bound by the perfomance of the interconnects. In this case throwing more commodity machines at the problem will actually increase the total time required to run the experiment.

    2. Re:Cost comparison? by mfago · · Score: 4, Informative

      The interconnects are (usually) not commodity parts -- just the servers.

      As an example, the first IBM SP "supercomputers" were essentially just common Power workstations bolted into racks, but connected with a custom made SP switch.

      Nevertheless, EarthSimulator has shown what can be done by designing the entire server from the ground-up with the application in mind.

      We'll have to see how ASCI Purple performs...

    3. Re:Cost comparison? by theedge318 · · Score: 2, Informative

      I recently had the opportunity to speak with the designers of ASCI Purple and Lightpath ... and there is definitely a reason that they cant use stock parts.

      Currently the interconnects are the biggest set back ... currently all of the supercomputers are designed with two dimensional floorplans ... with the goals of minimizing distances between each various aspects of the computer throughout the room.

      Lightpath which is designed to be a "low" cost super computer, is based upon a bio-med computer out of NY (probably Cornell ... but I can't recall) Even with this low cost design each machine will be a custom made dual processors. The communications protocols will actually be on the processors. To further reduce distances and communications issues, each rack will hold 2 clusters off the midplane. The curious part about Lightpath, is that it is not connected with switches ... each computer is connected in 5 directions, 1 vertically and 4 horizontal. The machines on the end loop around back to the other end. Because of this manner of networking the machine can reboot in minutes, instead of the 12 hours that it takes most super computers, b/c there is no heirarchy and precedence

      Common workstation modules can no longer be just bolted into specialized switched ... the communications needs to be on the chips.

      Furthermore after ASCI Purple and Lightpath, they are planning to build three dimensionally, although there a quite a few construction and maintence issues to be resolved.

      Performancewise, both machines are expected to perform on the order of 100's of terraflops ... however we might be seeing an end of the ASCI line of supercomputers, if the LightPath works out ... there will be an order of magnitude in difference in cost for on par performance.

      --
      Sig Nazi- "No Sig for you, come back 1 year."
  2. performance vs cost by harmless_mammal · · Score: 4, Interesting

    Teraflops per dollar is important, let's not forget that.

  3. Benchmarking by Anonymous Coward · · Score: 3, Insightful

    How does one go about bench marking a super computer specialized to do a certain task versus cheap computers in a cluster. Now we need to spend more money to develop specialized super computers even though the case scenerio presented in japan might not hold true to other applications? Seems a little too soon to start making recommendations

  4. Oh! No! End of the World! by Goalie_Ca · · Score: 3, Funny

    Skynet had 60 Teraflops IIRC and they're talking about 100!

    Let's hope this isn't tied into Nukes somehow. Wait a sec, a massive virus has already spread disabling millions of computers!

    RUN HIDE! THE END IS UPON US!!!!!!!

    --

    ----
    Go canucks, habs, and sens!
    1. Re:Oh! No! End of the World! by BabyDave · · Score: 5, Funny

      There's a far more important thing to worry about - could this be the end of "Imagine a Beowulf Cluster ..." jokes? After all, the phrase "Imagine a custom-built supercomputer utilising similar technology (albeit more specialised) to that found in one of those!" doesn't exactly roll off the tongue, does it?

  5. Someone who's knowledge please tell me by Raul654 · · Score: 3, Interesting

    The Japan Earth Simulator outperformed a computer at Los Alamos (previously the world's fastest) by a factor of three while using fewer, more specialized processors...
    What is the difference between processor designed to simulate earthquakes (et al) and an ordinary, off-the-shelf processor? I mean - so they optomized floating point operations. Is that it?

    --


    To make laws that man cannot, and will not obey, serves to bring all law into contempt.
    --E.C. Stanton
    1. Re:Someone who's knowledge please tell me by Boone^ · · Score: 3, Informative

      Ordinary off the shelf microprocessors don't have the bandwidth to memory or bandwidth to other processors to simulate complex problems. NEC's machine is a Vector architecture (SX-6), similar to the kind you see from the Cray X1. Vector architectures are a SIMD-style processor.

    2. Re:Someone who's knowledge please tell me by QuantumRiff · · Score: 3, Interesting
      Generic processors are ineffecient. Imagine having the fastest processesor on earth, and then take that chip and use it to do the calculation of x1++ (thats x1 = x1 +1 for you non-C'ers)and looping it a few Trillion times. Then take a processor that is desinged specifically to do x1++, and only that calculation. You can run a hell of alot faster, you don't need to worry about having to multiply, devide, etc.. they're smaller, and cooler, and after the cost of engineering them, cheaper.

      Can't remember the link, but somebody made a board with a few FPGA chips (I think) that cracked a 56bit DES key in a few days or less, and distributed.net had how many computers working on it for how many years?

      Its all about designing the chip for the application. The ones they are refferring to would probably be designed to do mass computation of heavy physics, and only be able to run custom Nuke Simulation software.

      The thing I am interested in, as an Ex Computer Systems Engineering major, is are they interested in designing and fabbing processors from the ground up, or using an assload of FPGA's or something from a company like Altera and program them..

      --

      What are we going to do tonight Brain?
    3. Re:Someone who's knowledge please tell me by ihowson · · Score: 2, Insightful
      Can't remember the link, but somebody made a board with a few FPGA chips (I think) that cracked a 56bit DES key in a few days or less, and distributed.net had how many computers working on it for how many years?

      I think you're thinking of the EFF's DES cracking machine. It used a custom gate array chip - it took advantage of the cheapness of an ASIC, but not the extra efficiency (they couldn't afford to have the first round of chips not work properly - a large proportion of the chips didn't work properly anyway). IIRC, it searched the keyspace in 3.5 days.

      There have been many other groups to attack DES on FPGA's, but none have achieved the same scale as the EFF machine. I will be attempting it myself very, very soon (as soon as I can get the key buffer in my design to work on actual hardware, we're all set - today, if I'm lucky!). Some (extremely) preliminary figures suggest that we might be able to match the EFF machine on larger Xilinx FPGAs for only a few tens of thousands of dollars (it cost almost $250k).

      I'll be looking at the problem specifically mentioned in the blurb - comparing the price/performance ratio of FPGAs vs. software. At the moment FPGAs are looking they'll come out well ahead, but I have hope for bitslicing techniques to narrow the gap a bit. There are also ciphers that are designed to run well on software and are hence difficult to attack in hardware (DES was designed to run well on hardware).

  6. *flops not necessarily important... by Shenkerian · · Score: 3, Funny

    What if you care only about integer operations?

    --
    You tell me how "whilst" differs from "while," and I'll stop calling you a pretentious jackass.
  7. Specialization by bersl2 · · Score: 4, Interesting

    If you're going to have a supercomputer do one thing, of course specialize it. An Earth simulation surely has a set number of formulae whose calculations are to be optimized as much as possible, even to the hardware level.

    But if you want a versitile, general-purpose supercomputer, why not go with the clustering solution?

    1. Re:Specialization by jstott · · Score: 3, Informative
      But if you want a versitile, general-purpose supercomputer, why not go with the clustering solution?

      Because some problems don't work on clusters--things like large-scale molecular dynamcis simulations with long-range spatial interactions.

      Problems that require the nodes to share massive amounts of data between nodes (gigabytes per second and up--these problems often have N^2 behaviors) don't do so well on a cluster since they tend to saturate the network. A shared-memory system, like a supercomputer, on the other hand, can provide much better memory access times (top of the line Cray's have a peak memory transfer rate of 204 GB per sec per node [yes, 204 gigabytes per second]) and since there's only one copy of the memory, there can often be a lower peak bandwidth requirement.

      In short, it all depends on the problem you need to solve. Some problems work very well on clusters, others do not.

      -JS

      --
      Vanity of vanities, all is vanity...
  8. The motivation is a tad depressing by Faust7 · · Score: 4, Insightful

    The two studies resulted, in part, from NEC Corp.'s May 2002 announcement of the Earth Simulator, a custom-built supercomputer that delivers 35.8 teraflops. That system packed five times the performance of the fastest U.S. supercomputer at that time...

    "The Earth Simulator created a tremendous amount of interest in high-performance computing and was a sign the U.S. may have been slipping behind what others were doing," said Jack Dongarra...

    Graham said researchers should not overreact to NEC Corp.'s Earth Simulator that blindsided many in the high-performance computing community eighteen months ago by delivering a custom-built system five to seven times more powerful than the more off-the-shelf clusters developed in the U.S.


    I don't mean to draw a crude analogy here, but I really can't help but read this and be reminded of the space race.

    It took Sputnik to kickstart our spacemindedness; I for one consider it sad that a "tremendous amount of interest" -- and the funding that comes with it -- in high-performance computing seems only to have arisen/regenerated with the influence of competitive international politics. Are we really so hardly advanced that our respective national egos are still the driving force behind enthusiasm, financial or otherwise, in certain areas of science?

    1. Re:The motivation is a tad depressing by Pharmboy · · Score: 4, Interesting

      It took Sputnik to kickstart our spacemindedness; I for one consider it sad that a "tremendous amount of interest" -- and the funding that comes with it -- in high-performance computing seems only to have arisen/regenerated with the influence of competitive international politics. Are we really so hardly advanced that our respective national egos are still the driving force behind enthusiasm, financial or otherwise, in certain areas of science?

      I don't really see that as bad. Yes, it may look like pure ego, but the space race gave us so much that filtered into the commercial/private sector. From advanced computers to Velcro(tm). From my perspective, being the most advanced nation in as many areas as possible is a good defense, both economically and in a homeland security sense.

      Frankly, I don't want the fastest computer chips on the desktop to be designed by a company in another country (even if Intel makes them outside of the US) and I would rather that the cutting edge, be cut here, in my native country. I am sure other people in other countries feel the same, that pushed all of us to new heights. In the end, the technologies are shared anyway. Most anyone in the world can buy Intel chips, for example.

      If no one cared who could race a bicycle the fastest, Lance Armstrong would be just some guy who had cancer. Instead, our desire to compete and excell and outdo our neighbors has benefited EVERYONE a great deal. It can bring out the bad side from time to time, but the benefits far outweigh the costs. This urge to compete and win is not unique to America by any means, it is part of being human: man the animal.

      I say bring on the computer chip wars: Lets all compete, Japanese, Americans, Europeans, Russians, come one come all. In the end, we will all benefit, no matter who has the bragging rights for a day.

      --
      Tequila: It's not just for breakfast anymore!
  9. Relative speed by silmarildur · · Score: 2, Insightful

    Is there a way to really compare the speed of a supercomputer and commodity hardware? If anyone could give either a quick explanation or a link to the relationships between bogomips teraflops MHz and the whole lot I would be very much appreciative.

    --
    -Silmarildur
  10. Specialized always outperforms... by I'm+a+racist. · · Score: 5, Insightful

    Specialized hardware (almost) always outperforms commodity stuff.

    I use custom designed amplifiers because they work better for my application. I could buy off-the-shelf stuff (~$500~$10,000 range), but that won't be exactly what I want. I use custom software too... know why? Because it's designed specifically for the job. That same software shouldn't really be used for other fields of research, neither should my amplifiers. The thing about this stuff is that it takes a lot of time to maintain (plus initial development). That means grad students, postdocs, and technicians who may spend over 90% of their time just keeping systems in working order and/or adding features. The benefits of customized hardware/software, in this instance, is worth the headaches associated with it.

    All of my optics is commodity stuff (some is rare/exotic, but it's still basically black-box purchasing). I don't have the facilities to make coated optics, nor do I need anything that specialized, so... I just buy it.

    When I was in telecom, we used Oracle and Solaris and Apache. It worked, and the cost of developing the same functionality in-house was ridiculously high (plus we'd never get to designing our products that sit on top of it).

    Eventually, it always comes down to a comparison between the cost (man hours, equipment, etc) of custom building and of integrating stuff from OEMs.

    So, the question our labs need to answer is, does clustered COTS hardware get the job done? Supplementary to that, is it cost-effective to buy/design it in light of the previous answer?

    In any field where you are pushing the limits of technology, you have to make such trade-offs. Personally, I don't care who has the absolute fastest supercomputer (measured in flops, factoring-time, whatever)... what really counts is, who does the best research with the supercomputers.

    --


    Down with Saudi Arabia!!!
  11. Specialization by bytesmythe · · Score: 4, Insightful

    Specialized systems are almost always going to outperform generalized systems when you're dealing with similar levels of technology (for instance, specialized abacasuses vs. a generalized Cray T3E).

    The great thing about generalized systems is you can use them to explore new areas, then design a specialized system to take advantage of specific optimizations the generalized one can't support.

    I'm glad for the report suggesting a "balanced approach". I can't imagine forsaking one type of system for the other, as each has its place. (Uhoh... generalized systems have a "place"? Does that mean they're specialized at being generalized? Oh, the irony! ;))

    --
    bytesmythe
    Hypocrisy is the resin that holds the plywood of society together.
    -- Scott Meyer
  12. trigonometry? by SHEENmaster · · Score: 3, Informative

    I assume that hard-coding trig functions into the tertiary processors would be advantagious for this. I know it violates the spirit of RISC in general-pupose computing, but for such a large scale system with so many processors it coould be advantagious.

    Do HP's Saturn or other such special-purpose processors have hard-coded higher-level functions?

    --
    You can't judge a book by the way it wears its hair.
    1. Re:trigonometry? by Gherald · · Score: 5, Funny

      Do HP's Saturn or other such special-purpose processors have hard-coded higher-level functions?

      Indeed, functions Cost_an_arm_and_a_leg() and Fork_over_much_dough() are hard-coded, and always return a value of "1".

  13. Invest in Cray by Teahouse · · Score: 2, Interesting

    Cray is back and getting back into the government contract game. Suprisingly, they are doing it just as the DOD is realizing that they need specialized hardware like they used to when Cray was one of their best suppliers. Look for little ol Cray to be back in the black real quick, and pick up a few shares now.

    --
    "Curiosity killed the cat, but for a while I was a suspect."- Steven Wright
  14. Good question by Anonymous Coward · · Score: 2, Funny

    Good question there man.

    I am also wondering, which should I get? I mean, with Doom III on its way, to get decent frames should I go specialized supercomputer, or a linux beowulf cluster?

  15. This greatly surprises me by ikewillis · · Score: 4, Interesting
    As an employee of an atmospheric modelling group I am very surprised to hear this. Our atmospheric modelling program, the Regional Atmospheric Modelling System, is not I/O bound in the slightest and is instead very much CPU bound. We currently use 100bT for the interconnect on our cluster, and have tried moving to Gigabit with negligable performance gains.

    The main area in which we saw benefit was switching from the Portland Group Fortran Compiler to the Intel Fortran Compiler, which cut the timestep (simulation time/real time) nearly in half.

    Every cluster in the department is assembled from commodity x86 components. Groups here have been moving from proprietary Unix architectures to Linux/x86 systems and clusters. Our group started out on RS/6000s, then moved to SPARC, and is now moving to x86. In terms of price/performance there really is no comparison.

    As for TCO, the lifetimes of clusters here are relatively short, one or two years at the most. Thus a high initial outlay cannot be set by lower cost of operation.

    1. Re:This greatly surprises me by bathmatt · · Score: 3, Interesting

      I also work in the geophysical modeling arena and you will find that one of the biggest differences in using a purpose build S/C versus a lot of OTS equipment is memory speed. It is typical to reach only 10% of peek efficiency when running an application, even with nice structured problems like you are running. While you claim that you are CPU bound, you really are not. For example, if you run on a slower CPU but with a better memory subsystem or a larger cache (example SGI vs intel/linux) you will find that the SGI will win even though it is a much slower machine on paper. This is because the memory thruput and large cache. Now, to explain why you don't notice the speed up when you went from a 100-1G network, that is because your latency did not change much in that. You are typically sending lots of small packets (assuming you are not doing variable packing and the only atm model I know doing this is WRF) you are never really getting out of the latency mode and not seeing much improvement on the communicatoin speed. This is why people use myrinet (SP) because this can be accessed from the application, not going through the kernel and start transfer much quicker. (For typical latency/bandwidth numbers for a structured grid halo exchange google wallcraft halo and you will get numbers for all different types of machines and code to test yours)

    2. Re:This greatly surprises me by FullyIonized · · Score: 5, Interesting
      And I'm surprised to hear that you are surprised since fluid modeling is one of the applications that do very well with the vector processors that the Earth Simulator uses. I attended a lecture by Dr. Sato, head of the Earth Simulator, who stated that the best application usage was 65% peak usage (the theoretical peak which assumes that the processor always has data to crunch and no branches) and the average was 30% of theoretical peak. By contrast, typical fluid-like codes on current U.S. machines typically get less than 10% of peak usage if they have any type of implicitness (currently the magnetohydrodynamics code I use gives about 6% usage on an IBM SP that is #5 on the Top 500 supercomputer list).

      I get tired of seeing figures that compare peak flop rates and then don't mention that actually code usage isn't keeping up at all. The Japanese (and Europeans who are allowed to buy NEC machines) are absolutely spanking the US when it comes to fluid codes (for climate modeling for example) and it is largely because they are using vector machines with their old highly optimized Fortran (or High Performance Fortran) codes. The MPP revolution in the U.S. has been manna for the CompSci community, but has set the computational physics community back by 10 years (except for those lucky bastards with embarrassingly parallel jobs).

      I would give up an unnecessary body part for an Earth Simulator.

      --
      Sigs are bad for you.
  16. Why oh why? by tomstdenis · · Score: 3, Interesting

    Definitely a really huge super-computer would be neat to have but honestly are they putting the ones we already have to good use?

    From what I've heard [anecdotally] computers like the earth simulator go vastly under utilized for the most part.

    So given that most nations [including the US] have budget problems specially concerning education couldn't people think of better uses for money?

    And before anyone throws a "it's the technology of it" argument my way, I'd like to add that if anything I'd rather have the money spent on researching how to make high performance low power processors [and memory/etc] instead. E.g. an Athlon XP 2Ghz that runs at 15W would be wicked more impressive than a 50,000 processor super computer that runs a highly efficient idle loop 99% of the time.

    Tom

    --
    Someday, I'll have a real sig.
    1. Re:Why oh why? by mfago · · Score: 4, Informative

      computers like the earth simulator go vastly under utilized for the most part

      From first-hand experience, such computers are running jobs almost 24x7. Due to job scheduling details there are times when some of the machine is idle, but this is still a small percentage. These machines are used for a vast array of applications, not just the advertized ones.

      Now the utilization as a percentage of peak theoretical is another matter. For some algorithms, 20% of peak performance (IIRC) is considered good (ie. a particular code might only get 2 TFlops on a machine rated for 10).

    2. Re:Why oh why? by gsabin · · Score: 2, Interesting

      From what I've heard [anecdotally] computers like the earth simulator go vastly under utilized for the most part.

      From my experience that is mostly untrue, yet widely publicized. Yes, if you look at utilization as the (used-proc*sec)/(totaltime*numprocs) the number can be relatively low (~60-70%). However, that includes system time, rebooting the machine, weekends, holidays, etc. Further, when it comes down to it the researchers need to have a reasonable turnaround time during the day for their development runs (when the utilization is much higher than 60%). Further, since these machines generally run jobs of different sizes from many differnt users there is an upper bound on utilization

    3. Re:Why oh why? by tomstdenis · · Score: 2, Insightful

      I don't get what you are saying. Before my Athlon I owned a K6-2. Before that I owned a MII 300, before that a MI 166 and before that a 486SX.

      Each time I bought a new computer it wasn't because I wanted to rival a local supercomputer. It was because newer technology existed that was faster than what I had. The newer processor allowed me todo more.

      If AMD could make a 2400+ which generates half the heat I would use it. And such a decision would have nothing todo with the local super-computer capabilities.

      That all being said a super-computer which uses off the shelf processors doesn't really "fuel" the science of electrical engineering. In fact of more importance in supercomputing would be reliability, uptime, maintaining sufficient inter-processor communication bandwidth, etc.

      None of which I'll ever use in a desktop processor [perhaps for 50 years or so]. Even in this age of computing multi-processor desktop boxes are fairly rare.

      So I think it's hard to say that the ability to cram more Xeons in a room really advances processor design. [or substitute another off the shelf processor].

      Tom

      --
      Someday, I'll have a real sig.
  17. Re:Interesting.. by RevMike · · Score: 2, Insightful
    So I guess my Athlon XP won't fit in their CPU socket will it? Damn... So much for cheap AMD CPUs for supercomputers.

    Stop being silly. The cooling requirements of an Athlon based massively powerful supercomputer would eat up the savings from using standard parts.

    Seriously, though - I would guess, actually, that if one were to build a supercomputer from a "desktop" processor, the PPC970 (aka G5) chips would be a good choice. They have a solid vector unit, are RISCier, have a wider bus, and a better pipeline design. Plus IBM's fabrication capabilities are excellent - which helps in reliability and upgradability.

  18. That's what I mean by Faust7 · · Score: 2, Insightful

    Frankly, I don't want the fastest computer chips on the desktop to be designed by a company in another country (even if Intel makes them outside of the US) and I would rather that the cutting edge, be cut here, in my native country.

    Good lord, why? Is it just national/istic pride? I see that as something to be outgrown with respect to driving, receiving, and appreciating scientific discoveries and technological advancements. Honestly, if Japan were to come out with, say, the first mass-produced DNA computer, I wouldn't be the slightest bit bitter, or reluctant to take advantage of it. I regularly praise other countries for doing things the U.S. hasn't.

    German physicists were primarily responsible for breakthroughs in their field in the 19th and early 20th centuries, and during that period there was quite a bit of resentment from American politicians and scientists whose feelings boiled down to nothing more than "We should have gotten there first." I won't argue that fierce competition has been beneficial to mankind at large (we've seen it in the computer industry, after all) but I don't think I'm wrong in wanting the motivation to be something a little less self-centered, political, immature. An idealistic vision? Hardly. It's not too much of an expectation for us to evolve beyond petty glare-throwing.

    1. Re:That's what I mean by Pharmboy · · Score: 2, Interesting

      Speaking as one who has played Civilisation until the late hours of the morning, I can confidently say that the country with the most advanced technology, wins.

      That term makes a lot of people uncomfortable: win.

      People assume that when you have winners, you must have losers. While this is true in Civilization, it need not be true in life. It is true that when America innovates, it may benefit more, but everyone else that uses the product can benefit as well.

      America put more money into developing the Internet, through DARPA, starting in 1969, and many of the companies (not all) that build equipment for using the Internet, from computers to routers, are American companies. But this has created tons of jobs in China and other countries, sparked competition in Europe and the Pacific Rim, and has created many jobs along the way. America certainly didn't do it alone, but it was the Cold War and the space race that fueled much of DARPA, and now, in its adolesence, the internet is just as accessible in England, France, Japan, Brazil or America, and its getting better every year for poorer countries. In this respect, there are winners, and those who are doing better.

      We win in that we develop the most technology, but since it shared, there are very few losers. Some have a problem with the fact that we benefit more, at least initially. Some will always have problems when one group benefits more. I just don't share their world view. I think it was Winston Churchill that said "Capitalism is wealth distributed unequally. Socialism is misery shared equally." (something like that)

      Like most of us, I have no issue with sharing technology and helping others, but I still want to be on the winning team.

      --
      Tequila: It's not just for breakfast anymore!
  19. Link to the Earth Simulator Center by GeoGreg · · Score: 3, Interesting

    If you'd like to see what these people are up to for yourself, here is a link to their website. Lots of performance data, lists of projects, etc.

  20. Not just for climate modeling by GeoGreg · · Score: 3, Informative

    There seems to be an impression in some comments that this machine has some sort of special design that's only applicable to climate modeling problems. In fact, this is a vector-based supercomputer, applicable to any problem where you need to perform vector operations (i.e., operating on large arrays of numbers in parallel).

    Certain numerical operations can be performed blindingly fast on these types of machines. Each arithmetic processor on this machine has 72 vector registers, each of which can hold 256 elements. Then you can perform operations on all 256 elements of 1 or more registers simultaneously! If the algorithm can keep the vector units fed, they will scream.

    Since keeping data flowing to the processors is critical to speed, the high-speed interconnects (~12GB/s) are a must for any problem that is not completely localized. It's all about matching the problem to the hardware. There may well be problems for which a commodity cluster just can't get the job done like this can. Remember that each node of a cluster consumes power, produces heat, and takes up space. The raw cost of hardware is not the only consideration.

  21. Nuts to that by DeathPenguin · · Score: 2, Informative

    Earth Simulator is impressive in its own reguard, but in no way is the majority of clustering apps going toward these 'specialized' systems. Governments, research labs, etc. want powerful computers that are dirt cheap. Los Alamos's ASCI Q (Installment 1, the Alpha servers) cost over $100,000,000 to build, while their Pink cluster cost about $6,000,000 in hardware. On paper, Pink and ASCI Q are both around 10 teraflops. ASCI Q runs Quadrics on 64-bit 66MHz PCI, Pink is getting an ugprade to Myrinet Lanai 10 on PCI-X (From Lanai 9 on 64/66PCI). Not only that, but Pink runs the open-source, 100% GPL'd Clustermatic software and can be booted in a matter of seconds rather than hours like ASCI Q.

    The fact is, systems like ASCI Q and the Earth Simulator just aren't practical. They may look great on paper, but there's not much that they can do that can't be done on x86. Given the choice between paying over a hundred million for a proprietary cluster that might not even be all that reliable (*cough*Q*cough*) and requires expensive software and maintenance contracts, we see companies like Linux Networx offering high-power clusters on common hardware and free software that are a fraction as expensive.

    As far as reliability goes, don't get suckered into thinking that proprietary and expensive mean quality. Q's failure rate is almost as high as my old Windows '98 machine hahaha. With the exception of a few missing chillers, Pink seems relatively healthy with only a few minor failures.

    If CRAY and NEC want to get into a pissing contest in specs, that's fine. If they offer something that Intel can't, more power to them. Otherwise, the five organizations in the world that own their systems can be proud that they have the most powerful computer on paper for a year or two before someone builds a cheaper x86 cluster that matches or out-performs them.

    1. Re:Nuts to that by afidel · · Score: 2, Insightful

      Actually, the customized vector machines will usually achieve a MUCH higher %age of their theoretical peak computational capacity on certain "hard" problems then a cluster of comodity machines. The nearness of the nodes dictates that, if the average near neighbor latency is an order of magnitude faster then problems that are communications bound are going to be able to achieve much higher throughput on a tightly coupled cluster of faster, more specialized nodes then they would be able to on a larger more loosly coupled cluster of comodity systems. If your problem happens to be one which is trivially paralized and you are not hamstrung by limitations like the 4GB limit on 32bit CPU's then of course you should use the cluster of cheap systems, but if you have a problem which has no such mapping then the only way to effectivly achieve your goals might be a custom machine like the Cray SV series or the NEC SX series. Just because a particular machine has a bad track record doesn't mean that a whole class of systems should be condemned, on the contrary, many supercomputer centers have had good luck with their vector machines.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  22. Yes, they DO offer something that "intel" can't. by mbkennel · · Score: 3, Insightful

    That is the whole point.

    I have the feeling the DOE (nuclear weapon simulation etc) simulation program is not going anywhere near as well as it was sold.

    Massive commodity clusters boast big numbers but they do not boast great useful throughput of USEFUL RESULTS. (also with massive clusters
    you have to be able to deal with inevitable hardware failures).

    You have a certain fluid problem---there is a certain speed of sound, and a certain physical geometry. What you want to do is to be able to simulate the real thing at ever smaller grid-sizes, that is, with greater numerical approximations to the physical fields.

    Ideally, if your problem were embarassingly parallel and clusterizable, then you could put any number of grid points on each CPU and crunch away. You want more grid points? buy more CPUs.

    The problem is that in actual physics the length scale of 'interaction' per time step does NOT go down---remember, speed of sound is constant as is physical geometry---imagine for instance the uh, radiative driven implosion of a certain unspecified dense material in spherical or cylinderical geometry into one unspecified not-dense material.

    So when you scale-up in the scientifically useful sense---and not the computer nerd sense---then a problem which used to be solvable efficiently on clusters NO LONGER IS SO. There is just too much communication, and this is driven by physical reality.

    It is not 'OK' to just say "change your code". The codes are developed with mathematical methods and based on experimental data gleaned over literally decades at great expense.

    Programming for these is not easy---but it is quite a bit easier for the large vector old-skool cray type machines than the clusters, where the human has to do almost all the scutwork (e.g. MPI).

    The problem is actually more severe with the DOE fluids problem---there are fundamental mathematical issues in the nearly inviscid flow (singular perturbation theory baby) which have not yet been resolved. And they appear at smaller and smaller grid sizes.

    This requires rapid development of models and validation at the physically important resolutions and you can't do this with a cluster.

    I have no inside information whatsoever but I smell that the sudden DOE and DOD interest in back-to-the-future retrosupercomputing is because of some major failures in the recent cluster efforts.

  23. In the real world its a bit more complicated... by depeche · · Score: 4, Insightful

    There is also a direct trade-off between more general purpose systems and systems custom tailored to a task. Good examples are Deep Blue and Blue Gene. Both of these systems are designed with a particular task in mind (i.e. chess and protein folding) and therefor are able to leverage knowledge about the problem space to constrain the kind of hardware, the particular low-level instructions and the information flow within the system while achieving signifigantly greater performance on a small class of problems. I work with clusters that are used in scientific communities that have various researchers working on various problems. In these cases, the questions are about basic applicability of a particular problem to a particular architecture. For example a cluster with high-speed interconnects made of good COTS hardware will allow a user with a very granular problem to effectively use the cluster and it will also allow a user who needs the high speed interconnect because the problem space demands a high degree of internal communication. But the first researcher might also be able to make use of a grid of (for instance) many more computers with a total lower cost because (s)he doesn't need the high speed interconnect. The Earth Simulator gains a lot of performance (on a class of problems) because of the underlying vector processor architecture. Given the right internal bus it is conceivable that adding vector processor daughter boards to the next generation of COTS clusters could achieve similar results--but, of course, only for problem spaces that make efficient use of such processors and aren't bottlenecked by the communication requirements.

    Real answers are always more complicated. For example: the equations needed for nuclear simulation will probably require dedicated hardware (as the need for protein folding has lead to Blue Gene) to achieve the results that the Pentagon needs. But for many super computing tasks, the flexibility of COTS clusters will still be compelling, especially for areas where the algorithms are not yet fully developed (e.g. brain simulation). An interesting keynote at OLS 2003 argued that (some of) the problems are not going to be the local computing power but the need to move large quantities of data between research labs across the world and combine computational systems using the 'grid.' (For a down home examples of problems that have been successfully tackled through course granular distribution just look at SETI@Home and Distributed.Net. So its not just the flops anymore...

  24. Re:It does matter by afidel · · Score: 2, Interesting

    Would you rather they simulate weapons or resume detonation testing of new designs?? The fact is the US has a VERY large and ever aging supply of weapons, most of the cycle time so far from the ASCI projects has gone towards stewardship of the existing crop of weapons, making sure that the stockpiles are safe and also that they will be effective(if god forbid they should be needed). Also, reduced consumption is the only thing that will reduce our environmental "problems". Personally I think anyone who thinks the US has much of an environmental problem needs to get out of LA/New York/whichever big city they live in. I have spent a lot of time enjoying the national parks of this great country and I can tell you that there are a lot of pristine wilderness areas and a lot of generally green land here (in fact the US landmass is one of the least densly populated non-desert areas in the world).

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.