Slashdot Mirror


On the Supercomputer Technology Crisis

scoobrs writes "Experts claim America has been eating our 'supercomputer feed corn' by developing clusters rather than new supercomputer processors and interconnects. Forbes says America is playing catch-up and that the new federal budget items are too little too late. Cray is laying people off due to decreased federal spending and claims lower margin products have forced them to create products based on commodity parts. Red Storm, one of their new Linux-based products, is being delayed to next year."

23 of 347 comments (clear)

  1. it makes sense by dncsky1530 · · Score: 4, Insightful

    when you can build a top 5 supercomputer for under 6 million dollars, using off the shelf parts. Why spend the hundreds of millions of dollars?

    1. Re:it makes sense by Otter · · Score: 4, Insightful
      If you RTFA, an administration panel on high-end computing claims that clusters are inappropriate for certain tasks. I don't necessarily trust the claims of what I assume is an industry-heavy panel, but then I don't necessarily trust the supercomputing expertise of a bunch of Lunix fanboys "administering a network" in their parents' basement either.

      My inclination is to let the market sort itself out, although if supercomputer makers go under, they won't necessarily reappear the moment they're needed.

    2. Re:it makes sense by Rosco+P.+Coltrane · · Score: 4, Funny

      Well, I don't have that kind of money, but if I did, I'd rather get me an older Cray than a cluster of beige-box PCs. If nothing else, Cray machines are classy and impressive, and when we make clients visit the premises, we can go "oh, and this is the Cray computer, crunching at numbers for you" and look at the customer being impressed, as opposed to "oh yeah, that pile of nondescript computer, that's the 1000-node beowulf cluster of AMD Cheaperon computers". I'm sure the extra value in marketting would be worth it...

      --
      "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    3. Re:it makes sense by badboy_tw2002 · · Score: 5, Insightful

      Then trust the fact that not all problems are easily attacked from a parallel perspective. This means problems where working on one section of the dataset affects large amounts of data in other sections. There's a lot of locking and waiting for tasks in other parts of the system to be completed; and a lot of data transfer/need for shared memory, which if you're bussing between cluster components, its going to be slow.

      This doesn't mean that clusters don't have some use in these regards, it just means that for these types of problems no one has figured out an efficient parallel algorithm to use on them.

    4. Re:it makes sense by ch-chuck · · Score: 4, Insightful

      because, sometimes you need two strong oxen instead of 10240 chicken.

      --
      try { do() || do_not(); } catch (JediException err) { yoda(err); }
    5. Re:it makes sense by grawk · · Score: 5, Insightful

      As someone who works for a supercomping center, I can say that some things work VERY well on cheap unix based clusters. I am the primary admin on a 5 TFLOP cluster. We've also got a Cray X1, and while it's only 2.6 TFLOPs, it will eat my IBM's lunch when it comes to some specificly tuned tasks. Much in the same way that we can outperform mac clusters that have significantly higher floating point performance because of the speeds of the interconnects. Supercomputing is about a LOT more than just raw CPU power.

  2. Expected fallout from the Beowulf takeover by beee · · Score: 4, Insightful

    This is an expected and predicted fallout from the recent rise in popularity of beowulf clusters. Slowly but surely managers are realizing, yes, it is possible to have a supercomputer on mass-market hardware, running a free OS.

    Don't see this as bad news... it's a sign that we're winning.

    --


    + Donald Gunth
    + Email: dgunth@quicktek.net
    "Caffeine is the greatest lubricant ever created." -ESR
    1. Re:Expected fallout from the Beowulf takeover by susano_otter · · Score: 4, Insightful

      Since clusters are so much cheaper than mainframes, it's often the case that clusters still offer better performance for the money spent than a mainframe would, even if the cluster isn't really optimized the way the mainframe is, for the task at hand.

      That being the case, wouldn't it make more sense to invest heavily in R&D to solve the cluster's problems and remove its limitations, than to invest heavily in R&D into next-gen mainframes?

      --

      Any sufficiently well-organized community is indistinguishable from Government.

  3. Law of Diminishing Returns by Billobob · · Score: 4, Insightful
    It appears to me as if we have reached the point where supercomputers aren't really as practical as they were before. Fewer and fewer industries need and prefer supercomputers to a cluster of cheap PCs, and the market is simply heading in that direction - nothing really unique happening here other than capitalism.

    Of course people are going to cry that companies like Cray are falling by the wayside, but the truth is that their services simply aren't as needed as they were in years past.

    --
    If you have to ask, you'll never know.
    1. Re:Law of Diminishing Returns by susano_otter · · Score: 4, Interesting

      I suppose the counter-argument would go something like this:

      It's true that supercomputers aren't really all that useful or necessary these days. However, it may be that a future computing problem shall arise, which requires a next-generation supercomputer to solve. So we'd be well-served to have a next-generation supercomputer fresh from R&D, to apply to the problem.

      We may only encounter one or two more supercomputer-class problems, but they might be important ones. We should be prepared.

      On the other hand, we may encounter a problem that can only be solved by horses. But we don't see a lot of buggy-whip subsidies these days...

      --

      Any sufficiently well-organized community is indistinguishable from Government.

  4. "Feed' Corn? by jmckinney · · Score: 4, Insightful

    I think that should have been "Seed Corn."

  5. Expert complains: by Anonymous Coward · · Score: 4, Insightful

    Free market sucess might lead to us actually having to pay for our own supercomputer research that we use in profit making ventures.

  6. I Need A RAIS by grunt107 · · Score: 5, Interesting

    Random Array of Inexpensive Servers.

    If the 'supercomputers' of today are increasing performance, does it really matter the design?

    Maybe that is a signal that monolithic computer tasks are best handled in a hive mentality - have the Queen issue the big orders, have the warriors performing security, have the workers transporting the goodies (data), and have the requisite extra daughters and suitors to grow the hive and assure its viability (redundancy).

    The fact that it is cost-effective is even better.

  7. Re:Inevitable by mfago · · Score: 4, Informative

    a mesh of nodes on a network will do just as well

    In some cases.

    Unfortunately, some problems are particularly unsuitable for clusters of commercial computers, and really benefit from specialized architectures such as shared memory or vector processors.

    A while ago it was decided by the US government to essentially abandon such specializations, and buy COTS. It is certainly cheaper, but not necessarily effective.

  8. You are all missing the point by Anonymous Coward · · Score: 5, Insightful

    Its the fact that clusters require higher skill to program efficiently for than do single processor systems. Plus you have all of the wasted processing power used for communication between the nodes. Granted, many problems lend themselves well to distributed computing (essentially what a cluster is, but the nodes are closer and communicate faster), but there are also problems that are handled better by a smaller amount of specialized hardware. The other point is that by using off the shelf parts, we are not really innovating in this space like we should be. We are allowing the commodity computer market determine the direction of the supercomputer market.

  9. Trickle Down by Anonymous Coward · · Score: 4, Interesting

    Technology first developed on the high end slowly works it's way down into the low end. What happens when the high end is no longer there.

    Not that many people really need a race care, but advances in fuels, materials, engineering in race cars eventually leads to bette passenger car. And for raw performsnce, strapping together a bunch of Festivas will not get you the same as an Indy racer.

  10. Without a market you can't survive long term by wintermute42 · · Score: 4, Insightful

    There seems to be some historical revisionism going on regarding the demise of the "supercomputer industry". People are coming out of the woodwork now saying that lack of government support caused the great supercomputer die off.

    As Eugene Brooks predicted in his paper Attack of the Killer Micros, the supercomputer dieoff was caused by the increasing performance of microprocessor based systems. Many of us now own what used to be called supercomputers (e.g., 3GHz Pentinum processors, capable of hundreds of megaFLOPs).

    The problem with supercomputers is that high performance codes must be specially designed for the supercomputer. This is very expensive. As people were able to fill their needs with high performance microprocessors they quit buying supercomputers.

    Many people who need supercomputer levels of performance for specialized applications (e.g., rendering Finding Nemo or The Lord of the Rings) are able to use walls of processors or clusters.

    There are, of course, groups where putting together off-the-shelf supercomputers will not suffice. But these groups are few and far between. As far as I can tell they consist of the government and a few corporations doing complex simulations. The problem is that this is not much of a market. Even if the government funds computer and interconnect architectural research, there does not seem to be a market to sustain the fruits of this research.

    In the heyday of supercomputers there were those who argued that when cheap supercomptuers were available the market would develop. The problem is, again, programming. High performance supercomputer codes tend to be specialized for the architecture. Also, no supercomputer architecture is equally efficient for all applications. It is difficult to build a supercompter that is good at doing fluid flow calculations for Boeing and VLSI netlist simulation for Intel (the first applications tends to be SIMD, the second, MIMD). The end result of these problems tends to suppress any emerging supercomptuer market.

    The reality right now seems to be that those who are doing massive computation must build specialized systems and throw a lot of talent into developing specialized codes.

  11. What tasks require high-speed interconnects? by Bruce+Perens · · Score: 5, Insightful
    One of the nice things about clusters is that they encourage people to consider how to decompose a problem so that it can work without a large high-speed shared data memory. Some of the older supercomputers were important because scientists hadn't done this work because there wasn't the economic incentive back then. Now there is one.

    So, what tasks still require a high-speed shared data memory? Answer that, and you'll understand where you can still sell a supercomputer.

    Bruce

  12. About time... by 14erCleaner · · Score: 4, Informative
    The surprising thing about this is that there are still companies making big-iron vector supercomputers. I worked in this industry from about 1980 to 1995, and when I left it was dying already. Even then, the majority of scientific computer users would rather have their own mini or microcomputer than get a small share of some behemoth Cray mainframe. It provided them more flexibility, and if they can use it 24 hours per day it also was more effective.

    For things like weather forecasting, maybe big vector machines still have an edge, but I suspect that's changing as the weather guys get more experience in using machines with large numbers of micros. This seems to have already occurred, in fact; NCAR appears to have mostly IBM RS6000 and SGI computers these days, with nary a Cray in sight.

    The most common term I used to hear in the early 90's was Killer Micros; I think the term dates back David Bailey in the 80's sometime. If you want more evidence that the death of the supercomputer has been going on for a long time, check out The Dead Supercomputer Society, which lists dozens of failed companies and projects over the years; this page was apparently last updated 6 years ago!

    --
    Have you read my blog lately?
  13. Complex issues that have to be solved by Anonymous Coward · · Score: 5, Informative

    I've been in this field over 25 years, been in public position at a major lab now for 8.

    If this was a simple issue, the HPC community would already have completely moved to clusters and never looked back 3 or 4 years ago. But it's not kiddies.

    Want to run a physics projection for more than 1 microsecond? Takes real horsepower that clusters cannot provide even distributed. Just too much damn data. Chem codes that include REAL data for useable time slices? too slow for clustered memory. Every auto maker in the world (almost) has been whining about the lack of BIG horsepower for a few years now.(crash codes and FEA) I could go on forever. Sure, some problems work awesome on clusters, which is why we have them. But definately not all of them.

    The problem is partly diminishing returns, partly the pathetic ammount of useable memory on a cluster and its joke for memory throughput, partly the growth in power of the low end and clustered networking, partly the ridiculously long development cycles invloved in High Performance Computing and the low $ returns,

    One of the biggest things congress sees is that this country will more than likely NEVER again lead the world in computing power for defense and research.

    And thats something we ought to do as the last real Superpower.

    The national labs TRIED clusters, they don't get all the jobs done they wanted. (see testimony before congress, writings in HPC jounals, and the last couple RFPs from US gov. labs,heck every auto maker in the world) People in HPC _know_ it now, but having let what little there was of the supercomputer industry die out, there isn't mcuh of an industry left to turn to now. It just may be too darned late. HPC hasn't been a money making industry since the early 80s.
    Heck, even Intel abandoned their clustered machine they custom built for the government.

    Most folks in HPC will readily admit the Top500 is kind of a joke. The HPC-challenge #s are a little more realistic for the tests, but we really do need something that approximately real world applications, not just a 70s cpu benchmark.

    For those that think this is a 'Linux wins' issue,
    consider that mostly it was fast interconnect networks that allowed clustering, not the OS. Examine the history of clusters and you'll see this is true. Btw, the last few SC companies are already mostly moving to linux anyway.(nec,fujitsu,cray;ibm dabbles in hpc)

    Hopefully the industry will survive long enough to allow for even better mergers of supercomputing power with low end cost, but at this point I doubt it. Cray has been on the ropes since 96, fujitsu's sc division is a loss leader, and NEC has been trying to get out of it for a while for something with a margin.

    Ed -gov labs HPC research punk
    -former Cray-on
    -former CDC type

    1. Re:Complex issues that have to be solved by jsac · · Score: 4, Interesting

      Here's the problem. On codes which need lots of data interchange, communication speed becomes the bottleneck. I don't know of anyone running a serious fluid dynamics or weather code, which are this kind of data-interchange-limited application, who gets anything near peak performance on "real-world" problems using ASCI machines. Sure, ASCI White (a 10000-node cluster) was billed as a 10-Teraflops supercomputer. Who cares, when you get 10% of peak performance if you're lucky? NOAA wanted to buy a supercomputer in the mid-90s, for weather and climate simulations. They did the requirements analysis and decided that a Japanese vector supercomputer was what they needed -- nobody in the U.S. made them anymore. Seymour Cray flipped out -- a government organization buying foreign supercomputers? heresy! -- pulled a bunch of strings, and very soon thereafter Japanese supercomputers faced a stiff tariff because the Japanese were "dumping" their product on the U.S. market. Of course, that meant NOAA couldn't get their NEC. They ended up buying some American-made cluster and getting their piss-poor 5% of peak performance. Well, two years ago, Japan brought Earth Simulator online. It's cluster of 5000 vector processors; it boasted 30 Teraflops peak performance, which was 3 times as fast as the then-current number one machine, ASCI White. And a group from NOAA went over to Japan on invitation to check the machine out. They spent on the order of a week adapting some of their current codes to the ES architecture and fired them up. And got 66% of peak performance right off the bat. How'd that happen? Well, ES cost on the order of $100 million. (By the way, as a rule, if your 'supercomputer' cost less than $10 million, it's not really a supercomputer.) Of that, about $50 million went into developing the processor interconnect -- it's a 5000-way(!) crossbar, for you EE types. With an interconnect that big and fast, the communication bottleneck which dooms the big physics codes suddenly disappears. So, yeah, the U.S. supercomputer market at its own seed corn. To see Earth Simulator jump to the top of the Top 500 was something of a slap in the face; to see it get 20 Teraflops on real-world problems was a terrible blow to the prestige of the U.S. supercomputing community. And not one we're going to easily recover from.

      --
      "The urge to fly from modern systems, instead of moving through them to even greater, fairer things is, I think, an indi
  14. Clusters and supercomputers... by gillbates · · Score: 5, Interesting

    I've seen a lot of naive comments suggesting that supercomputers are being replaced by clusters. The truth is, anyone who can replace their supercomputer with a cluster didn't need a supercomputer in the first place:

    1. (compared to a supercomputer):
    2. The prime advantage of an x86-based server is that it is cheap, and it has a fast processor. It is only fast for applications in which the whole dataset resides in memory - and even then, it is still the slowest of the group.
    3. Clusters are a little better, but suffer from severe scalability problems when driving IO-bound processes. As with the x86 server, if you can't put the full dataset into memory, you might as well forget using a cluster. The node to node throughput is several orders of magnitude slower than the processor bus in multiple CPU systems. (6.4GB/s vs 17MB/s for regular ethernet, or 170MB/s for Gigabit)
    4. Multiple CPU servers do better, but still lack the massive storage capacity of the mainframe. They work better than clusters for parallel algorithms requiring frequent syncronization, but still suffer from a lack of overall data storage capacity and throughput.
    5. Mainframes, OTOH, possess relatively modest processors, but the combined effect of having several of them, and the massive IO capability makes them very good for data processing. However, their processors aren't fast at anything, and often run at 1/2 or 1/3 the speed of their desktop counterparts.
    6. Supercomputers combine the IO throughput of a mainframe with the fast processors typically associated with RISC architectures (if you can still consider anything RISC or CISC nowadays). They have faster processors, more memory, and much greater IO throughput than any other category.
    It used to be that the prime reason for faster computers came from the scientific and business communities. But now that the internet has turned computers into glorified televisions, the challenges have gone from that of crunching numbers to serving content:
    1. Clusters are great for serving read-only content, because there's very little active synchronization required between nodes, and the aggregate IO capacity scales well.
    2. Mainframes reign when it comes to IO throughput - companies that formerly had use for a supercomputer are finding that their role is shifting to more of an information-provider role; faster processors are no longer as important as fast IO subsystems.
    3. Scientists aren't being trained to use the computer as a tool; most think of a computer more or less as a means of verifying their hypothesis, rather than a means of discovering possible explanations. Their primary work is done with a calculator and pencil, and only later, when they need something to back up their ideas, do they turn to a computer simulation. The computer is a verification tool, not a means of discovery.

    As our economy has shifted away from a technological base to an entertainment one, the need for supercomputers has begun to evaporate. We outsource innovation overseas so that we can lounge around on the couch watching tv and drinking beer (or surfing the net and drinking beer). The primary purpose of technological innovation has shifted from that of discovering the universe to merely bringing us better entertainment.

    --
    The society for a thought-free internet welcomes you.
  15. Re:It's bad news for Cray by DarkMan · · Score: 4, Informative

    Uh, Cray have a backlog of orders. A backlog to the tune of $153 million, if I recall correctly.

    That's not the sign of a dying buisness model. If they are having problems, it's down to the mangement, not lack of demand.

    There are problems that don't work well on clusters, but rocket on a proper supercomputer. These include a lot of interesting areas, there will always be demand for a few pieces of big iron. At the risk of echoing the ghost of IBM CEO's past, I think somewhere around 20-30 serious top end supercomputers in the world [0]. Most of the rest of the jobs will do just fine on high end clusters.

    If you read the article, there are no quotes from Cray people. What there are quotes from is the people who used to get to play with special hardware, who now admin those clusters.

    It's toys for the boys, not a buggy whip issue.

    [0] That's informed by being someone who uses high perfromance computing, both cluster and supercomputer.