Slashdot Mirror


Cray CTO: Linux clusters don't play in HPC

jagger writes "Linux clustering was touted as the next big thing by many vendors last week at ClusterWorld Conference & Expo 2004. But supercomputer vendor Cray Inc. scoffed at the notion of putting Linux clusters in the high-performance computing (HPC) category. "Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer," said Dr. Paul Terry, CTO of Cray Canada."

20 of 435 comments (clear)

  1. Marketing by Allen+Zadr · · Score: 5, Insightful

    While Paul Terry makes some good points, in his statements, including the partial quote from the post, "Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer, said Dr. Paul Terry, CTO of Cray Canada. "At best, clusters are a loose collection of unmanaged, individual, microprocessor-based computers."

    Remember to take this with a grain of salt. The inflammatory nature of the comment is nothing more than a marketing ploy to increase visibility of, and sell, the new Cray XD1

    --
    Kinetic stupidity has a new brand leader: Allen Zadr.
    1. Re:Marketing by Hoser+McMoose · · Score: 4, Insightful

      The Top500 list uses Linpack exclusively for it's test. Linpack can be split to run on clusters VERY easily, it could even fall under the catagory of "embarassingly parallel" problems. These sorts of tasks do exist in reality, but they definitely aren't the only kinds of problems you'll encounter.

      If you need to access remote memory in a super cluster, such as the ones mentioned above, you take a BIG hit in terms of performance. Think about running from swap space vs. running an application out of memory and you'll be on the right track. In these sorts of situations a system like that Cray down in slot 19 could easily beat out nearly anything above it on that list (almost all of which are superclusters except for Earth Simulator at #1).

      As others have mentioned, the guy was clearly talking from a marketing standpoint rather than a "chose the best solution for the job" standpoint, however what he said isn't entirely without value. There are a lot of tasks out there where that Big Mac supercluster that people keep touting would suck-ass. Even with their high-bandwidth, low-latency infiniband interconnect you're still looking at a good 3 orders of magnitude lower performance for remote memory vs. local memory.

  2. And in other news... by heironymouscoward · · Score: 5, Insightful

    Oracle disclaim MySQL and PostgreSQL as "toy databases", Microsoft claims that "Apache cannot be used for real web serving", and Sun announces that "Intel and Linux simply cannot be used for enterprise computing".

    So all those supercomputing labs that use Linux clustering (that invented Linux clustering, even) have been wasting their time?

    --
    Ceci n'est pas une signature
    1. Re:And in other news... by dasmegabyte · · Score: 4, Insightful

      All of those statements are true. And a cluster is not a mainframe, and the products sold by Oracle, Microsoft and Sun *DO* go far beyond their Open Source competitors in terms of functionality.

      The problem for these guys is that, in terms of real world enterprise usage, not everybody needs the features they offer. My business doesn't need the easy management and clustering features in IIS, heck the website hasn't been updated in months and this time kast year nobody even knew which machine it ran on. We don't need the task scheduling, file striping, data transformation, replication or XML features of Orcale. In fact, we only need a tiny sliver of the possible functionality of these great products...but we're unable to pay a sliver of the price. With OSS ramping up its feature set daily, for a lot of companies with our needs it makes more sense to train a guy on Linux than to drop five digits on Windows Server 2003 and SQL Server.

      As for supercomputing...well, a cluster is NOT a mainframe. They're two similar, but different things, with the main difference being the databus. If your task is to perform a lot of calculations on a trivial dataset, clustering is the way to go. If your task is to perform a few calculations on a massive dataset, you want a mainframe. The mainframe is simply more efficient at processing massive inputs and providing massive outputs because it was designed to efficiently pass data between processors -- give the same dataset to a cluster and most of your time is wasted negociating the network.

      Of course, these days networking is so fast that a cluster will probably do for most of the things people used to do on mainframes...but a cluster is still best for tasks which are easy to split apart and process in pieces.

      --
      Hey freaks: now you're ju
  3. VA Cluster yet to be used by Anonymous Coward · · Score: 4, Insightful

    Regardless of whether I agree with the article or not I feel compelled to point out that:

    The 1100 node Apple G5 cluster in virginia has yet to run any real scientific code. So far it has only ran benchmarks.

  4. What do you expect him to say? by WarlockD · · Score: 3, Insightful

    "We are dropping our line of Cray supercomputers and replacing them with rack mounted Beowulf cluster of 486's!"

    I am not saying Cray isn't worth it, but there is something to be said on replacing/fixing your supercomputer with over the counter parts.

  5. Sure... by avalys · · Score: 4, Insightful

    In other news...

    "Despite assertions made by Toyota salesmen, a Lexus sedan is not a luxury car," said Bill Taylor, CEO of Mercedes-Benz.

    --
    This space intentionally left blank.
  6. He's got a point by PissingInTheWind · · Score: 4, Insightful

    Clusters can get high performance on some types of tasks. But sometimes, you need fine-grained parallelism that just isn't available on a cluster.

    On the other hand, high performance usually comes through special hardware. And on that hardware, I think Linux could be the right thing (modulo some patches).

    --

    A message from the system administrator: 'I've upped my priority. Now up yours.'
  7. Re:Seymour Cray by Anonymous Coward · · Score: 3, Insightful

    You could just as well ask though

    "If you were building an ants nest, which would you rather use? 1024 Ants or a Bulldozer?"

    Perhaps he shouldn't be comparing plowing fields to high performance computing.

  8. Re:Seymour Cray by wowbagger · · Score: 4, Insightful
    The analogy USED to be valid, however the times have changed as microprocessors are now much more powerful.

    The analogy now would be more like:

    Which would you rather use to plow a field - one big tractor or a 1024 little tractors.


  9. Just because we love Linux.... by foooo · · Score: 5, Insightful

    Just because we love Lunux doesn't mean that clusters are HPCs.

    There are real issues that differentiate mainframe/supercomputers from large, powerful, clusters.

    Of course this all depends on your definition of an HPC. But I believe that it's reasonable to say that if parts of your computer are connected with low bandwidth connections (10/100,gigabit) they just can't handle the same kinds of transactions that a computer with parts that are connected by 10 gigabit or 1000 gigabit connections or whatever it is nowadays.

    As far as I know if you're deploying a large database it's still advisable to have a big huge IBM mainframe or a Unisys box or a Sun 10k instead of 4,8 or 16 clustered 8 proc machines.

    My point is there are valid arguments for not including clusters of commodity hardware in the HPC category.

    In my mind they aren't High Performance Computers... they are High Performance Clusters of Commodity Computers.

    ~foooo

  10. Partly right, partly wrong.... by ERJ · · Score: 4, Insightful

    How well a cluster will do depends on the application that it is performing. Some problems can be divided into several small problems with little reliance on other parts of the problem (SETI / Encryption breaking). These things can be easily distributed to hundreds or thousands of "small" boxes for processing and are what a beowulf cluster would be good at.

    Other applications require the breakneck interconnect speeds that large Cray / Sun / etc.. build on. When the data being calculated on one CPU requires data from CPU2 to continue its calculations you don't want to have it wait for 100mbit or even 1gbit ethernet speeds. Even quicker interconnects such as SCALI are going to be slowed by PC bus speeds.

    Cray fills an important niche for those who can afford it.

  11. Different tools by BoneFlower · · Score: 4, Insightful

    The comment was stupid, yes, but not all jobs that you'd use supercomputers for can be broken down into many threads as others can. A linux cluster will do well for some jobs, a cray box will do well for others. There *will* be times when a Cray system is so far superior to anything you could do with Linux that it becomes the only real option.

    However, dismissing linux cluster technology automatically is dumb. In many cases, it provides more than enough cpu power and I/O bandwith to support your reason for getting a supercomputer, and probably at less cost than the other options.

    Its all a matter of determining what you need the computer to do, determining your budget, and get the best system in your budget for the uses you have for it. Sometimes that will be a Cray, sometimes a Linux cluster.

  12. Can you multithread your application? by LostCluster · · Score: 5, Insightful

    Clusters can rival a supercomupter when they are assigned is a task that's suitable for distributed computing. That is, work units can be divided up and worked on in any sequence... the result of segment 45 doesn't depend on knowing the result of 44 and such. Effectively, you can have the sum of all of the processors minus just a little overhead for the clustering.

    What Cray's rightfully pointing out is that for most business applications, however, distributed computing is not a viable option. When processing on a transaction basis, the transactions often need to posted in the exact order they were recieved, which means they must be taken serially. In those situations, the programs can't multithread work out to the other processors so well, and the cluster will end up running at roughly the speed of just one processor while the others waste clock cycles waiting for something to do.

    The cluster isn't the solution to everything. Nor is the supercomputer. You've gotta think about the job, then figure out which tool is right for the task.

  13. Cray has some points. by Saeed+al-Sahaf · · Score: 4, Insightful
    While Dr. Paul Terry's comments are obviously self-serving, especially since in a way, with the Cray XD1 based on multiple AMD processors rather than proprietary Cray processors, he does have a point about the overhead of running the OS on each machine in a cluster, and the statement "The Cray XD1 is not a traditional cluster; it does not use I/O interfaces for memory and message passing semantics."

    In truth, such machine will always have a certain performance advantage over traditional clusters. The question is, will the price point be low enough to invalidate the idea of just adding more boxes to the traditional cluster.

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
  14. Re:Seymour Cray by xdroop · · Score: 3, Insightful
    Use the right tool for the job.

    If you are plowing fields, use the bull.

    If you are making eggs, use the chickens.

    This isn't a one-size-fits-all world any more. Only those deluded enough to think that Windows should be the world's standard desktop think otherwise.

    --
    you should read everything on the internet as if it had "but I'm probably talking out of my ass" appended to it.
  15. Marketing BS, but he has a point by Anonymous Coward · · Score: 3, Insightful

    The "interconnect" latency (especially) and bandwidth in a cluster, even using very high-end network hardware, is much worse than that of a Cray-style supercomputer. This does make certain applications run slower, especially if not specifically tailored to clustered architecture. Some applications are very difficult to break down into small pieces and require extensive memory sharing between nodes, which clusters just can't do well.

  16. Re:Help me here... by maan · · Score: 4, Insightful

    You're right in saying that the Virgina Tech cluster is the 3rd fastest supercomputer (LINPACK tests). I think that for some other tasks however, it would be slower. Sure, they use infiniband as an interconnect (very fast & low latency), but that doesn't change the fact that it's many separate nodes, each with its own memory. So if one processor were to access some memory on a different node, it would slow down things a little.

    So depending on the task at hand, the cluster might perform very well, or perhaps a little less well. Cray supercomputers are a big number of processors all in the same machine, and more importantly all sharing the same memory. Each processor has the same delay to access any memory content.

    The argument in favor of clusters, however, is that it's still cheaper to throw more computers in than to buy a Cray that would perform the same task in less time.

    In the end, there's a lot of marketing involved in all of this...

    Hope this helps (and that I'm not completely wrong!),

    Maan

  17. Re:Are too by dead+sun · · Score: 5, Insightful
    Pixar doesn't need telling, their problem breaks up so miraculously well that they'll see the best performance you could possibly expect from a cluster. The big problem, rendering a movie, decomposes into thousands of small problems, rendering a frame. Each machine in their cluster can handle a group of frames at a time with zero need to communicate or worse, share computation, with other machines in the cluster. It's the best case scenario.

    Many other computing problems don't decompose nearly so nicely. So there are certainly problems that probably won't see more than 8% of peak performance. If you were particularly inclined you could probably invent a problem that had to be done serially, leaving percent of peak performance equal to what percent of your cluster one box was. Cray is right to that extent and if you're solving a problem that falls into the category of not easily parallelized then perhaps one of their machines is the better tool for the job. But, like you mention there are instances where the cluster is a great tool and cost effective to boot.

    Heck, ever check out some of the faster interconnects like Myrinet? They're insane and exist because fast ethernet just doesn't cut it in some places. Just using a slow interconnect is enough to bring real performance down below theoretical peak. Luckily for Pixar off the shelf fast or gigabit ethernet is likely enough.

    Anyway, use the best tool available. If your problem falls into the category of trivially parallelizable like rendering a movie is then don't bother wasting your money on a Cray. If your problem isn't suited to a cluster, however, then maybe a cluster isn't the right answer. If you have a big problem that needs serious computation take the time to figure out what you need before taking a marketing drone's spiel for gospel in your situation.

    --
    If not now, when?
  18. Re:Well.. by s00p41337h4x0r · · Score: 5, Insightful
    How could Cray be wrong. I mean just becuase linuxis running some of the top 500 computers there is no reason to consider HPC right. What a self serving statement Cray makes....they still dont get it .... there way is a dead-end...
    That's right. Dataflow vector processing has been shown to be a dead end. The fact that fastest computer in the world is a dataflow machine is a statistical anomaly, right?

    Oh, here's the TOP500 list, btw.