Cray CTO: Linux clusters don't play in HPC
jagger writes "Linux clustering was touted as the next big thing by many vendors last week at ClusterWorld Conference & Expo 2004. But supercomputer vendor Cray Inc. scoffed at the notion of putting Linux clusters in the high-performance computing (HPC) category. "Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer," said Dr. Paul Terry, CTO of Cray Canada."
I guess that the simple problem is just that the algorithm applied is usually not suitable for massively parallel computing.
The Cray CTO makes the point that Linux clusters get, at best, just under 10% peak as sustained performance and uses this as a justification that Linux clusters are not HPCs. This is a reasonable criticism. Let's take the percentage he cites as real for a moment. Now what is the cost difference between a Linux cluster and a Cray (not some future offering, but today) and how much more of a Linux cluster could you afford? Would that offset the quoted inefficiency? Would the flexibility of being able to use commodity components further offset any advantage Cray might have? What about 24hr or same-day parts replacement without a hyper-expensive service contract? At the end of the day, I suspect the Linux cluster wins out even given the sub-10% efficiency figure Cray cites. --Pat / zippy@cs.brandeis.edu
It all depends on the problem you are trying to solve. I have been doing some work of late that would not complete in my life time on the 108 node cluster that we have. But when programmed for and run on two Cray X1s I should complete inside of a week.
Granted there are many codes (and more every day) that will run on clusters, the big iron will never die.
I happen to work in a facility that has large had both large supercomputers (cray t3e, j90, sgi) and linux and *nix based clusters (beowulf/linux, compaq/Tru64). The Cray CTO is correct that you can't just call every linux cluster out there HPC. Just about anyone with networking and linux knowledge can build a linux cluster.
What really makes a difference between an HPC cluster and your normal every day cluster is the hardware interconnects used. There is a comment in the artical that refers to not using I/O for memory and message passing. I am not quite sure what he means by that, but I am guessing that he is saying that the network is not used for shared memory/message passing (MPI/openMP/SHMEM).
If a cluster can limit the impact of latency between nodes either through smarter software or faster interconnects then I can't see any reason not to concider a linux cluster as HPC.
Clusters without smarter software tend to be a real difficult coding platforms. Some developments with things like globally shared memory might make the difference, but there will still be the problem of latency between nodes.
So depending on the task at hand, the cluster might perform very well, or perhaps a little less well.
Surely what you meant to say is that, depending on the task at hand, a cluster might perform very well, or perhaps perform attrociously. :-)
Clusters tend to work well when the various nodes don't need to communicate very often but you need lots of cycles for the subtasks, while dedicated supercomputers tend to perform very well in tasks requiring vast amounts of internode communications bandwidth along with large numbers of cycles. If you need vast bandwidth and relatively low numbers of cycles, your pricepoint is likely a mainframe. And if you don't need either, you get a cheap desktop machine.
Certain problems parallelize well on a cluster ... others don't. Some don't parallelize at all, and a cluster won't do you a darn bit of good. The different machines are designed for different uses ... and one should be careful not to push a "one size fits all" solution. The Cray guy clearly got it wrong on that point, and likely knows it, but he was marketting, not teaching a course in choosing hardware for the task at hand.
Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer, said Dr. Paul Terry, CTO of Cray Canada. "At best, clusters are a loose collection of unmanaged, individual, microprocessor-based computers."
Although this statement reeks of FUD, he's right about one thing: a cluster is not an HPC... that's why its called a cluster. But to say that a cluster is 'unmanaged' is one hell of a stretch IMO. All in all, he's just arguing semantics: nothing to see here, put down your flamethrowers, move along folks.
Since this is slashdot, I'll add that the rest of the article is full of choice quotes all of which point squarely at basic FUD + marketing spin for their new cluster-cost-like product.
It seems to me that Cray is just plain bitter that Linux (through all the cluster solution providers) has managed to steal Cray's thunder at a mere fraction of the cost. Cray's probably even more bitter that folks are willing to sacrifice performance (at least from Cray's perspective) just to save a buck.
Okay, this is Cray we're talking about here: people are saving millions of bucks all over the place by using clusters instead of big expensive machines.
And guess who wants 'their' slice of the pie back.