Cray CTO: Linux clusters don't play in HPC
jagger writes "Linux clustering was touted as the next big thing by many vendors last week at ClusterWorld Conference & Expo 2004. But supercomputer vendor Cray Inc. scoffed at the notion of putting Linux clusters in the high-performance computing (HPC) category. "Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer," said Dr. Paul Terry, CTO of Cray Canada."
While Paul Terry makes some good points, in his statements, including the partial quote from the post, "Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer, said Dr. Paul Terry, CTO of Cray Canada. "At best, clusters are a loose collection of unmanaged, individual, microprocessor-based computers."
Remember to take this with a grain of salt. The inflammatory nature of the comment is nothing more than a marketing ploy to increase visibility of, and sell, the new Cray XD1
Kinetic stupidity has a new brand leader: Allen Zadr.
Oracle disclaim MySQL and PostgreSQL as "toy databases", Microsoft claims that "Apache cannot be used for real web serving", and Sun announces that "Intel and Linux simply cannot be used for enterprise computing".
So all those supercomputing labs that use Linux clustering (that invented Linux clustering, even) have been wasting their time?
Ceci n'est pas une signature
Regardless of whether I agree with the article or not I feel compelled to point out that:
The 1100 node Apple G5 cluster in virginia has yet to run any real scientific code. So far it has only ran benchmarks.
In other news...
"Despite assertions made by Toyota salesmen, a Lexus sedan is not a luxury car," said Bill Taylor, CEO of Mercedes-Benz.
This space intentionally left blank.
Clusters can get high performance on some types of tasks. But sometimes, you need fine-grained parallelism that just isn't available on a cluster.
On the other hand, high performance usually comes through special hardware. And on that hardware, I think Linux could be the right thing (modulo some patches).
A message from the system administrator: 'I've upped my priority. Now up yours.'
The analogy now would be more like:
www.eFax.com are spammers
Just because we love Lunux doesn't mean that clusters are HPCs.
There are real issues that differentiate mainframe/supercomputers from large, powerful, clusters.
Of course this all depends on your definition of an HPC. But I believe that it's reasonable to say that if parts of your computer are connected with low bandwidth connections (10/100,gigabit) they just can't handle the same kinds of transactions that a computer with parts that are connected by 10 gigabit or 1000 gigabit connections or whatever it is nowadays.
As far as I know if you're deploying a large database it's still advisable to have a big huge IBM mainframe or a Unisys box or a Sun 10k instead of 4,8 or 16 clustered 8 proc machines.
My point is there are valid arguments for not including clusters of commodity hardware in the HPC category.
In my mind they aren't High Performance Computers... they are High Performance Clusters of Commodity Computers.
~foooo
How well a cluster will do depends on the application that it is performing. Some problems can be divided into several small problems with little reliance on other parts of the problem (SETI / Encryption breaking). These things can be easily distributed to hundreds or thousands of "small" boxes for processing and are what a beowulf cluster would be good at.
Other applications require the breakneck interconnect speeds that large Cray / Sun / etc.. build on. When the data being calculated on one CPU requires data from CPU2 to continue its calculations you don't want to have it wait for 100mbit or even 1gbit ethernet speeds. Even quicker interconnects such as SCALI are going to be slowed by PC bus speeds.
Cray fills an important niche for those who can afford it.
The comment was stupid, yes, but not all jobs that you'd use supercomputers for can be broken down into many threads as others can. A linux cluster will do well for some jobs, a cray box will do well for others. There *will* be times when a Cray system is so far superior to anything you could do with Linux that it becomes the only real option.
However, dismissing linux cluster technology automatically is dumb. In many cases, it provides more than enough cpu power and I/O bandwith to support your reason for getting a supercomputer, and probably at less cost than the other options.
Its all a matter of determining what you need the computer to do, determining your budget, and get the best system in your budget for the uses you have for it. Sometimes that will be a Cray, sometimes a Linux cluster.
Clusters can rival a supercomupter when they are assigned is a task that's suitable for distributed computing. That is, work units can be divided up and worked on in any sequence... the result of segment 45 doesn't depend on knowing the result of 44 and such. Effectively, you can have the sum of all of the processors minus just a little overhead for the clustering.
What Cray's rightfully pointing out is that for most business applications, however, distributed computing is not a viable option. When processing on a transaction basis, the transactions often need to posted in the exact order they were recieved, which means they must be taken serially. In those situations, the programs can't multithread work out to the other processors so well, and the cluster will end up running at roughly the speed of just one processor while the others waste clock cycles waiting for something to do.
The cluster isn't the solution to everything. Nor is the supercomputer. You've gotta think about the job, then figure out which tool is right for the task.
In truth, such machine will always have a certain performance advantage over traditional clusters. The question is, will the price point be low enough to invalidate the idea of just adding more boxes to the traditional cluster.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
You're right in saying that the Virgina Tech cluster is the 3rd fastest supercomputer (LINPACK tests). I think that for some other tasks however, it would be slower. Sure, they use infiniband as an interconnect (very fast & low latency), but that doesn't change the fact that it's many separate nodes, each with its own memory. So if one processor were to access some memory on a different node, it would slow down things a little.
So depending on the task at hand, the cluster might perform very well, or perhaps a little less well. Cray supercomputers are a big number of processors all in the same machine, and more importantly all sharing the same memory. Each processor has the same delay to access any memory content.
The argument in favor of clusters, however, is that it's still cheaper to throw more computers in than to buy a Cray that would perform the same task in less time.
In the end, there's a lot of marketing involved in all of this...
Hope this helps (and that I'm not completely wrong!),
Maan
Many other computing problems don't decompose nearly so nicely. So there are certainly problems that probably won't see more than 8% of peak performance. If you were particularly inclined you could probably invent a problem that had to be done serially, leaving percent of peak performance equal to what percent of your cluster one box was. Cray is right to that extent and if you're solving a problem that falls into the category of not easily parallelized then perhaps one of their machines is the better tool for the job. But, like you mention there are instances where the cluster is a great tool and cost effective to boot.
Heck, ever check out some of the faster interconnects like Myrinet? They're insane and exist because fast ethernet just doesn't cut it in some places. Just using a slow interconnect is enough to bring real performance down below theoretical peak. Luckily for Pixar off the shelf fast or gigabit ethernet is likely enough.
Anyway, use the best tool available. If your problem falls into the category of trivially parallelizable like rendering a movie is then don't bother wasting your money on a Cray. If your problem isn't suited to a cluster, however, then maybe a cluster isn't the right answer. If you have a big problem that needs serious computation take the time to figure out what you need before taking a marketing drone's spiel for gospel in your situation.
If not now, when?
Oh, here's the TOP500 list, btw.