Cray CTO: Linux clusters don't play in HPC
jagger writes "Linux clustering was touted as the next big thing by many vendors last week at ClusterWorld Conference & Expo 2004. But supercomputer vendor Cray Inc. scoffed at the notion of putting Linux clusters in the high-performance computing (HPC) category. "Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer," said Dr. Paul Terry, CTO of Cray Canada."
While I certainly disagree that you can't build a very high performance computer out a cluster of computers (Linux or otherwise), there is a lot of merit to the fact that clusters just don't scale well for certain classes of applications. Hence the renaissance of the vector supercomputer (ala the Earth Simulator ).
Obviously, this guy is plugging the new Cray X1 architecture, which really is quite promising. For instance, check out this paper by some folks at Oak Ridge National Lab that appeared in Supercomputing 2003.
Of course, since this is Slashdot, I expect that there will be a deluge of posts decrying everything about the new Cray machine because it commits the cardinal sin of NOT USING LINUX. Oh, the horror!
The reason that Cray only holds 19th right now is because they have only deployed X1 systems using up to 256 nodes. When the number of nodes is increased, you will certainly see the Cray moving up the top 500 list -- the architecture is VERY scalable.
The how to from way back in the day.
e r- formats/html_single/Beowulf-HOWTO.html
http://www.ibiblio.org/pub/Linux/docs/HOWTO/oth
has a great explanation using a grocery story analogy that makes it really easy to understand what kind of tasks will work well and what kind will suck. And unlike the cheerleaders that have been showing up since clusters became a big business is very balanced about it.
Still worth reading.
Cypherpunks: Civil Liberty Through Complex Mathematics. Those who live by the sword die by the arrow.
Cray could easily be at or close to the top of the top500 list, their X1 architecture will extend that far. However, for a lot of really important supercomputing codes, it's no contest: The cray will trounce the clusters (linux or otherwise). Those #19 crays are only 256 processors. To get similar performance a stack of xeons requires thousands of processors. Some tasks just can be split appart that easily.
A cray processor has eight floating-point units running at 800Mhz. The big Mac cluster (for example) uses G5 processors which have 2 FPUs at 2000Mhz. Thus the cray has a ~40% advantage. However, the G5 processor has ~4GB/s memory bandwidth. The Cray has ~50GB/s memory bandwidth. If you have a problem that needs to do a HUGE amount of math on a tiny amount of data, the G5 will rock. If you have a problem that needs to do a HUGE amount math on a GINORMOUS amount of data, buy the cray. (for a GINORMOUS amount of money too)
Similaraly infiniband (ala the big mac) is really hot in the cluster interconnect space because it gives 2.5GB/s per node. The Cray gives you 51GB/s.
You need to move a little data, buy a cluster. You need to move a lot of data, buy the Cray.
There's no one solution for all problems.
It was annonced that VA tech actually purchased the G5 X-serves before production was in place, but were instead delivered the G5 towers as loaners to have the cluster built in time for ranking.
The cluster remains, they have not shut it down and were swapping out individual racks for the upgrade.(something like one rack of X-serves is three racks of towers.
I don't think it's been published that they have or haven't ran any data besides benchmarks.
Post: Sigged, for your pleasure.
For the love of Christ people, it's a simple thing.
Format links like this: <a href="http://somelink">link text</a>
It takes virtually no extra time and we don't have to trim the fucking slashcode spaces.
Oh, and here's the link.
LOAD "SIG",8,1
LOADING...
READY.
RUN
Actually, that crossbar memory bus is just the local bus for each cabinet, and they do have low-latency interconnects that allow globally shared memory and single system imaging. Otherwise they wouldn't be working on a 1024 CPU installation. A clue for you: The technology used in the Origin machines was originally developed by Cray, and it runs 1024 CPU installations as global shared memory and single system image.
As for research, it's more a case of researchers doing the old "Damn, I'll have to make do with this". And Origin and Altix systems are still selling well in the research market.
And don't forget, Cray is backed by US government departments such as the NSA. The X1 received a lot of such support, which Cray even admits themselves: http://www.cray.com/products/systems/x1/