Cray CTO: Linux clusters don't play in HPC
jagger writes "Linux clustering was touted as the next big thing by many vendors last week at ClusterWorld Conference & Expo 2004. But supercomputer vendor Cray Inc. scoffed at the notion of putting Linux clusters in the high-performance computing (HPC) category. "Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer," said Dr. Paul Terry, CTO of Cray Canada."
While Paul Terry makes some good points, in his statements, including the partial quote from the post, "Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer, said Dr. Paul Terry, CTO of Cray Canada. "At best, clusters are a loose collection of unmanaged, individual, microprocessor-based computers."
Remember to take this with a grain of salt. The inflammatory nature of the comment is nothing more than a marketing ploy to increase visibility of, and sell, the new Cray XD1
Kinetic stupidity has a new brand leader: Allen Zadr.
How could Cray be wrong. I mean just becuase linuxis running some of the top 500 computers there is no reason to consider HPC right. What a self serving statement Cray makes....they still dont get it .... there way is a dead-end...
. I love the sound of burning women and screaming rubber....
Oracle disclaim MySQL and PostgreSQL as "toy databases", Microsoft claims that "Apache cannot be used for real web serving", and Sun announces that "Intel and Linux simply cannot be used for enterprise computing".
So all those supercomputing labs that use Linux clustering (that invented Linux clustering, even) have been wasting their time?
Ceci n'est pas une signature
Regardless of whether I agree with the article or not I feel compelled to point out that:
The 1100 node Apple G5 cluster in virginia has yet to run any real scientific code. So far it has only ran benchmarks.
"We are dropping our line of Cray supercomputers and replacing them with rack mounted Beowulf cluster of 486's!"
I am not saying Cray isn't worth it, but there is something to be said on replacing/fixing your supercomputer with over the counter parts.
In other news...
"Despite assertions made by Toyota salesmen, a Lexus sedan is not a luxury car," said Bill Taylor, CEO of Mercedes-Benz.
This space intentionally left blank.
Clusters can get high performance on some types of tasks. But sometimes, you need fine-grained parallelism that just isn't available on a cluster.
On the other hand, high performance usually comes through special hardware. And on that hardware, I think Linux could be the right thing (modulo some patches).
A message from the system administrator: 'I've upped my priority. Now up yours.'
You could just as well ask though
"If you were building an ants nest, which would you rather use? 1024 Ants or a Bulldozer?"
Perhaps he shouldn't be comparing plowing fields to high performance computing.
Not all problems are solvable by the "divide and conquer" approach. That's where you need the heavy iron of a supercomputer.
The analogy now would be more like:
www.eFax.com are spammers
Surely he has never seen OctigaBay's computers:
http://www.octigabay.com/
I do not work for them, I was simply amazed by what you can do with these things and how they interconnect with up to 1,000 boxes.
Oh, shit. I just went to their web site and THEY WERE BOUGHT BY CRAY!!!!
Hahahaha!!!! The ultimate Linux HPC is now a Cray product.....This is too funny...
"Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer,"
Maybe so but not everyone can pull a Cray out of his ass when they need horsepower. A Linux cluster is affordable, a Cray is the thing of wet dreams..
Just because we love Lunux doesn't mean that clusters are HPCs.
There are real issues that differentiate mainframe/supercomputers from large, powerful, clusters.
Of course this all depends on your definition of an HPC. But I believe that it's reasonable to say that if parts of your computer are connected with low bandwidth connections (10/100,gigabit) they just can't handle the same kinds of transactions that a computer with parts that are connected by 10 gigabit or 1000 gigabit connections or whatever it is nowadays.
As far as I know if you're deploying a large database it's still advisable to have a big huge IBM mainframe or a Unisys box or a Sun 10k instead of 4,8 or 16 clustered 8 proc machines.
My point is there are valid arguments for not including clusters of commodity hardware in the HPC category.
In my mind they aren't High Performance Computers... they are High Performance Clusters of Commodity Computers.
~foooo
How well a cluster will do depends on the application that it is performing. Some problems can be divided into several small problems with little reliance on other parts of the problem (SETI / Encryption breaking). These things can be easily distributed to hundreds or thousands of "small" boxes for processing and are what a beowulf cluster would be good at.
Other applications require the breakneck interconnect speeds that large Cray / Sun / etc.. build on. When the data being calculated on one CPU requires data from CPU2 to continue its calculations you don't want to have it wait for 100mbit or even 1gbit ethernet speeds. Even quicker interconnects such as SCALI are going to be slowed by PC bus speeds.
Cray fills an important niche for those who can afford it.
The comment was stupid, yes, but not all jobs that you'd use supercomputers for can be broken down into many threads as others can. A linux cluster will do well for some jobs, a cray box will do well for others. There *will* be times when a Cray system is so far superior to anything you could do with Linux that it becomes the only real option.
However, dismissing linux cluster technology automatically is dumb. In many cases, it provides more than enough cpu power and I/O bandwith to support your reason for getting a supercomputer, and probably at less cost than the other options.
Its all a matter of determining what you need the computer to do, determining your budget, and get the best system in your budget for the uses you have for it. Sometimes that will be a Cray, sometimes a Linux cluster.
Clusters can rival a supercomupter when they are assigned is a task that's suitable for distributed computing. That is, work units can be divided up and worked on in any sequence... the result of segment 45 doesn't depend on knowing the result of 44 and such. Effectively, you can have the sum of all of the processors minus just a little overhead for the clustering.
What Cray's rightfully pointing out is that for most business applications, however, distributed computing is not a viable option. When processing on a transaction basis, the transactions often need to posted in the exact order they were recieved, which means they must be taken serially. In those situations, the programs can't multithread work out to the other processors so well, and the cluster will end up running at roughly the speed of just one processor while the others waste clock cycles waiting for something to do.
The cluster isn't the solution to everything. Nor is the supercomputer. You've gotta think about the job, then figure out which tool is right for the task.
His rhetoric is quite predictable, actually. He talks at some length about how and why clusters of PCs can't get the job done, and how clustering is inherently inferior to a REAL SuperComputer, then goes on to describe how their new product (which sounds suprisingly like a cluster of propreitary machines) can work. Repeat the above as it applies to the management software.
If clustering doesn't work, and Supers are better / cheaper, explain why large companies (Pixar, NVidia,
Note that this does NOT mean that clusters are suitable for ALL traditional SuperComputing tasks. It really depends on the problem. If the problem is better solved with a vector processor, then a vector machine (like a Cray) is what you want. If the problem is solvable in parallel, then a cluster might be the right answer.
How about a real tractor? Like an International.
I assume you are refering to the Altix systems. They are Single System Image systems (SSI), when means they are built like a supercomputer, and not like a cluster. They are big and expensive.
SSI is where all CPUs can see all memory as if it was local. They are also Non-uniform memory access which means all the memory it sees is not as fast as all other memory, but really ALL single systems are like this. For example each CPU can address the entire TB of memory that is in the system, but reading from one memory location might take 100 cycles, and from another might be closter to 1000 cycles.
I'm a layman...I have no idea what I talk about, but of course that doesn't stop me.
I know I keep coming back to Virginia Tech, but isn't all those G5's linked together to make the 3rd fastest supercomputer itself a cluster? Or is it considered something else?
And if it IS considered a cluster, then why wouldn't a Linux based (along with the *BSD based G5s) be able to make a fast supercomputer?
If so, then what Paul Terry is spouting is just FUD and marketing to help sell his product, yes?
Just wondering.
"Music is everybody's possession. It's only publishers who think that people own it." - John Lennon.
In truth, such machine will always have a certain performance advantage over traditional clusters. The question is, will the price point be low enough to invalidate the idea of just adding more boxes to the traditional cluster.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Tell that to PIXAR. I don't believe it either.
Ya beat me to that one. I won't post it because it would be modded redundant, but I would have mentioned Google also.
The truth shall set you free!
Cray is losing market share and want people to believe that their expensive to maintain and operate machines are better and 'cheaper' than a clusteer of regualar pc's running together. My question is Does anyone with experiance with both systems back him up.
If you are plowing fields, use the bull.
If you are making eggs, use the chickens.
This isn't a one-size-fits-all world any more. Only those deluded enough to think that Windows should be the world's standard desktop think otherwise.
you should read everything on the internet as if it had "but I'm probably talking out of my ass" appended to it.
That's funny, but since plowing a field is very parallelizable it doesn't make a very good analogy. Especially since in that analogy the Cray isn't two strong oxen, it's more like a machine that can plow all/many of the rows at once, and the linux cluster is a machine that can plow one row at a time, but you can afford to buy a bunch and plow as many at a time as you have $$s to spend.
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
The "interconnect" latency (especially) and bandwidth in a cluster, even using very high-end network hardware, is much worse than that of a Cray-style supercomputer. This does make certain applications run slower, especially if not specifically tailored to clustered architecture. Some applications are very difficult to break down into small pieces and require extensive memory sharing between nodes, which clusters just can't do well.
Many other computing problems don't decompose nearly so nicely. So there are certainly problems that probably won't see more than 8% of peak performance. If you were particularly inclined you could probably invent a problem that had to be done serially, leaving percent of peak performance equal to what percent of your cluster one box was. Cray is right to that extent and if you're solving a problem that falls into the category of not easily parallelized then perhaps one of their machines is the better tool for the job. But, like you mention there are instances where the cluster is a great tool and cost effective to boot.
Heck, ever check out some of the faster interconnects like Myrinet? They're insane and exist because fast ethernet just doesn't cut it in some places. Just using a slow interconnect is enough to bring real performance down below theoretical peak. Luckily for Pixar off the shelf fast or gigabit ethernet is likely enough.
Anyway, use the best tool available. If your problem falls into the category of trivially parallelizable like rendering a movie is then don't bother wasting your money on a Cray. If your problem isn't suited to a cluster, however, then maybe a cluster isn't the right answer. If you have a big problem that needs serious computation take the time to figure out what you need before taking a marketing drone's spiel for gospel in your situation.
If not now, when?
"Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer,"
That's like saying that the automobile is not a high performance team of clydesdales. That's true, but it may be irrelevant. If it can get you there faster or better, I guess it doesn't matter.
Disconnect your television. Do your own research. Draw your own conclusions. They're probably lying. Don't be a sheep.
For some tasks distributed clusters are better, for others ultra-high-bandwidth Cray-type monsters are better. So what's new?
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.
To hell with the analogy.
Further clarification:
One big computer with 1024 processors that allows every processors to access all of the system memory at full speed, and has been designed from the ground up to work as one system, or a cluster of 1024 processors spread across some number of computers with either 2 or more procesors per box, memory is only quickly local to the computer box it's plugged into and has been built to work as one system.
Cray is dying. The days of fat government sales have been over for a long time. It's only logical to discredit your competitors, especially if you stand to lose a lot because of them.
This is nothing new, nor anything special. For instance, if you've looked at the latest computer magazines, Microsoft is doing the same kind of "it sucks" argument to anything related to Linux in a wide front. For example, Apache lost to IIS in a review, and IIS became the Editor's Choice in one magazine. In the next issue of the magazine there will be somekind of "debunking Linux myths" article. (This certain computer magazine is nothing special, even though it has become a nothing short of an unfunny joke paper written by people who don't have a clue. Some of their readers do have a clue, that's why they cancel their subscription.)
So, to sum it up: it doesn't matter what the reality is, the people who decide only see the image which is created for them. Even if that image is wrong, that's the only thing they decision makers are going to see. They don't have the time or the energy to investigate things thoroughly on their own. This is why Microsoft pays the magazines to write garbage. This is why a Cray executive talks garbage.
Lobbying is pretty powerful stuff.
I do not moderate.
A lot of what he said isn't much of a surprise, however definitive statements about clusters not being supercomputers and being unmanaged loose collections of machines are a bit overblown. Management software exist for clusters and they are rather easy to program for with available popular and industrial strength libraries.
Moreover many HPC applications actually scale quite well on clusters of Linux systems. Affordable interconnect infrastructure is increasing in bandwidth and reducing in latency, further broadening the scope of the problems these clusters can tackle. In addition each node can now comfortably have 2 or four processors giving even better bandwidth between CPUs sharing a node. With 64 bit processors and operating systems now available the final barriers to very impressive easy to use HPC Linux clusters have been removed which is exactly why Cray now sees them as a threat. Now is probably the worst time to talk of how a cluster is not a supercomputer. Clusters form a class of supercomputer that can now handle most supercomputer tasks. True there are classes of problems that the dedicated supercomputer systems CRAY sells will excell at, however clusters are useful workhorses in the supercomputer world and hold their own.
Todays supercomputer problems are tomorrows computer problems and Cray must continue to find new classes of problems to solve as they always have, rather than attacking competing technologies, people will use clusters where the clusters meet their needs.
There are certain types of computing which simply cannot be done with microprocessor based platforms including clustering. One of these calculation types is vector processing. A Cray supercomputer is a vector processing based unit. When comparing a cluster of PC systems being used to calculate what a single Cray is designed to calculate, the Cray CTO is perfectly correct in his statement.
This isn't a one-size-fits-all world any more. Only those deluded enough to think that Windows should be the world's standard desktop think otherwise.
Actually there is a second group that thinks it is a one-size-fits-all, or better, a one-solution-fits-all world: Linux cluster advocates, as in:
- You don't need that mainframe, use a Linux cluster instead. (Ignore IO requirements)
- You don't need that supercomputer, use a Linux cluster instead. (Ignore CPU/memory latency)
- You don't need that midrange box, use a Linux cluster instead. (Ignore big memory requirements)
For way too many people who should have at least an inkling of a clue, if not in fact know better, Linux clusters are a knee-jerk answer to almost any problem regardless if it is suitable or not. What makes it even more endearing is the arrogant attitude of "you must be stupid if you aren't using the only solution I know."
Um, John Deere cheap? are you looking at the same stores I'm looking at? I do know farmers that swear by both. I know a lot of people who own Green Tractors, and a lot of people who own Red tractors, not a few who own White tractors, of course the realy fancy tractors are the Yellow ones. All these colored paints seem to be expensive though.
Never mind me, I forgot the point of this rant.
That which is done from love exists beyond good and evil
Every other machine in the top 10 is built from standard processors. The old DEC Alpha, PowerPCs, and IA-32 predominate, with a few Itanium machines.
Because supercomputers today have several thousand processors, they can't even be big shared-memory multiprocessors. Speed of light lag in the interconnects would slow everything down. It just takes too long for the signals to make it across the room.
So all supercomputers today are clusters of one kind or another, fast machines with slower interconnects between them. The hardware architecture revolves around interconnect schemes. The software architecture revolves around working around the limitations of the interconnect schemes. Tightly coupled problems don't map well to such machines.
Bear in mind that we're talking about clusters of uniform machines located near each other with gigabit or better interconnects. We're not talking about "clusters" consisting of spare-time programs out at the end of Internet connections. Those are useful only for problems with almost no coupling between parts. Such problems are usually low hit rate search problems, like cryptanalysis, SETI@HOME, and such.
Yes, there's the Cray X1, the last of the liquid-cooled monsters, but it looks like the only customers who bought one were Government agencies with old Cray machines.
Most readers have the right idea - you don't listen to a competitor's opinion when judging whether something is viable or not. It is very easy to twist the words to be "true" while misleading.
A cluster isn't a supercomputer, by definition, but for many jobs can be equal or better. In other words: Those 2 oxen cost more, consume more resources, are only useful for the one job (pulling a plow) and only benefit a single owner. Those 1024 chickens cost less, consume less resource, are useful for many jobs besides the one (including laying eggs) and benefit their many owners.
I was taking one day at a time, but then several days got together and ambushed me. (from a Rhymes with Orange comic)