Cray CTO Says Cray Computers Are Great
Jan Stafford writes "Linux clusters can not offer the same price-performance as supercomputers, according to Paul Terry, chief technology officer of Burnaby, British Columbia-based Cray Canada. In this interview, Terry explains that assertion and describes Cray's new Linux-based XD1 system, which will be priced competitively with other types of high-end Linux clusters."
The thing is makers of big supercomputers are scared of clustering technology. Look at google. A large cluster, and if one of the machines dies, you don't worry about it. Every once in a while you go and replace those that died. If only a small portion die, you haven't seriously impacted your production. However, if your supercomputer goes down... well, your screwed. 1000 machines are more reliable then 1 big machine.
The difference is that linux clusters aren't really designed for supercomputing... more of distributed computing. Cray specializes in it. Of course they're going to come out on top....
Dude, the makers of "big supercomputers" invented clustering. I don't think they're afraid of it.
There are tasks that a cluster of Linux shitboxen will do well, and tasks where the cluster will not hold up so well against a real supercomputer. Google is an example of a perfect application for networked Linux servers. If you're simulating cloud physics one molecule at a time, though, you are a lot better off using the right tool for the job instead of 1,024 wrong ones.
Cretin - a powerful and flexible CD reencoder
FUD = Fear, Uncertainty, Doubt. Provide examples in his statements of any of those three?
P.S. You are so l33t for using TT.
In other news, a Porsche performs better than a Ford Focus. Nevermind the 'slight' price difference.
But even then, there are legitimate needs for supercomputers. A traditional PC-based server solution will address probably 99% of all problems. An inexpensive cluster will get you 99.9%. But there's that remaining 0.1%, and that's the target audience for whom Cray and similar companies exist.
The fact that PCs can be used almost unmodified to create supercomputers and high-speed clusters is remarkable, and says tremendously good things about the flexibility and power of the architecture as a whole. But there are just places it can't go, not yet. For example, you know how you never get 99% efficiency with 100 megabit ethernet? You're lucky to get 70% with gigabit, and 50% is a pretty common figure. PCI-X, at least at the speeds we're talking about here, is so rare now that it's hardly cheaper than custom supercomputer-style solutions - effectively because it is a custom supercomputer-style solution. I don't think we'll ever see common systems, even midrange servers, with more than one 16X PCI-X slot.
I really think this is what Cray mean here. Not that Linux-based clusters have no use, but that there is still a significant market for which they are suboptimal. And, in all probability, will always remain suboptimal. However fast PCs get, however popular PCI-X and similar high-speed buses become, supercomputers will just get faster to match... and computational problems will get harder to go along with them. I just don't see the need for supercomputers, at some level, ever going away.
(I hope people find my comment useful in some way. I elected to post it rather than mod down the idiot posting flamebait about Macs in reply to you. And here's hoping people don't interpret this as karma whoring, since usually if you say "This will get modded down" it doesn't. But... oh, hell. I don't even know which Slashdot rule of thumb applies to my post at this point.)
Well, supercomputing can be either of two issues
a) (google-like) jobs well suited to a high degree of parallel processing.
b) complicated problems that can't easily be broken down to make use of a large number of CPUs, but require a lot of operations to be completed in the proper sequence.
On the first, a cluster is a great idea.
On the second, a reaaaaaallly fast CPU is a great idea.
If you're simulating cloud physics one molecule at a time, though, you are a lot better off using the right tool for the job instead of 1,024 wrong ones.
In this case the right tool is a vector based supercomputer like the SV1 (8 vector processors at 2Gflops each . . . MMmmmmmmm). A cluster based approach will waste more processing time with the message passing than anything else. Cheaper maybe, but grosely ineffecent.
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
for him to suggest an HPC is always a more economic, or even better option than a cluster of cheap x86 boxes is demonstrably false
It would be if he'd said it, so it's a good thing he didn't. He even commented that there are applications (emabarassingly parallel algorithms) that clusters do very well at. And Google is a perfect example of that.
That is, for a Linux cluster to keep up with a supercomputer, the cluster needs faster communications between processors. The bottleneck of going from processor to South Bridge to PCI Bus to Ethernet card, and back again at another processor, is the problem.
So, the answer is to recognize that in a cluster most of the machines don't need video cards. That means Somebody can design a fiber-optic communications card that plugs into the AGP slot (or maybe a PCI Express slot). Then, Cray, look out!
Clusters are nice for some problems but message passing and memory copying over a network is not ideal even when you have what *you* think is a lot of bandwidth. Latency and cache coherency and having a single image system can be critical factors in some classes of supercomputing problem, not to mention ease of use and specialized fp vector instructions that are often supported. The topology in large systems is often built (flexibly) into the memory controller hardware, the CPU writes to memory and it finds the right node, page migration and process affinity along with other advanced features like hardware level cache coherency helps these systems outperform clusters with ease given the right problems.
The coolest thing about this IMHO is that Cray are using Linux for their single image systems.
Yep the performance of computers is always on the increase but there will always be demand for more compute, the question is where do you want to be on the performance curve, not the absolute performance. People solve increasingly difficult problems with increasing detail and there looks to be no slowdown. They buy what suits their budget and solve as rigorously as they can for their hardware, and as hardware improves they redefine the types of problem they want to solve.
Yup clusters are cheap and they're on the top 500 but nobody actually buys a supercomputer to run LINPACK. They use them to solve real problems, the list is just for bragging rights.
Networked clusters are useful only when the task is parallelizable, and each subtask is computable on a single node. Cloud physics is not like that. Cracking RC5, for instance, is.
Cretin - a powerful and flexible CD reencoder
MS says their operating system is great. McDonald's says their food is great *and* cheap.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
It's a but depressing to watch everyone jump on Cray here despite having no clue about the key differences between supercomputers and clusters are. All this cheerleading for clusters in various posts here illustrates how thoughtless some of these posts are. Why the heck should you care if someone makes a supercomputer or a cluster. Both clusters and supercomputers lose value fast over time.
Yes clusters are good for some stuff but we should be rooting for Cray if they're creating interesting products that fill a need, and that's exactly what they do.
It is a fact that supercomputers have an architecture that clusters cannot compete with for some classes of problem. Get over it, live with it and enjoy the fact that supercomputers are running Linux too.
It's pretty darned cool that Cray survived until now and that they still have a market for large single image systems.
They don't scale for applications that require shared memory access.
..offtopic..
Something like SETI@home could scale almost infinitely. The data elements are completely unrelated.
But if every node needed access to the same chunk of data, then the more nodes you add, the more they "fight" over that chunk of data.
Ultimately, with a PC cluster solution, only one node at a time can be accessing any given section of "shared" memory.
That's what he means, and he's right.
Look at the slashbots who can't understand the article throwing a fit because of a percieved "diss" against linux. This place really makes me laugh sometimes. Hell, Cray's new gear is using linux. Cray is a card-carrying linux loving company, and have been for quite awhile.
And Cray's got some friggin crazy tech. I can't wait to see what they have to kick back into the kernel.
I don't need no instructions to know how to rock!!!!
That's a nice theory, but Cray's XD1 "supercomputer" uses the same Mellanox switch chips as some "clusters". Cray is splitting hairs to justify their product.
(BTW, I get 100 us ping time on my GigE network, but you're right that that's still 100x too slow for HPC.)
Cray makes at least two types of supercomputers according to their SEC forms. These include massively parrallel clusters and vector-based supercomputers. In general massively parallel clusters are less expensive for the number of calculations per sec than the vector-based supercomputers. However, for many applications, the vector-based supercomputers will massively outperform the clusters.
Cray's competitors in the cluster markets include IBM, and their main competitor in the vector-based market is NEC.
I remember reading an article about how the US is losing the supercomputer technology war. But this criticism is best directed at companies other than Cray who are pushing cluster-based solutions to the exclusion of others. It is true, however, that the only company I am aware of in the US which markets these supercomputers is Cray.
LedgerSMB: Open source Accounting/ERP
If your goal is to run simulations where each piece of the simulation depend on large subset of the other pieces, then you will need ridiculous interconnect speeds, and you're likely to end up with something you could have bought from Cray or SGI or some of the other remaining supercomputer manufacturers for a fraction of the price.
Luckily for you and the rest of us many problems can be split into relatively independent pieces, in which case a Beowulf cluster or similar is more than adequate.
If you seriously believe that clusters can compete with supercomputers for every type of problem, you need to think again.
Of course, it really does depend on the problem you're facing. Most people who pay for results, though, want results as fast as possible, and that's why supercomputers win for problems that aren't "embarassingly parallel".
"There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
In a way he's right. Reading the whole article, it seems apparent that he's talking about certain high performance applications. Clusters are not always the best way to solve a problem. For problems that can broken down into small independent tasks like SETI, clusters are a good solution. Clusters do have their optimization challenges with latency, bottlenecks, etc. For simulations where the tasks are dependent on each, these bottlenecks add up. The individual nodes spend as much time communicating with each other as they do computing. There are also problems that cannot be distributed. In these cases clusters are not the right solution and it may not be cost effective to use a cluster.
Well, there's spam egg sausage and spam, that's not got much spam in it.
He's completely right, just not in the way he intended. You'd have a hard time making the cluster as expensive as the supercomputer....
No, he's right in the way he intended.
He just leaves out a lot of information. The business environment determines what is or is not expensive. The computational environment determines what will or will not run fast, the two make a measure of how expensive something is.
If you are crunching a big continuous stream of numbers with multiple small results which are then looped in and crunched more (think major statistics, math, language interpretation etc...) These might be quicker on a single machine. If you are in an environment where (time=money)^2 (think casinos, trading floors, JIT manufacturing etc...) the lag of shared resources becomes MORE expensive than the single Cray. However, that statement is actually under hard analysis a no brainer and he is hoping that no one will question his statement enough to notice that for the other 85% (best guess statistic, blast it if you want to) of applications the cluster will offer considerable savings.
He's also hoping that no one will go down the road of the obvious that someone who has worked on the major 85% apps will do. Which I am guessing (no offence) is what happened with you. If you read it to critically his statements don't seem to hold any water. However they do, it's just the amount of water that is questionable.
Both clusters and big iron have their place. I am a meteorology professor and my current research involves high-resolution numerical modeling of thunderstorms. For a problem where the domain decomposition is straightforward and internode communication isn't your bottleneck, clusters are great. One huge advantage of clusters is that they are cheap and it isn't too big of a deal to get a grant together to buy the hardware, and it's YOURS and nobody else's. A huge disadvantage to big iron is that you have to share it with about a hundred other researchers. Waiting in a queue for three days only to find you goofed up in your startup script (and the model exits immediately) is NO FUN (cf the Regatta at NCSA).
I am currently running a model using legacy FORTRAN 90 code which was written before there were clusters. It does use OMP but OMP sucks and is no substitute for code which is written with MPI in mind. The model as it currently stands requires big iron to do big runs, and it is inefficient, but it works and sometimes I just need to do science and not model development. I am working on MPI-izing the code; no small feat, but the rewards would be quite worth the effort.
In summary, both clusters and big iron have their place. Folks have a habit of making a false dichotomy with regards to these two options. I wouldn't trade my cluster for the world (currently doing parallel POV-Ray rendering of my 3D thunderstorm data, see my web link and an upcoming [not sure what month] Linux Journal article if interested) as it is perfect for much of what I am doing right now and I don't have to share it with anyone. But I will also use big iron when necessary.
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
Because clusters are cheaper, per raw unit power.
But if the supercomputer is more efficient per raw unit of power, then the price per unit doesn't matter.
I work for living with HPC, buth with clusters and with large SMP machines. The cluster is nice, but there are some things than can _only_ be run a large SMP machine or are much, much faster on a SMP.
You and him, you're saying the same thing, you're spinning it your own way, but the actual content is the same. So why are you describing his as FUD?
Many posts have pointed out the true fact that supercomputers are better for certain jobs that are not suited to clustered solutions (and visa versa).
Most slashdotters are technical enough to realise this...but...we are not the target audience of the original article. Such articles are meant for high level executives and relatively non-specialist managers who don't always hear all sides of the story. Every day these people are seeing articles and news blurbs stating how the latest linux cluster is as good or better than a supercomputer, and gee isn't that swell! While such press is good, and important, not everyone hearing that implicitly understands that such reports only apply to SOME applications.
So what the original article is, is a message from one executive to other executives trying to clarify the situation. Basically saying "hey, just because Wired ran a story that says linux clusters are the next best thing since sliced bread, doesn't mean that this is the best solution for you. Now, let us talk about what you need."
I see nothing wrong with this. I read the article, and found nothing in it that was false.
It is good because sometimes an exec will listen to a fellow exec when they won't listed to the advice of their own techs because of something said exec read in Scientific American.
Welcome to corporate america boys and girls.
(Disclaimer: Wired and American Scientific were random examples. I know of know articles in either publication about linux clusters. Both are fine publications.)
Did you buy a Neuros today?
That's all very nice, but have you ever heard of latency?