Cray CTO Says Cray Computers Are Great

← Back to Stories (view on slashdot.org)

Cray CTO Says Cray Computers Are Great

Posted by michael on Friday August 20, 2004 @03:06AM from the couldn't-be-any-other-way dept.

Jan Stafford writes "Linux clusters can not offer the same price-performance as supercomputers, according to Paul Terry, chief technology officer of Burnaby, British Columbia-based Cray Canada. In this interview, Terry explains that assertion and describes Cray's new Linux-based XD1 system, which will be priced competitively with other types of high-end Linux clusters."

18 of 338 comments (clear)

Imagine... by Rosco+P.+Coltrane · 2004-08-20 03:07 · Score: 5, Funny

no nevermind.

--
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
The issues are progress and long-term usefulness by Space+cowboy · 2004-08-20 03:07 · Score: 5, Informative

Given the difference in rate-of-evolution in the two camps, it can't be long before PC clusters, probably running Linux / with PVM or BSP (that's bulk-synchronous parallel rather than 3D graphics :-) are perfectly capable of doing what supercomputers do today. Of course, there'll be new really-super computers then, but that's a different story :-)

It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...

I notice he hasn't quoted the data-transfer rate on these new super-duper chips. The whole article does rather look like a piece of advertising on the cheap, speaking of which, the cluster solution is (relatively) CHEAP. Did I mention that ITS CHEAP...

Simon.

--
Physicists get Hadrons!
NO WAY! by FortKnox · 2004-08-20 03:07 · Score: 5, Funny

The CTO from Cray said Crays are great machines and are priced competitively!

Next you'll tell me the CEO of SCO thinks the lawsuit is completely valid and fair!

--
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
1. Re:NO WAY! by gcaseye6677 · 2004-08-20 03:19 · Score: 5, Insightful
  
  In other news, a Porsche performs better than a Ford Focus. Nevermind the 'slight' price difference.
Re:*Shock* by Nos. · 2004-08-20 03:09 · Score: 5, Insightful

The thing is makers of big supercomputers are scared of clustering technology. Look at google. A large cluster, and if one of the machines dies, you don't worry about it. Every once in a while you go and replace those that died. If only a small portion die, you haven't seriously impacted your production. However, if your supercomputer goes down... well, your screwed. 1000 machines are more reliable then 1 big machine.
Linux vs. linux by Anonymous Coward · 2004-08-20 03:10 · Score: 5, Funny

Is MS somehow involved? Who am I supposed to hate? Editors?
Re:*Shock* by Anonymous Coward · 2004-08-20 03:10 · Score: 5, Informative

No, no, you misunderstand.
He's saying that linux-based *supercomputers* are faster then linux-based *clusters*.
(although, you can probably cluster those supercomputers...)
Re:*Shock* by ohad_l · 2004-08-20 03:13 · Score: 5, Informative

Uhh, no, he's not dissing Linux at all. He's saying that one big supercomputer (running Linux, perhaps) will get you more price-performance (bang per buck, I guess) than a Linux cluster.

--
If it weren't for fog, the world would run at a really crappy framerate.
Re:*Shock* by krog · 2004-08-20 03:14 · Score: 5, Insightful

Dude, the makers of "big supercomputers" invented clustering. I don't think they're afraid of it.

There are tasks that a cluster of Linux shitboxen will do well, and tasks where the cluster will not hold up so well against a real supercomputer. Google is an example of a perfect application for networked Linux servers. If you're simulating cloud physics one molecule at a time, though, you are a lot better off using the right tool for the job instead of 1,024 wrong ones.

--
Cretin - a powerful and flexible CD reencoder
Re:*Shock* by Anonymous Coward · 2004-08-20 03:15 · Score: 5, Insightful

FUD = Fear, Uncertainty, Doubt. Provide examples in his statements of any of those three?

P.S. You are so l33t for using TT.
Re:The issues are progress and long-term usefulnes by PythonCodr · 2004-08-20 03:22 · Score: 5, Informative

It's not just the speed of the data transfer, it's also the latency of the interconnect. A lot of scientific codes will pass around a lot of little messages, and GigE is fast for bulk transfer, but it's not so good for that. That's why there are companies like Quadrics, Myricom, etc... Infiniband should fix this, but you'll want a big infiniband switch.

His point is building fast machines is hard, and the fastest machines are really hard. Too many folks think all you have to do is throw enough PCs and GigE nics at the problem. You can build a machine that way, but the codes don't scale well. Some scientific code will quickly show negative scaling in fact (where the more processes you add, the *slower* you code will run.) MPI codes do that all the time, which is one of the reasons you'll see people running their code at sizes smaller than the whole machine, and different sizes on different machines.

Yeah, you can build a Linux based world-class supercomputer as a cluster, but you better be willing to sweat the details is all. Or buy a Cray, I guess. ;-)
No ... by gstoddart · 2004-08-20 03:22 · Score: 5, Informative

There are entire classes of computational problems which are calssed as Embarassingly Parallel.

It means it is so trivial to parallelize the problem and get gains from it (think SETI@Home) that it's a no-brainer.

Other computational problems don't just simply fan out to the bazillions of nodes with tiny independant pieces of data.

Your assertion that the Cray CTO is talking FUD when he uses the actual term is just plain wrong and unfair to him. He actually knows what he's talking about.

--
Lost at C:>. Found at C.
Re:The issues are progress and long-term usefulnes by ctr2sprt · 2004-08-20 03:23 · Score: 5, Insightful

You're right, the key is "cheap." Clusters don't offer the same level of performance as supercomputers. I don't think you'd disagree with that statement. What they do is offer a similar level of performance - once unattainable by desktops or even high-end servers, and here I mean real high-end servers instead of just quad Opterons or the like - for probably a tenth the cost.
But even then, there are legitimate needs for supercomputers. A traditional PC-based server solution will address probably 99% of all problems. An inexpensive cluster will get you 99.9%. But there's that remaining 0.1%, and that's the target audience for whom Cray and similar companies exist.
The fact that PCs can be used almost unmodified to create supercomputers and high-speed clusters is remarkable, and says tremendously good things about the flexibility and power of the architecture as a whole. But there are just places it can't go, not yet. For example, you know how you never get 99% efficiency with 100 megabit ethernet? You're lucky to get 70% with gigabit, and 50% is a pretty common figure. PCI-X, at least at the speeds we're talking about here, is so rare now that it's hardly cheaper than custom supercomputer-style solutions - effectively because it is a custom supercomputer-style solution. I don't think we'll ever see common systems, even midrange servers, with more than one 16X PCI-X slot.
I really think this is what Cray mean here. Not that Linux-based clusters have no use, but that there is still a significant market for which they are suboptimal. And, in all probability, will always remain suboptimal. However fast PCs get, however popular PCI-X and similar high-speed buses become, supercomputers will just get faster to match... and computational problems will get harder to go along with them. I just don't see the need for supercomputers, at some level, ever going away.
(I hope people find my comment useful in some way. I elected to post it rather than mod down the idiot posting flamebait about Macs in reply to you. And here's hoping people don't interpret this as karma whoring, since usually if you say "This will get modded down" it doesn't. But... oh, hell. I don't even know which Slashdot rule of thumb applies to my post at this point.)
Not quite so simple really is it? by Anonymous Coward · 2004-08-20 03:27 · Score: 5, Informative

I don't think the Cray assertion is that crazy.

For a 12 CPU opteron unit the academic pricing (admittedly lower than commercial but where most of their sales will go) is about 45K. That's not too shabby. Before you bounce up and down and say I can build four times the cluster for that price, it should be noted that the XD1 gives you a single systems image, which simplifies programming and makes shared memory applications (increasingly important for areas such as bioinformatics).

We have a cluster with dolphinics wulfkit, using distributed shared memory slows us down. It's not the end of the world type slow down but it's a factor. Our cluster is a sixteen node, dual xeon 2.2GHz with wulfkit 3d torus interconnects. It cost us, at academic prices, $50K. Admittedly more CPU power than the 12 Opterons but we find ourselves using distributed shared memory alot, wulfkit is great here, and that would probably be much better on the XD1. Had the XD1 been available a year ago we may have bought one instead.

It really depends on your application. Are Crays cheaper than clusters in terms of harnessable compute power per dollar? Maybe. Depends on your application. Surely that's the correct answer.

Also, buying Cray is about getting access to their software technology too.

R-S
Re:Clusters don't scale, huh? by argent · 2004-08-20 03:33 · Score: 5, Insightful

for him to suggest an HPC is always a more economic, or even better option than a cluster of cheap x86 boxes is demonstrably false

It would be if he'd said it, so it's a good thing he didn't. He even commented that there are applications (emabarassingly parallel algorithms) that clusters do very well at. And Google is a perfect example of that.
Re:The issues are progress and long-term usefulnes by Wesley+Felter · 2004-08-20 03:35 · Score: 5, Informative

Good clusters don't use IP; they use Infiniband, Myrinet, or Quadrics, which all have OS bypass and trasport offload features so that the app can talk directly to the NIC. In fact, Cray's XD1 "supercomputer" uses the same Infiniband interconnect as some "clusters"; Cray just has better NICs.
Re:*Shock* by ranrub · 2004-08-20 03:35 · Score: 5, Informative

Have you ever worked with supercomputers?

However, if your supercomputer goes down... well, your screwed

Cray supercomputers have built-in redundancies. All the subsystems are separate from the processors and memory, which are actually "clustered" (depends on model). Even the OS has build-in means to survive the harshest hardware catastrophe by checkpointing the running jobs regularly, to off-site disks.

1000 machines are more reliable then 1 big machine

Wrong again. With 1000 lousy cheap machines, you need an on-site team of technitians to keep the all up. Supercomputers (with built-in redundancy etc.) have equal or less maintenance requirements.
It ain't religion. by Performer+Guy · 2004-08-20 03:51 · Score: 5, Insightful

It's a but depressing to watch everyone jump on Cray here despite having no clue about the key differences between supercomputers and clusters are. All this cheerleading for clusters in various posts here illustrates how thoughtless some of these posts are. Why the heck should you care if someone makes a supercomputer or a cluster. Both clusters and supercomputers lose value fast over time.

Yes clusters are good for some stuff but we should be rooting for Cray if they're creating interesting products that fill a need, and that's exactly what they do.

It is a fact that supercomputers have an architecture that clusters cannot compete with for some classes of problem. Get over it, live with it and enjoy the fact that supercomputers are running Linux too.

It's pretty darned cool that Cray survived until now and that they still have a market for large single image systems.