Cray CTO Says Cray Computers Are Great

Theyyyyyy'rrrrrre Great! by ackthpt · 2004-08-20 03:07 · Score: 3, Funny

I thought that was Tony the Tiger.

I wonder how Cray computers are in milk...

--

A feeling of having made the same mistake before: Deja Foobar

Imagine... by Rosco+P.+Coltrane · 2004-08-20 03:07 · Score: 5, Funny

no nevermind.

--
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash

The issues are progress and long-term usefulness by Space+cowboy · 2004-08-20 03:07 · Score: 5, Informative

Given the difference in rate-of-evolution in the two camps, it can't be long before PC clusters, probably running Linux / with PVM or BSP (that's bulk-synchronous parallel rather than 3D graphics :-) are perfectly capable of doing what supercomputers do today. Of course, there'll be new really-super computers then, but that's a different story :-)

It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...

I notice he hasn't quoted the data-transfer rate on these new super-duper chips. The whole article does rather look like a piece of advertising on the cheap, speaking of which, the cluster solution is (relatively) CHEAP. Did I mention that ITS CHEAP...

Simon.

--
Physicists get Hadrons!

NO WAY! by FortKnox · 2004-08-20 03:07 · Score: 5, Funny

The CTO from Cray said Crays are great machines and are priced competitively!

Next you'll tell me the CEO of SCO thinks the lawsuit is completely valid and fair!

--
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!

Re:NO WAY! by Rosco+P.+Coltrane · 2004-08-20 03:09 · Score: 2, Funny

Apparently, since Cray uses Linux in clusters now, I'm sure SCO thinks Cray machines would be even greater if they costed $699 more per node...

--
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Re:NO WAY! by gcaseye6677 · 2004-08-20 03:19 · Score: 5, Insightful

In other news, a Porsche performs better than a Ford Focus. Nevermind the 'slight' price difference.

How about... by OxygenPenguin · 2004-08-20 03:09 · Score: 3, Funny

a Linux cluster of Cray's?

--
Read the only personal Runyon page out there.

Re:*Shock* by Nos. · 2004-08-20 03:09 · Score: 5, Insightful

The thing is makers of big supercomputers are scared of clustering technology. Look at google. A large cluster, and if one of the machines dies, you don't worry about it. Every once in a while you go and replace those that died. If only a small portion die, you haven't seriously impacted your production. However, if your supercomputer goes down... well, your screwed. 1000 machines are more reliable then 1 big machine.

Linux vs. linux by Anonymous Coward · 2004-08-20 03:10 · Score: 5, Funny

Is MS somehow involved? Who am I supposed to hate? Editors?

Re:*Shock* by Anonymous Coward · 2004-08-20 03:10 · Score: 5, Informative

No, no, you misunderstand.
He's saying that linux-based *supercomputers* are faster then linux-based *clusters*.
(although, you can probably cluster those supercomputers...)

Re:*Shock* by Crazy_MYKL · 2004-08-20 03:11 · Score: 2, Informative

No, he's saying you should buy their Linux-based supercomputer instead of a Linux cluster. If you don't RTFA, at least skim the summary.

--

<jedi> There is something funny here. You laugh. </jedi>

The difference by rwven · 2004-08-20 03:11 · Score: 4, Insightful

The difference is that linux clusters aren't really designed for supercomputing... more of distributed computing. Cray specializes in it. Of course they're going to come out on top....

Re:The difference by trifakir · 2004-08-20 03:27 · Score: 2, Informative

With high-speed interconnects (i.e. infiniband/myrinet), it is very feasible.
Hm, I haven't played with infiniband, but I have access to a small Myrinet cluster and it takes hell lot of efforts to write your application in such a way as to overcome the big disparity CPU power/network thoroughput and to have some normal speed-up.
Paul Terry is right - if they remove the PCI bottleneck it will be much easier to write scalable high-performance applications and then the costs will decrease.

editor training by Knights+who+say+'INT · 2004-08-20 03:11 · Score: 2, Interesting

You really shouldnt place commentary on a story title, unless it's an "its funny, laugh" one.

Oh, by the way, everyone who has a slashdot account should go to their preferences and set the "light" layout. You wont suffer with the bad color schemes anymore, and the results are more printer-friendly too.

Re:editor training by drooling-dog · 2004-08-20 03:41 · Score: 2, Funny

I just have to ask, why would you want to print a /. discussion?
Well, if I make a particularly witty comment, of course I'd like to frame it and hang it on the wall behind my desk...

Re:*Shock* by ohad_l · 2004-08-20 03:13 · Score: 5, Informative

Uhh, no, he's not dissing Linux at all. He's saying that one big supercomputer (running Linux, perhaps) will get you more price-performance (bang per buck, I guess) than a Linux cluster.

--
If it weren't for fog, the world would run at a really crappy framerate.

Re:*Shock* by krog · 2004-08-20 03:14 · Score: 5, Insightful

Dude, the makers of "big supercomputers" invented clustering. I don't think they're afraid of it.

There are tasks that a cluster of Linux shitboxen will do well, and tasks where the cluster will not hold up so well against a real supercomputer. Google is an example of a perfect application for networked Linux servers. If you're simulating cloud physics one molecule at a time, though, you are a lot better off using the right tool for the job instead of 1,024 wrong ones.

--
Cretin - a powerful and flexible CD reencoder

A better angle would have been... by Linker3000 · 2004-08-20 03:14 · Score: 4, Funny

...Your square boxes will never look as sexy as our 'Love Seat'

--
AT&ROFLMAO

Re:*Shock* by Anonymous Coward · 2004-08-20 03:15 · Score: 5, Insightful

FUD = Fear, Uncertainty, Doubt. Provide examples in his statements of any of those three?

P.S. You are so l33t for using TT.

Re:The issues are progress and long-term usefulnes by Marx_Mrvelous · 2004-08-20 03:15 · Score: 4, Interesting

There are some limitations to clusters that "supercomputers" don't have. Even if your network were exactly as fast as the internal bus of one of the Cray supercomputers (which I highly doubt it is), you still have a logical layer on top of it (TCP/IP/UDP etc). This slows it down.

For some applications, a cluster of slow PCs is ok. Bu if you want to do real time-intensive computation, you really can't beat a good internal bus.

--

Moderation: Put your hand inside the puppet head!

Dupe! by Xpilot · 2004-08-20 03:16 · Score: 4, Informative

Yeah, no wonder this post looked familiar. Yup, it's a dupe, folks.

--
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it." -- Linus Torvalds

Re:Dupe! by glenrm · 2004-08-20 03:29 · Score: 4, Funny

Cary most have bought the two posting, ad package.

--
Onward to the Aether Sphere!

Maybe "APPLE" will buy another Cray! by callipygian-showsyst · 2004-08-20 03:17 · Score: 2, Interesting

Remember when Applebought a Cray? It was mostly for show, so their R&D group can have the blinkenlights.

However it spawned a popular story about how "Cray designs on Apple and Apple designs on Cray" (see link.)

And now for the REST of the story:

Did you know that Macintoshes are designed on PCs!? That's right--PCs running WINDOWS. You see, nobody makes software to burn eproms or design printed circuit boards that runs on MacOS, so the hardware group has a bunch of Windows PCs!.

So now you know the *rest* of the story!

--
Best Buy can have you arrested

Re:Maybe "APPLE" will buy another Cray! by Thagg · 2004-08-20 04:00 · Score: 4, Interesting

As usual, there is more to the story. Apple brought my company in on a project back in the mid 80's when they bought the Cray. While we had to sign an NDA in blood, I doubt anybody will mind me talking about it now, almost 20 years later.

Apple was trying to design a new cpu chip. It would have had vector processing capabilities not all that different from the Cray, so they bought the Cray both to do circuit simulations on the chip and as a model for their own design.

The chip was going to be a 100 MHz chip (an astonishing speed for the time) with a four-pipleline vector processing unit.

They considered (but eventually declined to) hire us to develop some kind of 3D desktop for the Mac. The idea was this would distinguish the Mac further from other computing systems, but they wouldn't be able to emulate the interface because they didn't have the horsepower.

Anyway, that's the Apple-Cray story as I understand it. I'm sure that there is a lot more to the story than I know, of course.

Thad Beier

--
I love Mondays. On a Monday, anything is possible.

Re:The issues are progress and long-term usefulnes by vondo · 2004-08-20 03:18 · Score: 4, Informative

The latency on Ethernet is too high for many tightly coupled applications (lattice QCD for example). This is why people who need better networking use something like Myrinet. I would assume that these Cray machines have very high band-width, low-latency communications. This is where super-computers distinguish themselves from clusters.

Re:Unfuglify by AndroSyn · 2004-08-20 03:20 · Score: 2, Informative

Viola, un-fuglied version.

Not to nitpick but a Viola is a string instrument in the violin family, the word you want is voilà.

Re:The issues are progress and long-term usefulnes by PythonCodr · 2004-08-20 03:22 · Score: 5, Informative

It's not just the speed of the data transfer, it's also the latency of the interconnect. A lot of scientific codes will pass around a lot of little messages, and GigE is fast for bulk transfer, but it's not so good for that. That's why there are companies like Quadrics, Myricom, etc... Infiniband should fix this, but you'll want a big infiniband switch.

His point is building fast machines is hard, and the fastest machines are really hard. Too many folks think all you have to do is throw enough PCs and GigE nics at the problem. You can build a machine that way, but the codes don't scale well. Some scientific code will quickly show negative scaling in fact (where the more processes you add, the *slower* you code will run.) MPI codes do that all the time, which is one of the reasons you'll see people running their code at sizes smaller than the whole machine, and different sizes on different machines.

Yeah, you can build a Linux based world-class supercomputer as a cluster, but you better be willing to sweat the details is all. Or buy a Cray, I guess. ;-)

No ... by gstoddart · 2004-08-20 03:22 · Score: 5, Informative

There are entire classes of computational problems which are calssed as Embarassingly Parallel.

It means it is so trivial to parallelize the problem and get gains from it (think SETI@Home) that it's a no-brainer.

Other computational problems don't just simply fan out to the bazillions of nodes with tiny independant pieces of data.

Your assertion that the Cray CTO is talking FUD when he uses the actual term is just plain wrong and unfair to him. He actually knows what he's talking about.

--
Lost at C:>. Found at C.

Clusters don't scale, huh? by FyRE666 · 2004-08-20 03:22 · Score: 2, Informative

Scaling or upgrading these systems requires much more than simply ordering more parts; it opens up the whole integration exercise. From an application perspective, clusters limit application scaling. Bandwidth and latency restrictions significantly constrain performance as more processors are applied to a problem.

Has this guy ever heard of Google? I can see his point to an extent; in fact his whole q&a session/blatant advert really boiled down to a single point: If you need to move a lot of data between processors, then a cluster will faire worse than one of Cray's supercomputers which have (obviously) more bandwidth between the CPUs and shared memory. It really does depend on the application, but for him to suggest an HPC is always a more economic, or even better option than a cluster of cheap x86 boxes is demonstrably false...

--
Code, Hardware, stuff like that.

Re:Clusters don't scale, huh? by argent · 2004-08-20 03:33 · Score: 5, Insightful

for him to suggest an HPC is always a more economic, or even better option than a cluster of cheap x86 boxes is demonstrably false

It would be if he'd said it, so it's a good thing he didn't. He even commented that there are applications (emabarassingly parallel algorithms) that clusters do very well at. And Google is a perfect example of that.

Re:Yes he's talking FUD by CommieOverlord · 2004-08-20 03:23 · Score: 3, Informative

Are you being funny or serious?

There's an entire branch of parallel application which are labeled "embarrassingly parallel". This description simply means that such programs are trivially parallelized and achieve as close to linear as possible when scaled across many nodes. This is because of the low inter-node communications.

For "embarrassingly parallel" applications, a cluster is a really good tool. For programs that parallelize as nicely a nice big vector or smp will do nicely. Some code will run better on small 20CPU SMP machine than on a 1000 node cluster.

Re:The issues are progress and long-term usefulnes by ctr2sprt · 2004-08-20 03:23 · Score: 5, Insightful

You're right, the key is "cheap." Clusters don't offer the same level of performance as supercomputers. I don't think you'd disagree with that statement. What they do is offer a similar level of performance - once unattainable by desktops or even high-end servers, and here I mean real high-end servers instead of just quad Opterons or the like - for probably a tenth the cost.

But even then, there are legitimate needs for supercomputers. A traditional PC-based server solution will address probably 99% of all problems. An inexpensive cluster will get you 99.9%. But there's that remaining 0.1%, and that's the target audience for whom Cray and similar companies exist.

The fact that PCs can be used almost unmodified to create supercomputers and high-speed clusters is remarkable, and says tremendously good things about the flexibility and power of the architecture as a whole. But there are just places it can't go, not yet. For example, you know how you never get 99% efficiency with 100 megabit ethernet? You're lucky to get 70% with gigabit, and 50% is a pretty common figure. PCI-X, at least at the speeds we're talking about here, is so rare now that it's hardly cheaper than custom supercomputer-style solutions - effectively because it is a custom supercomputer-style solution. I don't think we'll ever see common systems, even midrange servers, with more than one 16X PCI-X slot.

I really think this is what Cray mean here. Not that Linux-based clusters have no use, but that there is still a significant market for which they are suboptimal. And, in all probability, will always remain suboptimal. However fast PCs get, however popular PCI-X and similar high-speed buses become, supercomputers will just get faster to match... and computational problems will get harder to go along with them. I just don't see the need for supercomputers, at some level, ever going away.

(I hope people find my comment useful in some way. I elected to post it rather than mod down the idiot posting flamebait about Macs in reply to you. And here's hoping people don't interpret this as karma whoring, since usually if you say "This will get modded down" it doesn't. But... oh, hell. I don't even know which Slashdot rule of thumb applies to my post at this point.)

Geez by iamdrscience · 2004-08-20 03:23 · Score: 4, Informative

Being the CTO of Cray, can you expect him to say anything less? Now while his points are often valid, I think his conclusion, that supercomputers outshine linux clusters is a little inaccurate. Rather, I think the real conclusion is that linux clusters and supercomputers are both good, but at slightly different things. Which one you need to solve your problem depends ultimately, on the specific details of your problem. Again, though, being the CTO of the company, can really expect him to give a balanced opinion like that, rather than the skewed opinion that his company is always on top?

Cray is a great company, but I really hate that they have to come out with things like this every now and then. Most people in need of a lot of computing power already know the difference between your products and linux clusters and really, they're going to choose whichever's most appropriate for their problem regardless of what your CTO says.

Re:Geez by argent · 2004-08-20 03:38 · Score: 4, Informative

I think the real conclusion is that linux clusters and supercomputers are both good, but at slightly different things. Which one you need to solve your problem depends ultimately, on the specific details of your problem

Indeed. He actually made that point himself: "There are some applications where a well-designed Linux cluster can deliver good price/performance on a particular application; those embarrassingly parallel applications where processors spend little time exchanging data."

Correction by Leomania · 2004-08-20 03:26 · Score: 2, Funny

Cray CTO Says Cray Computers Are Great

Actually, I think he said that "Cray computers rock, eh?" or perhaps it was "Cray computers kick ass, eh?" or something like that.

- Leo

--
You don't use science to show that you're right, you use science to become right.

Not quite so simple really is it? by Anonymous Coward · 2004-08-20 03:27 · Score: 5, Informative

I don't think the Cray assertion is that crazy.

For a 12 CPU opteron unit the academic pricing (admittedly lower than commercial but where most of their sales will go) is about 45K. That's not too shabby. Before you bounce up and down and say I can build four times the cluster for that price, it should be noted that the XD1 gives you a single systems image, which simplifies programming and makes shared memory applications (increasingly important for areas such as bioinformatics).

We have a cluster with dolphinics wulfkit, using distributed shared memory slows us down. It's not the end of the world type slow down but it's a factor. Our cluster is a sixteen node, dual xeon 2.2GHz with wulfkit 3d torus interconnects. It cost us, at academic prices, $50K. Admittedly more CPU power than the 12 Opterons but we find ourselves using distributed shared memory alot, wulfkit is great here, and that would probably be much better on the XD1. Had the XD1 been available a year ago we may have bought one instead.

It really depends on your application. Are Crays cheaper than clusters in terms of harnessable compute power per dollar? Maybe. Depends on your application. Surely that's the correct answer.

Also, buying Cray is about getting access to their software technology too.

R-S

Re:*Shock* by beh · 2004-08-20 03:28 · Score: 3, Insightful

Well, supercomputing can be either of two issues

a) (google-like) jobs well suited to a high degree of parallel processing.

b) complicated problems that can't easily be broken down to make use of a large number of CPUs, but require a lot of operations to be completed in the proper sequence.

On the first, a cluster is a great idea.
On the second, a reaaaaaallly fast CPU is a great idea.

Re:*Shock* by networkBoy · 2004-08-20 03:28 · Score: 4, Insightful

If you're simulating cloud physics one molecule at a time, though, you are a lot better off using the right tool for the job instead of 1,024 wrong ones.

In this case the right tool is a vector based supercomputer like the SV1 (8 vector processors at 2Gflops each . . . MMmmmmmmm). A cluster based approach will waste more processing time with the message passing than anything else. Cheaper maybe, but grosely ineffecent.
-nB

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump

Re:Whee! by the_mad_poster · 2004-08-20 03:34 · Score: 2, Informative

Hi, clueless Slashbot. This is a quick rundown of why your post was stupid, and why Cray supercomputers do, in fact, do some things better than a PC cluster regardless of price.

If you have a supercomputer, you have a very, very, very fast internal bus handling all necessary data transfer. Even with the advent of PCI Express, a cluster of PCs must run in a network model. Therefore, any data crunching that occurs must pass through a network layer, the bus, the physical medium, and back through those limiters once more on the next system. Therefore, if you are doing number cruncing that truly cannot afford the delays caused by the data transfer limitation of a PC cluster, a self-contained supercomputer is far and away the best option, even if it's more expensive.

Therefore, contrary to the idiotic drivel you just spouted, Cray does, in fact, have something to offer that no PC cluster currently can.

We now return you to your regular informed diatribe in the name of the self-gratifying masturbatory stupidity that is Slashdot.

--
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!

Re:The issues are progress and long-term usefulnes by Wesley+Felter · 2004-08-20 03:35 · Score: 5, Informative

Good clusters don't use IP; they use Infiniband, Myrinet, or Quadrics, which all have OS bypass and trasport offload features so that the app can talk directly to the NIC. In fact, Cray's XD1 "supercomputer" uses the same Infiniband interconnect as some "clusters"; Cray just has better NICs.

Re:*Shock* by ranrub · 2004-08-20 03:35 · Score: 5, Informative

Have you ever worked with supercomputers?

However, if your supercomputer goes down... well, your screwed

Cray supercomputers have built-in redundancies. All the subsystems are separate from the processors and memory, which are actually "clustered" (depends on model). Even the OS has build-in means to survive the harshest hardware catastrophe by checkpointing the running jobs regularly, to off-site disks.

1000 machines are more reliable then 1 big machine

Wrong again. With 1000 lousy cheap machines, you need an on-site team of technitians to keep the all up. Supercomputers (with built-in redundancy etc.) have equal or less maintenance requirements.

The argument by manavendra · 2004-08-20 03:40 · Score: 4, Informative

is based on :

Heritage and resultant architecture: Linux clusters are typically processors are connected through I/O links, whereas supercomputing machines where processors exchange data and instructions through shared memory.
PCI bottlenecks: This the key argument made - the bottlenecks introduced by PCI communication and the bottlenecks therein. He goes on to say that performance problems in any given such cluster tend to remain with any other such cluster. I agree with that.
High Availability: He then goes on to talk about the reliability, availability and manageability of the supercomputers against typical clusters. I think there is where the FUD creeps in, along with marketing BS.

In all fairness, he does raise a critical point, however, overall, I think considering the relative ease and popularity of building, administering and growing a cluster these days, I think cost-effectiveness of a single monolithic machine is a moot point

--
http://efil.blogspot.com/

He basically said faster communications needed by VernonNemitz · 2004-08-20 03:40 · Score: 2, Insightful

That is, for a Linux cluster to keep up with a supercomputer, the cluster needs faster communications between processors. The bottleneck of going from processor to South Bridge to PCI Bus to Ethernet card, and back again at another processor, is the problem.

So, the answer is to recognize that in a cluster most of the machines don't need video cards. That means Somebody can design a fiber-optic communications card that plugs into the AGP slot (or maybe a PCI Express slot). Then, Cray, look out!

Re:He basically said faster communications needed by vidarh · 2004-08-20 04:14 · Score: 2, Interesting

And in doing so you are essentially building a super computer. However you'd have to keep in mind that it isn't all about total bandwidth - latency also needs to be extremely low. That said, HP is working on an open source Single System Image clustering support for Linux on "normal" hardware

Re:The issues are progress and long-term usefulnes by Performer+Guy · 2004-08-20 03:42 · Score: 4, Insightful

Clusters are nice for some problems but message passing and memory copying over a network is not ideal even when you have what *you* think is a lot of bandwidth. Latency and cache coherency and having a single image system can be critical factors in some classes of supercomputing problem, not to mention ease of use and specialized fp vector instructions that are often supported. The topology in large systems is often built (flexibly) into the memory controller hardware, the CPU writes to memory and it finds the right node, page migration and process affinity along with other advanced features like hardware level cache coherency helps these systems outperform clusters with ease given the right problems.

The coolest thing about this IMHO is that Cray are using Linux for their single image systems.

Yep the performance of computers is always on the increase but there will always be demand for more compute, the question is where do you want to be on the performance curve, not the absolute performance. People solve increasingly difficult problems with increasing detail and there looks to be no slowdown. They buy what suits their budget and solve as rigorously as they can for their hardware, and as hardware improves they redefine the types of problem they want to solve.

Yup clusters are cheap and they're on the top 500 but nobody actually buys a supercomputer to run LINPACK. They use them to solve real problems, the list is just for bragging rights.

Re:The issues are progress and long-term usefulnes by NoMoreNicksLeft · 2004-08-20 03:43 · Score: 2, Funny

Don't suppose anyone has an old YMP or whatever that they'd be willing to give to a good home in Virginia?

Or for that matter, a warezed copy of Unicos....

I for one ... by cascadingstylesheet · 2004-08-20 03:43 · Score: 2, Funny

I, for one, welcome our new story-duplicating, supercomputer-mocking, Slashdot editor overlords ...

Re:*Shock* by krog · 2004-08-20 03:47 · Score: 4, Insightful

Networked clusters are useful only when the task is parallelizable, and each subtask is computable on a single node. Cloud physics is not like that. Cracking RC5, for instance, is.

--
Cretin - a powerful and flexible CD reencoder

In other news... by mrjb · 2004-08-20 03:48 · Score: 4, Insightful

MS says their operating system is great. McDonald's says their food is great *and* cheap.

--
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book

I dont have enough money.. by essreenim · 2004-08-20 03:50 · Score: 4, Funny

for a Cray, you insensitive clod.

It ain't religion. by Performer+Guy · 2004-08-20 03:51 · Score: 5, Insightful

It's a but depressing to watch everyone jump on Cray here despite having no clue about the key differences between supercomputers and clusters are. All this cheerleading for clusters in various posts here illustrates how thoughtless some of these posts are. Why the heck should you care if someone makes a supercomputer or a cluster. Both clusters and supercomputers lose value fast over time.

Yes clusters are good for some stuff but we should be rooting for Cray if they're creating interesting products that fill a need, and that's exactly what they do.

It is a fact that supercomputers have an architecture that clusters cannot compete with for some classes of problem. Get over it, live with it and enjoy the fact that supercomputers are running Linux too.

It's pretty darned cool that Cray survived until now and that they still have a market for large single image systems.

Latency by khrtt · 2004-08-20 03:52 · Score: 3, Informative

It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...

The main reason for supercomputers to exist is not the high bandwidth, it's the latency of the switch. The network hardware that is used in clusters as the interconnect medium (switch) can provide very high bandwidth, but the latency is high simply because you can not have low latency over large distance, and the network hardware is designed to connect over large distances. Even if you put your nodes in the same rack, the 1000000 gigabit ethernet or whatnot stock solution you use to interconnect them, will still take milliseconds ping time.

The supercomputers run on a custom, specially designed switch instead. This design includes a lot of cost and complexity just to get the latency down. This may not make any difference for your typical web-server application, but that's not what the supercomputers are designed for.

Some scientific computations have very low dependency between parts of the dataset. For example, pretty much any simulation or search application does fine on a cluster. Anything that allows you to split the work into a large number of independent tasks runs fine on a cluster. Some scientific applications do not allow the work to be split into independent pieces. Sometimes you just need random access all over your distributed data space, and for such applications the speed of computation is determined mostly by network latency. This is where you need a supecomputer, and no cheap cluster would help.

Re:Latency by Wesley+Felter · 2004-08-20 04:04 · Score: 3, Insightful

That's a nice theory, but Cray's XD1 "supercomputer" uses the same Mellanox switch chips as some "clusters". Cray is splitting hairs to justify their product.

(BTW, I get 100 us ping time on my GigE network, but you're right that that's still 100x too slow for HPC.)

Re:A little inaccurate... by stratjakt · 2004-08-20 03:53 · Score: 3, Insightful

They don't scale for applications that require shared memory access.

Something like SETI@home could scale almost infinitely. The data elements are completely unrelated.

But if every node needed access to the same chunk of data, then the more nodes you add, the more they "fight" over that chunk of data.

Ultimately, with a PC cluster solution, only one node at a time can be accessing any given section of "shared" memory.

That's what he means, and he's right. ..offtopic..

Look at the slashbots who can't understand the article throwing a fit because of a percieved "diss" against linux. This place really makes me laugh sometimes. Hell, Cray's new gear is using linux. Cray is a card-carrying linux loving company, and have been for quite awhile.

And Cray's got some friggin crazy tech. I can't wait to see what they have to kick back into the kernel.

--
I don't need no instructions to know how to rock!!!!

...and in yet more news... by pandrijeczko · 2004-08-20 03:54 · Score: 4, Funny

...drain cover manufacturer says their product is grate...

--
Gentoo Linux - another day, another USE flag.

Re:No, they're SUPER by decepty · 2004-08-20 03:57 · Score: 2, Funny

Well, hey, thanks for asking.

--
Be careful! Bears shouldn't consume large furry dogs.

Re:The issues are progress and long-term usefulnes by Wesley+Felter · 2004-08-20 03:59 · Score: 3, Informative

Only in routed Infiniband networks, which no one uses. The normal Infiniband protocol is very lean and totally different from TCP/IP.

My understanding of what Cray is actually saying by einhverfr · 2004-08-20 04:06 · Score: 2, Insightful

Cray makes at least two types of supercomputers according to their SEC forms. These include massively parrallel clusters and vector-based supercomputers. In general massively parallel clusters are less expensive for the number of calculations per sec than the vector-based supercomputers. However, for many applications, the vector-based supercomputers will massively outperform the clusters.

Cray's competitors in the cluster markets include IBM, and their main competitor in the vector-based market is NEC.

I remember reading an article about how the US is losing the supercomputer technology war. But this criticism is best directed at companies other than Cray who are pushing cluster-based solutions to the exclusion of others. It is true, however, that the only company I am aware of in the US which markets these supercomputers is Cray.

--

LedgerSMB: Open source Accounting/ERP

Re:The issues are progress and long-term usefulnes by drinkypoo · 2004-08-20 04:06 · Score: 2, Informative

While your comment is largely informative you are still confusing PCI-Express with PCI-X. They are different things. I know that it's inherently confusing, but still...

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:However by lpp · 2004-08-20 04:07 · Score: 2, Informative

In short, if clustering provides a better/cheaper solution, go with it.

Um, yes. The grandparent and ggp were (I think) inferring though that for that particular application you actually won't be able to be both better and cheaper with a clustering solution.

i.e. if you throw enough Linux boxes into the cluster to be able to achieve the "better (faster)" solution, you will no longer be cheaper.

But I don't think anyone was arguing that even if a cluster is cheaper and faster you should still go with a supercomputer instead.

Re:agreed by jedidiah · 2004-08-20 04:07 · Score: 2, Interesting

I'm not sure you do either.

A NUMA machine is just a cluster where the wire is in the form of a bus rather than copper or fibre cabling. The communications protocol for the bus may be better optimized for "supercomputing". However, you can do the same thing for a MPP optimized network protocol.

It's all ultimately just wires and protocols.

The total lack of process migration between nodes in a cluster might actually give clusters and edge over some NUMA implementations.

Watching a single process dance around a number of bricks in a Sun 15K can be rather entertaining.

--
A Pirate and a Puritan look the same on a balance sheet.

Re:Two words: by vidarh · 2004-08-20 04:09 · Score: 3, Insightful

Cray supports Linux. In fact, the supercomputer platform Cray's CTO was pushing in the interview is running Linux. What he's saying is really that supercomputers can handle classes of problems that clusters have problems with.

If your goal is to run simulations where each piece of the simulation depend on large subset of the other pieces, then you will need ridiculous interconnect speeds, and you're likely to end up with something you could have bought from Cray or SGI or some of the other remaining supercomputer manufacturers for a fraction of the price.

Luckily for you and the rest of us many problems can be split into relatively independent pieces, in which case a Beowulf cluster or similar is more than adequate.

If you seriously believe that clusters can compete with supercomputers for every type of problem, you need to think again.

Re:*Shock* by ThosLives · 2004-08-20 04:16 · Score: 3, Insightful

Uh, I think he understands, but do you know how long a single current-day [Linux] node would take to compute a cloud simulation? The reason you use supercomputers for this is because it's a really huge set of simultaneous (possibly nonlinear) floating-point equations. Most (if not all) desktop / server type computers are not designed for that type of computation; they're better at nice sequential stuff (like RC5). For example, trying to compute one car crash on my desktop would probably take it on the order of weeks (if not months). A cray will typically do that type of computation in about 12-24 hours. So, do you have 15 computers each taking 2 weeks to crunch 15 different simulations and not get any result for 2 weeks, or do you run 1 simulation a day for 15 consecutive days and make decisions based on the current result? The latter makes much more sense for most of the applications.

Of course, it really does depend on the problem you're facing. Most people who pay for results, though, want results as fast as possible, and that's why supercomputers win for problems that aren't "embarassingly parallel".

--
"There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)

Re:Two words: by CommieOverlord · 2004-08-20 04:24 · Score: 2

Hmmm...

1. Cray is definitely pro-linux. It's what their XD1 runs. Though not their bigger computers.

2. There are some problems for which that a cluster can not even come close to achieving the performance of a supercomputer. For a lot of problems yes, for some maybe if you spend a fortune on fancy interconnects, and for some no.

3. If you're commercially building clusters let me know company it is. I'm in the market for a 128CPU cluster and I want to know who not to buy from.

Re:The issues are progress and long-term usefulnes by einhverfr · 2004-08-20 04:24 · Score: 3, Interesting

You might want to read the latest 10-K form from CRAY.

http://www.sec.gov/Archives/edgar/data/949158/0000 89102004000325/v96761e10vk.htm

Here they discuss the limitations of clusters and vector-based supercomputing.

Basically, they offer three types of supercomputers aimed at different markets: vector, massively parallel, and multithreaded. Not really sure why multithreaded means in this context (Microkernel capable of threading itself across many processors i.e. UNICOS/mk?) but they do a decent job of explaining the whole thing:

Cray Research pioneered the use of vector systems, from the Cray-1 to the Cray C90 and T90 systems. These systems typically use a moderate number (one to 32) of very fast custom processors in connection with a shared memory. Vector processing has proven to be highly effective for many scientific and engineering application programs which over the years have been written to maximize the number of long vectors. Traditional vector systems do not scale effectively (that is, increase performance by increasing the number of processors) past a limited number of processors. We currently market one classic vector supercomputer, the Cray SX-6 system.

Massively parallel processing architectures typically link tens, hundreds or thousands of standard or commodity processors to act either on multiple tasks at the same time or together in concert on a single computationally-intensive task. Type T systems connect each processor directly to its own private memory and the programmer must manage the movement of data among memory units and processors. Consequently these systems can be difficult to program. Type C massively parallel systems, unlike low bandwidth clusters, have high bandwidth and low latency interconnect systems and are said to be "tightly coupled" -- the Cray T3E, Red Storm and the OctigaBay product are examples of balanced high bandwidth purpose built systems that employ standard microprocessors.

The Cray X1 system is revolutionary in that it is the first supercomputer that combines the attributes of both vector and high bandwidth massively parallel systems. The Cray X1 system has up to 64 processors per cabinet and a shared memory. The Cray X1 system can run small problems as a vector processor would or, by focusing many processors on a task, the Cray X1 system operates as a massively parallel system with a system-wide shared memory and a single-system image. The Cray X1 system is designed to provide efficient scalability and high bandwidth to run complex applications at high sustained speeds. The Cray X1E system furthers this architectural design with increased processor speed and capability.

Our MTA-2 project for NRL is designed to have sustainable high speed, be broadly applicable and easy to program, provide scalability as systems increase in size and have balanced I/O capability. The multithreading processors make the MTA-2 system latency tolerant and, with the system's flat shared memory, able to address data anywhere in the system.

--

LedgerSMB: Open source Accounting/ERP

Re:The issues are progress and long-term usefulnes by LurkerXXX · 2004-08-20 04:24 · Score: 2, Informative

Wow. 8+ GB/s. Nice.

Unless I'm now out of date, the last figures I saw said the CrayLink Interconnect can do 102 GB/sec. That's Just a tad bit more, don't you think? No messing with masses of gig ethernet to crossconnect them. It's just done.

Let's do some bandwidth math... by JBMcB · 2004-08-20 04:26 · Score: 3, Interesting

From Cray (From XD1 page):
"A 96 GB per second, nonblocking, crossbar switching fabric in each chassis provides four 2 GB per second links to each two-way SMP and twenty-four 2 GB per second interchassis links."

-So for a dual-opteron XD1 processor unit, there is 8GB total bandwidth available.

Total aggregate PCI bandwidths (Accepted standards):

PCI32 33MHz = 133MB/s
PCI32 66MHz = 266MB/s
PCI64 33MHz = 266MB/s
PCI64 66MHz = 533MB/s
PCI-X 133MHz = 1066MB/s
PCI Express = 200MB/s (Per slot)
PCI Express x16 = 3000MB/s (Usable bandwidth)

-So for PCI Express x16 we're talking 3GB/second

SMP Opteron with two PCI Express x16 slots can do 6GB/second aggregate bandwidth. A couple of Infiniband links can easily saturate that. I'm sure this all costs quite a bit less than Cray's propriatary stuff.

--
My Other Computer Is A Data General Nova III.

Re:Let's do some bandwidth math... by Anonymous Coward · 2004-08-20 08:54 · Score: 2, Insightful

That's all very nice, but have you ever heard of latency?

Taken a little out of context by UnknowingFool · 2004-08-20 04:28 · Score: 3, Insightful

In a way he's right. Reading the whole article, it seems apparent that he's talking about certain high performance applications. Clusters are not always the best way to solve a problem. For problems that can broken down into small independent tasks like SETI, clusters are a good solution. Clusters do have their optimization challenges with latency, bottlenecks, etc. For simulations where the tasks are dependent on each, these bottlenecks add up. The individual nodes spend as much time communicating with each other as they do computing. There are also problems that cannot be distributed. In these cases clusters are not the right solution and it may not be cost effective to use a cluster.

--
Well, there's spam egg sausage and spam, that's not got much spam in it.

Re:He's right!! by mr_z_beeblebrox · 2004-08-20 04:40 · Score: 2, Insightful

He's completely right, just not in the way he intended. You'd have a hard time making the cluster as expensive as the supercomputer....

No, he's right in the way he intended.
He just leaves out a lot of information. The business environment determines what is or is not expensive. The computational environment determines what will or will not run fast, the two make a measure of how expensive something is.
If you are crunching a big continuous stream of numbers with multiple small results which are then looped in and crunched more (think major statistics, math, language interpretation etc...) These might be quicker on a single machine. If you are in an environment where (time=money)^2 (think casinos, trading floors, JIT manufacturing etc...) the lag of shared resources becomes MORE expensive than the single Cray. However, that statement is actually under hard analysis a no brainer and he is hoping that no one will question his statement enough to notice that for the other 85% (best guess statistic, blast it if you want to) of applications the cluster will offer considerable savings.
He's also hoping that no one will go down the road of the obvious that someone who has worked on the major 85% apps will do. Which I am guessing (no offence) is what happened with you. If you read it to critically his statements don't seem to hold any water. However they do, it's just the amount of water that is questionable.

Re:Two words: by iPaul · 2004-08-20 04:52 · Score: 4, Interesting

Not quite true. First off, you get much higher bandwidth between processors using proprietary (NUMA) based interconnects than you can with commodity hardware. Why? Because you can optimize for your situation. Second you can exploit things like cache-coherency between processors (even if they're in different "nodes") and therefore true shared memory. So, a 1024 processor SGI Altrix, or a 256 processor Cray is one computer as far as the OS and user-land stuff is concerned.

There's another advantage Cray has on the SV and X series and that's a vector unit on the processor. That allows you to conduct operations on arrays of numbers at once instead of having to cycle through the numbers in a loop. For example, the dot_product between two small arrays might be accomplished with one or two instructions, as opposed to a loop. Apple's AltiVec is also a vector unit.

If you took money out of the picture it would be easier to deal with a big-honkin' super computer like an SGI or Cray rather than a cluster. One computer is easier to manage and you could always use threads and plain old heap memory (which is much faster than message passing over a network).

Add money back in and 500,000 goes a lot farther in raw compute power when you're buying racks of DELLs and infiniband interconnects. However, depending on the application, you may be faster, slower, or even dog-slow compared to the cray. If you need the answer today, and the $ is not a factor, go to Cray or SGI with a blank check. If you have to balance cost and time, then a cluster might be better.

Essentially, it boils down to how much communication you do between nodes. Cray does it orders of magnitude faster than off-the-shelf stuff. If you hardly ever pass messages between nodes, clusters are fast. If you have to pass a lot of messages between nodes, one big computer will trounce lots of little ones.

--
Leave the gun, take the cannoli -- Clemenza, The Godfather

As always, it depends on the application by Orp · 2004-08-20 04:53 · Score: 4, Insightful

Both clusters and big iron have their place. I am a meteorology professor and my current research involves high-resolution numerical modeling of thunderstorms. For a problem where the domain decomposition is straightforward and internode communication isn't your bottleneck, clusters are great. One huge advantage of clusters is that they are cheap and it isn't too big of a deal to get a grant together to buy the hardware, and it's YOURS and nobody else's. A huge disadvantage to big iron is that you have to share it with about a hundred other researchers. Waiting in a queue for three days only to find you goofed up in your startup script (and the model exits immediately) is NO FUN (cf the Regatta at NCSA).

I am currently running a model using legacy FORTRAN 90 code which was written before there were clusters. It does use OMP but OMP sucks and is no substitute for code which is written with MPI in mind. The model as it currently stands requires big iron to do big runs, and it is inefficient, but it works and sometimes I just need to do science and not model development. I am working on MPI-izing the code; no small feat, but the rewards would be quite worth the effort.

In summary, both clusters and big iron have their place. Folks have a habit of making a false dichotomy with regards to these two options. I wouldn't trade my cluster for the world (currently doing parallel POV-Ray rendering of my 3D thunderstorm data, see my web link and an upcoming [not sure what month] Linux Journal article if interested) as it is perfect for much of what I am doing right now and I don't have to share it with anyone. But I will also use big iron when necessary.

--
A squid eating dough in a polyethylene bag is fast and bulbous, got me?

Doom III by Yousef · 2004-08-20 05:22 · Score: 3, Funny

Finally, a machine capable of running Doom 3!

--
-- "To ask a question is to show ignorance; Not to ask a question means you'll remain ignorant."

Re:The issues are progress and long-term usefulnes by Tiosman · 2004-08-20 05:25 · Score: 2, Interesting

I work a lot with it, like ~3000 customers, almost half of them are industry (non academic or gvt).

You found bugs ? Care to share them ? Hardware failed ? Did you get it replaced ?

Can you give me the tech support ticket numbers so I can see if your complaints are reasonable (and have been addresses) or are just plain FUD ?

Re:*Shock* by CommieOverlord · 2004-08-20 05:39 · Score: 2, Insightful

Because clusters are cheaper, per raw unit power.

But if the supercomputer is more efficient per raw unit of power, then the price per unit doesn't matter.

I work for living with HPC, buth with clusters and with large SMP machines. The cluster is nice, but there are some things than can _only_ be run a large SMP machine or are much, much faster on a SMP.

So why did you say it was FUD? by argent · 2004-08-20 06:12 · Score: 2, Insightful

You and him, you're saying the same thing, you're spinning it your own way, but the actual content is the same. So why are you describing his as FUD?

Target audience... by umshaggy · 2004-08-20 07:12 · Score: 4, Insightful

Many posts have pointed out the true fact that supercomputers are better for certain jobs that are not suited to clustered solutions (and visa versa).

Most slashdotters are technical enough to realise this...but...we are not the target audience of the original article. Such articles are meant for high level executives and relatively non-specialist managers who don't always hear all sides of the story. Every day these people are seeing articles and news blurbs stating how the latest linux cluster is as good or better than a supercomputer, and gee isn't that swell! While such press is good, and important, not everyone hearing that implicitly understands that such reports only apply to SOME applications.

So what the original article is, is a message from one executive to other executives trying to clarify the situation. Basically saying "hey, just because Wired ran a story that says linux clusters are the next best thing since sliced bread, doesn't mean that this is the best solution for you. Now, let us talk about what you need."

I see nothing wrong with this. I read the article, and found nothing in it that was false.
It is good because sometimes an exec will listen to a fellow exec when they won't listed to the advice of their own techs because of something said exec read in Scientific American.

Welcome to corporate america boys and girls.

(Disclaimer: Wired and American Scientific were random examples. I know of know articles in either publication about linux clusters. Both are fine publications.)

--
Did you buy a Neuros today?

Exploiting parallelism vs. efficient computation by billstewart · 2004-08-20 09:39 · Score: 2, Interesting

If you're trying to run 1024 cases with different starting conditions, then a 1024-processor cluster lets you run them all at once. A supercomputer with the same price as the cluster probably has only 1/10th the raw GFLOPS as the cluster, because supercomputer designs are much more complex and commodity cluster hardware is dirt cheap.

So if each cluster CPU can run a single instance the problem efficiently, it's 10 times as cost-effective to use the cluster.
On the other hand, if a single instance of the problem doesn't really fit in a cluster CPU, it might be 1/10th as efficient as the supercomputer CPU, because you're spending more time doing swapping or communications to get the numbers to crunch than you are crunching them, in which case it's a tie with the supercomputer.
But on yet another tentacle, if it's 1/100th as efficient to use the cluster CPU as the supercomputer CPU, because you have to spend a LOT more time swapping, then the supercomputer is a big win, 10 times as cost-effective as the cluster.

Back in the mid-80s, my department had a huge VAX 780 with 4 MB of RAM (16KB chips, I think), and we were working on a network simulation system that needed 12-14 MB RAM to run. I spent a while playing with different versions of 4.1BSD and Unix System VR2, but fundamentally the machine spent all its time swapping data in and out of disk, and the main performance with was helping the physics jocks who wrote the application get better algorithms and better localization and good checkpointing because the computer didn't always stay running for the full week it took to finish a simulation run. A year or two later, we got the budget to buy another 4MB of RAM (in 64KB chips, about $50K IIRC), which helped a bit, and a year or two after that, we got enough budget to buy another 8MB of RAM (maybe 256KB chips? not sure. Also about $50K), and suddenly the application could complete in under an hour instead of a week, because RAM really is a couple orders of magnitude faster than disk drives with a couple more orders of magnitude less latency, so our problem changed from being disk-bound to being CPU-bound.

That speedup not only improved the utilization of the equipment, it made a qualitative difference in the kinds of problems we could address because of the way we could interact with it. That's why people buy supercomputers if they need them - it really can be orders of magnitude faster for some problems. The first year or so, we really had all the RAM that could fit in the double-refrigerator-sized VAX cabinet. Once the denser RAM chips became available, we probably should have spent a bit more manager time beating up on the accounting department, because an extra $50K for hardware could have more than doubled the efficiency of 3-4 physicists, but of course the accounting droids don't think in terms of efficient use of physicists unless it lets you buy half as many of them, which was _not_ the objective here...

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Slashdot Mirror

Cray CTO Says Cray Computers Are Great

78 of 338 comments (clear)