Cray CTO Says Cray Computers Are Great
Jan Stafford writes "Linux clusters can not offer the same price-performance as supercomputers, according to Paul Terry, chief technology officer of Burnaby, British Columbia-based Cray Canada. In this interview, Terry explains that assertion and describes Cray's new Linux-based XD1 system, which will be priced competitively with other types of high-end Linux clusters."
So he's saying we shouldn't buy thier latest offerings involving Linux
:(
because it's not as good as thier non Linux offerings?
I never expected FUD from Cray.
feh. stuff.
I wonder how Cray computers are in milk...
A feeling of having made the same mistake before: Deja Foobar
no nevermind.
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Given the difference in rate-of-evolution in the two camps, it can't be long before PC clusters, probably running Linux / with PVM or BSP (that's bulk-synchronous parallel rather than 3D graphics
It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...
I notice he hasn't quoted the data-transfer rate on these new super-duper chips. The whole article does rather look like a piece of advertising on the cheap, speaking of which, the cluster solution is (relatively) CHEAP. Did I mention that ITS CHEAP...
Simon.
Physicists get Hadrons!
The CTO from Cray said Crays are great machines and are priced competitively!
Next you'll tell me the CEO of SCO thinks the lawsuit is completely valid and fair!
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
Viola, un-fuglied version. ;)
feh. stuff.
a Linux cluster of Cray's?
Read the only personal Runyon page out there.
Is MS somehow involved? Who am I supposed to hate? Editors?
Wait... I thought apple pc's were already super computers?
--------========+++Dont Feed The Lab Techs+++========--------
Most /.'ers are probably dying to mod this whole article down to troll.
I guess they can take their anger out on this anonymous post instead.
The difference is that linux clusters aren't really designed for supercomputing... more of distributed computing. Cray specializes in it. Of course they're going to come out on top....
Bear found defacating in a wooded region.
Pope reveals his membership of the Catholic Church.
You really shouldnt place commentary on a story title, unless it's an "its funny, laugh" one.
Oh, by the way, everyone who has a slashdot account should go to their preferences and set the "light" layout. You wont suffer with the bad color schemes anymore, and the results are more printer-friendly too.
And it is, too.
I bet the cray can maintain higher fps in doom 3 than a cluster ever could.
Nothing like a CTO of a company that's got slipping marketshare coming out and masturbating in public to get more attention. But I guess that's the best they can do...
blog |
...Your square boxes will never look as sexy as our 'Love Seat'
AT&ROFLMAO
Compare this article with http://it.slashdot.org/article.pl?sid=04/04/13/145 2255&tid=126&tid=106
Bill Gates says Windows XP is great!
Linus Torvalds says Linux is great!
etc, etc.
There are some limitations to clusters that "supercomputers" don't have. Even if your network were exactly as fast as the internal bus of one of the Cray supercomputers (which I highly doubt it is), you still have a logical layer on top of it (TCP/IP/UDP etc). This slows it down.
For some applications, a cluster of slow PCs is ok. Bu if you want to do real time-intensive computation, you really can't beat a good internal bus.
Moderation: Put your hand inside the puppet head!
He's completely right, just not in the way he intended. You'd have a hard time making the cluster as expensive as the supercomputer....
Yeah, no wonder this post looked familiar. Yup, it's a dupe, folks.
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it." -- Linus Torvalds
However it spawned a popular story about how "Cray designs on Apple and Apple designs on Cray" (see link.)
And now for the REST of the story:
Did you know that Macintoshes are designed on PCs!? That's right--PCs running WINDOWS. You see, nobody makes software to burn eproms or design printed circuit boards that runs on MacOS, so the hardware group has a bunch of Windows PCs!.
So now you know the *rest* of the story!
Best Buy can have you arrested
The latency on Ethernet is too high for many tightly coupled applications (lattice QCD for example). This is why people who need better networking use something like Myrinet. I would assume that these Cray machines have very high band-width, low-latency communications. This is where super-computers distinguish themselves from clusters.
I saw this MST3k blooper once where Tom called out "Cray" instead of "Crow". Still in character, And with false modesty, Crow replied with "Well that's very nice of you, Tom. I'm really more of a PC though."
(Not a verbatim quote.)
"Derp de derp."
You could look to SGI. Their Altix range is up to 1024 Itanium 2 processors in a single supercomputer, and they are putting 20 512 * processor nodes together in a cluster of linux supercomputers for NASA while also working on doubling up the maximum single machine cpu count to 2048.
Never underestimate the dark side of the Source
I don't think the guy who you responded to even knows what NUMA means...
clusters will never replace the supercomputer, ever. If you can build a fast cluster, you can build an even faster and better shared memory supercomputer. There are many applications that do not break up well for clusters....
It's not just the speed of the data transfer, it's also the latency of the interconnect. A lot of scientific codes will pass around a lot of little messages, and GigE is fast for bulk transfer, but it's not so good for that. That's why there are companies like Quadrics, Myricom, etc... Infiniband should fix this, but you'll want a big infiniband switch.
His point is building fast machines is hard, and the fastest machines are really hard. Too many folks think all you have to do is throw enough PCs and GigE nics at the problem. You can build a machine that way, but the codes don't scale well. Some scientific code will quickly show negative scaling in fact (where the more processes you add, the *slower* you code will run.) MPI codes do that all the time, which is one of the reasons you'll see people running their code at sizes smaller than the whole machine, and different sizes on different machines.
Yeah, you can build a Linux based world-class supercomputer as a cluster, but you better be willing to sweat the details is all. Or buy a Cray, I guess. ;-)
In other news, Bill Gates says Windows is secure...
There are entire classes of computational problems which are calssed as Embarassingly Parallel.
It means it is so trivial to parallelize the problem and get gains from it (think SETI@Home) that it's a no-brainer.
Other computational problems don't just simply fan out to the bazillions of nodes with tiny independant pieces of data.
Your assertion that the Cray CTO is talking FUD when he uses the actual term is just plain wrong and unfair to him. He actually knows what he's talking about.
Lost at C:>. Found at C.
Scaling or upgrading these systems requires much more than simply ordering more parts; it opens up the whole integration exercise. From an application perspective, clusters limit application scaling. Bandwidth and latency restrictions significantly constrain performance as more processors are applied to a problem.
Has this guy ever heard of Google? I can see his point to an extent; in fact his whole q&a session/blatant advert really boiled down to a single point: If you need to move a lot of data between processors, then a cluster will faire worse than one of Cray's supercomputers which have (obviously) more bandwidth between the CPUs and shared memory. It really does depend on the application, but for him to suggest an HPC is always a more economic, or even better option than a cluster of cheap x86 boxes is demonstrably false...
Code, Hardware, stuff like that.
WOW COOL
a). What does this have to do with TFA?
b). Who the fuck cares...
Are you being funny or serious?
There's an entire branch of parallel application which are labeled "embarrassingly parallel". This description simply means that such programs are trivially parallelized and achieve as close to linear as possible when scaled across many nodes. This is because of the low inter-node communications.
For "embarrassingly parallel" applications, a cluster is a really good tool. For programs that parallelize as nicely a nice big vector or smp will do nicely. Some code will run better on small 20CPU SMP machine than on a 1000 node cluster.
But even then, there are legitimate needs for supercomputers. A traditional PC-based server solution will address probably 99% of all problems. An inexpensive cluster will get you 99.9%. But there's that remaining 0.1%, and that's the target audience for whom Cray and similar companies exist.
The fact that PCs can be used almost unmodified to create supercomputers and high-speed clusters is remarkable, and says tremendously good things about the flexibility and power of the architecture as a whole. But there are just places it can't go, not yet. For example, you know how you never get 99% efficiency with 100 megabit ethernet? You're lucky to get 70% with gigabit, and 50% is a pretty common figure. PCI-X, at least at the speeds we're talking about here, is so rare now that it's hardly cheaper than custom supercomputer-style solutions - effectively because it is a custom supercomputer-style solution. I don't think we'll ever see common systems, even midrange servers, with more than one 16X PCI-X slot.
I really think this is what Cray mean here. Not that Linux-based clusters have no use, but that there is still a significant market for which they are suboptimal. And, in all probability, will always remain suboptimal. However fast PCs get, however popular PCI-X and similar high-speed buses become, supercomputers will just get faster to match... and computational problems will get harder to go along with them. I just don't see the need for supercomputers, at some level, ever going away.
(I hope people find my comment useful in some way. I elected to post it rather than mod down the idiot posting flamebait about Macs in reply to you. And here's hoping people don't interpret this as karma whoring, since usually if you say "This will get modded down" it doesn't. But... oh, hell. I don't even know which Slashdot rule of thumb applies to my post at this point.)
Being the CTO of Cray, can you expect him to say anything less? Now while his points are often valid, I think his conclusion, that supercomputers outshine linux clusters is a little inaccurate. Rather, I think the real conclusion is that linux clusters and supercomputers are both good, but at slightly different things. Which one you need to solve your problem depends ultimately, on the specific details of your problem. Again, though, being the CTO of the company, can really expect him to give a balanced opinion like that, rather than the skewed opinion that his company is always on top?
Cray is a great company, but I really hate that they have to come out with things like this every now and then. Most people in need of a lot of computing power already know the difference between your products and linux clusters and really, they're going to choose whichever's most appropriate for their problem regardless of what your CTO says.
When will we see headlines that say "linus says linux is great"???
"embarassingly parallel" is just a phrase used to describe certain types of problems. Ok, I'll give you the fact that the phrase is a bit biased, but it's still just a phrase. Clusters work well for problems that can be broken up into smaller chunks that can be independantly solved individually and then combined to produce a final result, but for problems that require signifigant amounts of communication and data transfer between processors, clusters just don't cut it. Crays and other supercomputers use specially designed communication networks between processors and memory and such, and that's why they're so much more useful for those types of applications.
-James
Cray CTO Says Cray Computers Are Great
Actually, I think he said that "Cray computers rock, eh?" or perhaps it was "Cray computers kick ass, eh?" or something like that.
- Leo
You don't use science to show that you're right, you use science to become right.
you sure post an awful lot
I don't think the Cray assertion is that crazy.
For a 12 CPU opteron unit the academic pricing (admittedly lower than commercial but where most of their sales will go) is about 45K. That's not too shabby. Before you bounce up and down and say I can build four times the cluster for that price, it should be noted that the XD1 gives you a single systems image, which simplifies programming and makes shared memory applications (increasingly important for areas such as bioinformatics).
We have a cluster with dolphinics wulfkit, using distributed shared memory slows us down. It's not the end of the world type slow down but it's a factor. Our cluster is a sixteen node, dual xeon 2.2GHz with wulfkit 3d torus interconnects. It cost us, at academic prices, $50K. Admittedly more CPU power than the 12 Opterons but we find ourselves using distributed shared memory alot, wulfkit is great here, and that would probably be much better on the XD1. Had the XD1 been available a year ago we may have bought one instead.
It really depends on your application. Are Crays cheaper than clusters in terms of harnessable compute power per dollar? Maybe. Depends on your application. Surely that's the correct answer.
Also, buying Cray is about getting access to their software technology too.
R-S
Linus Torvalds claims Linux is "better than Microsoft."
"Emberassingly parallel" is a term referring to parallel computations involving minimal or no communication across the computing nodes. :)) ;
it is not to be interpreted as "parallelism is emberassing to him"
oh and by the way, what he said is not wrong !!
-ram
> why is parallel "embarassing"? oh I see, facts aren't good for your bottom line.
Actually that is a common expression, and not something he pulled out of his heinie - peek here
I think you misunderstand what he said. The term "embarasingly parallel" has been in common use for many years to describe problems that require so little communication between processors that they can be scaled up more or less indefinitely just by adding more computers. The ultimate examples of "embarassingly parallelizable problems" are things like the human genome project or SETI-at-home, where it's practical to farm it out to completely disconnected computers to do bits of the work in isolation.
I think it was Big Gay Al that said they were SUPER.
Or is it really?
Crays are hardware, Linux is software.
A cray could conceivably run Linux.
Maybe an off-the-shelf cray performs faster in some cases than a linux cluster, but you could adapt Linux to run on a Cray, what's the issue here, other than sales?
I don't know the meaning of the word 'don't' - J
Good clusters don't use IP; they use Infiniband, Myrinet, or Quadrics, which all have OS bypass and trasport offload features so that the app can talk directly to the NIC. In fact, Cray's XD1 "supercomputer" uses the same Infiniband interconnect as some "clusters"; Cray just has better NICs.
- Heritage and resultant architecture: Linux clusters are typically processors are connected through I/O links, whereas supercomputing machines where processors exchange data and instructions through shared memory.
- PCI bottlenecks: This the key argument made - the bottlenecks introduced by PCI communication and the bottlenecks therein. He goes on to say that performance problems in any given such cluster tend to remain with any other such cluster. I agree with that.
- High Availability: He then goes on to talk about the reliability, availability and manageability of the supercomputers against typical clusters. I think there is where the FUD creeps in, along with marketing BS.
In all fairness, he does raise a critical point, however, overall, I think considering the relative ease and popularity of building, administering and growing a cluster these days, I think cost-effectiveness of a single monolithic machine is a moot pointhttp://efil.blogspot.com/
That is, for a Linux cluster to keep up with a supercomputer, the cluster needs faster communications between processors. The bottleneck of going from processor to South Bridge to PCI Bus to Ethernet card, and back again at another processor, is the problem.
So, the answer is to recognize that in a cluster most of the machines don't need video cards. That means Somebody can design a fiber-optic communications card that plugs into the AGP slot (or maybe a PCI Express slot). Then, Cray, look out!
PCI sucks
Learning HOW to think is more important than learning WHAT to think.
Cray is so great! Cray is so great! Cray is so great!
G-R-A-T!
I mean G-R-E-A-T...
Clusters are nice for some problems but message passing and memory copying over a network is not ideal even when you have what *you* think is a lot of bandwidth. Latency and cache coherency and having a single image system can be critical factors in some classes of supercomputing problem, not to mention ease of use and specialized fp vector instructions that are often supported. The topology in large systems is often built (flexibly) into the memory controller hardware, the CPU writes to memory and it finds the right node, page migration and process affinity along with other advanced features like hardware level cache coherency helps these systems outperform clusters with ease given the right problems.
The coolest thing about this IMHO is that Cray are using Linux for their single image systems.
Yep the performance of computers is always on the increase but there will always be demand for more compute, the question is where do you want to be on the performance curve, not the absolute performance. People solve increasingly difficult problems with increasing detail and there looks to be no slowdown. They buy what suits their budget and solve as rigorously as they can for their hardware, and as hardware improves they redefine the types of problem they want to solve.
Yup clusters are cheap and they're on the top 500 but nobody actually buys a supercomputer to run LINPACK. They use them to solve real problems, the list is just for bragging rights.
Don't suppose anyone has an old YMP or whatever that they'd be willing to give to a good home in Virginia?
Or for that matter, a warezed copy of Unicos....
I, for one, welcome our new story-duplicating, supercomputer-mocking, Slashdot editor overlords ...
Do you even know what FUD means?
On the other hand, supercomputers are purpose-built to handle HPC applications, which place enormous demands on both processing power and inter-processor communication. Their design includes high performance interconnects that provide high bandwidth, low-latency communications across the entire system, regardless of the number of processors required.
Why can't Linux clusters use the same high performance interconnects? Is it because of cable overhead (length, signal travel, insulation, etc...) or is it because of slow electronic switching? Why can't optical linkage provide the same low-latency interconnect performance as that of supercomputers. Somebody tell me, please. I need to know.
This just in! Company exec. says their products are great!!!
Seriously, this is news?
Ignorance is the root of all evil.
now, perhaps i missed the point, but i can afford the beowulf cluster in my basement. But, i don't think i can afford even a used cray:p
While many things that the Cray CTO said are true, I think the issue (obviously) has be skewed some. It really depends on the problem you are solving. Some problems will need to have data shared between all of the the nodes, but others will require that each node only has access to the data that is important to the small part of the problem that it solves. Also, the CTO mentioned that clusters don't scale very well. I don't really know what made him think this, but it seems to me that clusters do scale pretty well. For instance, ILM supposedly uses all of its employee's workstations at night to help do the daily renders. This way all of the cpus sitting on desks don't go unused during off hours.
SIGFAULT
Infiniband uses a variant of IPv6 for addressing, and I believe the protocol is IPv6 based (It's been a few years since I looked at IB).
The only reason we have the rights we have is that people just like us died to gain those rights. -- Cheerio Boy
MS says their operating system is great. McDonald's says their food is great *and* cheap.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
He didn't say parallel was embarassing. He said that there are applications where it's embarassing how parallel they are.
for a Cray, you insensitive clod.
It's a but depressing to watch everyone jump on Cray here despite having no clue about the key differences between supercomputers and clusters are. All this cheerleading for clusters in various posts here illustrates how thoughtless some of these posts are. Why the heck should you care if someone makes a supercomputer or a cluster. Both clusters and supercomputers lose value fast over time.
Yes clusters are good for some stuff but we should be rooting for Cray if they're creating interesting products that fill a need, and that's exactly what they do.
It is a fact that supercomputers have an architecture that clusters cannot compete with for some classes of problem. Get over it, live with it and enjoy the fact that supercomputers are running Linux too.
It's pretty darned cool that Cray survived until now and that they still have a market for large single image systems.
It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...
The main reason for supercomputers to exist is not the high bandwidth, it's the latency of the switch. The network hardware that is used in clusters as the interconnect medium (switch) can provide very high bandwidth, but the latency is high simply because you can not have low latency over large distance, and the network hardware is designed to connect over large distances. Even if you put your nodes in the same rack, the 1000000 gigabit ethernet or whatnot stock solution you use to interconnect them, will still take milliseconds ping time.
The supercomputers run on a custom, specially designed switch instead. This design includes a lot of cost and complexity just to get the latency down. This may not make any difference for your typical web-server application, but that's not what the supercomputers are designed for.
Some scientific computations have very low dependency between parts of the dataset. For example, pretty much any simulation or search application does fine on a cluster. Anything that allows you to split the work into a large number of independent tasks runs fine on a cluster. Some scientific applications do not allow the work to be split into independent pieces. Sometimes you just need random access all over your distributed data space, and for such applications the speed of computation is determined mostly by network latency. This is where you need a supecomputer, and no cheap cluster would help.
...is also a name that rings a bell :)
...drain cover manufacturer says their product is grate...
Gentoo Linux - another day, another USE flag.
if you can get a big enough cluster that will get the work done faster than the supercomputer and still be cheaper, doesn't this override the inefficiency factor?
"I'm just here to regulate funkiness."
n/t
Only in routed Infiniband networks, which no one uses. The normal Infiniband protocol is very lean and totally different from TCP/IP.
You dork, Crays are running linux. And a cluster is not a supercomputer.
And you've obviously never worked with either.
As for your explanations, this is exaclty the place for it. Tell us how you can magically share memory over a PCI bus as fast as a Cray does.
I don't need no instructions to know how to rock!!!!
Cray makes at least two types of supercomputers according to their SEC forms. These include massively parrallel clusters and vector-based supercomputers. In general massively parallel clusters are less expensive for the number of calculations per sec than the vector-based supercomputers. However, for many applications, the vector-based supercomputers will massively outperform the clusters.
Cray's competitors in the cluster markets include IBM, and their main competitor in the vector-based market is NEC.
I remember reading an article about how the US is losing the supercomputer technology war. But this criticism is best directed at companies other than Cray who are pushing cluster-based solutions to the exclusion of others. It is true, however, that the only company I am aware of in the US which markets these supercomputers is Cray.
LedgerSMB: Open source Accounting/ERP
While your comment is largely informative you are still confusing PCI-Express with PCI-X. They are different things. I know that it's inherently confusing, but still...
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
If your goal is to run simulations where each piece of the simulation depend on large subset of the other pieces, then you will need ridiculous interconnect speeds, and you're likely to end up with something you could have bought from Cray or SGI or some of the other remaining supercomputer manufacturers for a fraction of the price.
Luckily for you and the rest of us many problems can be split into relatively independent pieces, in which case a Beowulf cluster or similar is more than adequate.
If you seriously believe that clusters can compete with supercomputers for every type of problem, you need to think again.
...1/2 wrong. Supercomputers, and before they were called that, just whopper mainframes, used to do "all of the above" computing tasks because that's all there was. Now, stand alone PCs and clusters of them can probably do at least 1/2 the jobs out there, and maybe even a higher percentage. So, of course they (Cray and the others) are concerned for their market, losing half your market has to hurt, and daily the ways you can use smallish commodity hardware computers increases, both in complexity of job and in numbers of jobs. It's also a function of cost. If I can mangle some negatives here, PCs are not getting worse, nor more expensive, nor able to handle less tasks of less complexity. If that was true, so called "super" conmputers would not be in any threat, but they are, because it's happening. PCs and clusters are getting much mo bettah, at a fantastic rate. There is not a 100% replacement for "super" computers yet, but just in the last ten years I would bet a LOT of jobs previously only possible on "super" computers are now being handled easier/better/cheaper on "normal" PCs and clusters of them, and that trend will only continue, and there's only one place that market can come from, and that is the mainframe-stand alone "super" comnputer of yesteryear.
But to do this, wouldn't Linux first have to be a "world-class" operating system? And we all know that there isn't much chance of that ever happening, right?
Why you will always be wrong. Basically, clusters make use of existing networking technologies (higher overhead) and supercomputers (SIMD, MIMD, whatever) are designed to make that overhead as low as possible.
Thus, by definition, for HPC applications that can't be parallelized enough to overcome the communications overhead, super computers will always beat out clusters and have a place.
In the future, I would want to not be isolated from my friends in the Space Station.
Does this mean I can run linux on my Cray?
Hmmm...
1. Cray is definitely pro-linux. It's what their XD1 runs. Though not their bigger computers.
2. There are some problems for which that a cluster can not even come close to achieving the performance of a supercomputer. For a lot of problems yes, for some maybe if you spend a fortune on fancy interconnects, and for some no.
3. If you're commercially building clusters let me know company it is. I'm in the market for a 128CPU cluster and I want to know who not to buy from.
http://www.sec.gov/Archives/edgar/data/949158/000
Here they discuss the limitations of clusters and vector-based supercomputing.
Basically, they offer three types of supercomputers aimed at different markets: vector, massively parallel, and multithreaded. Not really sure why multithreaded means in this context (Microkernel capable of threading itself across many processors i.e. UNICOS/mk?) but they do a decent job of explaining the whole thing:
LedgerSMB: Open source Accounting/ERP
Unless I'm now out of date, the last figures I saw said the CrayLink Interconnect can do 102 GB/sec. That's Just a tad bit more, don't you think? No messing with masses of gig ethernet to crossconnect them. It's just done.
From Cray (From XD1 page):
"A 96 GB per second, nonblocking, crossbar switching fabric in each chassis provides four 2 GB per second links to each two-way SMP and twenty-four 2 GB per second interchassis links."
-So for a dual-opteron XD1 processor unit, there is 8GB total bandwidth available.
Total aggregate PCI bandwidths (Accepted standards):
PCI32 33MHz = 133MB/s
PCI32 66MHz = 266MB/s
PCI64 33MHz = 266MB/s
PCI64 66MHz = 533MB/s
PCI-X 133MHz = 1066MB/s
PCI Express = 200MB/s (Per slot)
PCI Express x16 = 3000MB/s (Usable bandwidth)
-So for PCI Express x16 we're talking 3GB/second
SMP Opteron with two PCI Express x16 slots can do 6GB/second aggregate bandwidth. A couple of Infiniband links can easily saturate that. I'm sure this all costs quite a bit less than Cray's propriatary stuff.
My Other Computer Is A Data General Nova III.
SGI may have something to say about those ideas.
Free Mac Mini Yeah, it's
In a way he's right. Reading the whole article, it seems apparent that he's talking about certain high performance applications. Clusters are not always the best way to solve a problem. For problems that can broken down into small independent tasks like SETI, clusters are a good solution. Clusters do have their optimization challenges with latency, bottlenecks, etc. For simulations where the tasks are dependent on each, these bottlenecks add up. The individual nodes spend as much time communicating with each other as they do computing. There are also problems that cannot be distributed. In these cases clusters are not the right solution and it may not be cost effective to use a cluster.
Well, there's spam egg sausage and spam, that's not got much spam in it.
I suggest you read up on supercomputers and mainframes before you embaress yourself further.
Blar.
While your statements may be true and valid I don't think that I would want to base a business off selling a machine that costs seven figures and would only be marketable to that remaining 0.1% and expect to be around in a few years. I think that the flexability of the PC in clusters is good enough for most applications and they can throw student labor and people with phd's at the rest on the software end. Just like the PC market itself windows is good enough to do the job but nobody would say that it is perfect.
Got hosting
don't forget the SGI Altix range - scaling to 256 processors in a single image
and there was the recent PR about the contract with Nasa for a 10,240 processor server (though that's not 10,240 in a single image...)
"we demand rigidly defined areas of doubt and uncertainty!"
Bell?
sorry....
Not quite true. First off, you get much higher bandwidth between processors using proprietary (NUMA) based interconnects than you can with commodity hardware. Why? Because you can optimize for your situation. Second you can exploit things like cache-coherency between processors (even if they're in different "nodes") and therefore true shared memory. So, a 1024 processor SGI Altrix, or a 256 processor Cray is one computer as far as the OS and user-land stuff is concerned.
There's another advantage Cray has on the SV and X series and that's a vector unit on the processor. That allows you to conduct operations on arrays of numbers at once instead of having to cycle through the numbers in a loop. For example, the dot_product between two small arrays might be accomplished with one or two instructions, as opposed to a loop. Apple's AltiVec is also a vector unit.
If you took money out of the picture it would be easier to deal with a big-honkin' super computer like an SGI or Cray rather than a cluster. One computer is easier to manage and you could always use threads and plain old heap memory (which is much faster than message passing over a network).
Add money back in and 500,000 goes a lot farther in raw compute power when you're buying racks of DELLs and infiniband interconnects. However, depending on the application, you may be faster, slower, or even dog-slow compared to the cray. If you need the answer today, and the $ is not a factor, go to Cray or SGI with a blank check. If you have to balance cost and time, then a cluster might be better.
Essentially, it boils down to how much communication you do between nodes. Cray does it orders of magnitude faster than off-the-shelf stuff. If you hardly ever pass messages between nodes, clusters are fast. If you have to pass a lot of messages between nodes, one big computer will trounce lots of little ones.
Leave the gun, take the cannoli -- Clemenza, The Godfather
Of course the cost of this kind of networking technology does eat up quite a lot of the cheapness factor. In many clusters the interconnect costs more than, sometimes several times more than, the processors,memory, etc.
Both clusters and big iron have their place. I am a meteorology professor and my current research involves high-resolution numerical modeling of thunderstorms. For a problem where the domain decomposition is straightforward and internode communication isn't your bottleneck, clusters are great. One huge advantage of clusters is that they are cheap and it isn't too big of a deal to get a grant together to buy the hardware, and it's YOURS and nobody else's. A huge disadvantage to big iron is that you have to share it with about a hundred other researchers. Waiting in a queue for three days only to find you goofed up in your startup script (and the model exits immediately) is NO FUN (cf the Regatta at NCSA).
I am currently running a model using legacy FORTRAN 90 code which was written before there were clusters. It does use OMP but OMP sucks and is no substitute for code which is written with MPI in mind. The model as it currently stands requires big iron to do big runs, and it is inefficient, but it works and sometimes I just need to do science and not model development. I am working on MPI-izing the code; no small feat, but the rewards would be quite worth the effort.
In summary, both clusters and big iron have their place. Folks have a habit of making a false dichotomy with regards to these two options. I wouldn't trade my cluster for the world (currently doing parallel POV-Ray rendering of my 3D thunderstorm data, see my web link and an upcoming [not sure what month] Linux Journal article if interested) as it is perfect for much of what I am doing right now and I don't have to share it with anyone. But I will also use big iron when necessary.
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
does anyone stop to think about what that 200 PC cluster costs in power? quite a bit i think ...
There is a very fast NUMA (non-uniform memory access) interconnect in each case (about the size of a washing machine). So you can access memory on another board only slightly slower than on your local memory.
You can have up to 4 processors per board. Then you can connect together multiple washing machines with (I think) Infiniband.
You still want to access local memory if you can as that gives lowest latency. Work is going on in the kernel to better support this kind of architecture. Linux (or at least open source) is really important to these machines because you do need to be able to modify the kernels.
A bunch of PCs work real well when your problem can be partitioned. What kills you is high levels of synchronisation activity - whether signalling or updating because that's when latency kills you. For some apps you may have as much or even more compute horsepower in your PCs than the supercomputer but it spends all its time twiddling thumbs.
So for many hard applications these machines really are the bee's rollerskates.
Squirrel!
Current development looked to be 512 CPU's before too terribly long. I like the system, but it's EXPENSIVE when you think about the fact that you're still buying x86 architecture - $6 - $8K per CPU.
Every Cray is sa - cred.
Every Cray is greaaat...
If a Cray is was-ted,
Paul gets quite i-raaate!
(Thank you, thank you very much.)
Why can't all fpga/microcontroller manufacturers just release free optimizing compilers???
The point is, though, that the Cray supercomputers are vector supercomputers, whereas Linux clusters and other similar machines are not. Currently it seems to my that most clusters are very remeniscant of computers of old days where you run programs in batch mode. Cray is pointing out that running stuff in batch mode, even massively parallel, often cannot match the flexibility of the cray vector system.
on a side note, have you worked much with myrinet? personally, i find it to be the most buggy thing i've ever seen. their hardware seems to fail more than an antique chevette. i just wonder if anybody else has a similar experience with myricom's products, since at this moment i doubt i'll ever again invest in their hardware.
--- d'oh
No! What he is trying to explain here is that a Linux cluster using "standard" hardware, (eg x86 based), suffers from the usual PCI related bottlenecks that standard hardware has. Therefore it cannot be as efficient as a system specifically designed for supercomputing which has no PCI bottlenecks.
If you read the article you would note that Cray is promoting their new LINUX BASED SUPERCOMPUTER....
My hyperlinks aren't worth the paper they're printed on.
In linear algebra, there are often many algorithms that could be used to solve a problem, but the obvious algorithms require many more calculations than the clever algorithms. For instance, you don't solve A*x=b by calculating inv(A)*b.
Just as it would be embarrassing for the mathematician to recommend calculating inv(A) for a one-off solution of A*x=b, it would be embarrassing for a computer scientist to recommend a freon cooled million dollar supercomputer when, with a slight optimization of the algorithm, the solution could be calculated
with a cheap cluster of PCs interconnected with 100baseT.
I'm not a subject matter expert but it seems like the Cray is a M/m/X (X>=8) system while Linux clusters are multiple M/m/x (x=4) systems.
It seems to me that the mathemetical limitation of how much workload a Cray can handle is a lot worse then a Linux cluster.
Can it be that the price/performance issue that he is talking about is just for specific applications?
Finally, a machine capable of running Doom 3!
-- "To ask a question is to show ignorance; Not to ask a question means you'll remain ignorant."
Well, to be fair, if you use sendfile correctly you ARE talking directly to the NIC. If you're using UDP over IP over 1000/baseT, you can drive a NIC pretty damn fast. Start bonding those puppies and you can approach system bus speeds.
The real problem is NOT speed, it's the efficiency of scheduling, and Linux running on a Cray is likely to have similar problems there as Linux on a cluster will.
What's nice about both is that Linux is so ultra-generic that you can mix your highly specialized app with the hundreds of generic parts that you get with the OS.
It was poor reading skills rather than confusion. I simply read your post, incorrectly, as referring to PCI-X. Mea culpa.
I work a lot with it, like ~3000 customers, almost half of them are industry (non academic or gvt).
You found bugs ? Care to share them ? Hardware failed ? Did you get it replaced ?
Can you give me the tech support ticket numbers so I can see if your complaints are reasonable (and have been addresses) or are just plain FUD ?
There was a guy about 90 miles away, offering one on ebay for $7000, it never sells (he tries every 12 months or so). If I suddenly landed a job for $60k a year, I'd almost certainly buy it from him. Rent a Uhaul or something, go pick it up. I've heard of universities practically junking them.
Yes, Cray's are one of my saved ebay searches...
I don't see that your rant says anything different, other than you're giving more emphasis to problems that are more parallelisable, and he's giving more emphasis to the ones that aren't.
Oh, and you're implying Cray's product's vaporware, and he's implying clusters are less reliable, so I'll grant you both one FUD point. Happy?
Hmmm. I would say also that first byte latency is also very important in a lot of/most workloads. Clusters can mask this on some workloads through parallelization. Introduce interdependencies and it loses some of it's advantages. I know I will get flamed for this but I think Sun understands this quite well hence the philosophy behind throughput computing and their next gen core designs (Niagara, Niagara2, Rock)
It depends what your TCO estimates for an installation are. Typically, your buying costs aren't the signifcant part of fielding a solution. You need to look at how much it is going to cost to run (power/heat/real estate/maintenance) and how it costs you when it ain't running. Also, just throwing this out there as I don't have any specs to hand, does anyone know if commodity hardware is accurate enough (i.e IEEE FP precision etc.) to be used in all cases a 'super computer' (sic) is used?.
You wanna play house?
Who's you rather be.. the mommy or the daddy?
imagine a beowolf cluster of these...
"Software is like sex... it's better when it's free"
Flamebait? That's what you get for posting something that requires a sense of humor!
Read carefully. I did not say it was vaporware. I saw a real system. What I said was, trying to keep up with the commodity curve is battle that has been lost by many an HPC vendor. The key is time to market. By the time you get your latest and greatest to market, the commodity market has passed you by.
HPC for Primates. Read Cluster Monkey
I would. That 0.1% has little choice but to pay big for these computers. If you are the only one making them, it gets even better. Cray has been selling to a niche market for decades.
'SBEMAIL!' is better than a goat!!
You and him, you're saying the same thing, you're spinning it your own way, but the actual content is the same. So why are you describing his as FUD?
An independent research team found it was more cost-effective to pwn a super computer, than a cluster computer.
Supercomputers already do that - this is sometimes referred to a "multi-ported memory".
Frequently (Cray Y series), each processor has up to 4 memory busses-one for the instruction stream, two for data input, and one for data output. Note - this does NOT mention the one or more ports also used for the base I/O.
The big issue that is external to "just a DIMM" is synchronization between processors - cache coherency and such. Using "just a DIMM" slot won't cut it - the memory emulation must also be able to request cach flush/invalidate when updates are done.
IT: Slashdot points out obvious sales pitches
In the news today, Slashdot points out that owners of a company will say that their product is the best. Holy schnikes, could you believe this outrage!
It will always be the right tool for the job. If a company wants many machines and has a service agreement with someone to monitor, replace (or not), etc., then that works for them. If a company wants all that in one power draw in a large box, they get a supercomputer. There are also many processing models that clusters of smaller machines can not provide solutions to.
That having been said, of course Cray is going to say that smaller boxen clusters = bad.
Click here or here.
Thanks for pointing out the Cray SEC filing. That was very interesting.
Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...
Serious question here (yes, wrong place to ask one of those): "Hypercube formation"? Is this just a cubic lattice where the nodes are relatively densely packed so each can communicate with several others over not-too-far distances? What makes it "hyper"? Someone with an expertise in communications/computing theory help me out here!
A Cray XD1 is quite different from an SGI Altix.
Yes you are right,
We have on our linux cluster right now 36 Serial jobs (one machine)
and 75 parallel (more than one) in the queue. This is on ethernet.
But what about infiniband and myirnet?
http://www.myri.com/
Both of these right now plug into 64bit pci and keep the cpus full up to 80% vs gigabit ethernet doing only like 50%. So the statement that PCI is what is slowing down clusters is false as far as I have seen in my work here.
A nasty thing called physics says that latency will always be higher when moving messages over the cable than over the machines internal bus because the cable is longer.
That said, isn't the obvious solution to this problem a "smart" clustering software that puts the processes that exchange the most messages with each other into the same computer ? A bit like NUMA, but replace "memory" with "message".
Of course, if someone absolutely must write code that passes around a zillion messages, then it's going to be slow no matter what... So our smart clustering software should be really smart and arrange the threads so that a single machine contains threads that are likely to block on messaging at different times so you can run one as the other waits.
And of course, if you get enough bandwith and low enough latency, you can treat a cluster as a big NUMA machine (syncing shared memory areas over wire as neccessary).
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Many posts have pointed out the true fact that supercomputers are better for certain jobs that are not suited to clustered solutions (and visa versa).
Most slashdotters are technical enough to realise this...but...we are not the target audience of the original article. Such articles are meant for high level executives and relatively non-specialist managers who don't always hear all sides of the story. Every day these people are seeing articles and news blurbs stating how the latest linux cluster is as good or better than a supercomputer, and gee isn't that swell! While such press is good, and important, not everyone hearing that implicitly understands that such reports only apply to SOME applications.
So what the original article is, is a message from one executive to other executives trying to clarify the situation. Basically saying "hey, just because Wired ran a story that says linux clusters are the next best thing since sliced bread, doesn't mean that this is the best solution for you. Now, let us talk about what you need."
I see nothing wrong with this. I read the article, and found nothing in it that was false.
It is good because sometimes an exec will listen to a fellow exec when they won't listed to the advice of their own techs because of something said exec read in Scientific American.
Welcome to corporate america boys and girls.
(Disclaimer: Wired and American Scientific were random examples. I know of know articles in either publication about linux clusters. Both are fine publications.)
Did you buy a Neuros today?
Thats true there arent a lot of choices but IBM and Sun are also making big iron as well. Cray is not the only way to go nowdays. Even if you land an order for one machine the only income as a company you will have is a support contract. How long are you going to have a company if you only sell 2 units or less a year? Don't get me wrong I would love for the kind of research that these machines excel in to pick up in the US but it doesent seem to be happening. Worse than that I would hate to see Cray computing to become another government subsidy!
Got hosting
A train does not have the same price-performance ratio as a ship.
But you don't use a ship when a train would do, and vice versa.
As the parent poster said: clusters and supercomputers are not the same.
>Given the difference in rate-of-evolution in the two camps, it can't be long before PC clusters, probably running Linux / with PVM or BSP (that's bulk-synchronous parallel rather than 3D graphics :-) are perfectly capable of doing what supercomputers do today. Of course, there'll be new really-super computers then, but that's a different story :-)
This is not true. The issue being addressed here is the data transfer rate between nodes. Yes, PCI and other technologies advance, but there will always be a data passing technique that is more expensive but far faster than we can put in the home pc. For someone architecting the home pc, these technologies are out of reach. For someone architecting a supercomputer they are not only within reach but their benefits far outstrip the cost.
In other words, the data transfer abilities of a more expensive component decrease the dollars-per-mips number. The same goes for storage or any other component.
Like it or not, there will always be a supercomputer whose performance far outstrips what can be put on a desktop. When PC's can perform the tasks we use supercomputers for now, someone will have invented a new problem that requires the new supercomputer. It's inevitable.
C) getting >30fps in Doom III
In fact, Cray's XD1 "supercomputer" uses the same Infiniband interconnect as some "clusters"; Cray just has better NICs.
No, Cray doesn't have "better NICs." In fact it doesn't really have "NICs" at all, not in the sense that we think of them. Your typical Infiniband card hangs off a PCI bus. PCI bus = major bottleneck, especially when you're talking a couple dozen Infiniband connections.
The XD1 is cool because the Infiniband is right on the hypertransport bus of the Opteron CPUs. It's damn fast.
In a hypercube computer architecture, your put a node at each vertex, and a communication channel to each of the n adjacent vertices. That way, you don't need a huge number of communication channels per processor, i.e. log2(number of processors), at a cost of sometimes having to pass data N hops.
There are other popular architectures out there. Simple 2-D grids match a lot of applications, and require 4 comm channels per processor no matter how many processors you have. The old Transputers were built specifically for this. A minor extension to the 2-D grid is the torus, which you make by connecting the top and bottom of your grid together and the left and right ends together. (It basically doesn't cost any extra, since you had the spare channels at the processors at the edge, plus you get to say "ooohhh, donuts!"). And there are a bunch of applications with dense clusters of processors (for instance, N-way shared-memory nodes) with the clusters connected in hypercubes. Butterfly networks are another shape that was popular for a while - they look sort of FFT-like, and they basically keep the log-n number of communication channels while reducing the bottlenecks.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I have ran a decent sized cluster using Myrinet. I did have many problems with hardware and software failure. However, Myrinet was good at handling the problems in a resonable time. With that in mind I would never count on Myrinet in a mission critical system. However, for a computational cluster it served it's purpose.
There are some blade servers that use low-power CPUs like Transmeta to get the tradeoff of more MIPS per watt. E.g. 50 watts per processor gives you 10KW, as opposed to 300 watts-> 60KW. At 10 cents per KWH, a 10KW cluster is about $1/hour, which is cheaper than the grad student you've got managing the thing. (In practice, you often need to double or triple the power costs, because you also need cooling to get rid of all the heat from the CPUs.)
Obviously a supercomputer is a bit different, because you don't need all the disk drives, but CPU and RAM are using an increasing amount of power compared to disk drives. (So does high-end video, which obviously you don't need unless you're playing games like using the video processor for number-crunching instead of the main CPU.) But the power problems are still just as annoying. If you're doing anything custom-built for supercomputing, you'd obviously build boards with multiple CPUs and faster interconnects and skip all or most of the disk drive stuff, so that lets you fit more CPU per 1U or 3-4U of rack space. And you might build a system with lots of DSPs instead of general-purpose CPUs, which would probably get you more MIPS per watt.
Database supercomputers, on the other hand, look surprisingly like blade servers. The old Teradata machines had something like 488 CPU+disk units connected by a fancy back-end switched network, plus a front-end set of CPUs for managing work and communicating to the outside, with algorithms designed to split up queries intelligently across the processors. And of course there were the same kinds of arguments about database machine clusters vs. big iron mainframes vs. loosely-coupled clusters.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I have a friend who was one of the Cray 'Evangelists' at Apple and he told me this story: When they had trouble with injection molding of the case components they were able to run a Finite Elelment model on the Cray and reduce the number of reject cases. This use alone justified the cost of the Cray for a full year. I'm sure it had many other uses as well.
Back in the mid-80s, my department had a huge VAX 780 with 4 MB of RAM (16KB chips, I think), and we were working on a network simulation system that needed 12-14 MB RAM to run. I spent a while playing with different versions of 4.1BSD and Unix System VR2, but fundamentally the machine spent all its time swapping data in and out of disk, and the main performance with was helping the physics jocks who wrote the application get better algorithms and better localization and good checkpointing because the computer didn't always stay running for the full week it took to finish a simulation run. A year or two later, we got the budget to buy another 4MB of RAM (in 64KB chips, about $50K IIRC), which helped a bit, and a year or two after that, we got enough budget to buy another 8MB of RAM (maybe 256KB chips? not sure. Also about $50K), and suddenly the application could complete in under an hour instead of a week, because RAM really is a couple orders of magnitude faster than disk drives with a couple more orders of magnitude less latency, so our problem changed from being disk-bound to being CPU-bound.
That speedup not only improved the utilization of the equipment, it made a qualitative difference in the kinds of problems we could address because of the way we could interact with it. That's why people buy supercomputers if they need them - it really can be orders of magnitude faster for some problems. The first year or so, we really had all the RAM that could fit in the double-refrigerator-sized VAX cabinet. Once the denser RAM chips became available, we probably should have spent a bit more manager time beating up on the accounting department, because an extra $50K for hardware could have more than doubled the efficiency of 3-4 physicists, but of course the accounting droids don't think in terms of efficient use of physicists unless it lets you buy half as many of them, which was _not_ the objective here...
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Sure you've got enough money for a Cray! Cray J932SE supercomputer (dual IOS, 3 cabinet) for $4500, not including disk drives.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Back in the 80s I was doing a telecomm project for a large research lab that had a number of Cray supercomputers on one side of campus, and their campus backbone was a 30 Mbps baseband cable system feeding a bunch of 10 Mbps Ethernets, and a few of their buildings were starting to get brand-new 100 Mbps FDDI. They were getting very worried about what would happen if too many people _did_ imagine what they could do with a Cray, and wanted to do it from the other side of campus... Fortunately, the number of people who had access to the Cray was small enough that a variant on "sneakernet" worked fine - not using the sneakers to carry floppy disks around, but using the sneakers to carry the users to the building where the Cray lived :-)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
... I do SO know. Supercomputers are mainframes that have radioactive spiders building webs inside them, and the cooties slop over and give them SUPAH POWAHZ.
neener neener
What he's talking about here is more than just a commodity system with a low latency interconnect, it's other Really Useful Stuff(tm) like:
Hardware Checkpointing Your job is happily running on a CPU when it dies. The system notices, takes the last hardware checkpoint and restarts the job on a different one. If you're running a 3 month MPI job (like some of our users do) and it dies due to a HW failure 2.5 months into its run it's painful - even if you're doing software checkpointing having to requeue your job on a busy cluster can be a significant penalty. Processing time lost in MPI Barriers MPI programs makes extensive use of barriers to make sure that all processes are synchronised in their work. The problem is that if some other process (cron, sync, gm_mapper for Myrinet GM2) comes along and steals some CPU time from your process on one node then the other processes in your MPI job have to wait for it to catch up. Cray have synchronised the Linux scheduler across nodes so that all nodes take time out for non-MPI tasks at the same point, so when an MPI barrier is set up all tasks should be pretty much at the same point.Note that I'm not dissing Linux clusters (I run one that's on the Top500 and I love it), but if you can get this sort of functionality without paying a massive premium it's well worth considering.
I'll have to dig through our ticketing system and my desk to find the ticket numbers, so you may have to wait until Monday. None of them were 'bugs', simply hardware failing. Both myrinet cards in computational nodes, and switch cards. Hardware did get replaced, and I'm about to call in for a replacement of 10+ cards plus probably half of the cards in the switch. For a two year old cluster to have 20% of the myrinet hardware replaced, I find that not acceptable.
--- d'oh
I seriously doubt that Cray can put faster circuits down on silicon that Intel. part of the nature of the silicon foundry is that stuff doesn't start getting good and fast unless you make A LOT of it. It also gets cheap at this point.
I see no architectural difference between a "cluster" and a "supercomputer". The links between different CPUs are just conventionally made using different technology.
There's a lot of rubbishing of PCI (hey it's 10 years old now, and there are MUCH faster new versions happening), and what is the point of saying unquantified/unsubstantiated crap like "CRAYS HAVE VERY FAST SHARED MEMORY BUS".
Yeah - HOW FAST THEN? I'd be surprised if they are 128 bit running at 2 GHz.
Shared memory can mean one of a number of things, also:
You can have one CPU sharing say a 4 meg block with each of 25 other CPUs. The first CPU acts as the hub for communication between the other CPUs.
You could have 27 CPUS in a 3 x 3 x 3 cube, each CPU sharing memory with up to 6 neighbours.
You could have 5 processors in a line with each one sharing memory with (up to 2) neighbours.
Or you could have a bunch of core memory that 4 processors share (they might have their own memory too).
The same thing goes for a cluster - you could have PCs with up to 6 network cards (or with unidirectional custom ethernet protocol, even 12 network cards linking to neighbours in a 27 CPU cube, and so on.
The topology will affect how the program is written for maximum speed, but also which tasks the computer is suited for. I think you could make very very fast links between ordinary PCs with say full duplex gigabit running a custom protocol (TCP has latency by the way, UDP has none since it doesn't wait to assemble packets in buffers in the kernel).
It's hard to imagine a task that is so i/o bound (in my mind this is the opposite of embarrasingly parallel problems) as to require more than 100 megabytes/second between each node, when each CPU node has a memory bandwidth of 12 gigabytes per second (based on 32 bit core of Pentium 4 at 3 GHz, assuming roughly 1 transfer per clock cycle, which in itself is unlikely).
In other words, a cluster using off the shelf gigabit ethernet hardware could transfer 1% as much data as the CPU could do with RAM.
Note if the CPU is in a 27 CPU cube the combined 6 gigabit ether cards would be transferring 6% as much as the CPU could. I guess it is possible to get motherboards with larger numbers of PCI slots, say 12 in which case you could run two streams of gigabit ethernet between each CPU giving you 12% as much data being transferred over ethernet as the CPU can transfer in and out of memory (not including cache flushing from CPU to RAM).
Once again, what problems require such a huge amount of communication with other nodes that say 12% as much bandwidth between nodes versus CPU-memory is not sufficient?
Say 12% isn't high enough: what CPUs, data bus widths, and shared memory speeds are used then?
Arguments people have made so far are so light on detail, and using terms like "much faster" instead of giving a figure, it sounds like FUD.
Remember parallel links between devices on chips can exhibit data skew, lowering data rate compared with a fast serial link. In fact there is talk (and I personally suggested a long time on a newsgroup) using light to get signals from one chip to another. (probably mainly serial, but not necessarily exclusively).
I am detecting a slight conflict of interest here.
The point is, though, that the Cray supercomputers are vector supercomputers, whereas Linux clusters and other similar machines are not.
The article is about the Cray XD1, which is not a vector system. In fact, the XD1 is remarkably similar to an Opteron/Infiniband/Linux cluster...
It is called a hypercube because it is a mapping of a four-dimensional (and higher) cube into three-dimensional space.
It has nothing to do with communications or computing, but with topology.
Right, right. But if it's a mapping (1-1, as it seems) then there's an isomorphism, and the need to do anything in hyperspace isn't there. Just call it an arrangement with a certain number of lines from each node and be done with it. If it can be done in meat space there's no need to hyper- it up.
Until PC's have as wide busses, it probably
won't matter how well the chips do.. Cray's probably require less maintenance than a PC cluster.. Its like saying a bunch of fast motorcycles are better than a stock-car. Sure, but if you get in a wreck, I'd rather be in the stock car..
Just say no to license servers!!
did I hear right cray's are using a type of linux kernal....
When you've got you install disks you know you haven't been screwed
I think that this configuration gives the best proce performance. On one end of the spectrum you have a completely interconnected mesh, on the other hand you have all systems on a single bus, in between you have a whole lot of possible topographies.
In the hypercube system, you need in N dimensions N communication channels per processor, and the maximum distance that any packet has two travel is N hops.
In a complete interconnected mesh with P processors, you need P communication channels per processor. While your maximum hop is in that case only 1, you need (P - 1)*P communication channels, which is quadratic.