Cray CTO Says Cray Computers Are Great
Jan Stafford writes "Linux clusters can not offer the same price-performance as supercomputers, according to Paul Terry, chief technology officer of Burnaby, British Columbia-based Cray Canada. In this interview, Terry explains that assertion and describes Cray's new Linux-based XD1 system, which will be priced competitively with other types of high-end Linux clusters."
I wonder how Cray computers are in milk...
A feeling of having made the same mistake before: Deja Foobar
no nevermind.
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Given the difference in rate-of-evolution in the two camps, it can't be long before PC clusters, probably running Linux / with PVM or BSP (that's bulk-synchronous parallel rather than 3D graphics
It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...
I notice he hasn't quoted the data-transfer rate on these new super-duper chips. The whole article does rather look like a piece of advertising on the cheap, speaking of which, the cluster solution is (relatively) CHEAP. Did I mention that ITS CHEAP...
Simon.
Physicists get Hadrons!
The CTO from Cray said Crays are great machines and are priced competitively!
Next you'll tell me the CEO of SCO thinks the lawsuit is completely valid and fair!
Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
a Linux cluster of Cray's?
Read the only personal Runyon page out there.
The thing is makers of big supercomputers are scared of clustering technology. Look at google. A large cluster, and if one of the machines dies, you don't worry about it. Every once in a while you go and replace those that died. If only a small portion die, you haven't seriously impacted your production. However, if your supercomputer goes down... well, your screwed. 1000 machines are more reliable then 1 big machine.
Is MS somehow involved? Who am I supposed to hate? Editors?
No, no, you misunderstand.
He's saying that linux-based *supercomputers* are faster then linux-based *clusters*.
(although, you can probably cluster those supercomputers...)
Most /.'ers are probably dying to mod this whole article down to troll.
I guess they can take their anger out on this anonymous post instead.
No, he's saying you should buy their Linux-based supercomputer instead of a Linux cluster. If you don't RTFA, at least skim the summary.
<jedi> There is something funny here. You laugh. </jedi>
The difference is that linux clusters aren't really designed for supercomputing... more of distributed computing. Cray specializes in it. Of course they're going to come out on top....
Bear found defacating in a wooded region.
Pope reveals his membership of the Catholic Church.
You really shouldnt place commentary on a story title, unless it's an "its funny, laugh" one.
Oh, by the way, everyone who has a slashdot account should go to their preferences and set the "light" layout. You wont suffer with the bad color schemes anymore, and the results are more printer-friendly too.
And it is, too.
Uhh, no, he's not dissing Linux at all. He's saying that one big supercomputer (running Linux, perhaps) will get you more price-performance (bang per buck, I guess) than a Linux cluster.
If it weren't for fog, the world would run at a really crappy framerate.
I bet the cray can maintain higher fps in doom 3 than a cluster ever could.
Dude, the makers of "big supercomputers" invented clustering. I don't think they're afraid of it.
There are tasks that a cluster of Linux shitboxen will do well, and tasks where the cluster will not hold up so well against a real supercomputer. Google is an example of a perfect application for networked Linux servers. If you're simulating cloud physics one molecule at a time, though, you are a lot better off using the right tool for the job instead of 1,024 wrong ones.
Cretin - a powerful and flexible CD reencoder
...Your square boxes will never look as sexy as our 'Love Seat'
AT&ROFLMAO
Compare this article with http://it.slashdot.org/article.pl?sid=04/04/13/145 2255&tid=126&tid=106
FUD = Fear, Uncertainty, Doubt. Provide examples in his statements of any of those three?
P.S. You are so l33t for using TT.
There are some limitations to clusters that "supercomputers" don't have. Even if your network were exactly as fast as the internal bus of one of the Cray supercomputers (which I highly doubt it is), you still have a logical layer on top of it (TCP/IP/UDP etc). This slows it down.
For some applications, a cluster of slow PCs is ok. Bu if you want to do real time-intensive computation, you really can't beat a good internal bus.
Moderation: Put your hand inside the puppet head!
He's completely right, just not in the way he intended. You'd have a hard time making the cluster as expensive as the supercomputer....
Yeah, no wonder this post looked familiar. Yup, it's a dupe, folks.
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it." -- Linus Torvalds
However it spawned a popular story about how "Cray designs on Apple and Apple designs on Cray" (see link.)
And now for the REST of the story:
Did you know that Macintoshes are designed on PCs!? That's right--PCs running WINDOWS. You see, nobody makes software to burn eproms or design printed circuit boards that runs on MacOS, so the hardware group has a bunch of Windows PCs!.
So now you know the *rest* of the story!
Best Buy can have you arrested
The latency on Ethernet is too high for many tightly coupled applications (lattice QCD for example). This is why people who need better networking use something like Myrinet. I would assume that these Cray machines have very high band-width, low-latency communications. This is where super-computers distinguish themselves from clusters.
I saw this MST3k blooper once where Tom called out "Cray" instead of "Crow". Still in character, And with false modesty, Crow replied with "Well that's very nice of you, Tom. I'm really more of a PC though."
(Not a verbatim quote.)
"Derp de derp."
Not to nitpick but a Viola is a string instrument in the violin family, the word you want is voilà.
You could look to SGI. Their Altix range is up to 1024 Itanium 2 processors in a single supercomputer, and they are putting 20 512 * processor nodes together in a cluster of linux supercomputers for NASA while also working on doubling up the maximum single machine cpu count to 2048.
Never underestimate the dark side of the Source
It's not just the speed of the data transfer, it's also the latency of the interconnect. A lot of scientific codes will pass around a lot of little messages, and GigE is fast for bulk transfer, but it's not so good for that. That's why there are companies like Quadrics, Myricom, etc... Infiniband should fix this, but you'll want a big infiniband switch.
His point is building fast machines is hard, and the fastest machines are really hard. Too many folks think all you have to do is throw enough PCs and GigE nics at the problem. You can build a machine that way, but the codes don't scale well. Some scientific code will quickly show negative scaling in fact (where the more processes you add, the *slower* you code will run.) MPI codes do that all the time, which is one of the reasons you'll see people running their code at sizes smaller than the whole machine, and different sizes on different machines.
Yeah, you can build a Linux based world-class supercomputer as a cluster, but you better be willing to sweat the details is all. Or buy a Cray, I guess. ;-)
In other news, Bill Gates says Windows is secure...
There are entire classes of computational problems which are calssed as Embarassingly Parallel.
It means it is so trivial to parallelize the problem and get gains from it (think SETI@Home) that it's a no-brainer.
Other computational problems don't just simply fan out to the bazillions of nodes with tiny independant pieces of data.
Your assertion that the Cray CTO is talking FUD when he uses the actual term is just plain wrong and unfair to him. He actually knows what he's talking about.
Lost at C:>. Found at C.
Scaling or upgrading these systems requires much more than simply ordering more parts; it opens up the whole integration exercise. From an application perspective, clusters limit application scaling. Bandwidth and latency restrictions significantly constrain performance as more processors are applied to a problem.
Has this guy ever heard of Google? I can see his point to an extent; in fact his whole q&a session/blatant advert really boiled down to a single point: If you need to move a lot of data between processors, then a cluster will faire worse than one of Cray's supercomputers which have (obviously) more bandwidth between the CPUs and shared memory. It really does depend on the application, but for him to suggest an HPC is always a more economic, or even better option than a cluster of cheap x86 boxes is demonstrably false...
Code, Hardware, stuff like that.
Are you being funny or serious?
There's an entire branch of parallel application which are labeled "embarrassingly parallel". This description simply means that such programs are trivially parallelized and achieve as close to linear as possible when scaled across many nodes. This is because of the low inter-node communications.
For "embarrassingly parallel" applications, a cluster is a really good tool. For programs that parallelize as nicely a nice big vector or smp will do nicely. Some code will run better on small 20CPU SMP machine than on a 1000 node cluster.
But even then, there are legitimate needs for supercomputers. A traditional PC-based server solution will address probably 99% of all problems. An inexpensive cluster will get you 99.9%. But there's that remaining 0.1%, and that's the target audience for whom Cray and similar companies exist.
The fact that PCs can be used almost unmodified to create supercomputers and high-speed clusters is remarkable, and says tremendously good things about the flexibility and power of the architecture as a whole. But there are just places it can't go, not yet. For example, you know how you never get 99% efficiency with 100 megabit ethernet? You're lucky to get 70% with gigabit, and 50% is a pretty common figure. PCI-X, at least at the speeds we're talking about here, is so rare now that it's hardly cheaper than custom supercomputer-style solutions - effectively because it is a custom supercomputer-style solution. I don't think we'll ever see common systems, even midrange servers, with more than one 16X PCI-X slot.
I really think this is what Cray mean here. Not that Linux-based clusters have no use, but that there is still a significant market for which they are suboptimal. And, in all probability, will always remain suboptimal. However fast PCs get, however popular PCI-X and similar high-speed buses become, supercomputers will just get faster to match... and computational problems will get harder to go along with them. I just don't see the need for supercomputers, at some level, ever going away.
(I hope people find my comment useful in some way. I elected to post it rather than mod down the idiot posting flamebait about Macs in reply to you. And here's hoping people don't interpret this as karma whoring, since usually if you say "This will get modded down" it doesn't. But... oh, hell. I don't even know which Slashdot rule of thumb applies to my post at this point.)
Being the CTO of Cray, can you expect him to say anything less? Now while his points are often valid, I think his conclusion, that supercomputers outshine linux clusters is a little inaccurate. Rather, I think the real conclusion is that linux clusters and supercomputers are both good, but at slightly different things. Which one you need to solve your problem depends ultimately, on the specific details of your problem. Again, though, being the CTO of the company, can really expect him to give a balanced opinion like that, rather than the skewed opinion that his company is always on top?
Cray is a great company, but I really hate that they have to come out with things like this every now and then. Most people in need of a lot of computing power already know the difference between your products and linux clusters and really, they're going to choose whichever's most appropriate for their problem regardless of what your CTO says.
Who cares? This is /.! Nobody reads the article, and I got modded up instantly! All it takes is a few lines of text with a few links in it. Why bother doing any more?
Best Buy can have you arrested
"embarassingly parallel" is just a phrase used to describe certain types of problems. Ok, I'll give you the fact that the phrase is a bit biased, but it's still just a phrase. Clusters work well for problems that can be broken up into smaller chunks that can be independantly solved individually and then combined to produce a final result, but for problems that require signifigant amounts of communication and data transfer between processors, clusters just don't cut it. Crays and other supercomputers use specially designed communication networks between processors and memory and such, and that's why they're so much more useful for those types of applications.
-James
Cray CTO Says Cray Computers Are Great
Actually, I think he said that "Cray computers rock, eh?" or perhaps it was "Cray computers kick ass, eh?" or something like that.
- Leo
You don't use science to show that you're right, you use science to become right.
I don't think the Cray assertion is that crazy.
For a 12 CPU opteron unit the academic pricing (admittedly lower than commercial but where most of their sales will go) is about 45K. That's not too shabby. Before you bounce up and down and say I can build four times the cluster for that price, it should be noted that the XD1 gives you a single systems image, which simplifies programming and makes shared memory applications (increasingly important for areas such as bioinformatics).
We have a cluster with dolphinics wulfkit, using distributed shared memory slows us down. It's not the end of the world type slow down but it's a factor. Our cluster is a sixteen node, dual xeon 2.2GHz with wulfkit 3d torus interconnects. It cost us, at academic prices, $50K. Admittedly more CPU power than the 12 Opterons but we find ourselves using distributed shared memory alot, wulfkit is great here, and that would probably be much better on the XD1. Had the XD1 been available a year ago we may have bought one instead.
It really depends on your application. Are Crays cheaper than clusters in terms of harnessable compute power per dollar? Maybe. Depends on your application. Surely that's the correct answer.
Also, buying Cray is about getting access to their software technology too.
R-S
Linus Torvalds claims Linux is "better than Microsoft."
Well, supercomputing can be either of two issues
a) (google-like) jobs well suited to a high degree of parallel processing.
b) complicated problems that can't easily be broken down to make use of a large number of CPUs, but require a lot of operations to be completed in the proper sequence.
On the first, a cluster is a great idea.
On the second, a reaaaaaallly fast CPU is a great idea.
If you're simulating cloud physics one molecule at a time, though, you are a lot better off using the right tool for the job instead of 1,024 wrong ones.
In this case the right tool is a vector based supercomputer like the SV1 (8 vector processors at 2Gflops each . . . MMmmmmmmm). A cluster based approach will waste more processing time with the message passing than anything else. Cheaper maybe, but grosely ineffecent.
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
"Emberassingly parallel" is a term referring to parallel computations involving minimal or no communication across the computing nodes. :)) ;
it is not to be interpreted as "parallelism is emberassing to him"
oh and by the way, what he said is not wrong !!
-ram
> why is parallel "embarassing"? oh I see, facts aren't good for your bottom line.
Actually that is a common expression, and not something he pulled out of his heinie - peek here
I think you misunderstand what he said. The term "embarasingly parallel" has been in common use for many years to describe problems that require so little communication between processors that they can be scaled up more or less indefinitely just by adding more computers. The ultimate examples of "embarassingly parallelizable problems" are things like the human genome project or SETI-at-home, where it's practical to farm it out to completely disconnected computers to do bits of the work in isolation.
No, the inventors of big supercomputers (couple million dollars a pop) are definitely scared of clustering.
If you want a Cray supercomputer, you have to buy it from Cray. If you want a Linux cluster, you can buy it (or build it) from anyone.
I'm sure there are applications for a supercomputer, but I see universities, production studios (Pixar!), and research labs moving toward clusters. The supercomputer companies will do anything it takes to either stop that from happeneing or to gain in that market.
Didn't read the article did you? He's pushing one of their machines running Linux and even talking about when a supercomputer isn't the best solution.
Clusters are really only good when what you're doing is massively parallelizable, like 3D rendering or folding@home types of applications.
/.ers are actually "computer experts" when they're merely crusaders for their favorite OS.
For stuff thats not, algorithms that only work sequentially, nothing beats a crazy-insane-fast CPU, memory and system BUS.
You'd think that'd be a no-brainer to the "computer experts" here at slashdot, but you'd be wrong to assume that
Cray isn't "scared" of linux clusters. They just can't do what a Cray does.
I don't need no instructions to know how to rock!!!!
I think it was Big Gay Al that said they were SUPER.
Actually, I like http://developers.slashdot.org/article.pl?sid=04/0 8/20/1337224&tid=137&tid=163 better.
This sig under construction. Please check back later.
Hi, clueless Slashbot. This is a quick rundown of why your post was stupid, and why Cray supercomputers do, in fact, do some things better than a PC cluster regardless of price.
If you have a supercomputer, you have a very, very, very fast internal bus handling all necessary data transfer. Even with the advent of PCI Express, a cluster of PCs must run in a network model. Therefore, any data crunching that occurs must pass through a network layer, the bus, the physical medium, and back through those limiters once more on the next system. Therefore, if you are doing number cruncing that truly cannot afford the delays caused by the data transfer limitation of a PC cluster, a self-contained supercomputer is far and away the best option, even if it's more expensive.
Therefore, contrary to the idiotic drivel you just spouted, Cray does, in fact, have something to offer that no PC cluster currently can.
We now return you to your regular informed diatribe in the name of the self-gratifying masturbatory stupidity that is Slashdot.
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
Good clusters don't use IP; they use Infiniband, Myrinet, or Quadrics, which all have OS bypass and trasport offload features so that the app can talk directly to the NIC. In fact, Cray's XD1 "supercomputer" uses the same Infiniband interconnect as some "clusters"; Cray just has better NICs.
Have you ever worked with supercomputers?
However, if your supercomputer goes down... well, your screwed
Cray supercomputers have built-in redundancies. All the subsystems are separate from the processors and memory, which are actually "clustered" (depends on model). Even the OS has build-in means to survive the harshest hardware catastrophe by checkpointing the running jobs regularly, to off-site disks.
1000 machines are more reliable then 1 big machine
Wrong again. With 1000 lousy cheap machines, you need an on-site team of technitians to keep the all up. Supercomputers (with built-in redundancy etc.) have equal or less maintenance requirements.
If you had RTA you would have learned that these new Cray systems run Linux as their OS.
http://www.electricstate.com/articles/defuglify-sl ashdot/
Found this a while back, and now have it in my Firefox Toolbar - works great.
- Heritage and resultant architecture: Linux clusters are typically processors are connected through I/O links, whereas supercomputing machines where processors exchange data and instructions through shared memory.
- PCI bottlenecks: This the key argument made - the bottlenecks introduced by PCI communication and the bottlenecks therein. He goes on to say that performance problems in any given such cluster tend to remain with any other such cluster. I agree with that.
- High Availability: He then goes on to talk about the reliability, availability and manageability of the supercomputers against typical clusters. I think there is where the FUD creeps in, along with marketing BS.
In all fairness, he does raise a critical point, however, overall, I think considering the relative ease and popularity of building, administering and growing a cluster these days, I think cost-effectiveness of a single monolithic machine is a moot pointhttp://efil.blogspot.com/
That is, for a Linux cluster to keep up with a supercomputer, the cluster needs faster communications between processors. The bottleneck of going from processor to South Bridge to PCI Bus to Ethernet card, and back again at another processor, is the problem.
So, the answer is to recognize that in a cluster most of the machines don't need video cards. That means Somebody can design a fiber-optic communications card that plugs into the AGP slot (or maybe a PCI Express slot). Then, Cray, look out!
PCI sucks
Learning HOW to think is more important than learning WHAT to think.
So the claim is "Our supercomputer performs better than a PC?"
No kidding.
Even for geeks, this isn't really news.
I don't know the meaning of the word 'don't' - J
Clusters are nice for some problems but message passing and memory copying over a network is not ideal even when you have what *you* think is a lot of bandwidth. Latency and cache coherency and having a single image system can be critical factors in some classes of supercomputing problem, not to mention ease of use and specialized fp vector instructions that are often supported. The topology in large systems is often built (flexibly) into the memory controller hardware, the CPU writes to memory and it finds the right node, page migration and process affinity along with other advanced features like hardware level cache coherency helps these systems outperform clusters with ease given the right problems.
The coolest thing about this IMHO is that Cray are using Linux for their single image systems.
Yep the performance of computers is always on the increase but there will always be demand for more compute, the question is where do you want to be on the performance curve, not the absolute performance. People solve increasingly difficult problems with increasing detail and there looks to be no slowdown. They buy what suits their budget and solve as rigorously as they can for their hardware, and as hardware improves they redefine the types of problem they want to solve.
Yup clusters are cheap and they're on the top 500 but nobody actually buys a supercomputer to run LINPACK. They use them to solve real problems, the list is just for bragging rights.
It depends. If you're simulating cloud physics across a wide range of starting conditions, for example, a cluster will be your best bet.
There are very few situations in which there isn't a single parallizable task, and if there is one, a cluster is probably your best bet.
No matter how kind you are, German children are kinder.
Don't suppose anyone has an old YMP or whatever that they'd be willing to give to a good home in Virginia?
Or for that matter, a warezed copy of Unicos....
I, for one, welcome our new story-duplicating, supercomputer-mocking, Slashdot editor overlords ...
On the other hand, supercomputers are purpose-built to handle HPC applications, which place enormous demands on both processing power and inter-processor communication. Their design includes high performance interconnects that provide high bandwidth, low-latency communications across the entire system, regardless of the number of processors required.
Why can't Linux clusters use the same high performance interconnects? Is it because of cable overhead (length, signal travel, insulation, etc...) or is it because of slow electronic switching? Why can't optical linkage provide the same low-latency interconnect performance as that of supercomputers. Somebody tell me, please. I need to know.
This just in! Company exec. says their products are great!!!
Seriously, this is news?
Ignorance is the root of all evil.
now, perhaps i missed the point, but i can afford the beowulf cluster in my basement. But, i don't think i can afford even a used cray:p
As I've mentioned elsewhere, all you need is a single parallelizable task for a cluster to be worth it. For example, if you need a simulation to be run over a range of starting conditions, a cluster is probably your best bet.
Unless you have a single monolithic entangled run, you don't need a supercomputer - hence, the surging popularity of clusters. Yes, not everything is suited for clusters... but most things are, because most have parallelizable components at least *somewhere* in the process.
No matter how kind you are, German children are kinder.
Networked clusters are useful only when the task is parallelizable, and each subtask is computable on a single node. Cloud physics is not like that. Cracking RC5, for instance, is.
Cretin - a powerful and flexible CD reencoder
While many things that the Cray CTO said are true, I think the issue (obviously) has be skewed some. It really depends on the problem you are solving. Some problems will need to have data shared between all of the the nodes, but others will require that each node only has access to the data that is important to the small part of the problem that it solves. Also, the CTO mentioned that clusters don't scale very well. I don't really know what made him think this, but it seems to me that clusters do scale pretty well. For instance, ILM supposedly uses all of its employee's workstations at night to help do the daily renders. This way all of the cpus sitting on desks don't go unused during off hours.
SIGFAULT
Infiniband uses a variant of IPv6 for addressing, and I believe the protocol is IPv6 based (It's been a few years since I looked at IB).
The only reason we have the rights we have is that people just like us died to gain those rights. -- Cheerio Boy
MS says their operating system is great. McDonald's says their food is great *and* cheap.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
Please reread my post. I mentioned "a range of starting conditions" as being one possible (of many) mitigating case. Do you disagree? If there is a range of starting conditions, each condition's simulation can be assigned to a node.
That's the thing about clustering - you only need *one* parallelizable cpu-intensive task, and a cluster becomes worth it.
No matter how kind you are, German children are kinder.
for a Cray, you insensitive clod.
It's a but depressing to watch everyone jump on Cray here despite having no clue about the key differences between supercomputers and clusters are. All this cheerleading for clusters in various posts here illustrates how thoughtless some of these posts are. Why the heck should you care if someone makes a supercomputer or a cluster. Both clusters and supercomputers lose value fast over time.
Yes clusters are good for some stuff but we should be rooting for Cray if they're creating interesting products that fill a need, and that's exactly what they do.
It is a fact that supercomputers have an architecture that clusters cannot compete with for some classes of problem. Get over it, live with it and enjoy the fact that supercomputers are running Linux too.
It's pretty darned cool that Cray survived until now and that they still have a market for large single image systems.
You really don't have any business making any comments regarding what a "computer expert" should or shouldn't know.
Depending on a particular Cray, the tech may or may not be significantly different than a Beowulf cluster. Let's take NUMA as an example. NUMA started at Cray, was acquired by SGI and then sold to Sun.
In those examples, the "supercomputer" is nothing more than what amounts to a fancy cluster. The interconnects are faster. However, you are still just tying together a bunch of big bricks that look remarkably familiar (IOW, like a fat PC).
Don't automatically assume that what Cray is pushing is qualitatitively different than a Beowulf or an Altix.
A Pirate and a Puritan look the same on a balance sheet.
No, the claim is a supercomputer generally outperforms a cluster. The fact that Linux is running on either/both/neither has no relevancy.
It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...
The main reason for supercomputers to exist is not the high bandwidth, it's the latency of the switch. The network hardware that is used in clusters as the interconnect medium (switch) can provide very high bandwidth, but the latency is high simply because you can not have low latency over large distance, and the network hardware is designed to connect over large distances. Even if you put your nodes in the same rack, the 1000000 gigabit ethernet or whatnot stock solution you use to interconnect them, will still take milliseconds ping time.
The supercomputers run on a custom, specially designed switch instead. This design includes a lot of cost and complexity just to get the latency down. This may not make any difference for your typical web-server application, but that's not what the supercomputers are designed for.
Some scientific computations have very low dependency between parts of the dataset. For example, pretty much any simulation or search application does fine on a cluster. Anything that allows you to split the work into a large number of independent tasks runs fine on a cluster. Some scientific applications do not allow the work to be split into independent pieces. Sometimes you just need random access all over your distributed data space, and for such applications the speed of computation is determined mostly by network latency. This is where you need a supecomputer, and no cheap cluster would help.
...drain cover manufacturer says their product is grate...
Gentoo Linux - another day, another USE flag.
if you can get a big enough cluster that will get the work done faster than the supercomputer and still be cheaper, doesn't this override the inefficiency factor?
"I'm just here to regulate funkiness."
Only in routed Infiniband networks, which no one uses. The normal Infiniband protocol is very lean and totally different from TCP/IP.
then I just assign this filter to URLs
You dork, Crays are running linux. And a cluster is not a supercomputer.
And you've obviously never worked with either.
As for your explanations, this is exaclty the place for it. Tell us how you can magically share memory over a PCI bus as fast as a Cray does.
I don't need no instructions to know how to rock!!!!
Cray makes at least two types of supercomputers according to their SEC forms. These include massively parrallel clusters and vector-based supercomputers. In general massively parallel clusters are less expensive for the number of calculations per sec than the vector-based supercomputers. However, for many applications, the vector-based supercomputers will massively outperform the clusters.
Cray's competitors in the cluster markets include IBM, and their main competitor in the vector-based market is NEC.
I remember reading an article about how the US is losing the supercomputer technology war. But this criticism is best directed at companies other than Cray who are pushing cluster-based solutions to the exclusion of others. It is true, however, that the only company I am aware of in the US which markets these supercomputers is Cray.
LedgerSMB: Open source Accounting/ERP
While your comment is largely informative you are still confusing PCI-Express with PCI-X. They are different things. I know that it's inherently confusing, but still...
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I'm not sure you do either.
A NUMA machine is just a cluster where the wire is in the form of a bus rather than copper or fibre cabling. The communications protocol for the bus may be better optimized for "supercomputing". However, you can do the same thing for a MPP optimized network protocol.
It's all ultimately just wires and protocols.
The total lack of process migration between nodes in a cluster might actually give clusters and edge over some NUMA implementations.
Watching a single process dance around a number of bricks in a Sun 15K can be rather entertaining.
A Pirate and a Puritan look the same on a balance sheet.
Funny claim to make:
The supercomputer would have to be structured internally just like a cluster, with multiple processors overseeing different parts of the calculation, just like any cluster would do.
Also, I don't believe a supercomputer can REALLY beat a really big cluster.
Of course it depends on how big the cluster is and the latency between nodes.
I don't know the meaning of the word 'don't' - J
I saw guys claiming their Dell PCs are better than SGI (as its 600 mhz) nothing would surprise me.
Just a thought... if the original experiment (particle level modelling of cloud physics) was better done on a supercomputer instead of a cluster, why would having to run multiples of that problem (differing starting conditions) run any better on a cluster? Mind, in this case I'm thinking of a cluster of lower powered machines. A cluster of supercomputers would obviously work out. :)
If your goal is to run simulations where each piece of the simulation depend on large subset of the other pieces, then you will need ridiculous interconnect speeds, and you're likely to end up with something you could have bought from Cray or SGI or some of the other remaining supercomputer manufacturers for a fraction of the price.
Luckily for you and the rest of us many problems can be split into relatively independent pieces, in which case a Beowulf cluster or similar is more than adequate.
If you seriously believe that clusters can compete with supercomputers for every type of problem, you need to think again.
...1/2 wrong. Supercomputers, and before they were called that, just whopper mainframes, used to do "all of the above" computing tasks because that's all there was. Now, stand alone PCs and clusters of them can probably do at least 1/2 the jobs out there, and maybe even a higher percentage. So, of course they (Cray and the others) are concerned for their market, losing half your market has to hurt, and daily the ways you can use smallish commodity hardware computers increases, both in complexity of job and in numbers of jobs. It's also a function of cost. If I can mangle some negatives here, PCs are not getting worse, nor more expensive, nor able to handle less tasks of less complexity. If that was true, so called "super" conmputers would not be in any threat, but they are, because it's happening. PCs and clusters are getting much mo bettah, at a fantastic rate. There is not a 100% replacement for "super" computers yet, but just in the last ten years I would bet a LOT of jobs previously only possible on "super" computers are now being handled easier/better/cheaper on "normal" PCs and clusters of them, and that trend will only continue, and there's only one place that market can come from, and that is the mainframe-stand alone "super" comnputer of yesteryear.
But to do this, wouldn't Linux first have to be a "world-class" operating system? And we all know that there isn't much chance of that ever happening, right?
Of course, it really does depend on the problem you're facing. Most people who pay for results, though, want results as fast as possible, and that's why supercomputers win for problems that aren't "embarassingly parallel".
"There are a dozen opinions on a matter until you know the truth. Then there is only one." - CS Lewis (paraprhase)
Why you will always be wrong. Basically, clusters make use of existing networking technologies (higher overhead) and supercomputers (SIMD, MIMD, whatever) are designed to make that overhead as low as possible.
Thus, by definition, for HPC applications that can't be parallelized enough to overcome the communications overhead, super computers will always beat out clusters and have a place.
In the future, I would want to not be isolated from my friends in the Space Station.
Does this mean I can run linux on my Cray?
Hmmm...
1. Cray is definitely pro-linux. It's what their XD1 runs. Though not their bigger computers.
2. There are some problems for which that a cluster can not even come close to achieving the performance of a supercomputer. For a lot of problems yes, for some maybe if you spend a fortune on fancy interconnects, and for some no.
3. If you're commercially building clusters let me know company it is. I'm in the market for a 128CPU cluster and I want to know who not to buy from.
http://www.sec.gov/Archives/edgar/data/949158/000
Here they discuss the limitations of clusters and vector-based supercomputing.
Basically, they offer three types of supercomputers aimed at different markets: vector, massively parallel, and multithreaded. Not really sure why multithreaded means in this context (Microkernel capable of threading itself across many processors i.e. UNICOS/mk?) but they do a decent job of explaining the whole thing:
LedgerSMB: Open source Accounting/ERP
Unless I'm now out of date, the last figures I saw said the CrayLink Interconnect can do 102 GB/sec. That's Just a tad bit more, don't you think? No messing with masses of gig ethernet to crossconnect them. It's just done.
From Cray (From XD1 page):
"A 96 GB per second, nonblocking, crossbar switching fabric in each chassis provides four 2 GB per second links to each two-way SMP and twenty-four 2 GB per second interchassis links."
-So for a dual-opteron XD1 processor unit, there is 8GB total bandwidth available.
Total aggregate PCI bandwidths (Accepted standards):
PCI32 33MHz = 133MB/s
PCI32 66MHz = 266MB/s
PCI64 33MHz = 266MB/s
PCI64 66MHz = 533MB/s
PCI-X 133MHz = 1066MB/s
PCI Express = 200MB/s (Per slot)
PCI Express x16 = 3000MB/s (Usable bandwidth)
-So for PCI Express x16 we're talking 3GB/second
SMP Opteron with two PCI Express x16 slots can do 6GB/second aggregate bandwidth. A couple of Infiniband links can easily saturate that. I'm sure this all costs quite a bit less than Cray's propriatary stuff.
My Other Computer Is A Data General Nova III.
No it's not a funny claim to make. One of the defining features of a supercomputer vs. a cluster is tightly intergrated low latency high bandwidth interconnects, whereas clusters typically rely on software support (networking etc.) to shuffle data over relatively low speed, high latency communications channels.
SGI may have something to say about those ideas.
Free Mac Mini Yeah, it's
In a way he's right. Reading the whole article, it seems apparent that he's talking about certain high performance applications. Clusters are not always the best way to solve a problem. For problems that can broken down into small independent tasks like SETI, clusters are a good solution. Clusters do have their optimization challenges with latency, bottlenecks, etc. For simulations where the tasks are dependent on each, these bottlenecks add up. The individual nodes spend as much time communicating with each other as they do computing. There are also problems that cannot be distributed. In these cases clusters are not the right solution and it may not be cost effective to use a cluster.
Well, there's spam egg sausage and spam, that's not got much spam in it.
Hi flametroll. Please re-examine my statement. Note that I did not say Linux clusters were superior to Crays in all aspects. I merely mentioned that Cray's marketshare was slipping and that, par for the IT course, the CTO came out and stroked his ego in public. In the future, please do not read into the statements of others.
blog |
While your statements may be true and valid I don't think that I would want to base a business off selling a machine that costs seven figures and would only be marketable to that remaining 0.1% and expect to be around in a few years. I think that the flexability of the PC in clusters is good enough for most applications and they can throw student labor and people with phd's at the rest on the software end. Just like the PC market itself windows is good enough to do the job but nobody would say that it is perfect.
Got hosting
don't forget the SGI Altix range - scaling to 256 processors in a single image
and there was the recent PR about the contract with Nasa for a 10,240 processor server (though that's not 10,240 in a single image...)
"we demand rigidly defined areas of doubt and uncertainty!"
Because clusters are cheaper, per raw unit power. Because they're made of commodity mass-produced parts.
No matter how kind you are, German children are kinder.
If you're going to run over a range of inputs, you're going to need the total CPU power anyway. The only advantage you'd get with a supercomputer is that you'd get your first results out sooner than the later results, instead of all results coming out at the same time from a cluster.
And forget not the reason *Why* you'd want to use a cluster. Namely, commodity hardware is *cheap*. Incredibly bloody cheap, compared to supercomputers, for the same amount of total cpu processing time.
No matter how kind you are, German children are kinder.
Not quite true. First off, you get much higher bandwidth between processors using proprietary (NUMA) based interconnects than you can with commodity hardware. Why? Because you can optimize for your situation. Second you can exploit things like cache-coherency between processors (even if they're in different "nodes") and therefore true shared memory. So, a 1024 processor SGI Altrix, or a 256 processor Cray is one computer as far as the OS and user-land stuff is concerned.
There's another advantage Cray has on the SV and X series and that's a vector unit on the processor. That allows you to conduct operations on arrays of numbers at once instead of having to cycle through the numbers in a loop. For example, the dot_product between two small arrays might be accomplished with one or two instructions, as opposed to a loop. Apple's AltiVec is also a vector unit.
If you took money out of the picture it would be easier to deal with a big-honkin' super computer like an SGI or Cray rather than a cluster. One computer is easier to manage and you could always use threads and plain old heap memory (which is much faster than message passing over a network).
Add money back in and 500,000 goes a lot farther in raw compute power when you're buying racks of DELLs and infiniband interconnects. However, depending on the application, you may be faster, slower, or even dog-slow compared to the cray. If you need the answer today, and the $ is not a factor, go to Cray or SGI with a blank check. If you have to balance cost and time, then a cluster might be better.
Essentially, it boils down to how much communication you do between nodes. Cray does it orders of magnitude faster than off-the-shelf stuff. If you hardly ever pass messages between nodes, clusters are fast. If you have to pass a lot of messages between nodes, one big computer will trounce lots of little ones.
Leave the gun, take the cannoli -- Clemenza, The Godfather
Of course the cost of this kind of networking technology does eat up quite a lot of the cheapness factor. In many clusters the interconnect costs more than, sometimes several times more than, the processors,memory, etc.
Both clusters and big iron have their place. I am a meteorology professor and my current research involves high-resolution numerical modeling of thunderstorms. For a problem where the domain decomposition is straightforward and internode communication isn't your bottleneck, clusters are great. One huge advantage of clusters is that they are cheap and it isn't too big of a deal to get a grant together to buy the hardware, and it's YOURS and nobody else's. A huge disadvantage to big iron is that you have to share it with about a hundred other researchers. Waiting in a queue for three days only to find you goofed up in your startup script (and the model exits immediately) is NO FUN (cf the Regatta at NCSA).
I am currently running a model using legacy FORTRAN 90 code which was written before there were clusters. It does use OMP but OMP sucks and is no substitute for code which is written with MPI in mind. The model as it currently stands requires big iron to do big runs, and it is inefficient, but it works and sometimes I just need to do science and not model development. I am working on MPI-izing the code; no small feat, but the rewards would be quite worth the effort.
In summary, both clusters and big iron have their place. Folks have a habit of making a false dichotomy with regards to these two options. I wouldn't trade my cluster for the world (currently doing parallel POV-Ray rendering of my 3D thunderstorm data, see my web link and an upcoming [not sure what month] Linux Journal article if interested) as it is perfect for much of what I am doing right now and I don't have to share it with anyone. But I will also use big iron when necessary.
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
does anyone stop to think about what that 200 PC cluster costs in power? quite a bit i think ...
There is a very fast NUMA (non-uniform memory access) interconnect in each case (about the size of a washing machine). So you can access memory on another board only slightly slower than on your local memory.
You can have up to 4 processors per board. Then you can connect together multiple washing machines with (I think) Infiniband.
You still want to access local memory if you can as that gives lowest latency. Work is going on in the kernel to better support this kind of architecture. Linux (or at least open source) is really important to these machines because you do need to be able to modify the kernels.
A bunch of PCs work real well when your problem can be partitioned. What kills you is high levels of synchronisation activity - whether signalling or updating because that's when latency kills you. For some apps you may have as much or even more compute horsepower in your PCs than the supercomputer but it spends all its time twiddling thumbs.
So for many hard applications these machines really are the bee's rollerskates.
Squirrel!
Current development looked to be 512 CPU's before too terribly long. I like the system, but it's EXPENSIVE when you think about the fact that you're still buying x86 architecture - $6 - $8K per CPU.
Every Cray is sa - cred.
Every Cray is greaaat...
If a Cray is was-ted,
Paul gets quite i-raaate!
(Thank you, thank you very much.)
Why can't all fpga/microcontroller manufacturers just release free optimizing compilers???
The point is, though, that the Cray supercomputers are vector supercomputers, whereas Linux clusters and other similar machines are not. Currently it seems to my that most clusters are very remeniscant of computers of old days where you run programs in batch mode. Cray is pointing out that running stuff in batch mode, even massively parallel, often cannot match the flexibility of the cray vector system.
on a side note, have you worked much with myrinet? personally, i find it to be the most buggy thing i've ever seen. their hardware seems to fail more than an antique chevette. i just wonder if anybody else has a similar experience with myricom's products, since at this moment i doubt i'll ever again invest in their hardware.
--- d'oh
No! What he is trying to explain here is that a Linux cluster using "standard" hardware, (eg x86 based), suffers from the usual PCI related bottlenecks that standard hardware has. Therefore it cannot be as efficient as a system specifically designed for supercomputing which has no PCI bottlenecks.
If you read the article you would note that Cray is promoting their new LINUX BASED SUPERCOMPUTER....
My hyperlinks aren't worth the paper they're printed on.
In linear algebra, there are often many algorithms that could be used to solve a problem, but the obvious algorithms require many more calculations than the clever algorithms. For instance, you don't solve A*x=b by calculating inv(A)*b.
Just as it would be embarrassing for the mathematician to recommend calculating inv(A) for a one-off solution of A*x=b, it would be embarrassing for a computer scientist to recommend a freon cooled million dollar supercomputer when, with a slight optimization of the algorithm, the solution could be calculated
with a cheap cluster of PCs interconnected with 100baseT.
I'm not a subject matter expert but it seems like the Cray is a M/m/X (X>=8) system while Linux clusters are multiple M/m/x (x=4) systems.
It seems to me that the mathemetical limitation of how much workload a Cray can handle is a lot worse then a Linux cluster.
Can it be that the price/performance issue that he is talking about is just for specific applications?
Finally, a machine capable of running Doom 3!
-- "To ask a question is to show ignorance; Not to ask a question means you'll remain ignorant."
I work a lot with it, like ~3000 customers, almost half of them are industry (non academic or gvt).
You found bugs ? Care to share them ? Hardware failed ? Did you get it replaced ?
Can you give me the tech support ticket numbers so I can see if your complaints are reasonable (and have been addresses) or are just plain FUD ?
beowolf (is that spelt right) is not the only type of cluster. there are clusters such as openmosix that will cluster applications that aren't even written specially - its a kernel module. if i wanted a stupidly fast kde desktop, for example, i could get a room full of cheap x86's and cluster them using linux and openmosix.
Right and no. If engineers/scientists need to wait for results, then two things happen:
1. Run a cluster and wait two weeks. Twiddle thumbs in mean time.
2. Run on supercomputer. Start processing next day.
For a lot of companies, it is justifiable to spend more money to acheive #2.
Not too mention that a lot of large programs that can't be parallelized have high memory requirements. If your program needs 50GB of memory and can't be parallelized to run across a cluster then you have no choice but run on a supercomputer, since no cheap system can handle 50GB.
There was a guy about 90 miles away, offering one on ebay for $7000, it never sells (he tries every 12 months or so). If I suddenly landed a job for $60k a year, I'd almost certainly buy it from him. Rent a Uhaul or something, go pick it up. I've heard of universities practically junking them.
Yes, Cray's are one of my saved ebay searches...
Because clusters are cheaper, per raw unit power.
But if the supercomputer is more efficient per raw unit of power, then the price per unit doesn't matter.
I work for living with HPC, buth with clusters and with large SMP machines. The cluster is nice, but there are some things than can _only_ be run a large SMP machine or are much, much faster on a SMP.
I don't see that your rant says anything different, other than you're giving more emphasis to problems that are more parallelisable, and he's giving more emphasis to the ones that aren't.
Oh, and you're implying Cray's product's vaporware, and he's implying clusters are less reliable, so I'll grant you both one FUD point. Happy?
Hmmm. I would say also that first byte latency is also very important in a lot of/most workloads. Clusters can mask this on some workloads through parallelization. Introduce interdependencies and it loses some of it's advantages. I know I will get flamed for this but I think Sun understands this quite well hence the philosophy behind throughput computing and their next gen core designs (Niagara, Niagara2, Rock)
Actually, it was just your regular run-of-the-mill "I hate slashbots, so I'm going to point out the obvious to you like you're a small child because you displayed the cognitive ability of a small child" troll.
And, actually, truth-be-told, you said this as the follow up to the stroking-ego thing:
But I guess that's the best they can do...
Which is wrong. For the reasons I outlined. Now, please go back to your regular DMCA-RIAA-MPAA-Microsoft bashing while I slink back into the woodwork.
k thx!
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
It depends what your TCO estimates for an installation are. Typically, your buying costs aren't the signifcant part of fielding a solution. You need to look at how much it is going to cost to run (power/heat/real estate/maintenance) and how it costs you when it ain't running. Also, just throwing this out there as I don't have any specs to hand, does anyone know if commodity hardware is accurate enough (i.e IEEE FP precision etc.) to be used in all cases a 'super computer' (sic) is used?.
In other news:
Microsoft CEO Bill Gates announces today that their Windows Operating System is *really* great!
Read carefully. I did not say it was vaporware. I saw a real system. What I said was, trying to keep up with the commodity curve is battle that has been lost by many an HPC vendor. The key is time to market. By the time you get your latest and greatest to market, the commodity market has passed you by.
HPC for Primates. Read Cluster Monkey
I would. That 0.1% has little choice but to pay big for these computers. If you are the only one making them, it gets even better. Cray has been selling to a niche market for decades.
'SBEMAIL!' is better than a goat!!
You and him, you're saying the same thing, you're spinning it your own way, but the actual content is the same. So why are you describing his as FUD?
They're not really scared of clustering for reliability reasons. You can do big machines that are far more reliable than the cluster made of cheap components. In fact, it's being done all the time.
First of all, the Google setup isn't the type of cluster that is being compared to Cray supercomputers, and it's also a simple form, with little communication between nodes. Secondly, in a HPC cluster, the reliability of each node becomes far more important, since a failure for a node can mean hours or even days of having to performm calculations again. If you're unlucky, and you have a project where each node needs a lot of data from the other nodes, and a node goes down, you've stalled the entire project for quite a while. If a CPU dies in a supercomputer, process state etc can be migrated over to other CPU's.
It will always be the right tool for the job. If a company wants many machines and has a service agreement with someone to monitor, replace (or not), etc., then that works for them. If a company wants all that in one power draw in a large box, they get a supercomputer. There are also many processing models that clusters of smaller machines can not provide solutions to.
That having been said, of course Cray is going to say that smaller boxen clusters = bad.
Click here or here.
no, because in a cloud simulation, for each raw unit of power that you use there's a vast amount of communication that has to go on between nodes before they can keep working (checking for collisions after recent movement of local particles that were computed on another node, for example) that is either orders of magnitude faster on a supercomputer or perhaps just not necessary due to a large shared memory in a supercomputer. so yes, for each raw CPU cycle the cluster price is cheaper, but in simulations where the outcome of a computation on one processor affects the computation on other processors, many many many of those cycles are wasted in commodity clusters because they just can't talk fast enough between nodes to work efficiently and will therefore take far longer. sure, the overall problem may be parallelizable, but when the underlying details are so bloody complex, you
-- D-23994, Muff#2613
Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...
Serious question here (yes, wrong place to ask one of those): "Hypercube formation"? Is this just a cubic lattice where the nodes are relatively densely packed so each can communicate with several others over not-too-far distances? What makes it "hyper"? Someone with an expertise in communications/computing theory help me out here!
Not if your task needs 10+ GB of ram.
Think of this your simulating 100 plain crashes for a new design.
The plain takes 40GB of memory (courent position and stresses on each person and component)
Let's say each "frame" of the crash takes 10 Gflops of CPU time to finish and you need 5,000 fraims to run the simulation. Now you could run this on a cluster of 100 systems BUT if they don't have 50 GB of ram it's going to take 1000x as long because you need to fetch the whole simulation from disk 5,000 times vs a cray which whould load it into ram once and be done with it.
A Cray XD1 is quite different from an SGI Altix.
A nasty thing called physics says that latency will always be higher when moving messages over the cable than over the machines internal bus because the cable is longer.
That said, isn't the obvious solution to this problem a "smart" clustering software that puts the processes that exchange the most messages with each other into the same computer ? A bit like NUMA, but replace "memory" with "message".
Of course, if someone absolutely must write code that passes around a zillion messages, then it's going to be slow no matter what... So our smart clustering software should be really smart and arrange the threads so that a single machine contains threads that are likely to block on messaging at different times so you can run one as the other waits.
And of course, if you get enough bandwith and low enough latency, you can treat a cluster as a big NUMA machine (syncing shared memory areas over wire as neccessary).
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Many posts have pointed out the true fact that supercomputers are better for certain jobs that are not suited to clustered solutions (and visa versa).
Most slashdotters are technical enough to realise this...but...we are not the target audience of the original article. Such articles are meant for high level executives and relatively non-specialist managers who don't always hear all sides of the story. Every day these people are seeing articles and news blurbs stating how the latest linux cluster is as good or better than a supercomputer, and gee isn't that swell! While such press is good, and important, not everyone hearing that implicitly understands that such reports only apply to SOME applications.
So what the original article is, is a message from one executive to other executives trying to clarify the situation. Basically saying "hey, just because Wired ran a story that says linux clusters are the next best thing since sliced bread, doesn't mean that this is the best solution for you. Now, let us talk about what you need."
I see nothing wrong with this. I read the article, and found nothing in it that was false.
It is good because sometimes an exec will listen to a fellow exec when they won't listed to the advice of their own techs because of something said exec read in Scientific American.
Welcome to corporate america boys and girls.
(Disclaimer: Wired and American Scientific were random examples. I know of know articles in either publication about linux clusters. Both are fine publications.)
Did you buy a Neuros today?
There are also tasks which are quite paralellizable and quite sequential...but that are so heavily interrelated that they require a fast interconnected bus.
The major benefit of Linux clustering kind of falls apart when the bulk of the cost of your project becomes the interconnect. Mainframes are designed to communicate large data between multiple processors wheras standard servers are designed to communicate over a relatively slow I/O bus. If you're paying a few hundred grand for a top of the line control system and interconnects and that's STILL your major bottleneck...well, you should have had a Cray.
Of course, the Cray also has limited expandability...I guess the point is, do your fucking research, analyze the needs from a total cost perspective and don't go with Linux just because you can imagine a Beowulf cluster.
Hey freaks: now you're ju
"Networked clusters are useful only when the task is parallelizable, and each subtask is computable on a single node. Cloud physics is not like that. Cracking RC5, for instance, is."
The issue isn't parallelizability. *All* supercomputers are designed to exploit large amounts of parallelization (and *all* supercomputers will do poorly if the task isn't parallelizable).
The issue is internode communication. If your simulation requires a *lot* of transmitting data between each processing stage (like "cloud physics" probably would) then a more tightly-integrated parallel machine (like a cray) would be better. If you can do more computation before having to communicate (like cracking RC5), then a cluster will be more cost effective.
The following sentence is true. The preceding sentence was false.
Thats true there arent a lot of choices but IBM and Sun are also making big iron as well. Cray is not the only way to go nowdays. Even if you land an order for one machine the only income as a company you will have is a support contract. How long are you going to have a company if you only sell 2 units or less a year? Don't get me wrong I would love for the kind of research that these machines excel in to pick up in the US but it doesent seem to be happening. Worse than that I would hate to see Cray computing to become another government subsidy!
Got hosting
Once again, a person misses the fact that I *Did Not Suggest Running A Single Cloud Simulation On Multiple Computers.*. I suggested, given a number of starting inputs for cloud simulations that need to be run, a separate simulation on each element of the cluster. Networking bandwidth is essentially negligable.
This is just one example of how having a single parallelizable task makes a cluster the cost-efficient choice.
No matter how kind you are, German children are kinder.
Then get twice as many commodity PCs and run every simulation on two separate machines in case on edies. Get 5 times as many. Get 10 times as many. It doesn't matter, it'll still be cheaper than the supercomputer.
No matter how kind you are, German children are kinder.
A train does not have the same price-performance ratio as a ship.
But you don't use a ship when a train would do, and vice versa.
As the parent poster said: clusters and supercomputers are not the same.
This makes little sense. Parallelizability is not a simple yes or no question as you seem to think. Rather, it's a matter of degree; You express parallelizability of a computation as an integer indicating the maximum number of parallel threads of execution it can be split into.
A computation whose dependencies permit you to benefit from running two seperate threads of execution is technically parallelizable, but it's not going to benefit from a cluster unless your "cluster" consists of two machines and you're comparing the cluster to a uniprocessor with the same performance as one node of the "cluster".
do you have any idea how monstrously complex these simulations are and how long they take to process? the point of using a supercomputer is that these problems take too damned long (read: years) to run on a PC. no, sorry.. commodity hardware just does not cut it in these types of situations.
-- D-23994, Muff#2613
In fact, Cray's XD1 "supercomputer" uses the same Infiniband interconnect as some "clusters"; Cray just has better NICs.
No, Cray doesn't have "better NICs." In fact it doesn't really have "NICs" at all, not in the sense that we think of them. Your typical Infiniband card hangs off a PCI bus. PCI bus = major bottleneck, especially when you're talking a couple dozen Infiniband connections.
The XD1 is cool because the Infiniband is right on the hypertransport bus of the Opteron CPUs. It's damn fast.
Whee! Could we mod this funniest flame war? I want to watch the flames burn some more. Haha, funny slashbots. And remember folks, the easiest way to win the Turing test is to pick dumber test subjects.
-Those who would give up essential liberty to purchase temporary safety deserve neither. -Ben Franklin
In a hypercube computer architecture, your put a node at each vertex, and a communication channel to each of the n adjacent vertices. That way, you don't need a huge number of communication channels per processor, i.e. log2(number of processors), at a cost of sometimes having to pass data N hops.
There are other popular architectures out there. Simple 2-D grids match a lot of applications, and require 4 comm channels per processor no matter how many processors you have. The old Transputers were built specifically for this. A minor extension to the 2-D grid is the torus, which you make by connecting the top and bottom of your grid together and the left and right ends together. (It basically doesn't cost any extra, since you had the spare channels at the processors at the edge, plus you get to say "ooohhh, donuts!"). And there are a bunch of applications with dense clusters of processors (for instance, N-way shared-memory nodes) with the clusters connected in hypercubes. Butterfly networks are another shape that was popular for a while - they look sort of FFT-like, and they basically keep the log-n number of communication channels while reducing the bottlenecks.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
There are some blade servers that use low-power CPUs like Transmeta to get the tradeoff of more MIPS per watt. E.g. 50 watts per processor gives you 10KW, as opposed to 300 watts-> 60KW. At 10 cents per KWH, a 10KW cluster is about $1/hour, which is cheaper than the grad student you've got managing the thing. (In practice, you often need to double or triple the power costs, because you also need cooling to get rid of all the heat from the CPUs.)
Obviously a supercomputer is a bit different, because you don't need all the disk drives, but CPU and RAM are using an increasing amount of power compared to disk drives. (So does high-end video, which obviously you don't need unless you're playing games like using the video processor for number-crunching instead of the main CPU.) But the power problems are still just as annoying. If you're doing anything custom-built for supercomputing, you'd obviously build boards with multiple CPUs and faster interconnects and skip all or most of the disk drive stuff, so that lets you fit more CPU per 1U or 3-4U of rack space. And you might build a system with lots of DSPs instead of general-purpose CPUs, which would probably get you more MIPS per watt.
Database supercomputers, on the other hand, look surprisingly like blade servers. The old Teradata machines had something like 488 CPU+disk units connected by a fancy back-end switched network, plus a front-end set of CPUs for managing work and communicating to the outside, with algorithms designed to split up queries intelligently across the processors. And of course there were the same kinds of arguments about database machine clusters vs. big iron mainframes vs. loosely-coupled clusters.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
But, the bigger the cluster becomes, the more nodes and interconnect fabric you need just to keep up with the degradation of performance in the cluster for tasks that are not easily made parallell. As it is, for non-trivial tasks, a supercomputer is considered to have a good performance if they get 60-70% of the theoretical peak performance. With clusters, you're lucky if you come anywhere near close to that. And note that I said non-trivial tasks. The task of simulating gas flow or particle physics are non-trivial problems for example, while tasks such as SETI@Home and RC5-72 are quite trivial, they just require CPU power but can easily be split into separate chunks, and the only communication needed is between the individual notes and the master node handing out and receiving the chunks.
I'll use an example from my experience again: I sometimes freelance doing architectural visualizations, air flow simulations in buildings due to heating etc. I had the opportunity to try out my own code on both a Athlon cluster running on Myrinet and on a SGI Origin 3800. On both, I used 16 CPU's. Despite the superior FP performance of the Athlon CPU's, the Origin finished the work in less than half the time the clustered Athlons needed, despite me making specific changes that would allow the code to work better on the cluster.
Back in the mid-80s, my department had a huge VAX 780 with 4 MB of RAM (16KB chips, I think), and we were working on a network simulation system that needed 12-14 MB RAM to run. I spent a while playing with different versions of 4.1BSD and Unix System VR2, but fundamentally the machine spent all its time swapping data in and out of disk, and the main performance with was helping the physics jocks who wrote the application get better algorithms and better localization and good checkpointing because the computer didn't always stay running for the full week it took to finish a simulation run. A year or two later, we got the budget to buy another 4MB of RAM (in 64KB chips, about $50K IIRC), which helped a bit, and a year or two after that, we got enough budget to buy another 8MB of RAM (maybe 256KB chips? not sure. Also about $50K), and suddenly the application could complete in under an hour instead of a week, because RAM really is a couple orders of magnitude faster than disk drives with a couple more orders of magnitude less latency, so our problem changed from being disk-bound to being CPU-bound.
That speedup not only improved the utilization of the equipment, it made a qualitative difference in the kinds of problems we could address because of the way we could interact with it. That's why people buy supercomputers if they need them - it really can be orders of magnitude faster for some problems. The first year or so, we really had all the RAM that could fit in the double-refrigerator-sized VAX cabinet. Once the denser RAM chips became available, we probably should have spent a bit more manager time beating up on the accounting department, because an extra $50K for hardware could have more than doubled the efficiency of 3-4 physicists, but of course the accounting droids don't think in terms of efficient use of physicists unless it lets you buy half as many of them, which was _not_ the objective here...
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Sure you've got enough money for a Cray! Cray J932SE supercomputer (dual IOS, 3 cabinet) for $4500, not including disk drives.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Back in the 80s I was doing a telecomm project for a large research lab that had a number of Cray supercomputers on one side of campus, and their campus backbone was a 30 Mbps baseband cable system feeding a bunch of 10 Mbps Ethernets, and a few of their buildings were starting to get brand-new 100 Mbps FDDI. They were getting very worried about what would happen if too many people _did_ imagine what they could do with a Cray, and wanted to do it from the other side of campus... Fortunately, the number of people who had access to the Cray was small enough that a variant on "sneakernet" worked fine - not using the sneakers to carry floppy disks around, but using the sneakers to carry the users to the building where the Cray lived :-)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
... I do SO know. Supercomputers are mainframes that have radioactive spiders building webs inside them, and the cooties slop over and give them SUPAH POWAHZ.
neener neener
Sorry, but I can't believe you got modded as 'insightful'. (well, actually, this is /., so I guess I can.)
Look at google. A large cluster, and if one of the machines dies, you don't worry about it.
The most important thing to remember about Google is that they deal with NON-CRITICAL INFORMATION. Who cares if you get different results when looking for Janet Jackson Superbowl pics from one day to the next, or even one hour to the next? You're getting scads of results, most are probably good, so you're happy and think Google is perfect. Google's *actual* reliability and consistancy would get them NOWHERE in banking or airline reservations, let along weather prediction, nuclear blast modeling, etc etc etc. Just because a computer can do one job well does not mean it can do another, same way that Indy cars make really lousy drag racers and Top Fuel cars don't turn laps at the brickyard.
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
I'll have to dig through our ticketing system and my desk to find the ticket numbers, so you may have to wait until Monday. None of them were 'bugs', simply hardware failing. Both myrinet cards in computational nodes, and switch cards. Hardware did get replaced, and I'm about to call in for a replacement of 10+ cards plus probably half of the cards in the switch. For a two year old cluster to have 20% of the myrinet hardware replaced, I find that not acceptable.
--- d'oh
I seriously doubt that Cray can put faster circuits down on silicon that Intel. part of the nature of the silicon foundry is that stuff doesn't start getting good and fast unless you make A LOT of it. It also gets cheap at this point.
I see no architectural difference between a "cluster" and a "supercomputer". The links between different CPUs are just conventionally made using different technology.
There's a lot of rubbishing of PCI (hey it's 10 years old now, and there are MUCH faster new versions happening), and what is the point of saying unquantified/unsubstantiated crap like "CRAYS HAVE VERY FAST SHARED MEMORY BUS".
Yeah - HOW FAST THEN? I'd be surprised if they are 128 bit running at 2 GHz.
Shared memory can mean one of a number of things, also:
You can have one CPU sharing say a 4 meg block with each of 25 other CPUs. The first CPU acts as the hub for communication between the other CPUs.
You could have 27 CPUS in a 3 x 3 x 3 cube, each CPU sharing memory with up to 6 neighbours.
You could have 5 processors in a line with each one sharing memory with (up to 2) neighbours.
Or you could have a bunch of core memory that 4 processors share (they might have their own memory too).
The same thing goes for a cluster - you could have PCs with up to 6 network cards (or with unidirectional custom ethernet protocol, even 12 network cards linking to neighbours in a 27 CPU cube, and so on.
The topology will affect how the program is written for maximum speed, but also which tasks the computer is suited for. I think you could make very very fast links between ordinary PCs with say full duplex gigabit running a custom protocol (TCP has latency by the way, UDP has none since it doesn't wait to assemble packets in buffers in the kernel).
It's hard to imagine a task that is so i/o bound (in my mind this is the opposite of embarrasingly parallel problems) as to require more than 100 megabytes/second between each node, when each CPU node has a memory bandwidth of 12 gigabytes per second (based on 32 bit core of Pentium 4 at 3 GHz, assuming roughly 1 transfer per clock cycle, which in itself is unlikely).
In other words, a cluster using off the shelf gigabit ethernet hardware could transfer 1% as much data as the CPU could do with RAM.
Note if the CPU is in a 27 CPU cube the combined 6 gigabit ether cards would be transferring 6% as much as the CPU could. I guess it is possible to get motherboards with larger numbers of PCI slots, say 12 in which case you could run two streams of gigabit ethernet between each CPU giving you 12% as much data being transferred over ethernet as the CPU can transfer in and out of memory (not including cache flushing from CPU to RAM).
Once again, what problems require such a huge amount of communication with other nodes that say 12% as much bandwidth between nodes versus CPU-memory is not sufficient?
Say 12% isn't high enough: what CPUs, data bus widths, and shared memory speeds are used then?
Arguments people have made so far are so light on detail, and using terms like "much faster" instead of giving a figure, it sounds like FUD.
Remember parallel links between devices on chips can exhibit data skew, lowering data rate compared with a fast serial link. In fact there is talk (and I personally suggested a long time on a newsgroup) using light to get signals from one chip to another. (probably mainly serial, but not necessarily exclusively).
I am detecting a slight conflict of interest here.
1000 machines are more reliable then 1 big machine.
Depends.
If any machine(s) are up, the system works. This is Google's advantage.
If any machine hiccups, the system fails. This is supercomputer turf.
A cluster is a fast efficient way to solve a few problems. A very small portion of the problems would be my guess. To the extent that the next computation depends on something non-local, which in turn depends in part on the results of your last computation, supercomputers have an advantage somewhat like the difference between cache and disk access speeds.
The point is, though, that the Cray supercomputers are vector supercomputers, whereas Linux clusters and other similar machines are not.
The article is about the Cray XD1, which is not a vector system. In fact, the XD1 is remarkably similar to an Opteron/Infiniband/Linux cluster...
It is called a hypercube because it is a mapping of a four-dimensional (and higher) cube into three-dimensional space.
It has nothing to do with communications or computing, but with topology.
Right, right. But if it's a mapping (1-1, as it seems) then there's an isomorphism, and the need to do anything in hyperspace isn't there. Just call it an arrangement with a certain number of lines from each node and be done with it. If it can be done in meat space there's no need to hyper- it up.
Until PC's have as wide busses, it probably
won't matter how well the chips do.. Cray's probably require less maintenance than a PC cluster.. Its like saying a bunch of fast motorcycles are better than a stock-car. Sure, but if you get in a wreck, I'd rather be in the stock car..
Just say no to license servers!!
did I hear right cray's are using a type of linux kernal....
When you've got you install disks you know you haven't been screwed
I think that this configuration gives the best proce performance. On one end of the spectrum you have a completely interconnected mesh, on the other hand you have all systems on a single bus, in between you have a whole lot of possible topographies.
In the hypercube system, you need in N dimensions N communication channels per processor, and the maximum distance that any packet has two travel is N hops.
In a complete interconnected mesh with P processors, you need P communication channels per processor. While your maximum hop is in that case only 1, you need (P - 1)*P communication channels, which is quadratic.
The default behaivor should be sensible.
Migrating a single process on a mostly idle NUMA box across system boards is NOT sensible.
The whole "locality of reference" concept is CompArch 101 stuff.
A Pirate and a Puritan look the same on a balance sheet.