Cray XT-3 Ships
anzha writes "Cray's XT-3 has shipped. Using AMD's Opteron processor, it scales to a total of 30,580 CPUs. The starting price is $2 million for a 200 processor system. One of its strongest advantages over the std linux cluster is that it has an excellent interconnect built by Cray. Sandia National Labs and Oak Ridge National Labs are among the very first customers. Read more here."
single node of those.
A few more years of advances like this and we might have a machine capable of running Longhorn!
or else!
How are the Opterons at standard FPU operations in double precision? SSE2 and friends are nice, unless you have to make compromises in your simulations.
I ask, because I remember that the Athlons beat the pants off the Pentium 4's in FPU operations, so all the benchmarks were rewritten to use SSE2.
Dimensions (cabinet): H 80.50 in. (2045 mm) x W 22.50 in. (572 mm) x D 56.75 in. (1441 mm)
. ht ml
Weight (maximum): 1529 lbs per cabinet (694 kg)
http://www.cray.com/products/xt3/specifications
It seems that the XT-3 not only use Opteron processors but they also use PowerPC 440 co-processors from IBM to off load inter-processor communication from the main computing CPUs. Quite an interessting set up.
The XT-3's biggest comptetitor in this segment must be the BlueGene/L type super computer made by IBM. The processors in Blue Gene/L is a custom built dual core version of the PowerPC 440 with built in high speed interconnects.
Just like IBM have a finger in all the future game consoles, they seem to have a finger in several of the next generation super computers also. Nice going IBM.
- Henrik
- when the Shadows descend -
what kind of operation system runs on this beast?
UNICOS is usually a safe bet. In this case the specs say UNICOS/lc, which is made up of "SUSE(TM) Linux(TM), Cray Catamount Microkernel, CRMS and SMW software"
I'm not entirely clear how to interpet that, but I think it runs as follows: It runs the Catamount Microkernel as the kernel, and uses SUSE for everything else (so we have SUSE Linux, without the Linux - all of a sudden that GNU/Linux stuff starts to make sense). The CRMS is their interconnect management and monitoring software, and SMW is the System Management Workstation - which I'm guessing is their administration frontend.
It's worth noting that that's some pretty serious software there (because Cray has a lot of experience dealing with large systems) - you can bet that the management and monitoring software is some very serious stuff.
This thing is to a beowulf cluster what a dual G5 PowerMac is to homebuilt PC system running Linux From Scratch. It's going to work flawlessly "out of the box" with a smooth and polished interface that lets you get done everything you want to do simply and easily. You can of course make your home built PC with LFS work just as well, it's just going to take you an awful lot of effort.
Jedidiah.
Craft Beer Programming T-shirts
So, how does this compare to running Apple's Xserve? Bang per buck? Heat? Space? Etc etc....
There's not a lot to compare. We're talking apples and oranges. It's like asking to compare a PowerMac G5 with a bunch of PC parts scattered on the floor as desktop machines. Sure, you can put the PC together, load it with Linux, tinker with it to get everything working, etc. but that's a fair amount of work compared taking the PowerMac out of the box, plugging it in, turning it on, and having everything work perfectly.
Read the specs, particularly with regard to the interconnect, system administration, and hardware and software reliability features. This thing is seriously engineered to be massively parallel system with top of the line hardware and software to support and maintain that, as well as extremely impressive reliability features.
Jedidiah.
Craft Beer Programming T-shirts
Cray never went "belly up". It was acquired by SGI around 1997 or so, then divested and merged with Tera, who renamed the resultant entity "Cray Research".
Although it's true that Cray was not growing strongly before the SGI buy-out, it was not failing either. It could have kept running quite happily for many years, but in the bizarro-world of Wall Street, a company which is not growing is dying. I so love it when economists use biological terminology for corporations. In Wall Street's thinking, the only healthy growth would be a cancerous tumor.
Anyway....
The whole SGI-period of Cray is actually quite fascinating, and I suspect the true story will never be fully known. Lots of SGI engineers had their non-Cray technology branded with Cray marketting names, most egregiously LegoNet becoming CrayLink. Lots of Cray folks - aka. Crayons - felt that the core of their company was gutted by an SGI operation which didn't care for the extreme high-ends of HPC.
One rumor I heard, from a well-placed source, is that the Cray merger with SGI was primarily arranged by the USG. The intelligence services have huge investments in both company's products, so the merger between them made sense. I was told that as a quid-pro quo, the USG had an in-principle agreement to continue purchasing Cray gear to provide enough revenue inside SGI to keep both Cray architectures alive. However, certain parts of SGI felt that the US government didn't live up to their agreement, negotiations to rectify that weren't successful, and so SGI management defunded significant aspects of the Cray engineering work.
Also, FYI, Cray is one of those companies which will never totally go "belly up" anyway. Given the sensitivity of the work which they did, their support databases alone are full of sensitive and/or classified information. Should the company cease trading, it would be acquired by a shelf company whose sole function is to ensure this data would remain private. That's been the fate of almost all of the now-defunct supercomputer and high-end graphics companies who formerly supplied the defence and intelligence market.
More interesting is this spec:
Acoustical Noise Level: 75 dBa at 3.3 ft (1.0 m)
For comparison, that's roughly the same as an average vacuum cleaner when you're operating it, or maybe a good-sized pickup truck passing you in the next lane.
And remember, this value is *per cabinet*. You have to do a weighted sum over all the cabinets in an installation to get a true dB level. I wonder whether the maintenance people will have to use noise-level exposure limits for this baby.
And here I was, complaining about the quiet whine of my PC's fan.
Time flies like an arrow. Fruit flies like a banana.
Power: 14.8 kVA (14.5 kW) per cabinet.
that's amazing. how did the cray guys get a kilovolt-ampere that is not equal to a kilowatt? just goes to show you the power of fast interconnects.
Disclaimer: IANACEBIATAPEC (I Am Not A Cray Engineer But I Am Taking A Power Engineering Course)
It's fairly common to get a KVA !=KW.
Overall power used by a load is expressed as S=P+jQ, where P is the "real" power and Q is the reactive power (capacitive/inductive from motors, fluorescent lamp ballasts, etc).
While the "units" of S, P, and Q are power=voltage*current, S is generally expressed in VA, P in W, and Q in VAR(volt-ampere reactive) to differentiate the variables. Because the magnitude of S=sqrt(P^2+Q^2), S will always be greater than or equal to P (in this case, 14.8kVA=sqrt((14.5kW)^2+(+-2.965kVAR)^2)
--- You shall know the truth, and the truth shall make you mad- Neal (not Cowboy) Boortz
There are two prominent applications for these machines. The first is nuclear weapons simulation. Personally, I don't see the point to that. The other application is in weather prediction.
Oh, please. Buy a clue, will ya? There's lots and lots and lots of applications that use supercomputers, or could use if they were more affordable. A few examples from the top of my head:
Materials science, that is ab initio simulations, moldyn, you name it. This alone probably uses > 50 % of all supercomputer cpu time in the world. By comparison, weather prediction and nuke simulations is small potatoes (or shall we say, the simulations as such are big, but the number of people engaged in weather prediction or nuke simulation is really small compared to all the supercomputing materials scientists).
CFD, the automobile and aerospace sectors are big users.
Electronic design.
Seismic surveys, the oil industry uses lots and lots of supercomputers to find oil deposits.
Biology. Gene sequencing, moldyn simulations of lipid layers and whatever.
Climate prediction, somewhat related to weather prediction. Official purpose of the Earth Simulator.
All of the examples above could easily use almost any amount of cpu power you can throw at them. The only thing that stands between a lot of scientists and improved understanding of the world is computing power.
What a value!!
That is, until you throw a tightly coupled problem at it and the Cray is 10 times faster because it has much better internode bandwidth and lower latency.
And, you forgot to count the cost of the InfiniBand interconnect that the VT cluster used? That's a couple grand per node.
Bottom line, apples and oranges. If your applications is easily parallelizable (i.e. doesn't require much communication between the nodes) you'd be stupid to piss away your money on a "real" supercomputer instead of a cluster. And vice versa.
...Sadly I think that beats my Volkswagen on all three
So come on, ante up. How many remember being awed at the mere sight of old Crays back in the day? Like the Cray-3? I remember the first time I saw a Cray .... thing was in an anti-static environment. To access it, one had to pass through an airlock and be "decharged" or "depolarized" etc. Basically they some how charged the air to get rid of static electricity. Then you had this system that was running *in* liquid! Take that "Oh I'm so cool cause I have a l337 haX0r water cooled CPU" overclockers
They (Cray) were so proud of this accomplishment that the upper portion of the cabinet was some kind of plexiglass so you could see the fluid as it moved, and moved wiring and what not with it. Very surreal feeling, almost like the thing was breathing.
And what about the Cray-1? Wasn't that a true testiment to 70's *art* and sculpture? The thing looks like some kind of freaky bus station bench with it's odd red and white panels and black base. Though, I don't know if they all looked like that, maybe you could get them in other colors?
Ahh .... those were the days.
"Genius may shine aloof and alone, like a star, but goodness is social, and it takes two men and God to make a Brother."
Actually, there is no reason to cluster a few of these. If you have a 2000 node xt3 (or t3e, paragon, blue-gene, cm5, insert mesh-structured mpp here) and a 4000 node xt3, you stick them together and make a 6000 node xt3. But that's just picking nits.
Curiously the xt3 IS about shaving dollars off the price. If you go read the origional whitepapers on the system, they go through EXTENSIVE cost-return analysis. They studied their (then-) current generation of cluster systems, as well as future linux/solaris/aix clusters, and rejected them as (interestingly) FAR TOO EXPENSIVE, once the administrative costs are factored in. They then looked at, and rejected, cray's vector solution, the X1. They then decided that the (amazingly) most cost effective solution was to underwrite cray's product development cycle on a wholey new product. Basically they asked for an update to the system they already had. (asci-red i.e. intel paragon++) Nobody was building such a thing. Since cray had a really strong similar product in the 90s. (T3D, T3E) the department of energy asked them to create an update. Some designs never die.
What I'm most interested in is the reliability. One of the biggest difficulties in the T3D engineering cycle was dealing with memory failure. red-storm is going to have 10,000 processors. Lets assume each has 2 banks time 3 dimms (chip-kill) of memory. That means there are 10,000 x 6 x 18 = 1 million+ memory chips in the system. IF 1/100th or a percent of these fail, that's still a lot of memory failures. How well are faults isolated? That's the big question for systems this big.
I'm also a little wary of cray's use of lustre. I've used lustre before, as well as other cluster-FSes. While I'm not aware of other filesystems that will scale to 700+ i/o nodes, I'm not confident in lustre. It's an immature product at best. (I don't mean to disparage the people working on it, it's a neat architecture, but it's a hard problem, and I'm not sure it's ready for prime-time.)