Cray XT-3 Ships
anzha writes "Cray's XT-3 has shipped. Using AMD's Opteron processor, it scales to a total of 30,580 CPUs. The starting price is $2 million for a 200 processor system. One of its strongest advantages over the std linux cluster is that it has an excellent interconnect built by Cray. Sandia National Labs and Oak Ridge National Labs are among the very first customers. Read more here."
Dimensions (cabinet): H 80.50 in. (2045 mm) x W 22.50 in. (572 mm) x D 56.75 in. (1441 mm)
. ht ml
Weight (maximum): 1529 lbs per cabinet (694 kg)
http://www.cray.com/products/xt3/specifications
Opterons beat the pants off the Pentium 4s in x87 (i.e. old) FPU operations. If you want to get good performance, you need SSE/SSE2. Both for AMD and Intel. For pure SSE, the Pentium 4s beat the Opterons mainly because of the clock speed, but for multi-processor systems, the hyper-transport and all more than makes up for that.
Opus: the Swiss army knife of audio codec
what kind of operation system runs on this beast?
UNICOS is usually a safe bet. In this case the specs say UNICOS/lc, which is made up of "SUSE(TM) Linux(TM), Cray Catamount Microkernel, CRMS and SMW software"
I'm not entirely clear how to interpet that, but I think it runs as follows: It runs the Catamount Microkernel as the kernel, and uses SUSE for everything else (so we have SUSE Linux, without the Linux - all of a sudden that GNU/Linux stuff starts to make sense). The CRMS is their interconnect management and monitoring software, and SMW is the System Management Workstation - which I'm guessing is their administration frontend.
It's worth noting that that's some pretty serious software there (because Cray has a lot of experience dealing with large systems) - you can bet that the management and monitoring software is some very serious stuff.
This thing is to a beowulf cluster what a dual G5 PowerMac is to homebuilt PC system running Linux From Scratch. It's going to work flawlessly "out of the box" with a smooth and polished interface that lets you get done everything you want to do simply and easily. You can of course make your home built PC with LFS work just as well, it's just going to take you an awful lot of effort.
Jedidiah.
Craft Beer Programming T-shirts
So, how does this compare to running Apple's Xserve? Bang per buck? Heat? Space? Etc etc....
There's not a lot to compare. We're talking apples and oranges. It's like asking to compare a PowerMac G5 with a bunch of PC parts scattered on the floor as desktop machines. Sure, you can put the PC together, load it with Linux, tinker with it to get everything working, etc. but that's a fair amount of work compared taking the PowerMac out of the box, plugging it in, turning it on, and having everything work perfectly.
Read the specs, particularly with regard to the interconnect, system administration, and hardware and software reliability features. This thing is seriously engineered to be massively parallel system with top of the line hardware and software to support and maintain that, as well as extremely impressive reliability features.
Jedidiah.
Craft Beer Programming T-shirts
Disclaimer: IANACEBIATAPEC (I Am Not A Cray Engineer But I Am Taking A Power Engineering Course)
It's fairly common to get a KVA !=KW.
Overall power used by a load is expressed as S=P+jQ, where P is the "real" power and Q is the reactive power (capacitive/inductive from motors, fluorescent lamp ballasts, etc).
While the "units" of S, P, and Q are power=voltage*current, S is generally expressed in VA, P in W, and Q in VAR(volt-ampere reactive) to differentiate the variables. Because the magnitude of S=sqrt(P^2+Q^2), S will always be greater than or equal to P (in this case, 14.8kVA=sqrt((14.5kW)^2+(+-2.965kVAR)^2)
--- You shall know the truth, and the truth shall make you mad- Neal (not Cowboy) Boortz
There are two prominent applications for these machines. The first is nuclear weapons simulation. Personally, I don't see the point to that. The other application is in weather prediction.
Oh, please. Buy a clue, will ya? There's lots and lots and lots of applications that use supercomputers, or could use if they were more affordable. A few examples from the top of my head:
Materials science, that is ab initio simulations, moldyn, you name it. This alone probably uses > 50 % of all supercomputer cpu time in the world. By comparison, weather prediction and nuke simulations is small potatoes (or shall we say, the simulations as such are big, but the number of people engaged in weather prediction or nuke simulation is really small compared to all the supercomputing materials scientists).
CFD, the automobile and aerospace sectors are big users.
Electronic design.
Seismic surveys, the oil industry uses lots and lots of supercomputers to find oil deposits.
Biology. Gene sequencing, moldyn simulations of lipid layers and whatever.
Climate prediction, somewhat related to weather prediction. Official purpose of the Earth Simulator.
All of the examples above could easily use almost any amount of cpu power you can throw at them. The only thing that stands between a lot of scientists and improved understanding of the world is computing power.
It's not just hardware: the amount of non-parallelizable code in parallel applications impacts scalability most tremendously.
The upper bound on speedup is generally Amdahl's law. Plainly, the efficiency approaches zero as the number of processes is increased. Generally we consider the major sources of overhead to be communication, idle time, and extra computation. Interprocess communication is considered negligible for serial programs in this context (we consider message passing). Idle time ends up contributing to overhead, because processes idle awaiting information from others. Extra computation is virtually unavoidable at some point; for instance in MPI's Single Program Multiple Data model, each process in tree-structured communication other than the root is eventually idled prior to the completion of computation, and each process determines IPC at some point based on rank.
There are notable exceptions to Amdahl's law, however; Gustafson, Montry and Benner wrote about such in Development of parallel methods for a 1024-processor hypercube, SIAM Journal on Scientific and Statistical Computing 9(4):609-638, 1988.