Cray XD1 Now Available
cyngus writes "Cray announced the availability of their XD1 systems. Each XD1 chassis has up to 12 AMD Operton processors. Up to 12 chassis can be clustered together in a rack. The XD1 uses Cray RapidArray Interconnect technology, based on HyperTransport, for high bandwidth and low latency communications between processors and chassises. The XD1 also has a handful of other technologies aimed at the HPC market, including Xilinx FPGAs, communications accelerators, etc."
It would take sixty racks of these to best the Earth Simulator's theoretical peak; more than 60% more processors.
Still, if they need someone to, uh, test one...
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
Since they had been bought by SGI, I've actually been wondering whether they would make me dream again.
Trolling using another account since 2005.
A Cray is not a true Cray unless it can be used as a stylish sofa :p
frotz grue
Dilbert: I can compute many values of pi. Some people discuss areas of circles, but I'm doing something about it!
"A witty saying proves nothing." ~Voltaire
"d'Oh!" ~Homer
From the linked page:
Highly modular, the Cray XD1 base unit is a chassis. Up to 12 chassis can be installed in a rack. Multirack configurations integrate hundreds of processors into a single system.
Farther down the same page:
The Cray XD1 compute subsystem is composed of 12 AMD Opteron(TM) 64-bit processors that run Linux and are organized as six 2-way SMPs to deliver 58 GFLOPs* per chassis. Finely tuned memory and I/O performance removes bottlenecks and maximizes processor performance.
Wow - do the math: 696 GFLOPs per chassis. That's rather impressive.
However, part of me is a bit saddened by seeing the Cray name attached to X86s. Yes, I felt the same thing with SGI, DEC, and Sun. Yes, I need to get over it and move on.
I want to drag this out as long as possible. Bring me my protractor.
the nec SX architecture uses these ridiculously huge custom vector processors to get performance (similar to the Cray 1, 2, XMP, YMP, etc design)
this Cray is more like building MPPs off of scalar units (opterons) and doing some real innovation around the MPP interconnect. It's sort of off the shelf, yet not at the same time.
The big thing here that kicks ass is the 6 FPGAs per chassis. If you can write a highly tuned software algorithm, there's a chance you can write a highly tuned peice of hardware, deploy that to the FPGA, and you've got an application specific hardware accelerator. 6 per chassis, infact. That's pretty cool, and its in some ways a HUGE innovation over having a dedicated vector unit (as was the cray1 design).
the really interesting thing here is that these are essentially opterons running linux, with custom interconnect goo. The interconnect bypasses the PCI bus - its closer to the PE's than that.. their claim is that it attaches to the AMD hypertransport bus (the Proc -> Proc -> Mem bus for SMP AMD machines)
My opinions are my own, and do not necessarily represent those of my employer.
Crayola!
For my apps, I do iterative matrix calculations. However, one of the required data tables scales as n^2.3 (ish) of the system size. These can be precalculated, or calculated on demand. Typical size for a small run is 4-6 GB. I've filled a 40 GB array with data tables before.
Thus, the part that impacts runtimes the most is either the on disc lookup, which is still faster than direct calculation, which we've also had to do.
I looked into FPGA's a while back. Some back of envelope calculations show that a single FPGA should be able to calculated the data table on demand, and it'll be faster than reading from disc.
(Turns out, that to actually get a usable solution for a basic PC would need to hack up the whole tool chain. FPGA cards for a PC are all designed for DSP, rather than numerics).
So, with an FPGA and a CPU, I could elminated the slowest part of the job, and scale up to, what, a 1GB working matrix, which is about 8 time larger than the biggest job I've ever run, which hogged a T3E1200 for 6 hours.
So, in short, gimme an FPGA and some reasonable tool chain, and I will be able to about half runtimes, and, more importantly, scale up to 10 times larger calculations. 5 time larger calculations is the most I've ever been asked about.
Time to brush up on my VHDL, I think.
Id releases Doom 3 for Linux, Cray announces availability of new supercomputer.
Dare we say, we've finally actually found the hardware that can run this game?
You can accomplish anything you set your mind to. The impossible just takes a little longer.
This machine is really not much different to SGI's Altix, except running the AMD processors rather than Intel. This means that although each processor likely runs faster than the ones SGI uses, Cray can't bundle as many together, as AMD hasn't progressed nearly as far on SMP-aware chipsets as Intel.
This is some of the stupidest piles of drivel I have read on slashdot. SGI and Cray both do ALL of the glue logic chips themselves, that's the whole point of buying from them. They don't use the off the shelf chipset, they design their own with the design goal of large scalable systems. Besides Intel uses a shared bus where AMD uses the point to point bus they bought from Compaq which was origionally designed for the Alpha. So if anyone has a scalability lead it's AMD.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Cray now has three product lines to address 3 different market segments.
They have the X1, which is a massively parallel vector system for the very high-end. (For those who need 30+Gbytes/second of memory bandwidth for EACH cpu) These things are huge, expensive, and used by a limited number of users, mostly governments.
They are getting ready to productize red storm, which is also a bunch of opterons, but strung together in a shared-memory system like the T3E. also a high-end solution.
This system, the Xd1, is a low end system designed to be a half-step better than a cluster of off-the-shelf opterons. It's a multi-kernel cluster using MPI for all the data sharing. However the interconnect basically sits where the south-bridge sits on most opteron boxes.
So Cray still has the absolute cutting edge systems, but have now expanded down-market. (Rather, they acquired octiga-bay who did the early design work).
This is also not the first time this has happened. In the early 90s, Cray purchased a small start-up that was developing a NUMA-style mini-super based on sparc processors. They turned it into a product and sold a few, though not as many as they would have liked. During the SGI acquisition they sold the product to SUN, who branded it the E10000, and made about a billion dollars off of it. It's now the foundation for all of Sun's high-end Unix servers.
Cray also bought a small company (I forget the name) that made a cmos implementation of the YMP. This became the ymp-el, the J90, which pioneered technology for the SV1.
Cray has often built mid-range systems. Nothing new.