Cray XD1 Now Available
cyngus writes "Cray announced the availability of their XD1 systems. Each XD1 chassis has up to 12 AMD Operton processors. Up to 12 chassis can be clustered together in a rack. The XD1 uses Cray RapidArray Interconnect technology, based on HyperTransport, for high bandwidth and low latency communications between processors and chassises. The XD1 also has a handful of other technologies aimed at the HPC market, including Xilinx FPGAs, communications accelerators, etc."
SGI does not own CRAY. They did buy them back in 1996. SGI sold its Cray unit in 2000 to Tera Computer.
I've heard conflicting reports on this - reading Cray's own literature, you see them say:
"Tightly coupled to the AMD Opterons and switching fabric, [the RapidArray Communications Processors] handle memory to memory copies, global memory management, and system wide process synchronization, freeing..."
(Emphasis mine)
Does this mean the HT links give the OS the view of a single-system for each chassis? (Or rack, even?) Ie, can I utilize a single processor out of those 12 in a chassis, and access 96GB of RAM with that one process WITHOUT using MPI or rDMA?
I thought Cray was trying to convince the world that Clusters were not as good as true supercomputers, but this looks like a glorified cluster. In looking under the hood it appears to be just a collection of 2-way SMP Opterons with a superfast proprietary network backbone.
And it's running Linux, if that matters to you
For my apps, I do iterative matrix calculations. However, one of the required data tables scales as n^2.3 (ish) of the system size. These can be precalculated, or calculated on demand. Typical size for a small run is 4-6 GB. I've filled a 40 GB array with data tables before.
Thus, the part that impacts runtimes the most is either the on disc lookup, which is still faster than direct calculation, which we've also had to do.
I looked into FPGA's a while back. Some back of envelope calculations show that a single FPGA should be able to calculated the data table on demand, and it'll be faster than reading from disc.
(Turns out, that to actually get a usable solution for a basic PC would need to hack up the whole tool chain. FPGA cards for a PC are all designed for DSP, rather than numerics).
So, with an FPGA and a CPU, I could elminated the slowest part of the job, and scale up to, what, a 1GB working matrix, which is about 8 time larger than the biggest job I've ever run, which hogged a T3E1200 for 6 hours.
So, in short, gimme an FPGA and some reasonable tool chain, and I will be able to about half runtimes, and, more importantly, scale up to 10 times larger calculations. 5 time larger calculations is the most I've ever been asked about.
Time to brush up on my VHDL, I think.
After searching everywhere for the legendary "Wang Computer" tshirt, I decided to fall abck on teh second geekiest computer company to get a shirt from, Cray. I couldn't find a shirt through the normal outlets (eBay/ThinkGeek), so I called them directly. The woman that answered was glad to help and shipped out, not a tshirt, but a very nice collared shirt that makes it look like I work for Cray! I wer it to all the conventions and I become cool(er).
*queue calls to Cray*