Cray XD1 Now Available
cyngus writes "Cray announced the availability of their XD1 systems. Each XD1 chassis has up to 12 AMD Operton processors. Up to 12 chassis can be clustered together in a rack. The XD1 uses Cray RapidArray Interconnect technology, based on HyperTransport, for high bandwidth and low latency communications between processors and chassises. The XD1 also has a handful of other technologies aimed at the HPC market, including Xilinx FPGAs, communications accelerators, etc."
It would take sixty racks of these to best the Earth Simulator's theoretical peak; more than 60% more processors.
Still, if they need someone to, uh, test one...
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
Since they had been bought by SGI, I've actually been wondering whether they would make me dream again.
Trolling using another account since 2005.
" this is a chance to hunt them down ". Go get 'em, tiger!
--
make install -not war
Being based on Opteron, any x86 software will run on it. Maybe without all the bells and whistles though. But Openmosix can solve most of those problems.
A Cray is not a true Cray unless it can be used as a stylish sofa :p
frotz grue
So, is the Operton more or less powerful than the Opteron?
Also, mandatory: imagine a Beowulf cluster of these.
Dr Cray was killed in 1995(in think thats the year) when his SUV was T-boned by a short car and rolled over.
I only know this from a Sci-Am article on using supercomputers to predict crash situation.
Bacardi + slashdot = negative karma.
Hes dead.
http://en.wikipedia.org/wiki/Seymour_Cray
Seymour Roger Cray (September 28, 1925 - October 5, 1996) was a supercomputer architect who founded the company Cray Research. For about 30 years, the short answer to the question "What company makes the fastest computer?" was "Wherever Seymour Cray is working now."
I've heard conflicting reports on this - reading Cray's own literature, you see them say:
"Tightly coupled to the AMD Opterons and switching fabric, [the RapidArray Communications Processors] handle memory to memory copies, global memory management, and system wide process synchronization, freeing..."
(Emphasis mine)
Does this mean the HT links give the OS the view of a single-system for each chassis? (Or rack, even?) Ie, can I utilize a single processor out of those 12 in a chassis, and access 96GB of RAM with that one process WITHOUT using MPI or rDMA?
I thought Cray was trying to convince the world that Clusters were not as good as true supercomputers, but this looks like a glorified cluster. In looking under the hood it appears to be just a collection of 2-way SMP Opterons with a superfast proprietary network backbone.
And it's running Linux, if that matters to you
Hey jack nuts, I posted this. I like Cray, because I think companies that put a lot of thought into their product and make great ones deserve a cheering section. Of course you're a BC kid, so I'll forgive you, we (BU) spanks you enough in hockey to let you have a shot here and there.
Dilbert: I can compute many values of pi. Some people discuss areas of circles, but I'm doing something about it!
"A witty saying proves nothing." ~Voltaire
"d'Oh!" ~Homer
From the linked page:
Highly modular, the Cray XD1 base unit is a chassis. Up to 12 chassis can be installed in a rack. Multirack configurations integrate hundreds of processors into a single system.
Farther down the same page:
The Cray XD1 compute subsystem is composed of 12 AMD Opteron(TM) 64-bit processors that run Linux and are organized as six 2-way SMPs to deliver 58 GFLOPs* per chassis. Finely tuned memory and I/O performance removes bottlenecks and maximizes processor performance.
Wow - do the math: 696 GFLOPs per chassis. That's rather impressive.
However, part of me is a bit saddened by seeing the Cray name attached to X86s. Yes, I felt the same thing with SGI, DEC, and Sun. Yes, I need to get over it and move on.
I want to drag this out as long as possible. Bring me my protractor.
Yes, I believe it is 12-way SMP. The memory is connected to the CPU's in a crossbar switch.
Cray has announced a lot of different sales of the XD1 the past couple of weeks. We have all the details here.
ignorance is bliss. googlefiberatx.com
the nec SX architecture uses these ridiculously huge custom vector processors to get performance (similar to the Cray 1, 2, XMP, YMP, etc design)
this Cray is more like building MPPs off of scalar units (opterons) and doing some real innovation around the MPP interconnect. It's sort of off the shelf, yet not at the same time.
The big thing here that kicks ass is the 6 FPGAs per chassis. If you can write a highly tuned software algorithm, there's a chance you can write a highly tuned peice of hardware, deploy that to the FPGA, and you've got an application specific hardware accelerator. 6 per chassis, infact. That's pretty cool, and its in some ways a HUGE innovation over having a dedicated vector unit (as was the cray1 design).
the really interesting thing here is that these are essentially opterons running linux, with custom interconnect goo. The interconnect bypasses the PCI bus - its closer to the PE's than that.. their claim is that it attaches to the AMD hypertransport bus (the Proc -> Proc -> Mem bus for SMP AMD machines)
My opinions are my own, and do not necessarily represent those of my employer.
> Cray HPC-enhanced Linux, Kernel version 2.4.21
I wonder what that means - Red Hat EL 3.0 with enhancements, or their own thing..
Interconnect - I wonder how their proprietary interconnect compares to IB..
File system - ext3? No cluster file system?
Crayola!
For my apps, I do iterative matrix calculations. However, one of the required data tables scales as n^2.3 (ish) of the system size. These can be precalculated, or calculated on demand. Typical size for a small run is 4-6 GB. I've filled a 40 GB array with data tables before.
Thus, the part that impacts runtimes the most is either the on disc lookup, which is still faster than direct calculation, which we've also had to do.
I looked into FPGA's a while back. Some back of envelope calculations show that a single FPGA should be able to calculated the data table on demand, and it'll be faster than reading from disc.
(Turns out, that to actually get a usable solution for a basic PC would need to hack up the whole tool chain. FPGA cards for a PC are all designed for DSP, rather than numerics).
So, with an FPGA and a CPU, I could elminated the slowest part of the job, and scale up to, what, a 1GB working matrix, which is about 8 time larger than the biggest job I've ever run, which hogged a T3E1200 for 6 hours.
So, in short, gimme an FPGA and some reasonable tool chain, and I will be able to about half runtimes, and, more importantly, scale up to 10 times larger calculations. 5 time larger calculations is the most I've ever been asked about.
Time to brush up on my VHDL, I think.
i was looking at cray.com and there's no mention of the Tera MTA. The Tera MTA was the innovative idea they had to have 128 logical threads on a single CPU.. think of hyperthreading but with 128 logical threads instead of.. 2.. and also it was working at least 8 years ago.
:/
If you look at cray.com today its pretty sad. 3 product lines - the TD1 opteron+magic, the X1, which is traditional cray vector (smp vector nodes, and MPP's of those nodes), and their 3rd product line is the NEC SX-6... they're reselling it in the states for NEC.
If you hit tera.com, you get a 404
My opinions are my own, and do not necessarily represent those of my employer.
Id releases Doom 3 for Linux, Cray announces availability of new supercomputer.
Dare we say, we've finally actually found the hardware that can run this game?
You can accomplish anything you set your mind to. The impossible just takes a little longer.
... the Gentoo Chief Marketing Officer made the following statement :
"We welcome the Cray XD1 as the first platform on which Gentoo installs in less than 12 hours. Looking forward to renaming Gentoo to 'One-Click-Linux'. Stay tuned !"
War doesn't prove who's right, just who's left.
Having Linux or any other OS (or even CPU type functions) on the FPGA would be a waste of gates. The gates would be better spent for specialized vector operations, such as an FFT or crypto engine.
(S(SKK)(SKK))(S(SKK)(SKK))
I guess the chickens win after all.
~D
This sig has been enciphered with a one-time pad. It could say almost anything.
Loads of Opterons? Who cares if GFX card is teh sux? Cray are a bunch of noobs. I bet it doesn't even have neon fans! You'll never get the chix showing them your 1337 skillz in CS with that heap of junk.
Chernobyl 'not a wildlife haven' - BBC News
This machine is really not much different to SGI's Altix, except running the AMD processors rather than Intel. This means that although each processor likely runs faster than the ones SGI uses, Cray can't bundle as many together, as AMD hasn't progressed nearly as far on SMP-aware chipsets as Intel.
This is some of the stupidest piles of drivel I have read on slashdot. SGI and Cray both do ALL of the glue logic chips themselves, that's the whole point of buying from them. They don't use the off the shelf chipset, they design their own with the design goal of large scalable systems. Besides Intel uses a shared bus where AMD uses the point to point bus they bought from Compaq which was origionally designed for the Alpha. So if anyone has a scalability lead it's AMD.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
If you can't afford this Cray, you can at least buy the parts to start putting together your own multi-processor Opteron system:
:-\
http://www.monarchcomputer.com/
A friend of mine and I were talking the other night about local Atlanta, GA computer stores, and he mentioned that Monarch Computer is one of the only vendors from whom you can purchase the 4-way Opteron 800 series processors ($1200 a piece -- damn!).
He's been in grad school out of state for a few years and was suprised to learn that Monarch Computer is, in fact, in his hometown backyard. Kind of kewl to walk in a store in your own town and walk out with a $1200 4-way processor.
Until the wife finds out and sends you back to said store with the receipt in hand for a refund.
IronChefMorimoto
P.S. - I don't work for these guys or advocate their store. I just thought it was cool to have such a vendor nearby. Too bad they don't sell Shuttle XPCs.
Cray systems may not always be the fastest thing around, but they are solid. It would be nice to see more producers paying careful attention to clean design and reliability over having the latest speed-booster.
It's nice to see our old friend Cray continue to keep a foot in the market -- if nothing else, it makes everyone else stay on their toes.
We may not imagine how our lives could be more frustrating and complex—but Congress can. – Cullen Hightower
Not when you want to run existing Linux apps on those gates, without rewriting them (which would be a waste of programmer time, much more valuable). And running Linux apps on those gates is a first step in porting them to native FPGA netlists. This is the way to leverage existing apps to increase the utility of FPGAs, by running useful apps on them, and optimizing to native parallel execution. That way we don't waste either gates OR programmer time, not to mention every other resource in the chain.
--
make install -not war
The MTA idea is neat, but nobody's ever been able to find a problem that runs all that well on them. The original MTA didn't have enough memory bandwidth to make it competitve with a vector machine, and the small number of them in the field (less than 10 IIRC) are notoriously cantankerous. When Tera bought Cray, the one of the main things they were buying, aside from name recognition, was Cray's CMOS design experience; they were hoping Cray's designers could help with the problems they'd run into with MTA.
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Cray now has three product lines to address 3 different market segments.
They have the X1, which is a massively parallel vector system for the very high-end. (For those who need 30+Gbytes/second of memory bandwidth for EACH cpu) These things are huge, expensive, and used by a limited number of users, mostly governments.
They are getting ready to productize red storm, which is also a bunch of opterons, but strung together in a shared-memory system like the T3E. also a high-end solution.
This system, the Xd1, is a low end system designed to be a half-step better than a cluster of off-the-shelf opterons. It's a multi-kernel cluster using MPI for all the data sharing. However the interconnect basically sits where the south-bridge sits on most opteron boxes.
So Cray still has the absolute cutting edge systems, but have now expanded down-market. (Rather, they acquired octiga-bay who did the early design work).
This is also not the first time this has happened. In the early 90s, Cray purchased a small start-up that was developing a NUMA-style mini-super based on sparc processors. They turned it into a product and sold a few, though not as many as they would have liked. During the SGI acquisition they sold the product to SUN, who branded it the E10000, and made about a billion dollars off of it. It's now the foundation for all of Sun's high-end Unix servers.
Cray also bought a small company (I forget the name) that made a cmos implementation of the YMP. This became the ymp-el, the J90, which pioneered technology for the SV1.
Cray has often built mid-range systems. Nothing new.
12 Opterons deliver 58* GFlops (where * = peak). The Army's recent G5 cluster (1566*2 G5 processors running at 2GHz) deliver 25* TFlops. 58 divided by 12 yields 4.8* GFlops per chip for an Opteron, and 25000 divided by 3132 yields 8* GFlops per chip for the G5. What's wrong with this math? I didn't think the G5 had numbers THAT much better than an Opteron. And with G5s hitting 2.5GHz today the numbers would be much worse (or better, depending on your point of view).
If I didn't have absolutely NOTHING to do, I wouldn't be here.
Cray not-too-long-ago had major announcements with the RedStorm project. I believe that system is supposed to be a single image 10,000 CPU AMD based rig. There are some oddities friends have pointed out, like the OS is based on IRIX I believe...
Yea check this out:
Cray Unicos/mp"
Actually that references the X1, which is not based on PeeCee stuff, but actually a 8 core MPM.
Sad thing is, even with Red Storm I think IBM will remain on top as their contract calls for 130,000 of their powerPCs on one system?
It would be nice to see Cray on top, with something other than a commoditiy processors. I realize the T3D and T3E were both Alpha based systems.
PS, I still have a J932se 32 proc Vector Cray ( for sale ) if anyone wants a Cray for home. $4500, real deal 3 cabinet Cray from 97', most likely used for gov't nuclear energy something-or-other. Located in Southeastern Virginia.
Southeastern Virginia REPRESENT!
After searching everywhere for the legendary "Wang Computer" tshirt, I decided to fall abck on teh second geekiest computer company to get a shirt from, Cray. I couldn't find a shirt through the normal outlets (eBay/ThinkGeek), so I called them directly. The woman that answered was glad to help and shipped out, not a tshirt, but a very nice collared shirt that makes it look like I work for Cray! I wer it to all the conventions and I become cool(er).
*queue calls to Cray*
Sort of sad they abandonned their custum CPUs for these commodity CPUs. Their liquid cooling was pretty nihilistic. You'd think there would be a lot to be gained from the old techniques of restricting everything to 64 bit operations, liquid evaporation cooling, and quad core parts.