Cray Supercomputers to be Based on AMD Opterons

← Back to Stories (view on slashdot.org)

Cray Supercomputers to be Based on AMD Opterons

Posted by ScuttleMonkey on Monday November 14, 2005 @07:26AM from the all-eggs-in-one-basket dept.

PsychicX writes "AMD and Cray have announced an agreement to base Cray supercomputers on AMD's Opteron line until the end of the decade, and to collaborate on Cray's 2006 proposal for Phase 3 of the federal government's DARPA HPCS (High Productivity Computing Systems) program. Cray already offers the XT3 and XD1 supercomputers based on Opteron."

7 of 197 comments (clear)

Min score:

Reason:

Sort:

Re:excellent by fgodfrey · 2005-11-14 07:41 · Score: 5, Informative

Tera bought far more than a name when they bought us. They also bought a bunch of software and hardware people, many of whom (myself not included) have been with Cray Research (the original Cray) for many years. So, while it's certainly not the Cray of the mid-1980's, the tradition still goes back there, especially with the vector machines like the Cray X1/X1E and its impending follow-on.

--
Go Badgers! -- #include "std/disclaimer.h"
Opteron is not NexGen's tech by tomcres · 2005-11-14 07:47 · Score: 3, Informative

K6 technology was acquired and modified by AMD. The K7 and K8 were designed by AMD. True, many of the engineers on the K7 and K8 teams were probably ex-NexGen since AMD acquired that company, but so what? They are truly AMD innovations. At least they didn't sink all of their research into the Itanic!
Re:It only makes sense by fgodfrey · 2005-11-14 07:55 · Score: 4, Informative

I've never completely understood this argument (yes, I admit, I'm heavily biased). If I want to build a skyscraper, I'm not going to use the "mass market" crane that puts up the roof of a residential house. I'm going to use a specialized crane that's meant for building skyscrapers.

That doesn't mean that there isn't a place for commodity hardware in supercomputing, but to say that there's no room for custom hardware either misses the point. The only thing "off the shelf" about an AMD based Cray is the AMD. The logic board, and, most importantly, the network that interconnects the processors is entirely custom. Not to mention the fact that Cray will still build some entirely custom processors...

By the way - this is hardly the first Cray based on a commodity processor. The T3E and T3D were both Alpha processors, yet nobody calls those machines "commodity".

--
Go Badgers! -- #include "std/disclaimer.h"
Not to be a jerk...but this is old news.... by jamesgomez · 2005-11-14 07:56 · Score: 3, Informative

dated from June 16, 2005

Check out the article here...
http://www.hypertransport.org/consortium/cons_pres srelease.cfm?RecordID=79/
Re:excellent by flaming-opus · 2005-11-14 10:13 · Score: 5, Informative

Close.
Craylink was designed at SGI, and renamed to craylink after they bought Cray. They introduced craylink in the origin2000, which they started selling half a year after buying cray, so I'm sure they couldn't have integrated any cray-designs into their product in that span.

After they sold Cray to Tera, SGI started calling the technology Numalink, and currently use it in their origin3, altix3, and altix4 product lines. They are on the 4th generation of the technology, which is 3.2GB/s per direction. The cray that was sold to Tera included the half-finished X1 system, which also uses numalink. It uses the older 1.6GBps/dir links, but uses 32 networks in parallel for a total of ~50GB/s/dir per node.

The Cray XT3 uses a newer network interconnect called seastar, which offers 3.8GBps/direction. This is probably what will be used in the X1's successor.

The Cray XD1, which your colleague bought, is a product cray acquired when they bought OctigaBay. They use an interconnect called the RappidArray switch, which provides 4GBps/direction of interconnect.

All of these interconnects are high-bandwidth and low latency. The XD1, is also very inexpensive for a cray, which is always nice.
Re:excellent by flaming-opus · 2005-11-14 10:32 · Score: 4, Informative

No they won't! They have no reason to. The vector units that a cray uses aren't like altivec, sse, or other "bolt-on" vector units. The vector unit on a cray (or NEC) is a latency hiding mechanism. It's a method for forcing the programmer/compiler to structure the code such that the data loaded from memory is used a significant period of time after the load is initiated. This works pretty well on the HPC code that is used on crays, but not at all for the everyday server/workstation code that opterons run. Furthermore, to support that sort of vector unit, you need to have about eight times as much memory bandwidth as an opteron, which means many more pins on the socket, which are very expensive.

I think you're much more likely to see the cray vector processor retooled with lots of hypertransport connections, so it can use an opteron as its scalar unit, and use the same seastar routers that the xt3 uses. On the X1, the scalar unit already runs ahead of the vector unit, so I bet it's not all that important for the scalar unit to be on-die.
Re:excellent by joib · 2005-11-15 00:19 · Score: 3, Informative

No they won't! They have no reason to.

Yes, you're probably right that it doesn't make sense for AMD economically. But I want to run numerical codes at more than 5 % peak performance on my cheap Opterons, so I want to believe. ;-)

The vector units that a cray uses aren't like altivec, sse, or other "bolt-on" vector units. The vector unit on a cray (or NEC) is a latency hiding mechanism. It's a method for forcing the programmer/compiler to structure the code such that the data loaded from memory is used a significant period of time after the load is initiated.

Yes, I know. And that's precisely the reason why I'd like to see real vectors instead of the sse/altivec toy ones. Main memory latency is hundreds of cycles, and it's getting worse all the time.

Additionally, from a microarchitecture perspective, vectors have quite a few advantages there too.

This works pretty well on the HPC code that is used on crays, but not at all for the everyday server/workstation code that opterons run.

I'm not sure about that. I guess technical apps vectorize just as well as HPC codes (well perhaps not the UI, but the code that runs the actual simulation or whatever). Heck, even some database code vectorizes nicely (sorting and hash joins).

Furthermore, to support that sort of vector unit, you need to have about eight times as much memory bandwidth as an opteron, which means many more pins on the socket, which are very expensive.

Yes, as I said some Alpha Tarantula like design is probably overkill for the vast majority of the market. My point was that a vector ISA extension with modest execution resources wouldn't need that much die area, and could help make better use of the available bandwidth, whatever that bandwidth is. As you said yourself, the expensive thing is IO. Transistors are cheap by comparison. So not having instructions that allow one to effectively use the available IO resources is a real shame.

I think you're much more likely to see the cray vector processor retooled with lots of hypertransport connections, so it can use an opteron as its scalar unit, and use the same seastar routers that the xt3 uses. On the X1, the scalar unit already runs ahead of the vector unit, so I bet it's not all that important for the scalar unit to be on-die.

Yes, that sounds feasible. IIRC it is something like this that Cray has cooked up for the Cascade project; I.e. a node consists of 8 (or was it 4) scalar processors connected to memory (I guess these could be Opterons or further in the future some kind of Processor-in-memory (PIM) stuff), and a vector unit with its own cache and fast access to the main memory via the scalar cpu:s.

As for the seastar thing, I think you're right that that's what they'll use for inter-node communication. Currently X1(E) uses Numalink licenced from SGI, so they're certainly looking at replacing that with existing in-house tech. BTW, 2H2006 will see the XT4, with the new Opteron sockets with DDR2 memory and the Seastar2 router that provides twice the BW compared to the existing Seastar.