Cray CTO Says Cray Computers Are Great

← Back to Stories (view on slashdot.org)

Cray CTO Says Cray Computers Are Great

Posted by michael on Friday August 20, 2004 @03:06AM from the couldn't-be-any-other-way dept.

Jan Stafford writes "Linux clusters can not offer the same price-performance as supercomputers, according to Paul Terry, chief technology officer of Burnaby, British Columbia-based Cray Canada. In this interview, Terry explains that assertion and describes Cray's new Linux-based XD1 system, which will be priced competitively with other types of high-end Linux clusters."

5 of 338 comments (clear)

Min score:

Reason:

Sort:

Re:The issues are progress and long-term usefulnes by Marx_Mrvelous · 2004-08-20 03:15 · Score: 4, Interesting

There are some limitations to clusters that "supercomputers" don't have. Even if your network were exactly as fast as the internal bus of one of the Cray supercomputers (which I highly doubt it is), you still have a logical layer on top of it (TCP/IP/UDP etc). This slows it down.

For some applications, a cluster of slow PCs is ok. Bu if you want to do real time-intensive computation, you really can't beat a good internal bus.

--

Moderation: Put your hand inside the puppet head!
Re:Maybe "APPLE" will buy another Cray! by Thagg · 2004-08-20 04:00 · Score: 4, Interesting

As usual, there is more to the story. Apple brought my company in on a project back in the mid 80's when they bought the Cray. While we had to sign an NDA in blood, I doubt anybody will mind me talking about it now, almost 20 years later.

Apple was trying to design a new cpu chip. It would have had vector processing capabilities not all that different from the Cray, so they bought the Cray both to do circuit simulations on the chip and as a model for their own design.

The chip was going to be a 100 MHz chip (an astonishing speed for the time) with a four-pipleline vector processing unit.

They considered (but eventually declined to) hire us to develop some kind of 3D desktop for the Mac. The idea was this would distinguish the Mac further from other computing systems, but they wouldn't be able to emulate the interface because they didn't have the horsepower.

Anyway, that's the Apple-Cray story as I understand it. I'm sure that there is a lot more to the story than I know, of course.

Thad Beier

--
I love Mondays. On a Monday, anything is possible.
Re:The issues are progress and long-term usefulnes by einhverfr · 2004-08-20 04:24 · Score: 3, Interesting

You might want to read the latest 10-K form from CRAY.

http://www.sec.gov/Archives/edgar/data/949158/0000 89102004000325/v96761e10vk.htm

Here they discuss the limitations of clusters and vector-based supercomputing.

Basically, they offer three types of supercomputers aimed at different markets: vector, massively parallel, and multithreaded. Not really sure why multithreaded means in this context (Microkernel capable of threading itself across many processors i.e. UNICOS/mk?) but they do a decent job of explaining the whole thing:

Cray Research pioneered the use of vector systems, from the Cray-1 to the Cray C90 and T90 systems. These systems typically use a moderate number (one to 32) of very fast custom processors in connection with a shared memory. Vector processing has proven to be highly effective for many scientific and engineering application programs which over the years have been written to maximize the number of long vectors. Traditional vector systems do not scale effectively (that is, increase performance by increasing the number of processors) past a limited number of processors. We currently market one classic vector supercomputer, the Cray SX-6 system.

Massively parallel processing architectures typically link tens, hundreds or thousands of standard or commodity processors to act either on multiple tasks at the same time or together in concert on a single computationally-intensive task. Type T systems connect each processor directly to its own private memory and the programmer must manage the movement of data among memory units and processors. Consequently these systems can be difficult to program. Type C massively parallel systems, unlike low bandwidth clusters, have high bandwidth and low latency interconnect systems and are said to be "tightly coupled" -- the Cray T3E, Red Storm and the OctigaBay product are examples of balanced high bandwidth purpose built systems that employ standard microprocessors.

The Cray X1 system is revolutionary in that it is the first supercomputer that combines the attributes of both vector and high bandwidth massively parallel systems. The Cray X1 system has up to 64 processors per cabinet and a shared memory. The Cray X1 system can run small problems as a vector processor would or, by focusing many processors on a task, the Cray X1 system operates as a massively parallel system with a system-wide shared memory and a single-system image. The Cray X1 system is designed to provide efficient scalability and high bandwidth to run complex applications at high sustained speeds. The Cray X1E system furthers this architectural design with increased processor speed and capability.

Our MTA-2 project for NRL is designed to have sustainable high speed, be broadly applicable and easy to program, provide scalability as systems increase in size and have balanced I/O capability. The multithreading processors make the MTA-2 system latency tolerant and, with the system's flat shared memory, able to address data anywhere in the system.

--

LedgerSMB: Open source Accounting/ERP
Let's do some bandwidth math... by JBMcB · 2004-08-20 04:26 · Score: 3, Interesting

From Cray (From XD1 page):
"A 96 GB per second, nonblocking, crossbar switching fabric in each chassis provides four 2 GB per second links to each two-way SMP and twenty-four 2 GB per second interchassis links."

-So for a dual-opteron XD1 processor unit, there is 8GB total bandwidth available.

Total aggregate PCI bandwidths (Accepted standards):

PCI32 33MHz = 133MB/s
PCI32 66MHz = 266MB/s
PCI64 33MHz = 266MB/s
PCI64 66MHz = 533MB/s
PCI-X 133MHz = 1066MB/s
PCI Express = 200MB/s (Per slot)
PCI Express x16 = 3000MB/s (Usable bandwidth)

-So for PCI Express x16 we're talking 3GB/second

SMP Opteron with two PCI Express x16 slots can do 6GB/second aggregate bandwidth. A couple of Infiniband links can easily saturate that. I'm sure this all costs quite a bit less than Cray's propriatary stuff.

--
My Other Computer Is A Data General Nova III.
Re:Two words: by iPaul · 2004-08-20 04:52 · Score: 4, Interesting

Not quite true. First off, you get much higher bandwidth between processors using proprietary (NUMA) based interconnects than you can with commodity hardware. Why? Because you can optimize for your situation. Second you can exploit things like cache-coherency between processors (even if they're in different "nodes") and therefore true shared memory. So, a 1024 processor SGI Altrix, or a 256 processor Cray is one computer as far as the OS and user-land stuff is concerned.

There's another advantage Cray has on the SV and X series and that's a vector unit on the processor. That allows you to conduct operations on arrays of numbers at once instead of having to cycle through the numbers in a loop. For example, the dot_product between two small arrays might be accomplished with one or two instructions, as opposed to a loop. Apple's AltiVec is also a vector unit.

If you took money out of the picture it would be easier to deal with a big-honkin' super computer like an SGI or Cray rather than a cluster. One computer is easier to manage and you could always use threads and plain old heap memory (which is much faster than message passing over a network).

Add money back in and 500,000 goes a lot farther in raw compute power when you're buying racks of DELLs and infiniband interconnects. However, depending on the application, you may be faster, slower, or even dog-slow compared to the cray. If you need the answer today, and the $ is not a factor, go to Cray or SGI with a blank check. If you have to balance cost and time, then a cluster might be better.

Essentially, it boils down to how much communication you do between nodes. Cray does it orders of magnitude faster than off-the-shelf stuff. If you hardly ever pass messages between nodes, clusters are fast. If you have to pass a lot of messages between nodes, one big computer will trounce lots of little ones.

--
Leave the gun, take the cannoli -- Clemenza, The Godfather