Time For A Cray Comeback?

← Back to Stories (view on slashdot.org)

Posted by simoniker on Monday August 4, 2003 @09:41AM from the cray-cray-come-again-another-day dept.

Boone^ writes "The New York Times has an article (free reg. req.) talking about Cray Inc.'s recent resurgence in the realm of supercomputing. It discusses a bit of Cray's decline when the Cold War ended, "the occupation" under SGI, and the rebirth of the company after the Tera (now Cray Inc.) purchase. Recently Cray Inc. has been shipping their vector-based Cray X1 machine, designing ASCI Red Storm, and recently was one of 3 (also Sun, IBM) to win a large DARPA contract (PDF link) to design and develop a PetaFlops machine by 2010. Could Cray Inc. be poised for a comeback? Wall Street seems to think so."

15 of 266 comments (clear)

Registration not required by Anonymous Coward · 2003-08-04 09:44 · Score: 5, Informative

Partner Link

Posting as Anonymous Coward, please award my Karma to starving children in the world.
Petaflops by 2010? by Pope+Raymond+Lama · 2003-08-04 09:47 · Score: 5, Funny

Of course I expect that...in my Playstation IV,
equipped with an opto-quantic Emotion Engine VI
and a couple petabytes of holographic storage.

--
-><- no .sig is good sig.
2010? by stratjakt · 2003-08-04 09:53 · Score: 5, Funny

There's a whole bunch of PETAFlops outside of McDonalds right now having a sit in and screaming about how fur is murder.

I had to literally step on their faces to get a Big Mac.

--
I don't need no instructions to know how to rock!!!!
Correct me if I'm wrong ... by SuperDuG · 2003-08-04 09:54 · Score: 5, Insightful

... but wouldn't the fact the market for supercomputers isn't exactly that large. I mean you've got governmental contracts (research, educational, who knows what) that have to take up 95% of all the purchases made, and then a small private market. I mean how many companies are striving for a petaflop machine to run their database server?
If you look at the list of top 100 supercomputers, there are systems that are almost 15 years old or even older (not sure on a few). I know these take years to build and are multibillion dollar projects, but between time has got to be a killer.
Then there's the question of ... what do you need a supercomputer for? The applications are pretty limited for a need for a petaflop computer, unless your doing mass storage, cryptography (cracking), or simulations.
Don't get me wrong I'm all about nuclear testing being done in 1's and 0's instead of in the ocean or in the desert, but how big of a bomb do you really need when it's estimated theres enough nukes to blast the entire land surface of the earth 3 times over.

--
Ignore the "p2p is theft" trolls, they're just uninformed
1. Re:Correct me if I'm wrong ... by Doesn't_Comment_Code · 2003-08-04 10:11 · Score: 5, Funny
  
  Then there's the question of ... what do you need a supercomputer for? The applications are pretty limited for a need for a petaflop computer, unless your doing mass storage, cryptography (cracking), or simulations.
  
  You're missing the big picture...
  
  Massive multiplayer Quake on a 614,400 x 819,200 screen.
  
  Thank you Cray.
  
  --
  
  Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
2. Re:Correct me if I'm wrong ... by morcheeba · 2003-08-04 10:28 · Score: 5, Informative
  
  Yep, you are a bit wrong... (you didn't think a challenge to the slashdot community would go unnoticed?!)
  
  From this site, you can see the breakdown by organization:
  Usage..... Count Share Rmax Rpeak Procs Industry... 202 40.4 % 82398 182964 62869 Research... 131 26.2 % 187689 278030 120046 Academic... 115 23 % 77143 133564 45216 Classified.. 27 5.4 % 14167 20691 12892 Vendor...... 22 4.4 % 11033 15545 5230 Government... 3 0.6 % 1317 2256 528 Total...... 500 100 % 373749 633052 246781
  There are a lot of companies that use supercomputers, although maybe not the type you're thinking of. Of course, there are the number-crunchers: oil companies are big users (to crunch data & find new oil), and car companies (BMW). But there are also the transaction-processors, like SprintPCS and Ebay (used to be in the top 500), that make the list just by the sheer number of connected processors.
  
  Here's the latest list
  
  --
  HIV Crosses Species Barrier... into Muppets
Re:explain by Doesn't_Comment_Code · 2003-08-04 09:54 · Score: 5, Informative

Well, a well engineered supercomputer has much less overhead than a cluster. One superfast processor doesn't have to deal with interprocessor communcations like a cluster does.

And if your supercomputer has multiple processors, they are generally made to cooperate nicely to speed efficiency. Whereas a cluster has to go through ethernet and hardware layers to communicate between nodes. Granted that is fast, but on-board communication is faster.

It seems strange, but a multiple processor computer can actually perform a task slower than just one processor working on the problem if the program and os aren't designed well. So a lot of the value of a supercomputer comes in its design, and the reputation of the manufacturer. And Cray is pretty reliable in my book.

But the REAL key to the potential comeback of the Cray computer will be whether or not it still has cool bubbles! Wow!!! Cray computing... the inventor of case mods.

--

Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
Re:explain by Anonymous Coward · 2003-08-04 09:55 · Score: 5, Funny

can someone explain to me what the benefit of a moving van is compared to buying a fleet of pintos?
Re:explain by anzha · 2003-08-04 09:58 · Score: 5, Informative

Memory to processor feeding: std ots processors are often idle because the memory subsystem cannot feed the processor fast enough. This is bad now. It will be getting a lot worse.
Interconnections between processors: this goes beyond merely processors on a board, but between boxes. The bus architectures out there for the std ots hardware get saturated very quickly. This gets worse between boxes. In addition the latency on Myranet and Quadrics (compared to what Cray et al do) is horrible even if it is excellent compared to ethernet.
Problem set vs architecture: Not all problems map out well to clusters, or even SMP boxen. Some map best to vector machines. Some map best to tightly integrated MPPs. Some map out to moderately tight clusters. Some are just plain 'embarassingly parallel'. Others are highly threaded and don't work well on vector or scalar machines. etc, etc. The architecture ought to match the problem set.
MTBF: Mean time between failures. Commodity hardware goes kaputt much more often. A cluster capable of teraflop performance of custom hardware tends to need constant and evil levels of care and feeding: ie you better have a grad student on roller blades.
Those are just off the top of my head. I am sure that others will Tell you others before I can post again. ;)
Summarized: bandwidth, latency, problem set, and failure rate.
HTH.

--
Do you know why the road less traveled by is littered with the bones of the unwary?
Re:explain by fgodfrey · 2003-08-04 10:09 · Score: 5, Informative

As other replies have posted, bandwidth is the big issue. And by bandwidth, we are talking bandwidth of the processor to memory. Cache is great and all, but if you are stepping through gigabytes of data (or in some cases terabytes of data), your problem isn't going to fit in cache. The speed of your processor will then be dominated by the speed at which it can get to main memory. On a PC, that's slow. What's even slower is when you have to exchange data to a remote node in the cluster. Current massively parallel supercomputers (which is pretty much all of them) have phenomenal bandwidth between processors and memory and between nodes.

Second, (yes, I work for Cray so now I'm going to put in a sales pitch :) our processors are vector processors. As such, you can hide a lot of the latency of getting to memory by queueing up 64 loads at once. Short length vectors are what is used by MMX and Altivec to accelerate graphics. With sufficient vector operation chains, you can keep the processor busy all the time. You can't do that on a PC. I've heard (no, I don't have actual links to articles) that 10% of peak performance on a cluster is considered really good. Our customers wouldn't consider that anywhere near "really good".

Finally, there's memory. Lots of it. A single system image supercomputer can have terabytes of memory in one kernel image. You're simply not going to get that in a single PC cabinet.

Finally, in case anyone doubts that vectors, big memory, and large bandwidth can make a good system, the fastest machine in the world right now is the Japanese "Earth Simulator" machine which is an NEC SX machine. That is somewhat similar in architecture to a Cray in that it has large bandwidth and vectors.

--
Go Badgers! -- #include "std/disclaimer.h"
Re:explain by virtual_mps · 2003-08-04 10:12 · Score: 5, Interesting

MTBF: Mean time between failures. Commodity hardware goes kaputt much more often. A cluster capable of teraflop performance of custom hardware tends to need constant and evil levels of care and feeding: ie you better have a grad student on roller blades.

Hahahaha. Have you ever actually run a supercomputer? They tend to have much higher failure rates then normal servers. Couple of reasons: first, they push the envelope of a given technology. The sweet spot for stability is not the leading edge. Second, they're not nearly as well tested as mainstream hardware. On a platform with thousands of installations you're much less likely to run into a problem nobody has seen before than you are on a platform with only dozens of installations.
Comeback? by virtual_mps · 2003-08-04 10:27 · Score: 5, Insightful

Probably not. Cray made some money back when a supercomputer was something that an ordinary company might need. The capabilities of "normal" computers was much more limited then today, so there was a much higher percentage of the buying public likely to want something more. These days the vast majority of users are happy with something mainstream

But, you ask, isn't there a lunatic fringe who wants more power at any price? Well, the lunatic fringe ain't what it used to be. During the heyday of cray you got a damn fine box and nothing else. Cray didn't want to worry about your software--or even an OS. A person who needed the speed would plunk down the money for the box and then pay a couple of guys to code everything from scratch. Those days are gone--software is the driving factor these days, and people are far less willing to buy something that's going to force a total code rewrite. Especially if that thing is only going to buy them a couple of years of edge before they need to recode for the next best thing.

Then there's the question of whether cray can afford to be bigger. The answer is "probably not". If you sell to a lot of customers you need a huge support infrastructure. Cray doesn't have much of one anymore, so they'd need to buy one. (Most of the old support guys left one way or another when SGI came in, or stayed with SGI.) If you have a lot of customers you can spread the costs around, but in the case of a company like cray a support infrastructure means having a people sitting around most of the time in every region you sell a machine. Maybe two to four guys per system (24x7, right?) plus some sorta warehouse facility if you enter a new geographical market. That's expensive. You can bill a lot of that cost back to the customers, but that just makes your systems less competetive.

I think the long term answer is that cray will be a very small niche player, selling to a very select group of (U.S.) government agencies, with the occasional pro forma business customer thrown in so the company can issue press releases. Even most government facilities aren't in a position to buy a cray anymore. (Research money is fairly tight, recoding costs are prohibative, MTBF's are more of an issue then they used to be, etc.)
The trick is keeping ahead of the commodity guys by putaro · 2003-08-04 10:47 · Score: 5, Interesting

Supercomputing per se died because Intel, DEC, IBM/Motorola had a lot more money to throw at speeding things up than the supercomputing community.

In the 70's up until the early 90's it was possible to build a custom CPU out of discrete logic that ran significantly faster than the available microprocessors. Cray was able to push their clock cycle down into the nanosecond range through clever design. However, a 1ns clock rate == 1GHz. You can go buy that multi-million dollar CPU for a couple of hundred bucks in today's market.

In order for superocmputing to be viable you have to be able to provide quantum leap performance above the commodity hardware AND keep your cost/performance ratio in line as well.

The CRAY-1 came out with a clock speed of about 80 MHz and vector processing and high memory bandwidth at a time when mainstream systems like the PDP 11/70 were running at about 7MHz with a 1MB/s memory bus. Microprocessors weren't even't a joke compared with the Cray.

The new Japanese NEC supercomputer came with a price tag of about $160 million if I remember correctly (some estimates say that it took $1G in research funding) and hits 35 TFlops (sustained). #3 on the Top 500 supercomputers list is a Beowulf cluster with 2304 processors coming in at 7.6 TFlops (sustained). Even figuring $2000/processor + interconnect, that puts the Beowulf cluster at around $5 million or 1/32 of the cost for 1/5th of the performance (roughly speaking).

There are other factors, of course, but the key is that for the supercomputer to stay ahead of the microprocessor a boatload of funding is needed for the supercomputer and the payoff just isn't really there. If it was a lot more supercomputer companies would still be in business.
looks like Cray is going with the Opteron by Kargan · 2003-08-04 10:57 · Score: 5, Informative

The Sandia National Labs supercomputer (code name: Red Storm), currently being built by Cray, is going to be powered by 10,000 Opteron processors. A 40 Teraflop theoretical peak will put it at the top of the supercomputer list, being approximately 4 Teraflops faster than the NEC Earth Simulator, the current champ.

--
Palaces, barricades, threats, meet promises
Re:Icon is back by CausticWindow · 2003-08-04 12:09 · Score: 5, Interesting

I remember a story from a NSA contract worker.

In the early days of Cray, he and many others were wondering how they could keep things running, considering that their official budgets only showed ten or so sales per year.

Until he got the tour of the NSA computer plant, where they had a hall the size of two football fields, filled with Crays.

--
How small a thought it takes to fill a whole life