Time For A Cray Comeback?
Boone^ writes "The New York Times has an article (free reg. req.) talking about Cray Inc.'s recent resurgence in the realm of supercomputing. It discusses a bit of Cray's decline when the Cold War ended, "the occupation" under SGI, and the rebirth of the company after the Tera (now Cray Inc.) purchase. Recently Cray Inc. has been shipping their vector-based Cray X1 machine, designing ASCI Red Storm, and recently was one of 3 (also Sun, IBM) to win a large DARPA contract (PDF link) to design and develop a PetaFlops machine by 2010. Could Cray Inc. be poised for a comeback? Wall Street seems to think so."
SCO vs. Cray
There are other uses too. Consider: the weather guys that are working on the global warming and other climate modeling want a 500 petaflop sustained speed, massive memory machine to get the granularity that they want.
BTW, what's the 15 YO machine? I can't think of any...certainly not ones that are still in the Top 500. Hell, the ones I worked on 10 years ago, you can nearly buy the floppage on the desktop now...
As an interesting aside, the DARPA contract is out in part because they think the traditional drivers in computing speed are going to peter out around 2010...the implications of that are definitely interesting, no?
Do you know why the road less traveled by is littered with the bones of the unwary?
Nuclear simulations are used to see if the warheads are still effective after not being used for long times, not to see if they'll wipe out a city right after they are produced.
ato
Hahahaha. Have you ever actually run a supercomputer? They tend to have much higher failure rates then normal servers. Couple of reasons: first, they push the envelope of a given technology. The sweet spot for stability is not the leading edge. Second, they're not nearly as well tested as mainstream hardware. On a platform with thousands of installations you're much less likely to run into a problem nobody has seen before than you are on a platform with only dozens of installations.
Don't get me wrong I'm all about nuclear testing being done in 1's and 0's instead of in the ocean or in the desert, but how big of a bomb do you really need when it's estimated theres enough nukes to blast the entire land surface of the earth 3 times over.
:D
Well, the earth is over 2/3rds covered with water, and now we have the technology to reach the moon, mars, venus and beyond. Remember the spectical when a comet hit Jupiter? Just imagine a Beowulf of those, but really big nukes instead
On a more serious and less morbid note, I bet some other uses exist in physics, medicine and even cosmology. I even hear where they compare 'potential' cures for diseases using computer modeling to design drugs that we don't yet know how to make, good old biotech. You are correct that yes, this IS a very very limited market, but when you sell them for a billion bucks each, you don't need to match Dell's volume to make a profit. I wouldn't be suprised if the technology leads to some advancements in our pitiful micro world as well.
Tequila: It's not just for breakfast anymore!
...isn't 'Cray' today about as 'Cray' as the company that now owns 'Atari'? What's left besides the name of the original company?
Does it hurt to hear them lying? Was this the only world you had?
Have you ever actually run a supercomputer?
You know, that's kinda funny, since it's my current job. ;) I'm a NERSC employee. :P
You're right, until the the system hits maturity. Our T3E before being retired had a lot less hardware problems than our linux cluster does. Or the SP3 we have for that matter.
BTW, since it's rather hard to find a job these days for some people in the computing realm, we're hiring.
Do you know why the road less traveled by is littered with the bones of the unwary?
OK this is about as much a kiddy thing as how many VWs fit inside a football stadium or something, but... ...anyone know of a site with info on how current and past supercomputers compare to current desktops? Where are we at now with 2GHz G5s and 3.3GHz P4s, relatively?
One of the comparisons made when I was at university was of a 30-something MHz 386, with a supercomputer from 1973, showing how they do about the same amount of processing/data transfer but in completely different ways. I found that fascinating
Well, almost. Let's say I have a plane that can accomodate 100 people and does NY->London in 6 hours.
My problem is that I have to move 1000 people from NY to London
Now I can either:
1. I can buy a plane that is 20 time faster, 20 times more expensive. That's the supercomputer
2. I can buy 9 other planes (same as mine) and accomodate the same results as in 1 for less than half the price (I'll let you do the math). That's the cluster.
3. I can buy a plane that has a capacity of 1000 people. That's the parallel supercomputer. But if that one can do the deal for my specific problem, it proves to be not that flexible if my problem changes (ie: 500 people NY->London and 500 people from NY->LA).
That's the power of the bewolf cluster!!!
Write boring code, not shiny code!
Our T3E was having problems well past the point where it was getting long in the tooth. Cray started adding functionality to make it more supportable a few years back, but when it was actually a cutting edge system it was pretty unstable. They probably couldn't widely sell a system today that had the problems of the earlier T3E's (one hardware problem and you need to reboot the whole thing) but that just increases the development costs and time to market in a market where delay means that the peasents will be nipping at your heels. Remember, by the time a super hits maturity, it's obsolete.
Supercomputing per se died because Intel, DEC, IBM/Motorola had a lot more money to throw at speeding things up than the supercomputing community.
In the 70's up until the early 90's it was possible to build a custom CPU out of discrete logic that ran significantly faster than the available microprocessors. Cray was able to push their clock cycle down into the nanosecond range through clever design. However, a 1ns clock rate == 1GHz. You can go buy that multi-million dollar CPU for a couple of hundred bucks in today's market.
In order for superocmputing to be viable you have to be able to provide quantum leap performance above the commodity hardware AND keep your cost/performance ratio in line as well.
The CRAY-1 came out with a clock speed of about 80 MHz and vector processing and high memory bandwidth at a time when mainstream systems like the PDP 11/70 were running at about 7MHz with a 1MB/s memory bus. Microprocessors weren't even't a joke compared with the Cray.
The new Japanese NEC supercomputer came with a price tag of about $160 million if I remember correctly (some estimates say that it took $1G in research funding) and hits 35 TFlops (sustained). #3 on the Top 500 supercomputers list is a Beowulf cluster with 2304 processors coming in at 7.6 TFlops (sustained). Even figuring $2000/processor + interconnect, that puts the Beowulf cluster at around $5 million or 1/32 of the cost for 1/5th of the performance (roughly speaking).
There are other factors, of course, but the key is that for the supercomputer to stay ahead of the microprocessor a boatload of funding is needed for the supercomputer and the payoff just isn't really there. If it was a lot more supercomputer companies would still be in business.
They bought part of CRAY, the one that made the CS6400 server, which was a really neat SMP system based on supersparcs.
The rest of the company went to SGI.
So basically the server/sparc division went to SUN and then they got the technology for their Enterprise systems.
The rest of the supercomputer (the Alpha based and the Vector based units) units went to SGI, which did.... nothing with them. Oh, yeah they named some interconnections as CRAYlink or something, but they had 0 CRAY technology on them, they just wanted the name.
Same with TERA, they wanted the name and a way of ditching their crappy TM technology.
Maybe, maybe not. I don't really think even the NSA is _that_ far ahead of commercial process technology. It's more likely that they do custom designs for whatever applications they need, which allows them to process their data much faster than any general-purpose setup.
Hmm... I'm not entirely convinced by your arguments. However, I do agree with you that "during the heyday of cray, you got a damn fine box and nothing else."
;-) Furthermore, I also suspect that if Cray Inc. built a zettaflop or yottaflop abacus and provided instructions on how to simulate the weather, people around the world would abandon their computers and begin taking abacus lessons... Remember, it's all about the hardware and algorithms in supercomputing...
My thinking, however, is that the same is true today and for all of the top 100 supercomputers in the world. That is to say, each one of those machines is a custom hardware installation, and my educated guess is that software still isn't the driving force in the supercomputing market. Rather, algorithms are the driving force. The supercomputer market is geared towards people who want to very specific tasks, very acurately, and very fast. Example applications might be calculating fourier transforms (spectroscopic analysis), mendelbrot sets (weather simulations), prime numbers (cryptography), and statistical derivatives (markets). Any of these types of applications could feasibly require only a few thousand lines of code... At the same time, however, any of these applications are fully capable of utilizing as much hardware resources as you have available...
The problem is the magnitude at which these few lines of code need to be repeated. Furthermore, each of these types of algorithms can give qualitatively different and more robust results at each order of magnitude increase in speed... thereby creating a driving market force for upgrades.... We have a computer that can predict the weather 48 hours from now? Well, give us a computer that's 10 times as powerful, and we'll predict it 56 hours from now... Give us one 100 times more powerfull, and we'll predict the weather 62 hours from now, and so on, and so on... The point I'm trying to make is that the software isn't the driving force behind these supercomputers... the algorithms are... and the optimized hardware is what the organizations are paying hard cash for, in order to calculate those algorithms fastest.
Remember, we're talking about supercomputers here... we're certainly not talking about super-electronic-typewriters, super-spreadsheet-applications, super-databases, super-webservers, super-videoeditors, etc. etc. Nor are we necessarily talking about super-von-neuman machines, super-turring-machines, or super-mainframes. We're talking about supercomputing and the Cray corporation... the company historically responsible for building the machines which simluated the weather and nuclear explosions for many years... I suspect that there are not many end users of such machines and that user interface software is kept at a minimum...
But, I'm not a physics or computer science major, so what do I know... That, and I'm beginning to ramble... just my $0.02 worth...
I remember a story from a NSA contract worker.
In the early days of Cray, he and many others were wondering how they could keep things running, considering that their official budgets only showed ten or so sales per year.
Until he got the tour of the NSA computer plant, where they had a hall the size of two football fields, filled with Crays.
How small a thought it takes to fill a whole life
Every solution has to be chosen corresponding to any specific need. My point was just to show that in most cases the cluster makes sense. Of course some special cases might be better suited by option 1 or 3.
;-)
you couldn't surgically separate them
How do you stuff them in the plane then?
A good constraint for option 1 would be that you need to have them ASAP and the overall transfer could be interrupted anytime (before the 6th hour) and at that at that time you still want as much people as possible. Let's say 3 hours. Option 1 will have brought half the people there while option 2 leave all the planes above iceland at hour 3 with noone in England.
Write boring code, not shiny code!
SRC Computers is his legacy, not Cray Computer Corp.
He co-founded this company (with several other
ex-Cray employees) and died while still an employee/owner.
Interestingly, SRC is still around without any evidence on their website
of shipping a product. My guess is that their customers and/or investors
prefer to stay out of the limelight.
Disclaimer: I love SGI computers.
Well, from my experience the memory model is what matters the most. In a super computer you can allocate memory the way you need it with less restrictions. Let me explain, I have a data set that is several gigs, in a cluster I have to do lots of magic to ge it to work in a super computer I just say load mydata.dat.
Your trade off here is: Pay a scientist to develop a complicated distributed algorithm, or pay SGI to give you a shared memory machine whichever is cheaper should win, so for fairly parallel problems with small data sets clusters are probably better. Problems with unpredicatble data patterns and large data sets, you have to pay big bucks to SGI.
Secon comes the reliability. Examples: In a 256 nodes cluster at any given time there are 2 or 3 nodes that are not working. My home directory is not mounted in node 34 and then my program fails, I loose one night to run my code. Node 49 which is supposed to have 2 processors only detects one, my code doesn't do dynamic load balancing because that would add another 4 month to the development time, therefore my code runs for twice as long just because one processor was loose.
In an SGI, I get e-mails like the kernel paniced twice, we will have to shut the machine down and apply a patch. Next Monday between 12:00 and 12:30 the machine will be down. Remember, you can checkpint your job and it will start righ where it left or you can just not schedule anything for that time. Even if you are dumb and schedule stuff, we have told it that we will be shutting it down and it is not going to start running anything unless it knows i can finnish. I don't ever recall our machine failing even though it has a 94% usage.
And last, the interconnects are so fast that you can get much more out of each processor. The compilers are very well optimized and all the libraries are there for you to use, already compiled with the proper optimization and ready to run.
Side note: why would anyone choose Opteron for heavy numerical code, there isn't a good compiler for it. I can't understand. AMD are you listening? I need to buy a cluster and I would like to choose your chip but no compiler no purchase.
Well, it doesn't have to be. We could say that a company wants to send 250 people to London and want to use the 6 hours flight to have a corporate meeting in the plane... You're kind of screwed with 10 planes containing 100 people...
In this case option 3 makes sense.
You could say that the 6 hours is a reasonnable limit but sometimes (not predictable) you need as many people as you can in England before (amound of time not predictable either). In this case, option 1 make sense because both options 2 and 3 doesn't deliver anything before the 6 hour delay.
Write boring code, not shiny code!
You're dramatically overestimating the size of the market. Cray's own website puts it at about $1.1bn worldwide, and it's not like cray will get 100% market share. The ongoing R&D costs are a staggering percentage of their revenue, to the point that if the NSA wasn't subsidizing them it's unlikely they'd be alive today. The same goes for other pure supercomputing ventures--without huge amounts of government largess they're sunk.
Well, exaggerating to make a point perhaps. I checked out their website as well, looking for more technical info on their servers, with no luck. Personally, I can see more of a market in the future than even the past. Their systems do certain things faster than beowulfs, and frankly, for some govt. agencies/companies faster is more important than cheaper. It would not shock me to see the pendulum swing in the other direction, to at least a degree. Obviously Tera thinks so, too, since they purchased them in 2000, and profits/sales ARE up...
Tequila: It's not just for breakfast anymore!
John Markoff, the same jerkoff that wrote the less then factual articles and book about kevin mitnick, and happens to belong to one of the less reputable media outles (aka the plagarized and false stories coming from the ny times).
Lawyers, MBA's, RIAA? A jedi fears not these things!
However it happens, it is unlikely Cray was wrong about Gallium Arsenide -- he was not stupid. The question is when will a bureaucratic organization be able to throw marching morons at the problem and make it happen -- since that appears to be the only way technology is funded anymore.
It's unfortunate Seymour allowed Cray, Inc. to keep his name after he left to found CCC. Even though Cray himself was capitulating to massively parallel silicon in his final days -- he did die almost immediately thereafter.
PS: It seems creepy he died in a "jeeping accident" -- because that's exactly the way I had portrayed him dying in an April fools joke faxed to all members of congress a few years before -- an "accident" following shortly on the heels of CCC being taken over by Craig Fields of DARPA. I was sending out the joke because of the horrifying way DARPA had spent money on silly favorites within the academic community while guys who were really pushing the envelope like Seymour were going begging for customers -- having acquired private investments.
Seastead this.
I think there is one single reason that the market is poised for a Cray comeback... HEAT!
Commodity PCs managed to push the speed envelope by pushing the heat envelope... That's the main reason AMD took the speed advantage, because they were willing to operate their processors at higher temperatures than Intel would at the time.
Now, I would say it's quite a different story. First off, processors are getting closer and closer to the end of the line for heat increases.. Pretty soon, no known metal will be able to conduct heat away fast enough to allow computers to operate at room-temperatures. Even now, dumb little personal computers need serious cooling solutions... Either that, or they need to be some place that has serious air conditioning.
So, what are companies going to do, even with the current line of processors? Should they invest loads of money in dispersing waste heat, powerful air conditioners, system cooling fans, and software and/or hardware to closely monitor temperatures? OR Should they invest in a higher-end system that doesn't put off so much heat, doesn't use up so much electricity, etc?
In fact, I think we are even nearing the point where home users are going to get seriously pissed off and start demanding lower-power systems... It's interesting that C3 processors have become so popular despite their lowsy perfomance... (Maybe AMD/Intel will learn something from that)
So, I do think that either commodity processors will hit the heat ceiling, and stagnate like the rotational speeds of current IDE hard drives, OR the electrical and major cooling requirements of commodity processors will become too much to justify the small price savings. Either way, that will leave the market wide open for serious computing companies once again. The only question really is how much longer will it be until one of those two things happens? Well, in the Southern California Desert, electricty prices are still very high, and the temperatures are so very high that running a modern computer 24 hours a day requires your home cooling to also be running 24 hours a day, just to operate within the heat tolerances. I don't think it will be much longer before more of the country, and the world, will reach the same point.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant