Update From Cray World
rchatterjee writes "Cray, the only mainstream recognizeable name in supercomputing, has been busy lately. Their totally new MTA-2 supercomputer design will use a UltraSPARC-III powered Sun Fire 6800 server to just feed the data to the MTA-2's processor. They're also refocussing on Vector Supercomputers and are going to release their first new vector supercomputer since Tera Computing bought them, the SV-2 in 2002. And if that wasn't enough they have a deal with API networks to develop Alpha processor based Beowulf clusters of Linux machines that as a cluster will run the same operating system as Cray's T3E supercomputers. Seymour Cray would be proud. You can get a quick overview of all the latest Cray developments from this article on Cnet."
....A Beowulf Cluster Of Those.
Oh wait, nevermind
"These people look deep within my soul and assign me a number based on the order in which I joined" --Homer re:
When and where can I get me one of these?
Seriously though, it is great to see that they are finally gearing up some new designs, and I cannot wait to see some of the performance specs on these.
A computer is not a computer unluss it takes up at least 40 cubic feet
You say you want a revolution....
tcd004
The Cray team finally take their RC5 attempt seriously...
grab your ankles bitch
I'd hardly say that Seymour Cray would be proud. My feeling is that Mr. Cray was about innovations and pushing the limits of what computers can do. Today's Cray is doing little more, in my view, than simply rehashing the same old technologies with higher clock speeds and more CPUs thrown at the problem.
If Cray was alive today, I like to think that he'd be directing research into quantum computers, and maybe technologies like Starbridge Systems is working on.
The first person to mention Quake loses a testicle.
Got Rhinos?
Don't get me wrong here, Cray's putting out some remarkable new hardware, but there's no point in spending millions of dollars on a machine that will be as powerful as an average desktop in five years time.
Not only that, but the IDEA itself of supercomputing has become obsolete within the past few years. With recent advances in distributed processing and Beowulf clustering, anyone with a bunch of old 486's laying around can combine their power to process more data than a Cray could ever dream of.
SETI@home is a perfect example of MASSIVE amounts of processing being done by many small, inexpensive computers working in unison.
Google also uses clustered computers to provide the horsepower behind their search engine. In fact, it is believed that Google operates the largest Linux cluster in the world. Many of their computers are literally junkers. They've got dozens of 286, 386, and old Sparc's working together to provide an EXTREMELY powerful search engine.
Slashdot: Open Source, Closed Minds.
This must be a troll....
We? What are you talking about? Who here has even had any input into the purchasing of a super-computer? If it wasn't for your relatively tame posting history, I'd say you were a troll...
People buy a super-computer for one purpose - raw computing power. Not optimized for interconnectivity, not to conform to standards, not to run Linux or other system of choice, not to play Quake (although I'd like to see the benchmarks).
You buy it not because it's a good deal, but because you have research bucks to burn, and want the best money can buy. If you are even asking about cost or a down-the-road upgrade to a different platform, you are probably not looking for a super-computer.
It's like saying, maybe the Air Force should give up on specially designed fighter planes, and see if it would be a better idea to convert 747s or an Airbus model. Imagine the cost savings in spare parts!
nude chix
360 degrees of Karma
A good one, though - only a couple of red flags - he could have left out the Microsoft stuff and still got me.
It might even be the dreaded double-irony troll - what is the point of a bunch of Slashdotters commenting on developments in Cray super-computers? Inside knowledge? "Maybe it will run Linux?" "They ran Win2000, and now it is as fast as a 386! LOL!"
As I said, truly an effective troll - no one has much to comment on, but he throws in a catalyst to make people comment, anyway.
I believe that this quote:
not to play Quake
Taken in conjunction with this comment entitles you to some elective surgery.
You've still got one, though.
--
--
E_NOSIG
The original Crays (and CDC-6x00 machines) were crafted to take advantage of the efficiencies of the speed of light in their busses and memory configurations (i.e., cable lengths cut to sub-millimeter tolerances). These new "crays" are crays in name only... they lack the creative zeal that made "real" crays the exciting machines that they were and that launched the supercomputing industry...
These are just glorified SMP machines...
Don't get me wrong, clusters are great (hey, I even wrote a book on them!), the CPU speeds we have now are beyond my wildest dreams, but there will never be another Cray...
Imagine a beo... Oh, screw it.
Sorry, but I disagree strongly with you that cray is just 'rehashing old technologies'. As a primarily Cray shop (we have a T3E-900 and, after this weekend's upgrade, have the first SV1e) we are quite involved with the Cray User's Group (CUG) meetings and such where Cray's plans for the future (SV2 and beyond) are discussed.
There are _many_ things going on behind the scenes at Cray that show that Cray is once again trying to push the supercomputing envelope as far as they can. One way to look at the SV2 is as a T3E with large vector units in each CPU (no e-registers) and a nearly flat (shared) memory space across all processors. Thus, no need for mixed mode (MPI and OpenMP) programming like on IBM SP-like architectures.
The distributed/clustered model is great for some problems - lots of completely independent data processing (like SETI, or Google) works great. You start moving into a realm where you're doing scientific simulations where all the calculations are interdependent, then a Beowulf cluster (even with some good interconnect) hold a scalability candle to, oh, say, a Cray T3E. There's a good reason that the Cray T3E and SV1 won the "Co-Supercomputer product of the year award" this year, as handed out by the people who use them.
But there's No Such Agency...
Exactly how many overviews per second can I get?
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
I can see the programmers and designers drooling now.
Check out the Vinny the Vampire comic strip
"It is a greater offense to steal men's labor, than their clothes"
hey clueboy. The Sun is just a front end. It could have been almost anything, but Sun no doubt cut them a better deal than any other vendors, to get excellent marketing.
Either way you cut the cake Sun still got in. What would you rather have had AMD, Intel make chips for a Cray? Give me a break. They could have gotten a better deal on chips from SGI who sold them Cray from the get go and thats a given, so what makes you think they didn't do specs on it as well?
"See? UltraIIIs are fast! You can stop making fun of how slow our chips are now! UltraIIIs are fast enough to be associated with (lots of hand waving) CRAY'S NEW SUPERCOMPUTER!" -- Sun marketing.
Stick to whatever your day job is. When it comes to purchasing equipment (which I have many of times) I would go for what I felt was best, based on benchmarks, industry usage, etc., not some marketing bullshit so give it a rest.
You sound upset that Sun has captured this segment of business from Cray, I'm happy for them as I am for most companies that do well. As for Cray its a bit outrageous unless your a government or major fortune500, but I do use, own, and plan on purchasing more Sun hardware including a SunBlade workstation for home use, regardless of what people think. Sure their expensive, but well worth it, and Alpha's just couldn't cut it all the time so what other options are there? Wait... Maybe I should sit back and wait for you to make a chip, hows that?
360 degrees of Karma
There is a big difference between a vector supercomputer and some random collection of microcomputers. It's bandwidth and the ability to efficiently handle large datasets. See the Stream Benchmark.
Management: Which would you rather manage - 1024 seperate PC's, each with their own boot disk, hostname, power supply, etc. or a Cray T3E with a single system image, and one boot disk. Think about the time it would take to do an OS upgrade on the cluster.
Bandwidth: Stuff like Myrinet and Quadrix (sp?) is quite good but it still doesn't come near the bandwidth that you can get on a traditional supercomputer. Google and SETI@home are *horrible* examples of real scientific code because they do almost no internode communication. We can get 1.6 gigabytes/second full duplex between nodes on our Origin 3000 product. T3E gets even more than that.
Latency: The time it takes to get from node A to node B matters *a lot* with real code. Again, SETI and Google don't care if it take 100 microseconds instead of 4 to exchange data. When you are exchanging lots of data and synchronizing with many other nodes, this matters. Many massively parallel jobs spend large percentages (like 25%) of their time doing communication. A lot of this is very small messages.
Quality: Usually, you get better components when you buy a supercomputer than a PC. Does this matter for you? Probably not. If you are trying to predict where a tornado is going to touch down, you're going to be a lot more interested in whether the machine is running.
Ease of coding: It is a lot easier to use a model of coding called OpenMP, which relies heavily on shared memory between threads, than MPI in which you have to explicitely call for communication between threads to happen. OpenMP runs best on large SSI supercomputers.
Now don't get me wrong - there are many applications for which a cluster is sufficient. This doesn't mean there is no room for supercomputers. Besides, if you look at the direction Quadrix, Myrinet, and the new Infiniband stuff is going, they are going to end up looking a lot like a shared memory supercomputer....
Go Badgers! -- #include "std/disclaimer.h"
Well, if you call making a next-gen vector machine "rehashing old technologies", then they guy you replied to is probably right. But, the MTA-2 is unlike anything out there, and the SV2 looks very little like a SV1/e/ex. SV2 is designed to be a follow-on to the T3E with it's massively parallel setup, and a follow on to the T90/SV1 series with its vector processing. Also, SV2 has a single system memory image, unlike the T3E. Cray is taking vectors to a whole new performance/organization level.
Or you could look at their employment page here: http://www.cray.com/company/employment/openings/ea gan/index.html
So, you mean to say that gigabit ethernet isn't as fast as custom-designed router interconnects? *cough*. T3Es have extremely high bandwidth with low latency. Ethernet has high bandwidth with high latency. Which would your massively parallel codes (the ones that can't be broken up into smaller chunks SETI-style) prefer?
Also, would it shock you to let you know that Cray machines have TCP/IP stacks, and ethernet ports, and all that? They don't have video cards, so you have to connect to them somehow...
I'm not sure what exactly they meant by "the cluster will run the T3E operating system even though each node will run Linux". That is a contradiction. The T3E ran something called Unicos/mk which was a full blown Unix-type OS. In any case, what they probably really meant was that they will port the load balancing, gang scheduling (scheduling a job onto a group of processors) and some sysadmin tools. They will probably also port some of the process accounting tools. I suppose they could port Unicos/mk's kernel servers to Linux instead of the Chorus microkernel which is what it runs on T3E, but that would take a *lot* of work and may violate some of the agreements signed with SGI before the spin-off. I also doubt they have the resources to do this and continue to develop the OS's for the MTA and the SV2 (two different OS's) plus maintain Unicos (for the SV1 and previous vector systems) and Unicos/mk (for the T3E).
Go Badgers! -- #include "std/disclaimer.h"
On the supercomputer front the game has changed radically since the original Crays. The Cray 1 was a SIMD machine, one instruction stream controlling multiple processors. That architecture works well for a limited number of problems, the problem is that most problems turn out to have bits of vector code interspersed with decision code. If you have 10% of your code that cannot be parallelized then even an infinite number of processors can only go 10 times faster than a single processor.
The attraction of vector boxes was that there was no need to recode the FORTRAN application, the compiler could detect the parts of the code that could be parallelized and optimize the code. The problem is that there are limits to what the automatic parallelization can do.
The upshot is that there tends to be little advantage in more than 8 or 16 processors in a vector box. Meanwhile a standard Pentium IV has multiple independent processing pipelines - I forget the number (4 maybe). So the gap between the cray box and the mainstream may not be amazing.
At this stage most of the problems in science can be attacked using MIMD architectures. These range from the SETI style very loose coupling over the internet to closer coupling such as the SMP machines.
The actual speed of the cluster is pretty much irrelevant, I can build a SETI style parallel computer using off the shelf hardware for less than $1000 per processor. But that only allows me to handle problems that can be broken down into lots of independent sub-problems trivial parallelism.
What CRAY appear to be doing is building a machine that has closer coupling between the processors. There are certainly problems for which this approach is the solution. I doubt that the number of such problems is commercially viable however. The problem is that many of the traditional super computer problems are now dealt with using loosely coupled clusters. 100MHz ethernet is probably adequate for many problems. Other traditiona; 'supercomputer' problems are now attacked with desktop servers. I remember doing work with astrophysicists who used to wait for time on expensive mainframes, these days a cheap Linux box meets their needs.
Even 'defense' (read corporate welfare dept) applications are no longer automaticaly super computer class. Sure they may do a lot of processing, but these days it is likely to be optimization type work which in turn tends to break down into a series of independent simulation runs.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Okay, I'm a NEC/HNSX Supercomputers employee, on the verge of becoming a Cray employee (because of the agreement they signed), but I'm not speaking for anyone else but me here, of course. :-)
I don't know why people bother with such a news. Sun's gonna provide the I/O processor for a not-so-high-end supercomputer. And?
A few weeks ago, there was a real bombshell: Cray would drop the anti-dumping legal action, re-opening the US market to japanese supercomputers. Cray will even become the sole reseller of the NEC SX Series in North America!
If you go take a look at www.cray.com, you'll see that this agreement with Sun occupies a single line in their news listing, while the NEC agreement is a big framed box that occupies about half of my screen here.
For some time now, american supercomputer customers were petitioning to get japanese machines, because it been a long time the american machines had been up to any good. Instead, we hear about the SV2, which will barely surpass the few years old SX-5 processing power, with less memory throughput than the SX-5.
I won't deal with the "no need for big clunky vector supercomputers, we have clusters". I believe a whole lot into clusters, but they're freakin' hard to program, and some things just won't be as fast (hey, the SX-5 CPU has a 256 bytes wide memory path! that's not bits, that's bytes! what can you do with your puny gigabit ethernet cluster interconnections?).
Look at these bandwidth benchmark scores. The closest thing to a cluster, the Origin machines, are literally crushed to bits by the SX-5. And they're doing twice as good than the SV1.
As for using old big iron machines for stuff like fridges and so on, there was a cool thing at one of our customer site, at the University of Stuttgart: a Cray coffee table. :-)
Nothing beats talking about supercomputer technology while drinking some orange juice on top of a Cray machine. NOTHING.
--
"Welcome to Supercomputer users anonymous. I'd like to welcome our first speaker tonight, Stuart."
"Hi, my name's Stuart, and I'm a s-supercomputer user.
"I first started useing them quite recently, just six months ago. I got offered a small amount of computer time, for free. It's always the way of these things, let you use these pointless things, get you hooked.
"Anyway, I though I might just try it. You know, the first time can't hurt. Besides, I thought I w-would be able to give it up, any time I wanted.
"All I wanted to do was to get the progam to run that little bit faster. It was calculating the ground state energy of an ordered perovskite. A big job, to be sure - it took nearly 200 MB to hold the wavefunction.
"I packedged up all the code I'd got to that date. Burnt it off to a CD, just in case, you know.
"The Supercomputer sprinted through the code in thirty minutes. I just wasn't prepared for that. It was a feeling I hadn't felt before. Normally, on the RS/6000's we have ourselves, it took a few days, so I was literealy gobsmaked.
"Well, one thing lead to another, and we were offered money if we could calculated a disordered perovskite, with lithium interstitials. I didn't even think of using our own computers, my thought turned straight to the supercomputer.
"On reflection, I can see now that I wasn't thinking clearly. After all, just because it took nearly a gigabyte of disk space to hold the wave function there was no reason why an ordinary computer could have done that. With some swapping, as it needs to hold three of them in memory at once.
"And, of course, every thing the code did was a vector operation. That confused me, because I should have seen that a scaler processor was more efficent at doing vector calculations that a processor designed to do them, but that's the supercomputer addiction kicking in.
"It took a few days on the supercomputer. That's when I realised what had happened, and took my chance to return to normality.
"Fortunatly, with support, we managed to leave the supercomputer behind, and get a 24 node Beowulf of Pentium-III's.
"Just to show how unnessecary these so called 'super' computers are, the Beowulf is now running the code. We're a little concerned that they haven't produced any resulsts after three days, due to constantly swapping, but that's jsut after effects of the supercomputer. After all, the natural state of a computer is disk tharshing.
"The way the processors now take thirty times longer to do a single vecotr operation is a lot of comfort to me. I can see the light now."
--
Its processing speeds, of around 150 million floating point operations per second, were far above anything else that the time of its announcement in 1976. Those speeds are now matched by inexpensive workstations that fit on a person's desk.
Best Slashdot Co
The refocus on vector supercomputing is interesting. I wonder if it might have a side-effect of helping scientists take advantage of the Altivec units on PowerPC G4s. Yes, I know the Altivec can't do double-precision floats, but you don't ALWAYS need that, and companies like GCG in the biotech industry are excited about taking advantage of OS X on G4 hardware for bioinformatics. For tasks that don't need the full power of a Cray, but are nonetheless vectorizable, I hope the cross-pollination of vectorization in algorithm design will benefit everyone.
-- "Those who cast the votes decide nothing. Those who count the votes decide everything." -Joseph Stalin
As for the scheduling, at least on Irix, and I assume Unicos/mk, the scheduling/memory management is good enough that you can run multiple jobs. The way it works on Irix is that you can dedicate a certain set of CPU's and memory to a job and other sets to other jobs. That gives the job dedicated access to only the amount of hardware that it "needs" (or the programmer thinks it needs, anyway). I know similar stuff exists for the Cray T3E, but I'm not farmiliar with the details.
Go Badgers! -- #include "std/disclaimer.h"
And yeah, I forgot to include the MPI/shmem library and compiler in my list of what they would probably port.
As for Unicos/mk, the real difference there was the ability to have no difference at the user level between it and other Unices. It is able to present 1800 different kernels to the user as if a single OS were running. No small feat in OS development :)
Go Badgers! -- #include "std/disclaimer.h"
Also, I could be reading you wrong, but you seem to be implying that supercomputer == vector supercomputer. I am including NUMA style machines like the Cray T3E and large SGI Origin machines as supercomputers. Since your example ap wasn't vector, I'll assume it was something along the lines of MPI, which would run quite well on those architectures.
Go Badgers! -- #include "std/disclaimer.h"
Also, I don't know how many large T3E customers you've talked to, but I've talked to several (7 or 8) and almost all of them cited the single image as one of the reasons they love the machine, specifically as opposed to the IBM SP2. Besides, even if what you say is true, I suspect that the chunks are larger than 4 PE's per partition, which is what you'll get (effectively) in a cluster.
Finally, while you may not miss being able to do a "ps" and see all the processes on the machine, you will probably miss the longer latency and lower bandwidth and lack of shared memory in a cluster, well designed or not....
Go Badgers! -- #include "std/disclaimer.h"
Is there a history of Seymour Cray somewhere?
"sweet dreams are made of this..."
I think one of the most interesting aspects of their MTA-2 super comp is the fact they're using USPARC-IIIs which uses commodity SDRAM (though atypical since it runs on a 150Mhz bus). It's nice to see the chip given room to stretch its legs since it is basically languishing inside of Sun. Their server products aren't shipping with the processors yet so there's little in the way of real world benchmarking yet. Hopefully Cray changes that around a bit. I'm also glad to see Cray making better news than sitting as an unused subsidery of SGI.
I'm a loner Dottie, a Rebel.
He died in a car accident in 1996 shortly after founding SRC Computers (the "similarly named company" you mention; SRC is his initials). The company's page has a brief history of his work, though I'm sure there are plenty of more-complete such histories out there.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
The only "monolithic" supercomputer I ever had any respect for (as opposed to networks of individual machines like beowulf clusters or DECNets) was the KSR/1, and it only existed for about a year before the company's financial woes killed it.
:-(
They had some increadibly cool features like the fact that the system ran a UNIX variant (OSF/1), not on a front-end like the Crays, but right on the box itself.
They also had a wild memory architecture where there was no main memory. Instead, each processor had a 32MB cache, and the system virtualized the caches into a giant virtual memory space (giant for the time, now 2GB main memory is average to low for serious computing).
Process migration and the scheduler were a thing of beauty. When a process needed to be moved to a new processor, only the stack pointer and registers needed to be moved. When the process was ready to run again, a simple (seeming) page-fault would take care of everything else, moving it's stack and any other memory pages that it needed locally.
They even solved the performance problems of a ring-based bus, and got better performance than most flat busses. One of these suckers with 1024 nodes was a marvel to behold, and alas, there will never be another.
ASCI Red was on top because Intel threw so many processors at it. LINPAC is not really all that representative of customer code. If you can tell me (and have data to back it up) that Sandia's code ran as well on ASCI Red as it would on ASCI Blue Mountain or a Cray T3E, I would be very surprised.
Go Badgers! -- #include "std/disclaimer.h"
As for the physics stuff being obsolete, I'd say that it's not. Now this isn't because I don't think there's something better we could be doing, but because there isn't yet....
Go Badgers! -- #include "std/disclaimer.h"
I still don't see your point. So you had the number one spot on the top 500. That doesn't mean that anyone even *bought* your system (yes, I know Sandia bought Red). All that means is that you built something and ran LINPAC on it. I hardly see how that qualifies your statement that people would rather manage an independant cluster of 9000 systems rather than a T3E.
Go Badgers! -- #include "std/disclaimer.h"
Also, you keep trotting out examples of things that are in the category of "embarassingly parallel". Rendering video and chip design are two examples of that. Weather forcasting, oil exploration, particle physics (read nuclear bomb simulation), and protein folding (among many other things) are *not*. They require communication. This is why Pixar doesn't have a supercomputer and why Los Alamos National Labs does. If you don't understand why latency affects *dramatically* the speed of an MPI program, go read up on the subject and get back to me.
Finally, you don't seem to have much knowledge of networking. You don't just "add 100x the processors" and expect that the network is going to scale. That takes careful planning and hardware that you don't buy at CompUSA. I think you will find that building a really good cluster, while perhaps cheaper than a supercomputer, is a lot closer to that pricerange than what you would expect.
As for why Intel got out of the "supercomputer" business, I suspect they got out because they had no product for which there was a compelling reason to buy one. The Paragon was, for all practical purposes, a cluster (and from what I hear, not all that fantastic of one, but I may have a biased view on that). Plenty of people sell clusters.
Go Badgers! -- #include "std/disclaimer.h"
Oh well...are there any other mavericks out there in supercomputer design?
"sweet dreams are made of this..."