Cray Introduces Adaptive Supercomputing

Good Motto by ackthpt · 2006-03-24 07:04 · Score: 5, Insightful

Cray CTO Steve Scott says, 'The Cray motto is: adapt the system to the application - not the application to the system.'

That's a good motto, but how often do you bend the will of your application, needs or business to the limitations of the application? I've been sitting on something for a couple weeks after telling someone "You really should have accepted the information the other way, because this new way you want it is highly problematic (meaning: rather than rip it off with a simple SQL query, I'll have to do an app)"

IMHO adapting to the needs of the user == customisationg, which also == money. Maybe it's not a bad idea at that! :-)

In certain cases, at run-time, the system will determine the most appropriate processor for running a piece of code, and direct the execution accordingly.

This assumes, of course, that you have X number of processors to chose from. If you can't do it, the answer is still 'throw more money at it, buy more hardware.'

my head is still spinning from all the new buzzwords overheard at SD West 2006.

--

A feeling of having made the same mistake before: Deja Foobar

Re:Good Motto by Kitsune818 · 2006-03-24 07:10 · Score: 2, Insightful

They just left out the ending. It's really: 'The Cray motto is: adapt the system to the application - not the application to the system.'" Why? Because hardare costs more to change!"
Re:Good Motto by dildo · 2006-03-24 08:19 · Score: 5, Interesting

It is possible to build comptuers that are optimized for certain kinds of calculations.

For example, Gerald Sussman of MIT (a computer scientist) and a Jack Wisdom (a physicist) decided they wanted to do long-term modelling of the solar system's evolution over time. Long time modelling of a multi-body system requires a fantastic amount of calculation. What is the best way to do it?

Sussman and Wisdom came up with a crafty idea: build a computer that is specially configured at the hardware level to do the modelling. Sussman and his colleagues decided that with off-the-shelf parts they could build a computer that would be just as or more capable of modeling this system than a supercomputer would be. The result was the Digital Orrery, a relativlely cheap computer that gave great results. (It is now featured in the Smithsonian museum.)

Think of it: if your computer is going to be doing the Fast Fourier Transform 6.02x10^23 times per day, why not build a superfast chip that does nothing but the FFT rather than express it as software? It's a pretty cool idea. I think this is the sort of thing that Cray computers claims to want to do with its motto.
Re:Good Motto by imgod2u · 2006-03-24 09:13 · Score: 2, Interesting

Look on Xilinx's website. The Vertex4's (although currently having supply problems) go up to 500MHz (though you probably don't want to run anything at that speed considering that's probably the reg-to-reg limit). These things are literally better system-on-a-chip solutions than any ASICs could be considering what it offers. Integrated micro-processor, bus architecture, peripheral interfaces and non-volatile and volatile memory, with enough pins (BGA package) to expand with off-chip components. Actel even offers mixed-signal FPGA's where you can have your analog and digital circuitry all programmed onto one chip. These things are the future.
Re:Good Motto by drinkypoo · 2006-03-24 09:14 · Score: 2, Informative

There are FPGAs over 250MHz now. There are times when such a beastie is useful. There are times when they aren't. Not sure why the hell you'd want to put this in a commodity PC though. It couldn't possibly help more than a second processor, which would be cheaper - or a second core, which would be cheaper still.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Good Motto by imgod2u · 2006-03-24 09:16 · Score: 2, Interesting

FPGA's aren't the golden answer currently. Most if not all FPGA's have issues with being used this way. They are programmable, but they're not made or intended to be programmed in the field (despite their name). The majority have a programmable life of maybe 1000 flashes with flash-based FPGA's (ProASIC from Actel for instance) having a life of maybe 100 flashes. They're basically a poor-man's ASIC more than anything else. The technology would have to improve significantly in a much different direction than what the main FPGA market is targetted at before they can be used as adaptive circuit components while live.

Cray? by Eightyford · 2006-03-24 07:10 · Score: 2, Funny

I didn't even know Cray still existed. Maybe it was Sony's "emotion engine" that almost killed them. ;)

--
Religion for nerds. Stuff that really matters

Coolest Looking Supercomputers by Eightyford · 2006-03-24 07:13 · Score: 4, Interesting

Cray always made the coolest looking supercomputers. Here's an interesting bit of trivia:

The Cray T3D MC cabinet had an Apple Macintosh PowerBook laptop built into its front. Its only purpose was to display animated Cray Research and T3D logos on its color LCD screen.

--
Religion for nerds. Stuff that really matters

Re:Coolest Looking Supercomputers by morgan_greywolf · 2006-03-24 07:34 · Score: 2, Interesting

Another interesting bit of trivia. Apple Macintoshes have been designed using a Cray. What's even more ironic is that according to that same link, Seymour Cray used a Mac to design the next Cray.

--
My blog
Re:Coolest Looking Supercomputers by Anonymous Coward · 2006-03-24 07:50 · Score: 2, Funny

Ah, that's nuthin'. Mr. Scott used a Mac to repair the Enterprise, and Jeff Goldblum used one to repel an alien invasion!
Re:Coolest Looking Supercomputers by flaming-opus · 2006-03-24 08:19 · Score: 2, Interesting

They used to, and the X1 still holds true to that. If you take the skins off, it is a marvel of stainless steel, plumbing, and just plain fantastic mechanical engineering. The Xt3 and mta, however, are just more rectangular racks. The xd1 is just a dull 3u rackmount.

Go AMD by dotfucked.org · 2006-03-24 07:19 · Score: 2, Interesting

"All of these platforms will use AMD Opterons for their scalar processor base.'

Im just loving the vendors picking up on AMD.

Their idea seems very interesting in theory. It sounds like HPC's version of the math co-processor->crypto accelerator idea.

And at least they are not basing the userland on Unicos :)

--
-- DotFucked.ORG

Complexity, current machines by gordyf · 2006-03-24 07:20 · Score: 3, Interesting

It seems like the idea of combining multiple architectures into a single machine is already being done -- we have fast general purpose CPUs (single and dual core x86 offerings from AMD and Intel), paired with very fast streaming vector chips on video cards, which can be used for other non-graphical operations like a coprocessor.

The only difference I see is that they're relying on an intelligent compiler to decide which bits to send to which processing unit, but I'm not sure how much faith can be placed there. Cray certainly has a lot of supercomputing experience, but relying on compiler improvements to make or break an architecture doesn't have a good track record. I'm curious to see how they fare.

Re:Complexity, current machines by TubeSteak · 2006-03-24 07:38 · Score: 2, Informative

The only difference I see is that they're relying on an intelligent compiler to decide which bits to send to which processing unit, but I'm not sure how much faith can be placed there.
If you read further into the article, you would have noticed TFA talks about a new programming language called "Chapel".
Chapel was designed as a language for rapid development of new codes. It supports abstractions for data, task parallelism, arrays (sparse, hierarchical, etc.), graphs, hash tables and so on.
So, they aren't relying on a just a compiler, even though they are going to support "legacy programming models."

--
[Fuck Beta]
o0t!
Re:Complexity, current machines by flaming-opus · 2006-03-24 08:16 · Score: 3, Interesting

They really aren't rellying on compiler improvements, so much as passing the code through their vectorizing compiler, and a tool for generating their fpga codes. If the code optimization for these 2 steps fails to optimize very much, you bail out and send it to the general purpose (opteron) processors.

Your being fairly pedantic about the computer architecture anyway. Yes, pairing multipe processor types together is not new, but most mpp supercomputers use identical node types.

The jist of this story is simpler than it sounds. Cray has 4 product lines with 4 cpu types, 4 interconnect routers, 4 cabinet types, and 4 operating systems. They would like to condense this down. The first step is to reuse components from one machine to the next. There are distinct advantages for keeping the 4 cpu types for various problem sets, but most everything else could be multi-purpose. From the sounds of things, it's using the next generation of the seastar router in all of the machines. Thus you use the same router chips, cabling, backplane, and frame for all the products. This reduces the number of unique components cray has to worry about. If they go to DDR2 memory on the X1 and mta, that further simplifies things, though I suspect they won't.

Well, once you share parts, why not make a frame with a bunch of general purpose CPUs for unoptimized codes, and a few fpga or vector cpus for the highly optimized codes? It allows customers more flexibility, and introduces cray's mid-range customers to the possibility of using the really high-end vector processors currently reserved for the high-end X1 systems. It's also a win for the current high-end customers. On the current X1 systems, you have these very elaborate processors running the user's optimized application, but the vector cpu's also end up running scalar codes like utilities and the operating system. These are tasks the vector cpu's aren't terribly good at, and you're using a $40,000 processor to run tasks a $1000 opteron will do better. Even if the customer isn't interested in mix-n-match codes on the system, (which I'm skeptical any cray customer really is), you probably want to throw a few dozen opteron nodes into the X1's successor, just to handle the OS, filesystems, networking, and the batch scheduler.
Re:Complexity, current machines by flaming-opus · 2006-03-24 09:51 · Score: 2, Interesting

The X1 processor is already a coprocessor. Not in the sense that it's on a different piece of silicon from the scalar unit, but that the vector CPU's instruction stream is distinct from the scalar unit. In past cray systems, some cpu's used the same functional units for the scalar unit and vector unit, (T90) while some (J90) used distinct scalar units. The X1 is a vector unit bolted on the side of a MIPS scalar core, with synchronization logic, and multi-ported register files to support multi-streaming. I don't know what latency there is for the scalar unit reading/writing a vector register, but I can definately imagine a vector co-processor linked to an opteron with coherent-hypertransport. Maybe in black widow, rather than cascade. Cray has been cheering how much faster black widow will be at scalar codes, than the X1.

The trick, of course, is how do you get the opteron and the vector processor to share access to memory? No way does hypertransport have enough bandwidth to feed the vector unit. You don't want the scalar unit to have to read the memory through the hypertransport through the vector unit. Do you give them distinct memories that are connected in some form of numa?
The current X1 uses 32-channel rdram for 4 cpus. Assuming the black widow processor is twice as fast, and only 1 vector cpu per node: to provide the same bandwidth per flop, you need at least 12-channel xdr memory, or 4-channel xdr2. The opteron keeps going with dual channel ddr2, and has one hypertransport channel connecting the register files, one for numa memory, and one to talk to the seastar(s). Also, do the vector units and the scalar processors share the same interconnect controllers? You would want more than 1 seastar for each vector node, maybe 4.

Hmm. I'm sure there are technical hurdles a-plenty, but it sounds good on paper.

Co-processors anyone? by TubeSteak · 2006-03-24 07:26 · Score: 3, Insightful

After exhaustive analysis Cray Inc. concluded that, although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies using scalar, vector, multithreading and hardware accelerators (e.g., FPGAs or ClearSpeed co-processors) creates the greatest opportunity for application acceleration.

So they're saying that instead of faster/more generalized processors, they want several specialized processors.

Old ideas are new again.

--
[Fuck Beta]
o0t!

Re:Co-processors anyone? by sketerpot · 2006-03-24 08:20 · Score: 4, Interesting

There are actually processors out there with compilers which can compile a few bottleneck C/C++ functions into hardware on an integrated FPGA. This expands the CPU instruction set in application-specific ways and can, in some cases, give absolutely enormous speedups.
In other words, they're working on processors which are programmed in general-purpose languages, but which adapt their hardware to the specific program.

This begs the question... by __aaclcg7560 · 2006-03-24 07:27 · Score: 4, Funny

So if I want to run Mine Sweeper, Cray will adapt one of their supercomputers to the requirements of this game? Sweet!

I don't know about that... by aussersterne · 2006-03-24 07:29 · Score: 2, Interesting

I always thought that Thinking Machines deserved the award for most "I feel like I live in the future" cool in their computers with the CM5.

--
STOP . AMERICA . NOW

Good Linux Journal Article On This by dapantzman · 2006-03-24 07:31 · Score: 2, Informative

LJ had a good article on this a few months back.

http://www.linuxjournal.com/node/8368/print

Re:Supercomputing v Distributed Computing by drinkypoo · 2006-03-24 07:33 · Score: 2, Informative

Unless 'computing power' is different to 'combined processor speed', I don't understand what Cray are up to here..

Well yes, they are very different. Processor speed is clock rate and tells you precisely jack shit about how much work can actually be done. Computing power is better measured in operations per second. Typically we measure integer and floating point performance separately. Even those benchmark numbers are usually pretty useless; hence we have the SPECint and SPECfp benchmarks which supposedly exercise the CPU in a way more similar to real-world use.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

building machines around problems by deadline · 2006-03-24 07:42 · Score: 3, Interesting

Cray finally figured it out. I have been saying for years:

HPC/Beowulf clusters are about building machines around problems

That is why Clusters are such a powerful paradigm. If your problem needs more processors/memory/bandwidth/data access, you can design a cluster to fit your problem and only buy what your need. In the past you had to buy a large supercomputer with lots of engineering you did not need. Designing clusters is an art, but the payoff is very good price-to-performance. I even wrote an article on Cluster Urban Legends the explains many of these issues.

--
HPC for Primates. Read Cluster Monkey

And What If... by Nom+du+Keyboard · 2006-03-24 07:46 · Score: 3, Insightful

The new system will combine multiple processor architectures

And what if I don't want multiple processor architectures, but instead just lots and lots of the single architecture my code is compiled for?

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."

Re:And What If... by flaming-opus · 2006-03-24 08:24 · Score: 3, Insightful

The idea is that all the CPU types will be blades that all use the same router, and plug into a common backplane, and that the cabinets all cable together the same way. In all cases, I imagine there will be opterons around the periphery, as I/O nodes and running the operating system. Then you plug in compute nodes in the middle, where the computer nodes can be a bunch more opterons, or vector cpu's, or fpga's, or multithreaded cpus. There will certaintly be plenty of customers only interested in lotsa opterons on cray's fast interconnect, and they just won't buy any of the custom cpus.

Re:Cray as a company in general by SillyNickName4me · 2006-03-24 07:47 · Score: 4, Informative

The story is interesting, but also full of almost going under, being bought, sold, parent companies going bankrupt and what not..

The Cray we know now shares a name with the Cray that produced the famous Cray supercomputers of old, they also have some nice technology around, but there the similarities stop.

Buzz word. by mtenhagen · 2006-03-24 08:50 · Score: 2, Insightful

While I must admit "Adaptive Supercomputing" sound like a realy cool buzz word, in practice the programmer still will need to adapt the application to the physical distribution of the systems. Or are they going to dynamicly rewire the switches?

There have been several attempts (hpfortran, orca, etc..) to automate parallisme but most of them failed because a skilled programmer could create a much faster application within a few days. And remeber that a 10% performance boost in these applications means thousands of dollars saved.

So I suspect this is just a buzz word.

--
200GB/2TB $7.95 Coupon: SAVE90DOLLAR

Re:Buzz word. by scoobrs · 2006-03-24 09:32 · Score: 2, Informative

Did you RTFA at all?! The article is NOT about automatic parallelization by some special language. Most supercomputer customers are fully aware that writing applications which perform well for their supercomputer requires writing some form of parallel code. The issue at hand is that some specialized problems perform MUCH faster on one platform than another whether it's primarily scalar, vector, threading (hundreds, not two), and even FPGA. The goal is an intelligent compiler that can recognize code segments that perform much better in another architecture and utilize it across a single application in a hybrid system. That's no small task!

--
-Those who would give up essential liberty to purchase temporary safety deserve neither. -Ben Franklin
Re:Buzz word. by flaming-opus · 2006-03-24 10:10 · Score: 2, Insightful

2 years behind in announcements, let's see who brings it to market first.

Sadly the answer is that it's not even a race. SGI brought foreward their first step already, but won't get past that. You can now buy an fpga blade to put in your altix. While cray is just now announcing a unified vision for this, they've already had their fpga based solution since they bought octiga bay 2 years ago.

As much as cray is suffering financially, SGI is in much worse shape, and they have about $350million in debt around their neck, which makes them an unlikely target for a buy-out, at least until they go through bankruptcy for a while. I doubt that SGI has any money to spend on long-term engineering efforts like a vector cpu. They hopped on the fpga bandwagon because they could buy them from xilinks, slap a numalink on them, and stuff them into an altix with relatively little investment. Thus far cray has had a great deal of luck porting bioinformatics codes to the fpga in the xd1. (smith-waterman alignment, if anyone cares.) This is a market much more in line with SGIs market strengths and somewhat new for cray, who is used to selling machines with an entry-level price of $2million.

In any case, it's the logical path foreward for Cray's 4 product lines, even if noone combines vector, fpga, and multithreaded processors. They all benefit from being paired with opteron nodes, and from reducing the number of parts cray has to maintain. SGI is coming from the other direction, which is to add processor types to their interconnect foundation. It's still a good idea, but it's probably more capital-intensive than what SGI's capable off these days.

Re:Old Story by LookoutforChris · 2006-03-24 11:31 · Score: 2, Informative

Just what I was going to say! Project Ultra-Violet is what they're calling it.

SGI has a 2nd generation product based on this: RASC, which is a node board with 2 FPGA chips that integrates (same access to shared memory) with the rest of the machines Itanium node boards.

Sounds like SGI, sadly by hpcanswers · 2006-03-24 13:31 · Score: 2, Insightful

Cray and SGI have both been losing money recently as more users flock to clusters, which tend to be cheaper and more flexible. Now both of them are offering this "adaptability" position. SGI is moving in the direction of blades so customers can choose their level of computing power; Cray will soon have a core machine that customers can build out from. What's interesting to note is that both of them are ultimately selling Linux on commodity processors (Itanium for SGI and Opteron for Cray) plus a proprietary network and a few other bells and whistles. It seems unlikely they'll be able to compete LinuxNetworx or even *gasp* IBM.

Re:Adaptive = Adapting for Survive by some+damn+guy · 2006-03-24 16:42 · Score: 3, Informative

Cray already makes systems based on many thousands of opteron processors. You can't beat them for scalar processing power. But what they also make,and still excel at, is specialized vector machines that can work with them. It's two good, but different tools for different jobs. The improvement is to make the two even more integrated and more flexible.

Slashdot Mirror

Cray Introduces Adaptive Supercomputing

32 of 108 comments (clear)