500 Billion Very Specialized FLOPs

← Back to Stories (view on slashdot.org)

500 Billion Very Specialized FLOPs

Posted by timothy on Saturday June 3, 2000 @08:23PM from the oh-the-NSA-would-never-use-anything-like-this dept.

sheckard writes: "ABC News is reporting about the world's fastest 'supercomputer,' but the catch is that it doesn't do much by itself. The GRAPE 6 supercomputer computes gravitational force, but needs to be hooked up to a normal PC. The PC does the accounting work, while the GRAPE 6 does the crunching." The giant pendulum of full-steam-ahead specialization vs. all-purpose flexibility knocks down another one of those tiny red pins ...

11 of 89 comments (clear)

Min score:

Reason:

Sort:

Gravity simulation algorithms by WolfWithoutAClause · 2000-06-03 17:19 · Score: 5

Actually gravity simulation is pretty cool algorithmically as well as hardware wise. Originally the gravity simulators had to work out the attraction between every pair of particles. This meant that if you simulate 1000 particles they had to do 1000,000 calculations. Slow.
So along come some doods who said why don't we recursively stick the particles into boxes and then calculate the attraction between the boxes instead and it should be a lot faster. So they tried it and it seemed to work great- it only takes more like 10,000 calculations to do 1000 particles.
Anyway along came some other guys and they were a bit suspicious. They showed that some galaxies fell apart under some conditions with the recursive boxes method, when like they shouldn't. Back to the drawing board.
There are some fixes for this now- they run more slowly, but still a lot faster than the boring way. Still, its better than the end of the universe. Even if it is only a toy universe.
For descriptions of loadsa algorithms, including 'symplectics' which are able to predict the future of the solar system to 1 part in 10^13 ten million years in the future check out this link:

--
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"
It's really sad seeing all these 'funny' posts by A+nonymous+Coward · 2000-06-03 22:29 · Score: 3

No one seems to understand the gravity of the situation.

--

--
Infuriate left and right
"Specialised"? by JoeyLemur · 2000-06-03 15:35 · Score: 3

Uh... running a supercomputer from a less-powerful computer is nothing new, and certainly doesn't make it 'specialised'. Historically, the Cray T3D used a Cray Y-MP as a front-end, and the Thinking Machines CM-5 (and CM-200, I think) used Sun servers. I'm sure there are others that used less-powerful system to run mathematical behemoths.
Reread the article.500 billion? Pah!100 Trillion! by IvyMike · 2000-06-03 15:38 · Score: 4

If you re-read the article, you'll see that 500 billion is just ONE OF THE BOARDS in the GRAPE. There are going to be 200 boards in this puppy, making for a machine that's getting 100 petaflops.

Damn fast!
Haiku by 575 · 2000-06-03 19:12 · Score: 5

Installing Grape 6
Processor of gravity
Quake sure feels real now
There is a lot of work in Processor-in-memory by slothbait · 2000-06-03 23:34 · Score: 4

Processors with embedded RAM's have been under research for some time. Check out the IRAM project at Berkeley and the PIM project at University of Michigan and elsewhere. Despite all of the research, though, Processor-in-memory hasn't made it into general use yet.

There are many problems with implementing a system like this in practice. The fabrication process used for DRAM's is completely different from that used for logic. In general, for DRAM you want a *high* capacitance process so that the wells holding your bits don't discharge very quickly -- that way you can refresh less often. In logic you want *low* capacitance so that your gates can switch quickly (high capacitance -> high RC time constant -> slow rise/fall time on gates -> slow clock speed).

Fabricating both with the same set of masks doesn't work particularly well, so you really have to compromise -- you'll basically be making a processor with a RAM process, or vice-versa. Alternately, you could use SRAM, which is nice and fast and is built with a logic process, but is 1/6th the storage density of DRAM. This is why SRAM is used for caches and DRAM is used for main memory.

Having the memory on the same die as the processor definately gives a bandwidth and latency advantage. For instance, when you are on the same die, you can essentially lay as many data lines as you like so that you can make your memory interface as wide as you like.

But another large advantage is the power-savings. Processors consume a great deal of their power in the buffers driving external signals. Basically, driving signals to external devices going through etch is power-expensive, and introduces capacitances that kill some of your speed. Keeping things on die, no such buffers are needed, and a great deal of power is saved.

The first commercial application of the processor-in-memory concept that I am aware of is Neomagic's video cards. They went with PIM not for bandwidth, but for power-conservation, and chip reduction. These characteristics are extremely appealing to portable computing, and thus Neomagic now pretty much owns the laptop market.

In a limited application, such as a 2D graphics card, this is feasible because the card only needs perhaps 4 MB of memory. Placing an entire workstation's main memory (say, 128 MB) on a single die *with* a processor would lead to a ridiculously massive die. Big dies are expensive, lead to low yield and increase design problems with clock skew. Thus, having 128 MB of DRAM slapped onto the same die as your 21264 isn't going to happen in the near future.

Placing a small (4-8 MB) amount of memory on-die, and leaving the rest external is possible, but leads to non-uniform access memory, which complicates software optimization and general performance tuning greatly. It is generally considered undesirable.

Another approach is to build systems around interconnected collections of little processors, each with modest computing power and a small amount (say 8 MB) of memory. Thus, you are essentially building a mini-cluster, where each node is a single chip. This, too, leads to a NUMA situation, but it is more interesting, and many people are pushing it.

PIM's are going to be used more and more, and the massive hunger for bandwidth in 3D-gaming cards very well may drive it to market acceptance. The power consumption adavantages will continue to appeal to portable and embedded markets as well. However, general purpose processors based on this design are unlikely in the near future. This style of design doesn't mesh well with current workstation-type architectures.

A bit of a tangent, but I hope it was informative...
--Lenny
I need this NOW!!!!!! by BiggestPOS · 2000-06-03 15:56 · Score: 4

My study of gravity has long been hindered by not enough computer power. I have a room full of ulttra-sparcs crunching away day and night and I can't get anywhere. I can't even prove for sure that gravity exists. I get laughed out of all the conferences, people don't return my calls, I must simply have this machine to prove my theories. I'll take 4, wait, make it 3, i'll just overclock them. Hmm, now, about porting linux to it.....

--
What, me worry?
GRAPE-5 by Detritus · 2000-06-03 16:06 · Score: 3

A paper (PDF format) on its predecessor, GRAPE-5, can be found here. It has more technical detail but it doesn't describe the architecture of the specialized processors. It won the 1999 Gordon Bell price/performance prize.

--
Mea navis aericumbens anguillis abundat
Special Problems. by szyzyg · 2000-06-03 20:21 · Score: 3

As an astornomer who does these kind of calculations I shuld point out that this system is not just specialised to solve one type of problem - The N body problems where N is very big - e.g. our galaxy has about 100, billion stars in it - fully specifying their position and velocity would require 4.8 terabytes of memory. We're still a long way away from that... but getting closer. Oh and that's neglectign things like molecular clouds and suchlike which have appreciable mass but aren't stars

I have a cluster of alphas crunching away solar system models - Grape6 couldn't actually do this very well since it's designed for a certain N body algorithm which doesn't suit small N... Instead I use a syplectic integrator which takes advantage of a number of known factors in the problem.

So - we still need bigger and faster machines, but we also need more general machines...

Anyway... I want one of these to model EKO formation in the solar system
Re:What's the latest definition of "supercomputer" by Detritus · 2000-06-03 16:41 · Score: 4

I would split them into two types, classic supercomputers like Cray vector systems, and massively parallel collections of microprocessor modules with high-speed interconnects.
The problem with anything based on a microprocessor is the pathetic main memory bandwidth. If your program blows out the cache, the performance goes to hell.
A vector supercomputer is designed to have massive memory bandwidth, enough to keep the vector processing units operating at high efficiency. No cache or VM to slow things down. An engineer once told me that a Cray was a multimillion dollar memory system with a CPU bolted on the side.
See the STREAM benchmark web page for some measurements of sustained memory bandwidth. This separates the real computers from the toys.

--
Mea navis aericumbens anguillis abundat
FIR Filters and Neural Networks by Baldrson · 2000-06-03 16:52 · Score: 3

Back in 1989, I cut a deal with Datacube whereby, in exchange for testing their new image flow software, I was allowed to hang a bunch of their Finite Impulse Response filter boards together and achieve several billion operations per second doing neural image processing. FIR filters do sum of weighted product calculations on sequences of data (in this case, rectangular region of interest of video data) and do them all in hardware -- at a constant rate. So peak rate is the same as average rate. This allowed one to train the system to recognize features that could not be exctracted via analytic algorithms at a blazingly high speed. Unfortunately, even though the system would only cost around $200,000 at that time, the only market interest was from government shops who had some serious Not Invented Here cultures.
I haven't followed the progress in the field since then, but I suspect present day hardware could handle a good fraction of the satellite image feeds affordably -- and dwarf the realized performance figures of this gravitation board.
Of course, if you want to get really picky about it, there are lots of specialized circuits out there doing work all the time all over the place that could be viewed as "computation" at enormous rates -- it all depends on where you draw the line.

--
Seastead this.