500 Billion Very Specialized FLOPs
sheckard writes: "ABC News is reporting about the world's fastest 'supercomputer,' but the catch is that it doesn't do much by itself. The GRAPE 6 supercomputer computes gravitational force, but needs to be hooked up to a normal PC. The PC does the accounting work, while the GRAPE 6 does the crunching." The giant pendulum of full-steam-ahead specialization vs. all-purpose flexibility knocks down another one of those tiny red pins ...
So along come some doods who said why don't we recursively stick the particles into boxes and then calculate the attraction between the boxes instead and it should be a lot faster. So they tried it and it seemed to work great- it only takes more like 10,000 calculations to do 1000 particles.
Anyway along came some other guys and they were a bit suspicious. They showed that some galaxies fell apart under some conditions with the recursive boxes method, when like they shouldn't. Back to the drawing board.
There are some fixes for this now- they run more slowly, but still a lot faster than the boring way. Still, its better than the end of the universe. Even if it is only a toy universe.
For descriptions of loadsa algorithms, including 'symplectics' which are able to predict the future of the solar system to 1 part in 10^13 ten million years in the future check out this link:
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"A machine that massive is likely to have its own gravitational field and throw off all the calculations!
tee hee
We know 10^12 is tera. But did you know 10^15 is peta and 10^18 is exa?
Will I retire or break 10K?
Make a "dust particle" the size of the Moon, stick it in deep space, and you have a lot of mass for your visible cross-section.
:-)
Nobody really knows how much of that stuff is out there. We know something is there that we don't see from the gravity puts out, but that doesn't mean it has to be something truly exotic.
Cheers,
Ben
My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
The Summer 2000 issue of American Heritage of Invention & Technology has a fascinating article on the specialized code breaking machines that were built and used during World War II.
Mea navis aericumbens anguillis abundat
The truth about gravity is very interesting. However, my knowledge cannot be passed on to you because my life holds greater value than the dissemination of this info (from my point of view). I apologize for my selfishness, but must point out that this what society has taught me.
Search here.
--
He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
It doesnt really say in the article, but it sounded like they didnt use relativity and only used newtonian forces. Any comments, like how accurate the results will be and whether definitive statements are possible (For example, This galaxy will never collide with this one, even with relativistic effects).
EFF's Deep Crack crypto supercomputer supplied 1/3 of the computing power in the latest distributed.net DES challenge. Now, if it could be rebuilt for RC5-64...
Will I retire or break 10K?
Simple: various tasks need different amounts of bandwidth between the nodes to perform the calculation. For distributed.net and SETI@home, every data block is completely independent - the nodes don't need to communicate at all, so you just pipe the work units over the Internet.
Most problems don't break up this well, though - individual parts of the problem can interact with their neighbours, meaning individual nodes need to communicate with each other fairly quickly - a Beowulf cluster, for example. Lots of normal PCs on a fairly fast LAN.
Then, you have a handful of BIG number-crunching problems - like this one - where every part of the problem interacts with every other one. Think of it like a Rubik's cube: you can't just work one block at a time, you need to look at the whole object at once. This take serious bandwidth: the top-end SGI Origin 2800s run at something like 160 Gbyte/sec between nodes (in total).
Here in Cambridge, the Department of Applied Mathematics and Theoretical Physics has an SGI Origin 2000 series box with 64 CPUs - homepage here. (There's a photo of Stephen Hawking next to it somewhere on that site - this is his department.)
Basically, there are jobs clusters of PCs just can't handle. If the choice is between a $100k Beowulf cluster that can't do the job, and a $10m supercomputer which can, the latter is much better value.
Sure if you have the money to burn, go custom. But most of the computing projects out there do not require that kind of "big iron" and couldn't even afford it if they did. Besides, most of the time (unless you are in the DoD or NSA or such-like) you only end up with a small slice of that "big iron" which may or may not be roughly equivalent to being able to run your proggies on a computer that is all yours 24/7.
You're right - most projects don't need this kind of hardware. Some projects - including this one - do need it - either they cough up the big $$$, or the job doesn't get done.
Also, it sounds like you're arguing about ASICs vs. CPU's which is not what this is about at all. ASICs obviously are enormously useful (witness their vast dominance in the market), but it has nothing to do with whether or not you buy some custom supercomuter from SGI or build one yourself out of PCs and ethernet cabling for a fraction of the cost.
You can't build yourself a supercomputer out of PCs and Ethernet. You can build a cluster which will do almost all the jobs a supercomputer can - but not all of them. Some jobs need a supercomputer. A few very specialised jobs need even more muscle - like this one. It uses custom silicon, because that's the only way to get enough CPU horsepower.
No one seems to understand the gravity of the situation.
--
Infuriate left and right
-------
CAIMLAS
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Its the grape boards which are specialized. All they can do is calculate gravitational potentials between particles, nothing else.
The only problem with previous versions of grape (that I know of) is that their precision is a little lower than you'd really like or need for some applications, but otherwise they are very nice for doing large n-body sims.
Doug
Venn ist das nurnstuck git und Slotermeyer? Ya! Beigerhund das oder die Flipperwaldt gersput!
... the Thinking Machines CM-5 ... used Sun servers. I'm sure there are others that used less-powerful system to run mathematical behemoths.
Yup. The Sun Enterprise 10000 (AKA "Starfire") uses a dedicated Ultra 5 as the console/management station. It connects via dedicated ethernet to the Starfire.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
Last time I heard a discussion about supercomputers, someone said that a supercomputer had to have a sustainable throughput of at least 1 Gigaflop.
I always liked the definition, "Any computer that is worth more then you are."
;-)
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
Uh... running a supercomputer from a less-powerful computer is nothing new, and certainly doesn't make it 'specialised'. Historically, the Cray T3D used a Cray Y-MP as a front-end, and the Thinking Machines CM-5 (and CM-200, I think) used Sun servers. I'm sure there are others that used less-powerful system to run mathematical behemoths.
If you re-read the article, you'll see that 500 billion is just ONE OF THE BOARDS in the GRAPE. There are going to be 200 boards in this puppy, making for a machine that's getting 100 petaflops.
Damn fast!
Installing Grape 6
Processor of gravity
Quake sure feels real now
That will help a lot...umm...while landing at Neptune some day.
tera == 2^40; peta == 2^50; exa == 2^60; address space of a 64-bit machine == 16 exabytes
Will I retire or break 10K?
Processors with embedded RAM's have been under research for some time. Check out the IRAM project at Berkeley and the PIM project at University of Michigan and elsewhere. Despite all of the research, though, Processor-in-memory hasn't made it into general use yet.
There are many problems with implementing a system like this in practice. The fabrication process used for DRAM's is completely different from that used for logic. In general, for DRAM you want a *high* capacitance process so that the wells holding your bits don't discharge very quickly -- that way you can refresh less often. In logic you want *low* capacitance so that your gates can switch quickly (high capacitance -> high RC time constant -> slow rise/fall time on gates -> slow clock speed).
Fabricating both with the same set of masks doesn't work particularly well, so you really have to compromise -- you'll basically be making a processor with a RAM process, or vice-versa. Alternately, you could use SRAM, which is nice and fast and is built with a logic process, but is 1/6th the storage density of DRAM. This is why SRAM is used for caches and DRAM is used for main memory.
Having the memory on the same die as the processor definately gives a bandwidth and latency advantage. For instance, when you are on the same die, you can essentially lay as many data lines as you like so that you can make your memory interface as wide as you like.
But another large advantage is the power-savings. Processors consume a great deal of their power in the buffers driving external signals. Basically, driving signals to external devices going through etch is power-expensive, and introduces capacitances that kill some of your speed. Keeping things on die, no such buffers are needed, and a great deal of power is saved.
The first commercial application of the processor-in-memory concept that I am aware of is Neomagic's video cards. They went with PIM not for bandwidth, but for power-conservation, and chip reduction. These characteristics are extremely appealing to portable computing, and thus Neomagic now pretty much owns the laptop market.
In a limited application, such as a 2D graphics card, this is feasible because the card only needs perhaps 4 MB of memory. Placing an entire workstation's main memory (say, 128 MB) on a single die *with* a processor would lead to a ridiculously massive die. Big dies are expensive, lead to low yield and increase design problems with clock skew. Thus, having 128 MB of DRAM slapped onto the same die as your 21264 isn't going to happen in the near future.
Placing a small (4-8 MB) amount of memory on-die, and leaving the rest external is possible, but leads to non-uniform access memory, which complicates software optimization and general performance tuning greatly. It is generally considered undesirable.
Another approach is to build systems around interconnected collections of little processors, each with modest computing power and a small amount (say 8 MB) of memory. Thus, you are essentially building a mini-cluster, where each node is a single chip. This, too, leads to a NUMA situation, but it is more interesting, and many people are pushing it.
PIM's are going to be used more and more, and the massive hunger for bandwidth in 3D-gaming cards very well may drive it to market acceptance. The power consumption adavantages will continue to appeal to portable and embedded markets as well. However, general purpose processors based on this design are unlikely in the near future. This style of design doesn't mesh well with current workstation-type architectures.
A bit of a tangent, but I hope it was informative...
--Lenny
"...I have a room full of ulttra-sparcs crunching away day and night and I can't get anywhere. I can't even prove for sure that gravity exists..."
The solution is trivial.
1. Carry Ultra-Sparc to building rooftop.
2. Drop Ultra-Sparc off building rooftop.
3. If results are disputed, request that critic stand at base of building. Repeat steps 1 & 2.
Being an IBM employee, I feel the need to stand up for the good Mr. Ayd :).
:).
Aww, talk about sour grapes! They've hurt IBM's feelings, because IBM sells really smokin' computers too.
Seriously, I think David misclassified GRAPE 6 quite a bit. I don't think it's quite David's fault, because the article writers don't know the difference between 'supercomputer' and 'attached processor'. ABC News didn't really apply the term 'supercomputer' correctly either.
The term 'supercomputer' is more of a marketing term than anything else. Technical people only use it when they want to describe a general capability. AFAIK there is no concrete definitions of 'supercomputer', and if there were they would likely change daily. GRAPE 6, from the information I can see, is really an attached processor.
Attached processors can be an ARM chip on your network card to a GRAPE 6. Interanally, GRAPE 6 is a full custom, superscalar, massively pipelined, systolic array (say that 5 times fast). That basically means that data comes in one side of the board, and after n clock cycles the answer comes out the other side. There is no code other than a program running on the host computer which generates and consumes data, and every piece of the algorithm is done in hardware.
"What happens when the algorithm changes?" you might ask. Well, then you're screwed. You have to do a whole new board. Many boards use programmable chips as their processing elements, and can reprogram them when bugs or features get added, but these guys appear to be using ASICs. Great for speed, bad for flexibility.
Even though David Ayd was mistaken about the architecture, this idea has been around for quite a while also. The SPLASH 2 project was one of the first successes with this idea. There is also a commercial company selling boards using that idea but with completely up to date components (compared to SPLASH).
Still, in July of 1995, the GRAPE 4 became the world's fastest computer, breaking the 1 teraflop barrier with a peak speed of 1.08 TFLOPS.
Well, we really can't argue with that, can we, Mr. Ayd?
This architecture lends itself to extremely high throughput. It's no surprise that these perform so well. NSA uses architectures just like this to do it's crypto crunching. Brute forcing doesn't look so bad after trying one of these
The opinions I post here have nothing to do with my employer.
Secondly, that's "theoretical peak performance", otherwise known as the "guaranteed not to exceed" performance. On their highly specialized code it'll probably do ok, but on other calculations I'd be surprised if it got 10% of that speed, especially if a lot of cross-node communication is occuring. Don't forget, this is not a general purpose computer, it's like a really really big math co-processor that is optimized to run a very very specific type of program fairly well.
What, me worry?
A paper (PDF format) on its predecessor, GRAPE-5, can be found here. It has more technical detail but it doesn't describe the architecture of the specialized processors. It won the 1999 Gordon Bell price/performance prize.
Mea navis aericumbens anguillis abundat
Last time I heard a discussion about supercomputers, someone said that a supercomputer had to have a sustainable throughput of at least 1 Gigaflop. Is that accurate? If not, what *is* the definition of a supercomputer these days?
c k --interesting, if practically useless, scores...
Which reminds me, if anyone is interested in the "flopsability," to coin a silly-sounding word, of common x86 processors, visit http://www.jc-news.com/parse.cgi?pc/temp/TW/linpa
"The more corrupt the state, the more numerous the laws."--Tacitus, *The Annals*
As an astornomer who does these kind of calculations I shuld point out that this system is not just specialised to solve one type of problem - The N body problems where N is very big - e.g. our galaxy has about 100, billion stars in it - fully specifying their position and velocity would require 4.8 terabytes of memory. We're still a long way away from that... but getting closer. Oh and that's neglectign things like molecular clouds and suchlike which have appreciable mass but aren't stars
I have a cluster of alphas crunching away solar system models - Grape6 couldn't actually do this very well since it's designed for a certain N body algorithm which doesn't suit small N... Instead I use a syplectic integrator which takes advantage of a number of known factors in the problem.
So - we still need bigger and faster machines, but we also need more general machines...
Anyway... I want one of these to model EKO formation in the solar system
General purpose computers get their butt kicked in price and performance by custom silicon, assuming the task is well-defined and not too complicated. These get used a lot in signal processing and decoders for error correction codes.
Mea navis aericumbens anguillis abundat
The problem with anything based on a microprocessor is the pathetic main memory bandwidth. If your program blows out the cache, the performance goes to hell.
A vector supercomputer is designed to have massive memory bandwidth, enough to keep the vector processing units operating at high efficiency. No cache or VM to slow things down. An engineer once told me that a Cray was a multimillion dollar memory system with a CPU bolted on the side.
See the STREAM benchmark web page for some measurements of sustained memory bandwidth. This separates the real computers from the toys.
Mea navis aericumbens anguillis abundat
David Ayd, a supercomputing manager at IBM, says "the GRAPE 6 computer appears to be based on a very old model. In the 1970s and '80s these vector models were developed in Japan for problems like simulating weather and plane mechanics, he said. The difference today is that the computers can do the jobs at 100 times the speed or faster."
... "Be a beacon?"
Aww, talk about sour grapes! They've hurt IBM's feelings, because IBM sells really smokin' computers too. But:
Still, in July of 1995, the GRAPE 4 became the world's fastest computer, breaking the 1 teraflop barrier with a peak speed of 1.08 TFLOPS.
Well, we really can't argue with that, can we, Mr. Ayd?
--
"Give him head?"
"One World, one Web, one Program" - Microsoft Ad
I haven't followed the progress in the field since then, but I suspect present day hardware could handle a good fraction of the satellite image feeds affordably -- and dwarf the realized performance figures of this gravitation board.
Of course, if you want to get really picky about it, there are lots of specialized circuits out there doing work all the time all over the place that could be viewed as "computation" at enormous rates -- it all depends on where you draw the line.
Seastead this.