Slashdot Mirror


500 Billion Very Specialized FLOPs

sheckard writes: "ABC News is reporting about the world's fastest 'supercomputer,' but the catch is that it doesn't do much by itself. The GRAPE 6 supercomputer computes gravitational force, but needs to be hooked up to a normal PC. The PC does the accounting work, while the GRAPE 6 does the crunching." The giant pendulum of full-steam-ahead specialization vs. all-purpose flexibility knocks down another one of those tiny red pins ...

32 of 89 comments (clear)

  1. Gravity simulation algorithms by WolfWithoutAClause · · Score: 5
    Actually gravity simulation is pretty cool algorithmically as well as hardware wise. Originally the gravity simulators had to work out the attraction between every pair of particles. This meant that if you simulate 1000 particles they had to do 1000,000 calculations. Slow.

    So along come some doods who said why don't we recursively stick the particles into boxes and then calculate the attraction between the boxes instead and it should be a lot faster. So they tried it and it seemed to work great- it only takes more like 10,000 calculations to do 1000 particles.

    Anyway along came some other guys and they were a bit suspicious. They showed that some galaxies fell apart under some conditions with the recursive boxes method, when like they shouldn't. Back to the drawing board.

    There are some fixes for this now- they run more slowly, but still a lot faster than the boring way. Still, its better than the end of the universe. Even if it is only a toy universe.

    For descriptions of loadsa algorithms, including 'symplectics' which are able to predict the future of the solar system to 1 part in 10^13 ten million years in the future check out this link:

    --

    -WolfWithoutAClause

    "Gravity is only a theory, not a fact!"
  2. Inaccurate! by Tom7 · · Score: 2

    A machine that massive is likely to have its own gravitational field and throw off all the calculations!

    tee hee

  3. The next two powers of 1000 are by yerricde · · Score: 2

    We know 10^12 is tera. But did you know 10^15 is peta and 10^18 is exa?

    --
    Will I retire or break 10K?
  4. A lot of dark matter could be big dust :-) by tilly · · Score: 2

    Make a "dust particle" the size of the Moon, stick it in deep space, and you have a lot of mass for your visible cross-section.

    Nobody really knows how much of that stuff is out there. We know something is there that we don't see from the gravity puts out, but that doesn't mean it has to be something truly exotic. :-)

    Cheers,
    Ben

    --
    My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
  5. Re:Crypto apps by Detritus · · Score: 2
    The NSA has their own chip fabrication facility at Fort Meade.

    The Summer 2000 issue of American Heritage of Invention & Technology has a fascinating article on the specialized code breaking machines that were built and used during World War II.

    --
    Mea navis aericumbens anguillis abundat
  6. I would tell, but government will kill me. by Yardley · · Score: 2

    The truth about gravity is very interesting. However, my knowledge cannot be passed on to you because my life holds greater value than the dissemination of this info (from my point of view). I apologize for my selfishness, but must point out that this what society has taught me.

    Search here.

    --

    --
    He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
  7. Do they use the theory of relativity? by Osram · · Score: 2

    It doesnt really say in the article, but it sounded like they didnt use relativity and only used newtonian forces. Any comments, like how accurate the results will be and whether definitive statements are possible (For example, This galaxy will never collide with this one, even with relativistic effects).

  8. Two words: Deep Crack. by yerricde · · Score: 2

    EFF's Deep Crack crypto supercomputer supplied 1/3 of the computing power in the latest distributed.net DES challenge. Now, if it could be rebuilt for RC5-64...

    --
    Will I retire or break 10K?
  9. Re:bah! by Cyberdyne · · Score: 2
    Ummm, how exactly does one supercomputer that costs over a million dollars (US) that performs at the same level as a collection of computers that costs a few 10's of thousands of dollars metamorphose into "better price"?

    Simple: various tasks need different amounts of bandwidth between the nodes to perform the calculation. For distributed.net and SETI@home, every data block is completely independent - the nodes don't need to communicate at all, so you just pipe the work units over the Internet.

    Most problems don't break up this well, though - individual parts of the problem can interact with their neighbours, meaning individual nodes need to communicate with each other fairly quickly - a Beowulf cluster, for example. Lots of normal PCs on a fairly fast LAN.

    Then, you have a handful of BIG number-crunching problems - like this one - where every part of the problem interacts with every other one. Think of it like a Rubik's cube: you can't just work one block at a time, you need to look at the whole object at once. This take serious bandwidth: the top-end SGI Origin 2800s run at something like 160 Gbyte/sec between nodes (in total).

    Here in Cambridge, the Department of Applied Mathematics and Theoretical Physics has an SGI Origin 2000 series box with 64 CPUs - homepage here. (There's a photo of Stephen Hawking next to it somewhere on that site - this is his department.)

    Basically, there are jobs clusters of PCs just can't handle. If the choice is between a $100k Beowulf cluster that can't do the job, and a $10m supercomputer which can, the latter is much better value.

    Sure if you have the money to burn, go custom. But most of the computing projects out there do not require that kind of "big iron" and couldn't even afford it if they did. Besides, most of the time (unless you are in the DoD or NSA or such-like) you only end up with a small slice of that "big iron" which may or may not be roughly equivalent to being able to run your proggies on a computer that is all yours 24/7.

    You're right - most projects don't need this kind of hardware. Some projects - including this one - do need it - either they cough up the big $$$, or the job doesn't get done.

    Also, it sounds like you're arguing about ASICs vs. CPU's which is not what this is about at all. ASICs obviously are enormously useful (witness their vast dominance in the market), but it has nothing to do with whether or not you buy some custom supercomuter from SGI or build one yourself out of PCs and ethernet cabling for a fraction of the cost.

    You can't build yourself a supercomputer out of PCs and Ethernet. You can build a cluster which will do almost all the jobs a supercomputer can - but not all of them. Some jobs need a supercomputer. A few very specialised jobs need even more muscle - like this one. It uses custom silicon, because that's the only way to get enough CPU horsepower.

  10. It's really sad seeing all these 'funny' posts by A+nonymous+Coward · · Score: 3

    No one seems to understand the gravity of the situation.

    --

  11. Grapes of Wrath by CAIMLAS · · Score: 2
    Grapes of Wrath, eh? Hrm, I wonder where GRAPE 1, 2, 3, 4, and 5 went to? Someone probably ate them as they became obsolete.

    -------
    CAIMLAS

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
  12. Re:"Specialised"? by drudd · · Score: 2

    Its the grape boards which are specialized. All they can do is calculate gravitational potentials between particles, nothing else.

    The only problem with previous versions of grape (that I know of) is that their precision is a little lower than you'd really like or need for some applications, but otherwise they are very nice for doing large n-body sims.

    Doug

    --
    Venn ist das nurnstuck git und Slotermeyer? Ya! Beigerhund das oder die Flipperwaldt gersput!
  13. Yeah.... other Suns, for example by DragonHawk · · Score: 2

    ... the Thinking Machines CM-5 ... used Sun servers. I'm sure there are others that used less-powerful system to run mathematical behemoths.

    Yup. The Sun Enterprise 10000 (AKA "Starfire") uses a dedicated Ultra 5 as the console/management station. It connects via dedicated ethernet to the Starfire.

    --

    dragonhawk@iname.microsoft.com
    I do not like Microsoft. Remove them from my email address.
  14. The long-term definition of "supercomputer" by DragonHawk · · Score: 2

    Last time I heard a discussion about supercomputers, someone said that a supercomputer had to have a sustainable throughput of at least 1 Gigaflop.

    I always liked the definition, "Any computer that is worth more then you are."

    ;-)

    --

    dragonhawk@iname.microsoft.com
    I do not like Microsoft. Remove them from my email address.
  15. "Specialised"? by JoeyLemur · · Score: 3

    Uh... running a supercomputer from a less-powerful computer is nothing new, and certainly doesn't make it 'specialised'. Historically, the Cray T3D used a Cray Y-MP as a front-end, and the Thinking Machines CM-5 (and CM-200, I think) used Sun servers. I'm sure there are others that used less-powerful system to run mathematical behemoths.

    1. Re:"Specialised"? by MustardMan · · Score: 2

      I happen to be fortunate enough to work on a machine modeled off the GRAPE 4 architecture, so let me clarify. Grape performs ONE calculation. Period. It can't multiply, or divide, or do flow control (if/then). All it can do, is calculate gravitational force between two objects. FAST. We have a 64 node beowulf cluster, and a single GRAPE machine the size of a mid tower pc. For the work it was designed for, our grape machine is nearly a hundred times faster than the beowulf cluster. And the machine cost us less than 10,000 dollars (Compared to quite a bit more for the beowulf cluster)


      Tell a man that there are 400 Billion stars and he'll believe you

  16. Reread the article.500 billion? Pah!100 Trillion! by IvyMike · · Score: 4

    If you re-read the article, you'll see that 500 billion is just ONE OF THE BOARDS in the GRAPE. There are going to be 200 boards in this puppy, making for a machine that's getting 100 petaflops.

    Damn fast!

  17. Haiku by 575 · · Score: 5

    Installing Grape 6
    Processor of gravity
    Quake sure feels real now

  18. Wonder where else can we use this? by da_King · · Score: 2
    Any plans for equipping new space-ships with this computer?

    That will help a lot...umm...while landing at Neptune some day.

  19. Almost. by yerricde · · Score: 2

    tera == 2^40; peta == 2^50; exa == 2^60; address space of a 64-bit machine == 16 exabytes

    --
    Will I retire or break 10K?
  20. There is a lot of work in Processor-in-memory by slothbait · · Score: 4

    Processors with embedded RAM's have been under research for some time. Check out the IRAM project at Berkeley and the PIM project at University of Michigan and elsewhere. Despite all of the research, though, Processor-in-memory hasn't made it into general use yet.

    There are many problems with implementing a system like this in practice. The fabrication process used for DRAM's is completely different from that used for logic. In general, for DRAM you want a *high* capacitance process so that the wells holding your bits don't discharge very quickly -- that way you can refresh less often. In logic you want *low* capacitance so that your gates can switch quickly (high capacitance -> high RC time constant -> slow rise/fall time on gates -> slow clock speed).

    Fabricating both with the same set of masks doesn't work particularly well, so you really have to compromise -- you'll basically be making a processor with a RAM process, or vice-versa. Alternately, you could use SRAM, which is nice and fast and is built with a logic process, but is 1/6th the storage density of DRAM. This is why SRAM is used for caches and DRAM is used for main memory.

    Having the memory on the same die as the processor definately gives a bandwidth and latency advantage. For instance, when you are on the same die, you can essentially lay as many data lines as you like so that you can make your memory interface as wide as you like.

    But another large advantage is the power-savings. Processors consume a great deal of their power in the buffers driving external signals. Basically, driving signals to external devices going through etch is power-expensive, and introduces capacitances that kill some of your speed. Keeping things on die, no such buffers are needed, and a great deal of power is saved.

    The first commercial application of the processor-in-memory concept that I am aware of is Neomagic's video cards. They went with PIM not for bandwidth, but for power-conservation, and chip reduction. These characteristics are extremely appealing to portable computing, and thus Neomagic now pretty much owns the laptop market.

    In a limited application, such as a 2D graphics card, this is feasible because the card only needs perhaps 4 MB of memory. Placing an entire workstation's main memory (say, 128 MB) on a single die *with* a processor would lead to a ridiculously massive die. Big dies are expensive, lead to low yield and increase design problems with clock skew. Thus, having 128 MB of DRAM slapped onto the same die as your 21264 isn't going to happen in the near future.

    Placing a small (4-8 MB) amount of memory on-die, and leaving the rest external is possible, but leads to non-uniform access memory, which complicates software optimization and general performance tuning greatly. It is generally considered undesirable.

    Another approach is to build systems around interconnected collections of little processors, each with modest computing power and a small amount (say 8 MB) of memory. Thus, you are essentially building a mini-cluster, where each node is a single chip. This, too, leads to a NUMA situation, but it is more interesting, and many people are pushing it.

    PIM's are going to be used more and more, and the massive hunger for bandwidth in 3D-gaming cards very well may drive it to market acceptance. The power consumption adavantages will continue to appeal to portable and embedded markets as well. However, general purpose processors based on this design are unlikely in the near future. This style of design doesn't mesh well with current workstation-type architectures.

    A bit of a tangent, but I hope it was informative...
    --Lenny

  21. The solution is trivial by Guppy · · Score: 2

    "...I have a room full of ulttra-sparcs crunching away day and night and I can't get anywhere. I can't even prove for sure that gravity exists..."

    The solution is trivial.

    1. Carry Ultra-Sparc to building rooftop.
    2. Drop Ultra-Sparc off building rooftop.
    3. If results are disputed, request that critic stand at base of building. Repeat steps 1 & 2.

  22. What these boards really are... by Silverpike · · Score: 2

    Being an IBM employee, I feel the need to stand up for the good Mr. Ayd :).

    Aww, talk about sour grapes! They've hurt IBM's feelings, because IBM sells really smokin' computers too.

    Seriously, I think David misclassified GRAPE 6 quite a bit. I don't think it's quite David's fault, because the article writers don't know the difference between 'supercomputer' and 'attached processor'. ABC News didn't really apply the term 'supercomputer' correctly either.

    The term 'supercomputer' is more of a marketing term than anything else. Technical people only use it when they want to describe a general capability. AFAIK there is no concrete definitions of 'supercomputer', and if there were they would likely change daily. GRAPE 6, from the information I can see, is really an attached processor.

    Attached processors can be an ARM chip on your network card to a GRAPE 6. Interanally, GRAPE 6 is a full custom, superscalar, massively pipelined, systolic array (say that 5 times fast). That basically means that data comes in one side of the board, and after n clock cycles the answer comes out the other side. There is no code other than a program running on the host computer which generates and consumes data, and every piece of the algorithm is done in hardware.

    "What happens when the algorithm changes?" you might ask. Well, then you're screwed. You have to do a whole new board. Many boards use programmable chips as their processing elements, and can reprogram them when bugs or features get added, but these guys appear to be using ASICs. Great for speed, bad for flexibility.

    Even though David Ayd was mistaken about the architecture, this idea has been around for quite a while also. The SPLASH 2 project was one of the first successes with this idea. There is also a commercial company selling boards using that idea but with completely up to date components (compared to SPLASH).

    Still, in July of 1995, the GRAPE 4 became the world's fastest computer, breaking the 1 teraflop barrier with a peak speed of 1.08 TFLOPS.

    Well, we really can't argue with that, can we, Mr. Ayd?


    This architecture lends itself to extremely high throughput. It's no surprise that these perform so well. NSA uses architectures just like this to do it's crypto crunching. Brute forcing doesn't look so bad after trying one of these :).

    --
    The opinions I post here have nothing to do with my employer.
  23. Re:Reread the article.500 billion? Pah!100 Trillio by ZanshinWedge · · Score: 2
    First off, that's 100 teraflops, not petaflops.

    Secondly, that's "theoretical peak performance", otherwise known as the "guaranteed not to exceed" performance. On their highly specialized code it'll probably do ok, but on other calculations I'd be surprised if it got 10% of that speed, especially if a lot of cross-node communication is occuring. Don't forget, this is not a general purpose computer, it's like a really really big math co-processor that is optimized to run a very very specific type of program fairly well.

  24. I need this NOW!!!!!! by BiggestPOS · · Score: 4
    My study of gravity has long been hindered by not enough computer power. I have a room full of ulttra-sparcs crunching away day and night and I can't get anywhere. I can't even prove for sure that gravity exists. I get laughed out of all the conferences, people don't return my calls, I must simply have this machine to prove my theories. I'll take 4, wait, make it 3, i'll just overclock them. Hmm, now, about porting linux to it.....

    --
    What, me worry?
  25. GRAPE-5 by Detritus · · Score: 3

    A paper (PDF format) on its predecessor, GRAPE-5, can be found here. It has more technical detail but it doesn't describe the architecture of the specialized processors. It won the 1999 Gordon Bell price/performance prize.

    --
    Mea navis aericumbens anguillis abundat
  26. What's the latest definition of "supercomputer"??? by Sir_Winston · · Score: 2

    Last time I heard a discussion about supercomputers, someone said that a supercomputer had to have a sustainable throughput of at least 1 Gigaflop. Is that accurate? If not, what *is* the definition of a supercomputer these days?

    Which reminds me, if anyone is interested in the "flopsability," to coin a silly-sounding word, of common x86 processors, visit http://www.jc-news.com/parse.cgi?pc/temp/TW/linpac k --interesting, if practically useless, scores...

    --


    "The more corrupt the state, the more numerous the laws."--Tacitus, *The Annals*
  27. Special Problems. by szyzyg · · Score: 3

    As an astornomer who does these kind of calculations I shuld point out that this system is not just specialised to solve one type of problem - The N body problems where N is very big - e.g. our galaxy has about 100, billion stars in it - fully specifying their position and velocity would require 4.8 terabytes of memory. We're still a long way away from that... but getting closer. Oh and that's neglectign things like molecular clouds and suchlike which have appreciable mass but aren't stars

    I have a cluster of alphas crunching away solar system models - Grape6 couldn't actually do this very well since it's designed for a certain N body algorithm which doesn't suit small N... Instead I use a syplectic integrator which takes advantage of a number of known factors in the problem.

    So - we still need bigger and faster machines, but we also need more general machines...

    Anyway... I want one of these to model EKO formation in the solar system

  28. Re:bah! by Detritus · · Score: 2

    General purpose computers get their butt kicked in price and performance by custom silicon, assuming the task is well-defined and not too complicated. These get used a lot in signal processing and decoders for error correction codes.

    --
    Mea navis aericumbens anguillis abundat
  29. Re:What's the latest definition of "supercomputer" by Detritus · · Score: 4
    I would split them into two types, classic supercomputers like Cray vector systems, and massively parallel collections of microprocessor modules with high-speed interconnects.

    The problem with anything based on a microprocessor is the pathetic main memory bandwidth. If your program blows out the cache, the performance goes to hell.

    A vector supercomputer is designed to have massive memory bandwidth, enough to keep the vector processing units operating at high efficiency. No cache or VM to slow things down. An engineer once told me that a Cray was a multimillion dollar memory system with a CPU bolted on the side.

    See the STREAM benchmark web page for some measurements of sustained memory bandwidth. This separates the real computers from the toys.

    --
    Mea navis aericumbens anguillis abundat
  30. Quick-to-diss IBM by alexburke · · Score: 2

    David Ayd, a supercomputing manager at IBM, says "the GRAPE 6 computer appears to be based on a very old model. In the 1970s and '80s these vector models were developed in Japan for problems like simulating weather and plane mechanics, he said. The difference today is that the computers can do the jobs at 100 times the speed or faster."

    Aww, talk about sour grapes! They've hurt IBM's feelings, because IBM sells really smokin' computers too. But:

    Still, in July of 1995, the GRAPE 4 became the world's fastest computer, breaking the 1 teraflop barrier with a peak speed of 1.08 TFLOPS.

    Well, we really can't argue with that, can we, Mr. Ayd?

    --
    "Give him head?" ... "Be a beacon?"

    "One World, one Web, one Program" - Microsoft Ad

  31. FIR Filters and Neural Networks by Baldrson · · Score: 3
    Back in 1989, I cut a deal with Datacube whereby, in exchange for testing their new image flow software, I was allowed to hang a bunch of their Finite Impulse Response filter boards together and achieve several billion operations per second doing neural image processing. FIR filters do sum of weighted product calculations on sequences of data (in this case, rectangular region of interest of video data) and do them all in hardware -- at a constant rate. So peak rate is the same as average rate. This allowed one to train the system to recognize features that could not be exctracted via analytic algorithms at a blazingly high speed. Unfortunately, even though the system would only cost around $200,000 at that time, the only market interest was from government shops who had some serious Not Invented Here cultures.

    I haven't followed the progress in the field since then, but I suspect present day hardware could handle a good fraction of the satellite image feeds affordably -- and dwarf the realized performance figures of this gravitation board.

    Of course, if you want to get really picky about it, there are lots of specialized circuits out there doing work all the time all over the place that could be viewed as "computation" at enormous rates -- it all depends on where you draw the line.