GRAPE6, Now With GNU/Linux Frontend, At 32 TFlops
You can also get a baby-grape, see pictures on http://www.astro.umd.edu/~teuben/pics/japan/09/p70 90014.html which runs a good fraction of a TFlop, and will cost somewhere around 10k$.
I have some more pictures on http://www.astro.umd.edu/~teuben/pics/japan/08/ which shows the 1/4 size Grape6 running 32 Gflop. The final full version would cost about 1M$. Compare that to the AsciWhite at 12 Tflop for 100M$. Drawback of course is that the Grape only computes things similar to the gravitational N-body problem (also useful for pharmaceutical industries).
Btw, also spent some time in Akihabara on sunday, I guess we're deprived on the US east coast, the amount of DVD writers you can get here is amazing. Also very popular here seem to be all kinds of embedded units, e.g. the GPS in your car to not get lost in Tokyo!
There was an ABC news story earlier in the year on the GRAPE, but at the time it was running alpha's with their unix. They have now fully switched to linux, and this system has been running since July 5."
The interesting thing to me here is how well some simple special purpose hardware can do at certain classes of problems. This sort of flies in the face of the trend towards generals COTS hardware and general languages for computation.
The last time I saw cool and useful specialized hardware was the EFF's cracking machine that won the distributed contest.
We talk about, for example, Java being fast enough to compete with compiled languages, but the fact of the matter is that a general system could not achieve anywhere near 32TFlops peak performance on standard PC clusters where you really just need raw computational speed. I think some other people mentioned that SIMD will get you in the GFlops range, but that is 3 orders of magnitude below the Grapes machines.
Before Seymour Cray was killed, one of the last thing he was working on was a project aiming for Petaflops performance. You can see just what a high goal that still is. (A Petaflop is 1000 Teraflops!)
I remember when transputers used to be advertised a lot in Byte and other computer magazines. I wonder if we'll ever see a return of something similar. Grapes seems pretty specialized, but something a little more general like FPGA add on boards might be a good way to get good price/performance on a PC base (i.e. using a PC cluster instead of an expensive supercomputer). The applications would be limited to computationally intensive things. But, for example, 3D rendering for movie animation might be better done on more specialized hardware.
-Kevin
The write-up for this article is just a tad bit misleading. The 32 TFLOPS figure is the "theoretical peak". This is a favorite number for hardware manufacturers to quote, since the theoretical peak far, far exceeds what anyone will see in practice, even when solving the most amenable of problems. To suppose that this hardware will get anywhere near 32 TFLOPS during actual use is just nuts.
:)
(Yes, I know that its limited hardware. It's just sorta expected.)
It looks like the GRAPE boards should be re-configurable. Why'd they do it with custom chips?
A whack of FPGA's should be pretty decent, but you can configure it for more than just as a N-body gravitational problem.
BTW, Akhihabara is over-rated. There's a WHACK of stuff there that we don't get in North. America. But wander for a few hours, and you soon realize that the same store exists on every block, repeated over and over and over...
Besides Akihabara doesn't usually have the best prices. I loafed around all over Tokyo, and it usually has the highest prices. Just pop over on the train to Ikebukero or something, they'll have the same mega chain stores (just not repeated every block) and usually lower prices.
I found digital cameras weren't cheaper or better. The MD players kick ass! 320min playtime per disk now in about the size of 3 3.5" floppies stacked up.
And there there's the colour, digital cell phones about 1/2 the size of ours for about $10-50US. Woohoo!
Checked out the latest Alteras?2 -i ndex.html
http://www.altera.com/products/devices/apex2/ap
One thing you get with a FPGA, parallelism, you can have as many execution units as you have gates to implement and if you need more, add chips, you do have to pay the IO penalty, but it can work out to single cycle operations without any of the pipeline stalls you get in a general purpose processor.
The other nice thing, most if these parts are reprogrammable, so algorithm tweaks are possible.
Starman97@Gmail.com (bring it on spammers)
Maybe now we can compute the emotional state your girlfriends mind, and know when to just hit the doghouse before we hit the door.
make Linux, not Microsoft. sin(beast) = -0.809016994374947424102293417182819
That is six years we will hit that, and in 7 years beat that limit. Now to see who's right.
make Linux, not Microsoft. sin(beast) = -0.809016994374947424102293417182819
That's how they do work. A 100Tflop grape6 machine would have 3072 individual units powering it.
make Linux, not Microsoft. sin(beast) = -0.809016994374947424102293417182819
Grape6 is 32Tflops, not 32Gflops. You are off by three orders of magnitude. COTS cannot achieve anywhere near this level of speed currently.
As a matter of fact, my advisor (who's at that conference - leaving me back here to play with my new laptop while my workstation analyzes simulations for me) has one. Just 4 nodes with one board each for now to get the code working, but will be scaled up when we're confident in the code. :-)=
[TMB]
If you read the paper at http://astrogrape.org about the prototype GRAPE6 system then you would have notice, according to the paper, when they actually did a simulation of the evolution of a galactic nucleus containing triple massive black holes, they only got about half the theoretical peak performance of the prototype.
We're running some tests later this month, shipping data from the Tokyo GRAPE farm across TransPac to the Indianapolis HPSS silo, testing differentiated QoS [another whole thread, involving Napster and GriPhyN]. The idea is to eventually send slices of the data on to the American Museum of Natural History Planetarium in New York, linking three "specialized instruments."
Yow! These "*BSD is Dying" posts are getting weirder and weirder...
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
I should have waited for more of the grape website to actually load, and install chinese text support. I am dumb and apologize for wasting your time. Below is some useless crap I concocted to try to save my ass from looking like an idiot. I failed, just like I failed out of college and at just about every other aspect of my miserable fucking life. It is funny how that after a while you get used it, and sleep a lot to pass the time--because if you don't, you end up wanting to impale your head on a pointy wrought-iron fence and just be done with it.
I guess the specialized boards are doing far more floating point operations per set of data sent to them then I thought. I didn't see how they could do this without saturating the bus to the number crunching hardware -- or especially saturating the PC's cpu itself, because it would have to send the information to the boards, retrieve the results, and do I/O.
man tunefs | grep fish
I find it hard to believe that any 1.7ghz PC can pull off 8 Teraflops, as the article states. (32/4) A single Gigaflop is attainable, but depends on what FLOPS weighting they are using....e.g. flops1, flops2...etc, which have different amounts of floating point divides, additions, and the like.
-Mike (on a 24 MegaFlop Indigo2)
man tunefs | grep fish
the kind of RC5 rate one of these would get!
I think there's a few corrections necessary.
Ha! I kill me!
Sounds like a catapult.
Drawback of course is that the Grape only computes things similar to the gravitational N-body problem (also useful for pharmaceutical industries).
So? If it's Turing-complete I can read slashdot on it -- or any other app.[1] Just a question of how long...
after all, linux itself was a hack to get unix onto x86...
[1] of course Slashdot will run equally slowly. But imagine your {FPS title} frame rate!
~
Something is wrong, I could actually read the article, and... see the pictures! wow.
room101 -- how much can you stand before they break you?
(they always break you eventually)
its not being served by the GRAPE6.
The Technonaut
Aherm...it's not always best, humor-wise, to go for the low-hanging fruit...
Although it brakes my hart to see my precious Karma go, I do believe that the 'Redundant' moderation of my earlier post is most approriate!
I hate to correct someone, but a dual-G4 set-up running at 733 mHz will get you 7 gigaflops. Not too shabby, now is it?
do they have seeds?
Having a vinyard would be quite the cluster of GRAPEs.
...through the GRAPEvine?
Worldcom - Generation Duh!
Reason is the Path to God - Anon
Fast?
Nope... the GRAPES are each one sweet system!
Ever need an online dictionary?
enough people did answer that. but in that vain I should comment that some collegues of mine added some assembler code (incidentally for the same Nbody code). He's been using the 3DNow SIMD instruction set on the AMD directly, and was able to get about 2 billion PP interactions in 45 seconds, which translates to about 2 Gflop on a 1.2 GHz AMD (his math). With 8-10 of such athlons they could compete wiht the Grape5 in speed. Of course that's still far from the Grape6 speed. But depending on your problem and budget, you can still get pretty far with COTS.
Actually, I think it would be called a "bunch" of GRAPEs.
--
--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
FPGA's are, unfortunately, slow.
They're made with a worse process than custom chips. For inner loops, you want as fast as you can get. You pay for programmability, and if it's always the same task, special-purpose is best.
It's like the difference between hand-assembled code and a compiler. You get it easier with the compiler, but hand-assembling can be better when you know the specifics.
The n-body gravitational problem is going to be around for a while, so it makes sense to customize to it.
What??? A machine like would cost one Microsoft? Either I have been sleeping thru all this time while inflation is running rampant, or M$ is not worth that much anymore.
Give up?
A: Nothing, it just made a little Wine.
Keeping
Does this mean that other OSes (cough, Windows, cough) should have sour grapes?
"It's comin' back around again..." -RATM
How slashdot slows scientific progress in the world:
1. Oh look, and interesting story on academic research on slashdot.
2. Oh look, a lovely link to those poor academic's website. Surely they have the $40k necessary to make a server that can handle the load from slashdot?
3. Oh look, the reeking Sun Ultra 5 that they were using for web duties has burst into flame, destroying the lab and scaring a small puppy that lives in the lab next door.
To hell with you slashdot for burning puppies.
A: GRAPEs and chess-playing computers, such as the one that tackled Kasparov (Deep Blue?), both accomplish their opening-up-of-cans-of-mathematical-whoopass via the same approach: functions in the innermost loops are done via calls to special-purpose hardcare cards. The rest is done with software.
So, say I take a GRAPE, and replace its special N-body gravitational daughtercard with one containing a few FPGAs programmed for, say, RC5; now I have a cracking machine. And then reprogram the FPGA to do image manipulation instead; now I have a renderer to make my own Toy Story. And then reprogram the FPGA to do, etc, etc.
Of course, I'm still lacking the software. So actually this post is mostly babbling. :-)
You cannot apply a technological solution to a sociological problem. (Edwards' Law)