500 Billion Very Specialized FLOPs
sheckard writes: "ABC News is reporting about the world's fastest 'supercomputer,' but the catch is that it doesn't do much by itself. The GRAPE 6 supercomputer computes gravitational force, but needs to be hooked up to a normal PC. The PC does the accounting work, while the GRAPE 6 does the crunching." The giant pendulum of full-steam-ahead specialization vs. all-purpose flexibility knocks down another one of those tiny red pins ...
Then you could say bye-bye to rc5-64. Perhaps before long you could eat rc5-64s like popcorn and go on to the other challenges at RSA.
I guess this one is a little faster tough...
Every expression is true, for a given value of 'true'
So along come some doods who said why don't we recursively stick the particles into boxes and then calculate the attraction between the boxes instead and it should be a lot faster. So they tried it and it seemed to work great- it only takes more like 10,000 calculations to do 1000 particles.
Anyway along came some other guys and they were a bit suspicious. They showed that some galaxies fell apart under some conditions with the recursive boxes method, when like they shouldn't. Back to the drawing board.
There are some fixes for this now- they run more slowly, but still a lot faster than the boring way. Still, its better than the end of the universe. Even if it is only a toy universe.
For descriptions of loadsa algorithms, including 'symplectics' which are able to predict the future of the solar system to 1 part in 10^13 ten million years in the future check out this link:
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"(sorry, i had to.)
Hmph... while some people worry that it is single purpose, they miss the fun... these people made a really fast computer. That's cool, by itself. It was created at the University of Tokyo, so it is obviously research, and not done as a cost-effective solution. I'm sure people can take lessons learned from this machine, and eventually apply it to a more broader market.
And having it controlled by a PC is no stranger then having your accelerated video card controlled by your computer, and it just doing the 3D video calculations. =^)
-legolas
i've looked at love from both sides now. from win and lose, and still somehow...
Perhaps anything that requires liquid cooling
and comes bundled with two onsite engineers
should be called a "supercomputer"
(the Jobs 'reality distortion field' G4 ads notwithstanding)
or perhaps anything that can crunch thru a
SETI data block in 10 minutes!
MAB
A machine that massive is likely to have its own gravitational field and throw off all the calculations!
tee hee
Seymour Cray's early supercomputers used DEC computers as front ends. The i/o for a Cray was a single connector. The i/o and housekeeping for the Cray, a vector computer, were done by the connected DEC. Seymour Cray was a pioneer in the field of making computers that do one thing, do it well, and do it very quickly.
We know 10^12 is tera. But did you know 10^15 is peta and 10^18 is exa?
Will I retire or break 10K?
What they meant by specialised is not that it is special because it uses a slower machine to feed it. When they mean is that the hardware is special in that it can only perform certain instructions. Normal computers can do general equations but this one has special hardware that makes it do certain operations faster. Think of it this way: If you were to study 10 different languages then you probably wouldn't be come very fluent in any given language. If you were to learn only 1 langauge, on the other hand, you would get really proficient in that langauge. This machine knows only one "language" which makes it faster and that is why it is special.
Make a "dust particle" the size of the Moon, stick it in deep space, and you have a lot of mass for your visible cross-section.
:-)
Nobody really knows how much of that stuff is out there. We know something is there that we don't see from the gravity puts out, but that doesn't mean it has to be something truly exotic.
Cheers,
Ben
My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
Printers? :)
With all this talk of special gravity-computing pipelines, does anyone know if the hardware design is a systolic array? If not, what is it?
Gee I'm dumb. As several people pointed out, it's 100 Teraflops. Well, so much for my theory, "I don't get any dumber after a few beers."
As a consolation, here's a link to IBM's Blue Gene supercomputer. It's still about 5 years off, but it will likely be the first Petaflop computer. It's being built specifically to solve a single problem--modelling protein folding. The best bit is that even at a petaflop, it will take about a YEAR to simulate a single protein.
The truth about gravity is very interesting. However, my knowledge cannot be passed on to you because my life holds greater value than the dissemination of this info (from my point of view). I apologize for my selfishness, but must point out that this what society has taught me.
Search here.
--
He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
Is this a new thing?, newtek has some thing called a screamer a few years back that did the same things for rendering in lightscape.
There is also an other product that i cant remember the name of that acts like a rendering farm for 3D studio it has some custom rendering chips and an alpha for controlling it all. It actually runs linux...
Hey if we want to go on: the older multi processor Macs, had the second processor acting as a slave to the first one.
Im shore there are lots more examples, the story just made me think back on some cool rendering farm solutions that i have come a cross.
Also, it sounds like you're arguing about ASICs vs. CPU's which is not what this is about at all. ASICs obviously are enormously useful (witness their vast dominance in the market), but it has nothing to do with whether or not you buy some custom supercomuter from SGI or build one yourself out of PCs and ethernet cabling for a fraction of the cost.
- One apple
- One planet
The calculations are all but immediate, and the results are impeccable.The planet is actually pretty expensive, but you can borrow it free of charge.
--
Sheesh, evil *and* a jerk. -- Jade
It doesnt really say in the article, but it sounded like they didnt use relativity and only used newtonian forces. Any comments, like how accurate the results will be and whether definitive statements are possible (For example, This galaxy will never collide with this one, even with relativistic effects).
EFF's Deep Crack crypto supercomputer supplied 1/3 of the computing power in the latest distributed.net DES challenge. Now, if it could be rebuilt for RC5-64...
Will I retire or break 10K?
im sort of from the graphics department, and I see the same problems. Right now, the biggest problem for all the graphics hard ware people is the bandwidth to the graphics cards, and basically there are 2 answers to that we are going to see faster memory types(duh!) and embedded ram. this means that the memory is inside the graphics chip.
playstation 2 has this and that is why it has a massive bandwidth of 48 gigs per second. Bitboys has the same technology for the pc so lets hope they can actually release some thing.
I would like to know if any one is working on a processor whit embedded ram?
An other thing is the AGP bus that is just getting way to slow, and i guess that's up to Intel to do some thing about.
Simple: various tasks need different amounts of bandwidth between the nodes to perform the calculation. For distributed.net and SETI@home, every data block is completely independent - the nodes don't need to communicate at all, so you just pipe the work units over the Internet.
Most problems don't break up this well, though - individual parts of the problem can interact with their neighbours, meaning individual nodes need to communicate with each other fairly quickly - a Beowulf cluster, for example. Lots of normal PCs on a fairly fast LAN.
Then, you have a handful of BIG number-crunching problems - like this one - where every part of the problem interacts with every other one. Think of it like a Rubik's cube: you can't just work one block at a time, you need to look at the whole object at once. This take serious bandwidth: the top-end SGI Origin 2800s run at something like 160 Gbyte/sec between nodes (in total).
Here in Cambridge, the Department of Applied Mathematics and Theoretical Physics has an SGI Origin 2000 series box with 64 CPUs - homepage here. (There's a photo of Stephen Hawking next to it somewhere on that site - this is his department.)
Basically, there are jobs clusters of PCs just can't handle. If the choice is between a $100k Beowulf cluster that can't do the job, and a $10m supercomputer which can, the latter is much better value.
Sure if you have the money to burn, go custom. But most of the computing projects out there do not require that kind of "big iron" and couldn't even afford it if they did. Besides, most of the time (unless you are in the DoD or NSA or such-like) you only end up with a small slice of that "big iron" which may or may not be roughly equivalent to being able to run your proggies on a computer that is all yours 24/7.
You're right - most projects don't need this kind of hardware. Some projects - including this one - do need it - either they cough up the big $$$, or the job doesn't get done.
Also, it sounds like you're arguing about ASICs vs. CPU's which is not what this is about at all. ASICs obviously are enormously useful (witness their vast dominance in the market), but it has nothing to do with whether or not you buy some custom supercomuter from SGI or build one yourself out of PCs and ethernet cabling for a fraction of the cost.
You can't build yourself a supercomputer out of PCs and Ethernet. You can build a cluster which will do almost all the jobs a supercomputer can - but not all of them. Some jobs need a supercomputer. A few very specialised jobs need even more muscle - like this one. It uses custom silicon, because that's the only way to get enough CPU horsepower.
Do you not get it? This object only does one thing, can only do one thing, and is unable to do anything else. "Other calculations" are not possible because the algorithms are coded in silicone.
No one seems to understand the gravity of the situation.
--
Infuriate left and right
Anyway, my point in all that was the the Cray's are designed for general purpose computation, even if they aren't designed to be as general as, say, database servers.
Go Badgers! -- #include "std/disclaimer.h"
-------
CAIMLAS
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
From following that link, its amazing how cheap this thing is (grape 5).
$40K including an Alpha host and software. Only $10K for the actual superCruncher. Plus its small, so it shouldn't suck up that much power. This is much more powerful than a cluster of 5-7 linux pcs
Its the grape boards which are specialized. All they can do is calculate gravitational potentials between particles, nothing else.
The only problem with previous versions of grape (that I know of) is that their precision is a little lower than you'd really like or need for some applications, but otherwise they are very nice for doing large n-body sims.
Doug
Venn ist das nurnstuck git und Slotermeyer? Ya! Beigerhund das oder die Flipperwaldt gersput!
... the Thinking Machines CM-5 ... used Sun servers. I'm sure there are others that used less-powerful system to run mathematical behemoths.
Yup. The Sun Enterprise 10000 (AKA "Starfire") uses a dedicated Ultra 5 as the console/management station. It connects via dedicated ethernet to the Starfire.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
Last time I heard a discussion about supercomputers, someone said that a supercomputer had to have a sustainable throughput of at least 1 Gigaflop.
I always liked the definition, "Any computer that is worth more then you are."
;-)
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
roughly eight years ago, i was taking plasma physics, and the professor, who i had a long relationship with (after grading homework for his pascal classes) revealed that galactic simulations (work done at boston university/harvard) had finally achieved something remarkable...
entropy. though most simulations suffer from reversibility (i.e. the system dynamics can be reversed: the simulated system evolves from state_x to state_y, but state_x can be determine exactly from state_y), researchers finally designed simulations that were not reversible (and the entropy correlated so well with theory you could derive boltzman's constant).
anyway, that's how i remember it, a passing comment from a class i really dug, but somewhere after debye shielding, i got lost--tensors can be rather difficult if you've spent most of your time writing code and designing circuits (hohlfeld, bless you, wherever you are ^_^;)
The CM-1 and CM-2 Connection Machines had the same basic idea. The CM-5 was a bit different -- it still had a front end, but the individual processors could be booted to run UNIX (SunOS), and in general were a bit more independent. The CM-1 and CM-2 were pure SIMD. This was actually quite a popular approach in the 1980's; there were lots of startups trying to do much the same thing, ultimately with even less success than Thinking Machines.
A lot of us who had been at TMC in the 1980's liked the CM-2 much more than the CM-5. Architecturally it was very clean. The CM-5 was a much more complicated machine.
The term is "symplectic integrator." You can check out the book "Dynamical Systems and Numerical Analysis" by A.M. Stuart and A.R. Humphries for an introduction and some references. The term refers to an ordinary differential equation solver that preserves the symplectic structure of the evolution semigroup of a Hamiltonian system. (Compare with Hamiltonian conserving methods). Such methods can be more accurate than general ODE solvers applied to a Hamiltonian system.
So, as far as I can tell, the poster made a typo but he isn't bullshitting. But you are probably a troll, so I'm not sure why I'm bothering.
They're coded in silicon, unless the makers of the machine have allied themselves with Dow Corning or something... :)
sorry for the formating. Disregard.
500 billion is a lot of FLOPS. I wonder how it would handle overclocking?
The only question is: "Did they join the /. team on d.net?" ;o)
StarTrek.org Free Webmail
And another item... A FLOP is a FLOP is a FLOP. If it can do a floating-point operation for one thing, it can do it for another.
Uh... running a supercomputer from a less-powerful computer is nothing new, and certainly doesn't make it 'specialised'. Historically, the Cray T3D used a Cray Y-MP as a front-end, and the Thinking Machines CM-5 (and CM-200, I think) used Sun servers. I'm sure there are others that used less-powerful system to run mathematical behemoths.
If you re-read the article, you'll see that 500 billion is just ONE OF THE BOARDS in the GRAPE. There are going to be 200 boards in this puppy, making for a machine that's getting 100 petaflops.
Damn fast!
The custom design ultra high performance on the order of a teraflops machines will still have their place at the top of the pile crunching stuff like quantum chromo-dynamics, simulated nuke blasts, and what-not, but the land of the middle of the line custom built number crunchers (from SGI, Sun, IBM, etc.) is quickly eroding.
It's 100 teraflops
___
Installing Grape 6
Processor of gravity
Quake sure feels real now
This is a very old concept as it has been said, but if you want specicfic tasks done, you build a specialized processor. Now all we need to do is build a GRAPE 7 for SETI or Distributed.net.
Welcome to the Entropy Bar, may I take your order?
That will help a lot...umm...while landing at Neptune some day.
At Drexel University, I had an opportunity to work with a machine based on the GRAPE 4 architecture, and let me tell you, this thing is amazing. Granted it can only do one thing: take in initial conditions and spit out forces (no if/thens or even add/multiplies here), and FAST! We have two supercomputers in one server room: A 64 node beowulf cluster, and a GRAPE machine. For the type of calculations GRAPE is designed for, it is about a hundred times faster than the beowulf cluster, all in the size of a mid tower PC case. Abso-frickin-lutely amazing! Not to mention the fact that our GRAPE system cost us about 10,000 $US, compared to MUCH MORE for the beowulf cluster (I dont have the number on hand). THATS what I call price/performance.
Tell a man that there are 400 Billion stars and he'll believe you
tera == 2^40; peta == 2^50; exa == 2^60; address space of a 64-bit machine == 16 exabytes
Will I retire or break 10K?
Processors with embedded RAM's have been under research for some time. Check out the IRAM project at Berkeley and the PIM project at University of Michigan and elsewhere. Despite all of the research, though, Processor-in-memory hasn't made it into general use yet.
There are many problems with implementing a system like this in practice. The fabrication process used for DRAM's is completely different from that used for logic. In general, for DRAM you want a *high* capacitance process so that the wells holding your bits don't discharge very quickly -- that way you can refresh less often. In logic you want *low* capacitance so that your gates can switch quickly (high capacitance -> high RC time constant -> slow rise/fall time on gates -> slow clock speed).
Fabricating both with the same set of masks doesn't work particularly well, so you really have to compromise -- you'll basically be making a processor with a RAM process, or vice-versa. Alternately, you could use SRAM, which is nice and fast and is built with a logic process, but is 1/6th the storage density of DRAM. This is why SRAM is used for caches and DRAM is used for main memory.
Having the memory on the same die as the processor definately gives a bandwidth and latency advantage. For instance, when you are on the same die, you can essentially lay as many data lines as you like so that you can make your memory interface as wide as you like.
But another large advantage is the power-savings. Processors consume a great deal of their power in the buffers driving external signals. Basically, driving signals to external devices going through etch is power-expensive, and introduces capacitances that kill some of your speed. Keeping things on die, no such buffers are needed, and a great deal of power is saved.
The first commercial application of the processor-in-memory concept that I am aware of is Neomagic's video cards. They went with PIM not for bandwidth, but for power-conservation, and chip reduction. These characteristics are extremely appealing to portable computing, and thus Neomagic now pretty much owns the laptop market.
In a limited application, such as a 2D graphics card, this is feasible because the card only needs perhaps 4 MB of memory. Placing an entire workstation's main memory (say, 128 MB) on a single die *with* a processor would lead to a ridiculously massive die. Big dies are expensive, lead to low yield and increase design problems with clock skew. Thus, having 128 MB of DRAM slapped onto the same die as your 21264 isn't going to happen in the near future.
Placing a small (4-8 MB) amount of memory on-die, and leaving the rest external is possible, but leads to non-uniform access memory, which complicates software optimization and general performance tuning greatly. It is generally considered undesirable.
Another approach is to build systems around interconnected collections of little processors, each with modest computing power and a small amount (say 8 MB) of memory. Thus, you are essentially building a mini-cluster, where each node is a single chip. This, too, leads to a NUMA situation, but it is more interesting, and many people are pushing it.
PIM's are going to be used more and more, and the massive hunger for bandwidth in 3D-gaming cards very well may drive it to market acceptance. The power consumption adavantages will continue to appeal to portable and embedded markets as well. However, general purpose processors based on this design are unlikely in the near future. This style of design doesn't mesh well with current workstation-type architectures.
A bit of a tangent, but I hope it was informative...
--Lenny
"...I have a room full of ulttra-sparcs crunching away day and night and I can't get anywhere. I can't even prove for sure that gravity exists..."
The solution is trivial.
1. Carry Ultra-Sparc to building rooftop.
2. Drop Ultra-Sparc off building rooftop.
3. If results are disputed, request that critic stand at base of building. Repeat steps 1 & 2.
What, me worry?
Perhaps I'm naive, but when they say that this computer is exclusively for calculating gravitational interactions, why could you not make some data substitutions and use it for different calculations?
Step 1) Acquire data on the purchasing behavior and demographic info of a couple of million consumers from some unscrupulous web retail site.
Step 2) Get a few scaling variables on the front- and back-end, replace stellar mass with income, replace stellar velocity with purchasing habits, replace stallar cluster density with population density (or proximity to retail outlets), etc., etc.
Step 3) Run the system to model consumer purchasing decisions for a product you're planning to introduce into the marketplace.
Surveys measure economic activity on a large scale and make broad predictions. Could this be used to more accurately model and predict economic behavior on a more precise scale? The data would be constantly updated, and the models would be constantly rerun to get the most accurate picture possible of how you and I will spend our $$$. Just make sure the the observed isn't aware of the observation, or your models lose their viability.
The man who does not read good books has no advantage over the man who cannot read them. - Mark Twain
Being an IBM employee, I feel the need to stand up for the good Mr. Ayd :).
:).
Aww, talk about sour grapes! They've hurt IBM's feelings, because IBM sells really smokin' computers too.
Seriously, I think David misclassified GRAPE 6 quite a bit. I don't think it's quite David's fault, because the article writers don't know the difference between 'supercomputer' and 'attached processor'. ABC News didn't really apply the term 'supercomputer' correctly either.
The term 'supercomputer' is more of a marketing term than anything else. Technical people only use it when they want to describe a general capability. AFAIK there is no concrete definitions of 'supercomputer', and if there were they would likely change daily. GRAPE 6, from the information I can see, is really an attached processor.
Attached processors can be an ARM chip on your network card to a GRAPE 6. Interanally, GRAPE 6 is a full custom, superscalar, massively pipelined, systolic array (say that 5 times fast). That basically means that data comes in one side of the board, and after n clock cycles the answer comes out the other side. There is no code other than a program running on the host computer which generates and consumes data, and every piece of the algorithm is done in hardware.
"What happens when the algorithm changes?" you might ask. Well, then you're screwed. You have to do a whole new board. Many boards use programmable chips as their processing elements, and can reprogram them when bugs or features get added, but these guys appear to be using ASICs. Great for speed, bad for flexibility.
Even though David Ayd was mistaken about the architecture, this idea has been around for quite a while also. The SPLASH 2 project was one of the first successes with this idea. There is also a commercial company selling boards using that idea but with completely up to date components (compared to SPLASH).
Still, in July of 1995, the GRAPE 4 became the world's fastest computer, breaking the 1 teraflop barrier with a peak speed of 1.08 TFLOPS.
Well, we really can't argue with that, can we, Mr. Ayd?
This architecture lends itself to extremely high throughput. It's no surprise that these perform so well. NSA uses architectures just like this to do it's crypto crunching. Brute forcing doesn't look so bad after trying one of these
The opinions I post here have nothing to do with my employer.
Around 100 GFLOPs, $5 million, these days.
...
Considering a Mac G4 chip peaks at 4 GFLOPS
Supercomputer sometimes means "limited computer".
In exchange for increased performance in some
repect, you lose something in general purpose
computing, such as software tools, programming
generality, adequate peripherals etc.
I wonder how well this would underclock?
Secondly, that's "theoretical peak performance", otherwise known as the "guaranteed not to exceed" performance. On their highly specialized code it'll probably do ok, but on other calculations I'd be surprised if it got 10% of that speed, especially if a lot of cross-node communication is occuring. Don't forget, this is not a general purpose computer, it's like a really really big math co-processor that is optimized to run a very very specific type of program fairly well.
Okay, back in World War 2, they had this problem of having to compute trajectories of artillery. They ended up creating the first electronic computers. Now, years later, we have electronic computers doing almost anything imaginable, and the cutting edge:
Computing trajectories.
(Disclaimer: Yes, I know it's only one of the cutting edges, and yes, I know gravitational interactions aren't strictly the same as trajectories, but the irony remains, okay?)
100 petaflops??? I think you mean 100 teraflops.
What, me worry?
Break out the Alpha coolers and overclock this to kingdom come.
--------
Oscarfish.com: tropical fish with attitude. Way t
A paper (PDF format) on its predecessor, GRAPE-5, can be found here. It has more technical detail but it doesn't describe the architecture of the specialized processors. It won the 1999 Gordon Bell price/performance prize.
Mea navis aericumbens anguillis abundat
Last time I heard a discussion about supercomputers, someone said that a supercomputer had to have a sustainable throughput of at least 1 Gigaflop. Is that accurate? If not, what *is* the definition of a supercomputer these days?
c k --interesting, if practically useless, scores...
Which reminds me, if anyone is interested in the "flopsability," to coin a silly-sounding word, of common x86 processors, visit http://www.jc-news.com/parse.cgi?pc/temp/TW/linpa
"The more corrupt the state, the more numerous the laws."--Tacitus, *The Annals*
that means when i buy one of these suckers i can run seti@home and watch a dvd at the same time.
Be you Admins? nay, we are but lusers!
Stop trolling! The Steve Woston is terribly annoyed at being impersonated by trolls on /. Read what the Real Steve Woston has to say about it here.
no sig
As an astornomer who does these kind of calculations I shuld point out that this system is not just specialised to solve one type of problem - The N body problems where N is very big - e.g. our galaxy has about 100, billion stars in it - fully specifying their position and velocity would require 4.8 terabytes of memory. We're still a long way away from that... but getting closer. Oh and that's neglectign things like molecular clouds and suchlike which have appreciable mass but aren't stars
I have a cluster of alphas crunching away solar system models - Grape6 couldn't actually do this very well since it's designed for a certain N body algorithm which doesn't suit small N... Instead I use a syplectic integrator which takes advantage of a number of known factors in the problem.
So - we still need bigger and faster machines, but we also need more general machines...
Anyway... I want one of these to model EKO formation in the solar system
-jhp
/. -- the Free Republic of technology.
The problem with anything based on a microprocessor is the pathetic main memory bandwidth. If your program blows out the cache, the performance goes to hell.
A vector supercomputer is designed to have massive memory bandwidth, enough to keep the vector processing units operating at high efficiency. No cache or VM to slow things down. An engineer once told me that a Cray was a multimillion dollar memory system with a CPU bolted on the side.
See the STREAM benchmark web page for some measurements of sustained memory bandwidth. This separates the real computers from the toys.
Mea navis aericumbens anguillis abundat
David Ayd, a supercomputing manager at IBM, says "the GRAPE 6 computer appears to be based on a very old model. In the 1970s and '80s these vector models were developed in Japan for problems like simulating weather and plane mechanics, he said. The difference today is that the computers can do the jobs at 100 times the speed or faster."
... "Be a beacon?"
Aww, talk about sour grapes! They've hurt IBM's feelings, because IBM sells really smokin' computers too. But:
Still, in July of 1995, the GRAPE 4 became the world's fastest computer, breaking the 1 teraflop barrier with a peak speed of 1.08 TFLOPS.
Well, we really can't argue with that, can we, Mr. Ayd?
--
"Give him head?"
"One World, one Web, one Program" - Microsoft Ad
I haven't followed the progress in the field since then, but I suspect present day hardware could handle a good fraction of the satellite image feeds affordably -- and dwarf the realized performance figures of this gravitation board.
Of course, if you want to get really picky about it, there are lots of specialized circuits out there doing work all the time all over the place that could be viewed as "computation" at enormous rates -- it all depends on where you draw the line.
Seastead this.