Blue Gene/L Tops Its Own Supercomputer Record
DIY News writes "Lawrence Livermore National Laboratory and IBM unveiled the Blue Gene/L supercomputer Thursday and announced it's broken its own record again for the world's fastest supercomputer. The 65,536-processor machine can sustain 280.6 teraflops. That's the top end of the range IBM forecast and more than twice the previous Blue Gene/L record of 136.8 teraflops, set when only half the machine was installed."
Imagine a Beowulf cluster of...oh, nevermind.
Click here or here.
Is it still in vogue to request we all imagine we had a "beowulf cluster of blue gene suprtcomputers?"
I wish I was smart enough to have an actual use for this beast. If I were, I'd be able to figure out how to heck to pay the huge electric bill this would generate.
Yep, I never spell check.
More incorrect spellings can be found he
slashdot on THAT! :)
No sig for now.
lets put folding@home (http://folding.stanford.edu/) on that mother!
As always.
They say it can launch Adobe Acrobat Reader in ELEVEN SECONDS!!!
So we can crack RC-72 faster, yipee.
What useful science has the "Earth Simulator" produced? What useful science will this monstrosity produce?
Seem like it's just a wang dangling contest between the large corporations / governments of the industrialized world. About as useful as Decibel Drag Racing.
I wonder how much could be gained via compiler improvements, anyone know what compiler they use?
.. for an OpenOffice to beat Office
..figure out what the hell we are going to be doing for energy in 15 years??
"Look to the future and the present will be safe"
An IBM engineer was caught remarking "And boy can it hold a lot of porn."
The damn thing's smarter than I am. Well, that's taking an estimate of 100 teraflops for the human brain, which seems to be popular.
Real_men_don't_need_spacebars.
Is it bad if the first question that came to mind was "I wonder how quickly I could install Gentoo on that?"
It is odd that this is being reported now since these machines have existed for quite a while now. Don't let the processor count of BG/L influence your interpretation of how big the machine is. The processors are quite a bit slower than traditional super computers (600 Mhz vrs 2 Ghz) this actually makes scalability much easier on BG/L because serial portions of the code run much slower giving you more flexibility in network latency. To bad they are going to classify this machine after a couple months, i hope i can get a chance to play on it before then!
and the answer had better not be 42.
OS-X? Sorry I'm lame...
Computers allow humans to make mistakes at the fastest speeds known, with the possible exception of tequila and handguns
I was on the main computer floor last week looking at the cluster.
Until I was dragged away.
Old news again.
WhoWhatWhereAmI
The legitimate thing that I can imagine is if it was a cost based contract that was given out before the cost of the hardware was known.
Was it?
Back when It was only half installed I got to take a tour of it while it was in Rochester, MN... Got to walk through it and touch it. Turns out the computer that controls blue gene takes up about half as much space as blue gene itself.
That's all nice and that, but what are the frame rates in HL2?
Comment removed based on user account deletion
WOULD YOU LIKE TO PLAY A GAME? aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
I have some very limited experience with this kind of computing, and I don't think the compiler is anywhere near the limiting factor.
I strongly suspect the limiting factor is algorithms. That is, the problem is designing code that can efficiently use a massively parallel machine. It's enormously difficult to even imagine how a problem could be solved by breaking it up into 65,000 mini-problems that can be solved simultaneously, and therefore mostly but not entirely independently. People just don't think that way. (Or rather, they do, but only at such a basic level close to the neurons that they are utterly unaware of how it's done.)
This is one reason "parallel computing" has been the Wave Of The Future(TM) for decades, and exhibits the same kind of "promise" as fusion power -- namely, we are told that ten years from now it will change everything -- and we hear it again every ten years.
Easy - you'd run a huge federal deficit, and let future generations sort it out.
Read more of this story at Slashdot.Read more of this story at Slashdot.Read more of this story at Slashdot.
.. since Quake 4 just hit the shelves.
When it was half done it was less than half the speed? Impressive. Was there a software/OS upgrade along the way, as well?
Only listed a few minutes ago, and it's already been slashdotted...
Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
Here's a picture of the momma: http://en.wikipedia.org/wiki/Image:BlueGeneL-600x4 50.jpg
Way off topic, but your link says Glenn Martyna moved to IBM/Watson. Christ, better buy some IBM stock.
By the way, a trivial point with respect to this: Isn't it relativity, not QM, that forbids superluminal communication? I seem to recall non-relativistic QM with instantaneous action at a distance (e.g. Coulomb's Law) being alive and well in the realm of quantum chemistry, or perhaps really anywhere pair creation is not an issue.
McDonalds, in a bold move to also top its own record, has announced a Race to Myocardial Infarction (tm) sweepstakes which will feature an improved Big Mac that packs an additional 10,000 calories.
but can it run.. windows vista?
What's so hard about releasing these things under an open source license.
I have nothing against donating CPU cycles, but I have yet to find a group that doesn't require me to sign a restrictive software license. And for this particular project, it's a university running it no less. Aren't universities supposed to encourage the spread of information?
(Then again, I'd have to bury my head in the sand and forget about all the patents that universities have amassed, often using tax dollars to fund the research that led to them).
With 1024 processors per rack, and does that mean you only have access to 128 processors? Or only 1/8 of all of the Blue Gene racks? Not very much of a savings, considering 128mil for all the racks, excluding costs for other equipment and not to mention tax.
WOULD YOU LIKE TO PLAY A GAME? aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
If you're going to be an 80s geek, don't half-ass it like most people. The correct line from WarGames is "SHALL WE PLAY A GAME?"
I just pray to God they don't put her on it! Then she would never shut up.
Just in time for... Windows Vista(tm)
Why this relentles quest for speed? Surely if the computer scientist spent a bit more time thinking about the algorithms they used, instead of playing Quake, they would come up with some software desings that didn't requre teraflops of CPU time.
Bresenham's algorithm for line drawing springs to mind. Before that was discovered, drawing a line on a display was a heavily compute-intensive task...
Notice that the performance has actually increased PER proccessor as you add more proccessors... This is very remarkable in computer technology.
Normally when you add cpus to a computer you get a increase in performance, but it doesn't increase linearly with each cpu. You have one cpu you have 100% performance, add one more and you may have 180% the performance and add 2 more you may have 300% of the performance etc etc.
Notice that with half the machine there it got 138 GFlops.
So if you doubled the size of the machine you'd expect to get something like 260 Gflops per second.
But you have 280 Gflops per second.
This pretty much means that as you add cpus the performance of each cpu actually increases slightly. That's a exponentional growth rate, at the beginning of the curve.
Of course there has to be a technical limit to the system and the amount of space, heat, and electricity it can handle.. but technically if you double the size of the cluster again I wouldn't be suprised if you'd get close to 750 GFlops per second performance.
This is some seriously hardcore stuff, the future of computing hardware. Todays supercomputer, tomorrow's desktop.. I can't wait.
""Earth Simulator" supercomputer performs 36 Terra flOps / second. ...Earth Simulators required to model 1 brain = 3.0 x 1017 / 3.6 x 1013 = 8333. ...
1 Brain = 8333 State-of-the-art Supercomputers"
So, unlike what kyle90 posted, you'd actually need 1,443 (rounded up from 1,442.25) of these Blue Gene/L to accurately model a single human brain.
To exceed that would require more!
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
Time for Earth Simulator to make a Walmart run and get some more Athlons to regain the top of the "supercomputer" chart.
The faster they make these things, the slower they sing that damn Daisy song.
65536 processors = 64K processors.
damn that IBM, they take geekiness to just a whole different place.
Funtage Factor: Purple
It still takes 15 seconds to start up OpennOffice.org
"The Chinese use two brush strokes to write the word 'crisis.' One stands for danger; the other for opportunity
The main drawback to forecasting models is that it takes soo long to run all the data, so we have to cut back on the data so that we can actually see what's forecast before it happens. With this this thing running an expanded version of the GFS with 10KM resolution, we might be able to actually get it right for once. ;)
Someone save me from this sanity.
That probably only means that they have optimised the architecture over time as would be expected. Things like improved resource management, a slimmer kernel for each CPU, a better compiler, etc. can easily make up for that small performance gain.
Hmm, and yet I thought the finite speed of light was primarily an empirical fact, and perhaps secondarily a way to prevent silly violations of causality, id est to prevent everything from happening at once. And, that the Lorentz transformation was less a postulate to be applied so much as a consequence to be derived from the more fundamental notion that c should be constant in all reference frames.
/. modspace being kind of unclear.
Alas, I sure hope I've not been laboring under a misapprehension. I would be forced to mod myself down to -1, Doofus. Although if someone has already modded me down to -1, Offtopic or -1, Blithering, I suppose I would be modding myself more across than down -- the topology of
I do know of people who fuss about relativistic corrections to core electron energies, but they seem a clannish, chthonic lot of Stoors, much given to muttering darkly under their caffeinated breath. I avoid 'em. Now, to me the most interesting bold-as-brass entry of relativity into ordinary (e.g. valence-shell) atomic physics is through magnetic fields. Add a pinch of vector potential to your kinetic energy operator, expand, stir, simmer, season lightly -- and, presto, fine-structure constants everywhere. Like toadstools after a good rain.
At which point the sober theorist sits back and looks quite thoughtfully at the trailing Coulumb energy term, with its implicit infinite value of c...
Uh, sorry -- what was it you were saying?
That probably only means that they have optimised the architecture over time
The cynic in me thinks they probably optimised the benchmark.
A pizza of radius z and thickness a has a volume of pi z z a
No wireless. Less space than a nomad. Lame.
The current record for blue gene wasnt running at a very high efficiency (iirc T_peak 0.7 T_max).
Plus you dont have to forget that the machince has 64K _dual core_ cpus, with one core dedicated for communication, thus the classification as 64K cpus.
There could be plenty of room for improvement by utilising this core better.
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
Just because the summary says when the machine was halfway done at 138 GFlops, that definitely does not mean the other half was just putting in the other half of the processors. They probably also meant that the optimizations were not finished as well. Pure linear growth is a dream enough already. If I saw exponential growth I'd crap my pants.
HAL, "You can't do that Dave!"
JH
Nice to see that Cnet continues to get their 'facts' wrong. If this is the finished system (64 compute racks) then the processor count is 131,072. The BlueGene/L system has 2,048 processors per rack; 1024 dual processor nodes per rack.
Cnet reporting is atrocious.
When I was coming up, we had to use MIPS to tell how fast a computer was.
The Admin and the Engineer
at least, right?
The Admin and the Engineer
How's it that the amount of GFlops the machine can do exceeds the sum of the GFlops of the individual processors. At 65 536 processors each capable of 2.8 GFlops, there's 183.5 TFlops avilable in theory no? But this claims 280.6 TFlops. Am I missing something or are there twice as many processors, because they said the machine that was half installed also had 65 536 processors.
OK, so this machine is going to simulate nuclear explosions. Based on other posts, it seems it will be able to do the same work as its predecessor in 1% of the time. Rather than run 100X the number of nuclear simulations in a given timeframe, maybe (just maybe?) the government could use our taxpayer funded supercomputer to do medical research? It seems there is now plenty of horsepower to go around unless they've been stockpiling unused simulation data for decades. Cheers,
Linux is substantially more scaleable now than it was even just 6 months ago (not the vanilla, but quite well tested scaleability patches). This could account for the improvement. I suspect if they ran just half of it now, they'd get a little bit over half the performance (but not much over half - that is how good Linux is these days).
Linux is substantially more scaleable now than it was even just 6 months ago (not the vanilla, but quite well tested scaleability patches).
Perhaps it is, but is has nothing to do with BG, since a) BG doesn't have shared memory, and each 2 cpu node (1 dual core processor) runs its own kernel and b) Linux is only used on the service nodes (the nodes handling disk IO, interactive logins, compiling etc.), not the compute nodes (where the actual action takes place).
I'm quite sure that the improvements are due to tweaking the LINPACK benchmark itself (yes, this is allowed), ESSL libraries (IBM:s version of BLAS), and improving the XL Fortran compiler.
The step from 274 to 280 (really, how did you get 260 when doubling 138?) is much, much smaller than the step from 560 to 750, though. Sorry, but you're essentially talking out of your ass here. :)
quidquid latine dictum sit altum videtur.
We're talking about a computer that can run Doom 3 on full settings.
Most of the codes on the Blue Gene/L at LLNL are coming from earlier ASCI systems and are most likely MPI+Fortran/C codes, possibly with OpenMP around the inner loops in some cases.
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
I understand this machine to be about 2^16 times as fast as my local workstation. Assuming CPU power doubles every 3 years, this would mean my grand (or great-grant) children to get such a machine as their first computer at some christmas or birthday occasion?
Today, we have the supercomputing CPU power (with respect to the 1940-70s of last century) for about every school kid. However, this has not changed society much, at least not to the positive. Or has it? Will it change society with a blue gene being available at Wal Mart in 2050? My point is not the availability of email and networks in general but the computing power that is so much more than required for normal office applications.
Sorry, but I have to disagree with your conclusion that this represents exponential growth.
The effect you speak of (doubling the number of processors giving less than double the final "power") is due to additional overhead - various processors coordinating their work with each other, deciding things like "Should I split this 2 ways or 4?" and so on - and that sort of stuff inevitably increases with the number of processors.
You can use improved algorithms, special-purpose hardware, etc, etc, to minimize this "friction", but it will always exist, and the percentage of processing that is "overhead" will inevitably climb as you increase the number of processors.
It's far more likely that either the earlier number resulted from some inefficiencies that existed then (due to it not being built as designed yet, perhaps), or there have been improvements in the algorithms or infrastructure which give greater efficiencies.
If it's the latter case, if you unplugged the 2nd half of the CPUs and made the measurement again, you'd probably get 150 GFlops or so.
Basically, you could write the equation for total power something like:
X - O - i**x, where X is the number of processors, O is the basic overhead (for doing things like I/O, for example), and c is the incremental cost of adding each processor.
To have what you describe would require that i**x be a negative number, which is like saying that you can have 10 individual conversations in less time than you can have five. Ain't gonna happen.
Check out the power usage for the system and cooling.... 10MW of power (out of the 45MW the place has dedicated...)
meme overuses You!
Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
"That's the top end of the range IBM forecast and more than twice the previous Blue Gene/L record of 136.8 teraflops, set when only half the machine was installed." Correct me if I'm wrong, but isn't this statement a little obvious?
Did you know that you can be apathetic to apathy? Not that I give a shit...
Let's get down to the brass tacks.
What's this baby's Quake III FPS stats?
I for one bow down to our... what's this article about?
here's a closer view of a single cabinet, apparently almost completely assembled.
This one shows the overall design concept for the installation. Here again in a much sexier view
And here is a bluish picture of Gene Simmons which popped up also.
useless sig advice - Read Nabokov.
32 bits of processors should be enough for anyone!
Clones are people two.
Nobody is EVER going to need more than 64K of RAM. Errr, I mean, processors.
...to Cowboy Neal for a stroke to preserve the language. In the headline he properly spells the possessive of it without an apostrophe. In the story DIY also gets it right with "announced it's broken its own record again." English is safe for a while longer. My English degree sated, I eagerly await the mods to off-topic.
I looked here and it is saying that properly encrypted 128bit-key files are uncrackable.
ps. I think i saw that cracking time chart for cracking encoded zip files and it was at but the site is blocked by websense so I can't find it.
Fair enough. I didn't say it was easy to even get basic stuff like your math libraries working well. I can believe that they've been working hard at the basic computing environment, you bet, and doing better all the time. More power to them.
But solving linear algebra efficiently is a well-studied, pretty thoroughly understood programming task. I appreciate the hard work people are doing to get it running, but I don't feel doing so is much evidence that the difficulty of good parallel algorithms for high-level tasks (e.g. docking or protein folding) is at all going away.
Also, in my experience the rate-limiting step of a high-level computation like a big molecular dynamics simulation is never the efficiency of your low-level math libraries. So, again, progress in speeding up benchmarks is not so well correlated with progress in solving actual problems.
That said, I have the feeling that this machine is going to be used largely for brute-force calculations using established algorithms and well-known existing code. I would be surprised if more than a small percentage of its cycles were dedicated to breaking ground with new and improved algorithms. But that is the work that will really pay off down the line. Frankly, I think there are few really break-through advances that have come from brute force computation. But then my training is in paper-and-pencil theory, so I'm probably prejudiced.
its true!
That's what those bipedal carbon units are for. So they can decide how to manage resources.
At least you don't have to be alone! I also noticed the beautifully correct apostrophe usage, and it's made my day just a little bit better. Data geek that I am, language is very important to me.
Lets see it beat my 500 terafaps record set when the videos of that chick with the perfect ass came out.
You're nothing; like me.
There are certain tasks that are easy to split into individual threads. Just about anything graphical deals with single pixels at a time, and complex 3D simulations deal with voxels. A simple 100x100x100 simulation would use 1,000,000 voxels. Each processor would handle a handful of voxels, and communicate with computers simulating joining voxel groups.
Just some more stats:
for a 1024^3 voxel cube, with 64K processors, each processor deals with 16K voxels.
16K^3 voxel cube: 64M voxels per processor. (A few Gigs of ram each should be enough)
"That's so plausible, I can't believe it!" - Leela
The article says that the brain fires at 1000 times a second, but practice shows that we can only process data at 30 pieces of data per second. Plus they forgot that only 10% of our brain is in use at any one time. To taking that into account, you can multiply their estimate by .003.
.003 = 360,000 cpus. (only 5 times the size of the one in the original article, so it could simulate a human brain at 20% speed)
120,000,000 cpus *
But all of that is irrelevant anyways. CPU power isn't the problem. It's knowledge of how the brain works. A honey bee's brain only has 960,000 neurons. Yet they search for food, remember where their home is, they fly, walk, eat, reproduce, and perhaps most importantly, they communicate. (through dance)
960,000 neurons, at 30 "frames per second" and 100% brain use is only 28 million values per second, and less then 4 Megs of RAM. Even if you go with their number of 1000 "frames per second", it's still only 960 million values per second. Well within the processing power of a single computer. That's assuming a neuron can be represented by a single 32 bit integer.
Has anyone simulated a bee's brain yet? No? Well why not? They have complex behavior, they're even able to communicate about how much food they found, and directions on how to get there. (including distances!) No, the real problem is that no one know how the brain actually works. This IBM cluster can simulate an entire hive of honey bees, but they just don't know how. Until then, they'll never simulate a human brain.
"That's so plausible, I can't believe it!" - Leela
I still have my MS Word 4.0 for my Mac Plus. It was on a single sided disk with enough room to store documents. On my plus, at about 8Mhz, Word opened about as fast as the 2000 suite opens on my work computer, which is about 1Ghz. Trust me, office applications will bloat in direct proportion to the speed of your computer.
In 2050, they'll have all these new features though! Like spell and grammar checking all the documents on your HD every time you hit a key. (all loaded and held in RAM, of course) Plus, lots and lots of fancy, animated 3D graphics and dolby 5.1 surround sound for the sound effects. Every time you hit a key, you'll hear a sonar ping wiz past you in your office. (plus a floating, 3D, ray-traced version of the letter you hit rotating at the center of your screen) Even when you turn off the options, they'll still be generated by the CPU, just in case you turn them back on in the middle of an animation or sound playing.
Don't say it won't happen.
"That's so plausible, I can't believe it!" - Leela
Can I get my J2EE webapp to run on it? Does it automatically overcome my bad code? Does it have enough memmory to tolerate the IBM xsl memmory leak?
The real story is that Blue Gene was supposed to run Windows Vista code. At the time, MS supplied an alpha set of code to trial on the machine. A massive concurrent multiprocessor buffer overrun caused a blackhole to develop many miles above the earth. Unfortunately this was the exact same time the that Space Shuttle Columbia was passing through the same space. It in turn caused the shielding in the wing to be ripped away and.. you know the rest.
See Volume 49, Number 2/3, 2005.
Redundancy is good; triple redundancy is twice as good! - Me.
By optimizing for the benchmark the have optimized for a specific class of problem that the cluster may need to do in it's "real" job, ergo they have made better at doing what is supposed to do.
LINPACK is a collection of Fortran subroutines that analyze and solve linear equations and linear least-squares problems.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.