IBM Sets Supercomputer Speed Record
T.Hobbes writes "IBM's BlueGene/L has set a new speed record at 36.01 TFlops, beating the Earth Simulator's 35.86 TFlops, according to internal IBM testing. 'This is notable because of the fixation everyone has had on the Earth Simulator,' said Dave Turek, I.B.M.'s vice president for the high-performance computing division. The AP story is here; the NY Times' story is here."
I wish I knew what a Tecord was...
/. shoud be using automatic text-box spell checking found in KDE...
Maybe
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
Is that how they measure Records in Teraflops?
Hmmmm..... TFlop + Record = Tecord
IBM Sets Supercomputer Speed Tecord
What exactly is a "Tecord"
Never start vast projects with half-vast ideas.
Someone call Huinness' Nook pf Eorld Tecords!
Hokey statistics and ancient misconceptions are no match for a good thought in your head, kid!
I'm sorry, but those Supercomputers have nothing on my machine running Windows. It has a record of AlwaysFlops.
I'd say rypo.
Just imagine the gaming possibilities on it. Doom 3 would be sear child's play for it.
:: Cyco(k) out
It might be fast, but could it keep up with monitoring all the errors and dupes on /.? ;P
A new tecord?!? That's timpossible! But more seriously, does anyone know if there's an impartial 3rd party that ever confirms these measurements? I'm all for improving technology, but how do they verify their "tecords"?
http://liquidben.com - Aspiring to an 'under construction' gif
Hete's the full text in case of a massive slashdotting of theit setvets:
IBM says Blue Gene bteaks speed tecotd
9/29/2004, 7:27 a.m. ET
By ELLEN SIMON
The Associated Ptess
NEW YOtK (AP) - IBM Cotp. claimed unofficial btagging tights Tuesday as ownet of the wotld's fastest supetcomputet.
Fot thtee yeats tunning, the fastest supetcomputet has been NEC's Eatth Simulatot in Japan.
"The fact that non-U.S. vendot like NEC had the fastest computet was seen as a big challenge fot U.S. computet industty," said Hotst Simon, ditectot of the supetcomputing centet at Lawtence Betkeley National Lab in Califotnia.
"That an Ametican vendot and an Ametican application has won back the No. 1 spot -- that's the main significance of this."
Eatth Simulatot can sustain speeds of 35.86 tetaflops.
IBM said its still-unfinished BlueGene/L System, named fot its ability to model the folding of human ptoteins, can sustain speeds of 36 tetaflops. A tetaflop is 1 ttillion calculations pet second.
Lawtence Livetmote National Labotatoty plans to install the Blue Gene/L system next yeat with 130,000 ptocessots and 64 tacks, half a tennis coutt in size. The labs will use it fot modeling the behaviot and aging of high explosives, asttophysics, cosmology and basic science, lab spokesman Bob Hitschfeld said.
The ptototype fot which IBM claimed the speed tecotd is located in tochestet, Minn., has 16,250 ptocessots and takes up eight tacks of space.
While IBM's speed sets a new benchmatk, the official list of the wotld's fastest supetcomputets will not be teleased until Novembet. A handful of scientists who audit the computets' tepotted speeds publish them on Top500.otg.
Supetcomputing is significant because of its implications fot national secutity as well as such fields as global climate modeling, asttophysics and genetic teseatch.
Supetcomputing technology IBM inttoduced a decade ago has evolved into a $3 billion to $4 billion business fot the company, said Simon.
Unlike the mote specialized atchitectute of the Japanese supetcomputet, IBM's BlueGene/L uses a detivative of commetcially available off-the-shelf ptocessots. It also uses an unusually latge numbet of them.
The tesulting computet is smallet and coolet than othet supetcomputets, teducing its tunning costs, said Hitschfeld. He did not have a dollat figute fot how much lowet Blue Gene's costs will be than othet supetcomputets.
Howevet, othet supetcomputets can do things Blue Gene cannot, such as ptoduce 3-D simulations of nucleat explosions, Hitschfeld said.
Small potatoes make the steak look bigger.
Place your bets, people!
What percentage of posts in the first 15 minutes will be about the spelling of the last word in the title, and what percentage about the content?
G
"I want to play chess against that one" - Kasparov
Would it not be easier in that case for the government to dissolve the people and elect another? - Bertold Brecht
...what operating system it uses. Anybody know?
I wonder if that is sustained ?? ...
I know that when the Mac G5 Cluster was developed they claimed tremendous speed, but when the sustained rate was calculated, it turned out to be much lower
98% of posts will be 0.4 standard deviations away from one of the following:
... 4. PROFIT!!!
0. "fist pr0st!!!!!111~"
1. "Imagine a beowulf cluster of these!"
2. "But does it run Linux?"
3. "In Soviet Russia, SPEED RECORD SETS YUO!"
4. "1. Earth Simulator: 38.56 TFlops. 2. BlueGene/L36.01 TFlops. 3.
5. "I for one, welcome our supercomputer overlords."
6. "Do either of the supercomputers run BSD? BSD is dying."
7. "I didn't have enough time to read the article, but..."
From TFA:
the Blue Gene/L system next year with 130,000 processors and 64 racks, half a tennis court in size.
The prototype for which IBM claimed the speed record is located in Rochester, Minn., has 16,250 processors and takes up eight racks of space.
So does this mean the finished product, with almost 10x as many procs will be much faster still? Or am I reading this wrong?
how does it compare to Shalmaneser???
Unlike the more specialized architecture of the Japanese supercomputer, IBM's BlueGene/L uses a derivative of commercially available off-the-shelf processors. It also uses an unusually large number of them. The resulting computer is smaller and cooler than other supercomputers, reducing its running costs, said Hirschfeld. He did not have a dollar figure for how much lower Blue Gene's costs will be than other supercomputers.
This is the most interesting part of the article to me. Makers of supercomputers are going to go back and forth for the speed record. However, holding the speed record with off the shelf components seems like a separate achievement in and of itself. The article did mention, however, that the IBM system is not as capable as other supercomputers.
http://www.busyweather.com/
More like Iuinness' Dook qf Yorld Tecords.
The beginning letter of each word is two letters after the intentional letter.
"Two wrongs don't make a right. But three rights make a left!" - Cosmo, "The Fairly Odd Parents"
From the NYTime article:
"The new system is notable because it packs its computing power much more densely than other large-scale computing systems. BlueGene/L is one-hundredth the physical size of the Earth Simulator and consumes one twenty-eighth the power per computation, the company said."
1/100th the size and 1/28th the power. Now if that isn't a beautiful thing, I don't know what is.
From AP article:
"However, other supercomputers can do things Blue Gene cannot, such as produce 3-D simulations of nuclear explosions, Hirschfeld said."
They state that Blue Gene L has 16,000 processors, but it's a prototype for the real deal which is going to have 130,000 processors. So, how in god's name could a computer with that much power not be able to simulate a nuclear explosion? Is it just that it would do it too slowly to be useful?
They fixed the typo, it's not funny now.
Get your Unix fortune now!
But how does it stack up to Google's 100.000 CPU cluster?
Here comes the best chess player you've ever seen!
Steal This Sig
For a great deal of detail about this system surf over to this pdf
http://www.busyweather.com/
Yup, nothing better to do this morning but take screen shots of typos.
Get your screen shots here!
Pretty Pictures!
However, other supercomputers can do things Blue Gene cannot, such as produce 3-D simulations of nuclear explosions, Hirschfeld said.
Then what fun is it? Let me guess, it can simulate the growth of a 3000 year old Grand Sequoia though...hippy scientists... Fire up the glx mod and lets get fraggin!
You are about to give someone a piece of your mind, something which you can ill afford...
they are not using websphere on it...
Is beawulf the sister of beowulf?
The Tao of math: The numbers you can count are not the real numbers.
From the Washington Post article:
"IBM's new system nudges past a nearly three-year-old computer speed record of 35.86 "teraflops," or trillions of calculations per second, with a working speed of 36.01 teraflops....The current record-holder, known as the Earth Simulator, is a supercomputer in Yokohama, Japan, designed to simulate earthquakes."
Won't it be great when IBM announces that they built Blue Gene to simulate Japanese earthquakes? Neener neener.
Always a godfather; never a god. -Gore Vidal
The typo is fixed, it's just a matter of time before the Offtopic and Redundant bombs hit..
Hokey statistics and ancient misconceptions are no match for a good thought in your head, kid!
I've heard that the neural network of human brain has calculation speed of 4.4 TFLOPS. How soon these machines will start to THINK? Seems like what we need now is just more storage capacity and some well-written "thinking" software...
It'd be able to deduce the existance of rice pudding and income tax before anyone could get to a switch to turn it off!
Ask 8 slackers a question, get 10 awnsers (a citation, but I can't remember from who)
I find it genuinely amusing that there is a disclaimer on the link you include that says the Earth Simulator is not to be confused with SimEarth. I wonder if there was a board meeting at Maxis to discuss the possible imapct of the release of the Earth Simulator...
As far as the machine being sexy: Its got red spots, that is usually a sign to leave it alone...
I think that they should take this computer just for one day and set it to run Seti@home...that's be a hell of a lot of searching for aliens right there. I bet they'd run out of packets to send you!
Will it run Longhorn?
Well no, the real question is, how many people beat me to that joke?
A few seconds ago I was having fun reading the tecords comments. Now it's corrected.
Killjoy editors.
Anyone have a list of TFlops/squarefeet top ten supercomputer?
-- Hasbullah bin Pit (sebol)
I'll be very interested in seeing how well this thing performs on benchmarks other than linpack.
Blue Gene is a very interesting design in so much as it uses IBM's 32-bit powerpc cores, normally used for embeded applications. They put 2 cores on a die, and integrated a memory controller, as well as the 4 different interconnect networks. The cores are only clocked at about 800mhz, and are thus pretty wimpy individually. However, that can be good. Since the processor cores are quite modest, the ratio of memory bandwidth to CPU flops is quite high. Similarly the ratio of interconnect bandwidth to CPU flops is also very high. Thus the CPUs should run very efficiently on problems that will parallelize to thousands of cpus. Some problems, on the other hand, will perform terribly. I expect a lot of this system's performance depends on the scalability of the system software, and the compilers / libraries.
That said, the earth simulator is also really good at some applications, and not so good at others. Instead of 16,000 small CPUs, it uses 5000 massive vector CPUs. Each is clocked at only 500mhz, but has 8 parallel execution pipes, and about 50GBytes/sec of memory bandwidth. Problems that don't vectorize run through the very modest 500mhz scalar unit.
Earth simulator has realized a large percent of it's theoretical peak performance on real world simulations (often up to 50%) while most large systems approach (10%). I'm looking forward to see how well utilized Blue Gene is. Earth simulator was a direct descendant from NEC's sx-series supercomputers, which have a 20 year lineage. Blue Gene is a radical departure from IBM's regular HPC product offerings, and uses a new microkernel OS rather than clustered AIX nodes. I imagine there will be some stutter-steps in the early days of this new product, which will undoubtedly work themselves out over time.
Great work IBM.
http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?query =teraflop
Cruise TT
When Longhorn ships....
From excellent karma to terible karma with a single +5 funny post...
Jesus Christ, that's a lot of "tecord" jokes. I mean, wow.
Indeed. It must surely be some kind of tecord.
Google has *at least* 50000 computers, each one having *at least 1GHz* processor. BlueGene right now has 16250 processors. I'd say that google can toast BlueGene right now. But read the following:
"...Lawrence Livermore National Laboratory plans to install the Blue Gene/L system next year with 130,000 processors and 64 racks, half a tennis court in size."
130000 processors is something Google can't deal with right now! I'd say any larger search company can get its hands on something like BlueGene. I can't see how is Google going to deal with supercomputers becoming almost a comodity... I guess I'll put my money on processor companies (IBM, Intel, AMD) and stay away from search "giants".
With ~16,000 processors now, and over 130,000 when it goes into production, getting all those CPUs to talk to one another is quite a challenge. Did they use infiniband? Or a proprietary interconnect, perhaps?
Chip H.
That looks like a bunch of HAL's.
I can almost hear them chanting "I'm sorry Dave... I'm sorry Dave..."
The Virginia Tech Supercomputer (take 2) is due to be clocked soon, and its also a huge off-the-shelf system. I'd like to see how they compare.
Also, I'll be big money its already been used for gaming. What college studeny could resist?
"Risc is good..."
Wouldn't help. If they had used a KDE spell checker, it would have been changed to 'rekord'.
I've been down in the basement of the building, see a few of the towers 1/2 loaded (at that time), along with the massive cooling system that was added to the building to keep those racks workings. Lift up a raised floor panel and the 95 LBS of me will get lifted off the ground (or so it feels).
Sadly, all 64 racks will never be in Roch, just not enough space.
Actually, StarTribune has one (crappy) pic of some towers.
When in danger or in doubt, run in circles, scream and shout.
How long does it take it to run an infinite loop?
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Ah, but would that be measured from the top, side, or bottom? There's about an inch difference depending on which side you measure. And with such small differences, you'd have to account for differences in excitement level... {shakes head} Metrologists in locker rooms are really scary.
This box, distcc and a cross compiler sound like a Gentoo wet dream.
Interestingly, why aren't they using Linux as an OS? I didn't see anything about exactly what they are planning on using.
HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
But does it run Doom3? (or, for that matter, Longhorn..?) Imagine a Beowulf cluster of these!
blakespot
-- Heisenberg may have slept here.
iPod Hacks.com
That's spelled "thruster"
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
I would have to say 100% of them. I still haven't seen an actual post about the content.
Lost at C:>. Found at C.
"However, other supercomputers can do things Blue Gene cannot, such as produce 3-D simulations of nuclear explosions, Hirschfeld said."
Sounds like they need a better video card!
_____________________
Huh?
...scandles.
Linux would have too much overhead on the compute nodes. Since they have to write all the drivers for the proprietary stuff anyway, it makes more sense to write a lightweight operating system for the compute nodes that don't need extra stuff like disks, consoles, or complicated schedulers.
Also, Linux doesn't run on non-cache-coherent SMP's, to the best of my knowledge. These machines have nodes that are coherent in L2 and L3, but not coherent in L1. Making that work with Linux could be a very big technical challenge.
The machine does actually use Linux, but only for the I/O nodes, which have disks and stuff attached, and to which users can connect. It's really a case of using the right tool in the right place, and for most of the machine, Linux just isn't that.
People are fixated on the Earth Simulator because of what it does, not how fast it does it. Geeks care about the specs, but the normals care that it's fast enough to model the weather, which is increasingly destructive and scary. The rest of these supercomputers are used for finding oil and "perfecting" weapons, not nearly as inspiring.
--
make install -not war
In Win 2000 this might well be true. When I ran a test on the a speed of my latest program on a 2 GHz machine, the result was in a ratio of 2.2 to 2.7 with the task manager running in the former case. The task manager consumed about 20% CPU when the system was busy.
I realize a spell checker ought to stop as soon as it finishes scanning while task manager never stops.
A supercomputer class speed checker would undoubtedly be fully blown, in the industrial strength class. Such software wouldn't just do the garden variety dictionary compare. It wouldn't be satisfied until it determined whether you used the correct word relative to context, whether you used the correct phrasing, and should not stop until it determined you've elucidated the precise connotation and denotation within the realm of your intentions.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
But how many FPS will it get in Doom 3??
When you have a machine this fast the sampler cannot keep up. As a result what you see is a total distortion of reality. You may have seen this phenomenon with wagon wheel spokes rotating backward when the cart is really rolling. Thus when you read "speed record at 36.01 TFLOPS" your eyes view the letters going backwards and forwards.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
Some google references mention 10-15 watts per node, giving about 250 kilowatts for the 16K node test machine. They were trying to stay under two megawatts for the full blown 130K, 360TF machine planned in a couple years. That is the site power capacity.
How DOOM3 would look on this Piece
"Insert Sig Here"
>the VT cluster has been upgraded to the new and yet unreleased duel 2.3 GHz Xserves with ECC memory. last month was their first live month and they were testing it out by running stuff for the military.
"upgraded", "live", "testing it" - which one is it? What the fuck does that mean - is it ready or not? It's live but it's still under testing? How can it be? It's the tests first, then going live.
And if they're done with it, why don't they publish another (higher, if the upgrade worked) benchmark?
If they haven't finished upgrading it or if it's running slower than the benchmark they published last year, the last result means nothing.
Not to mention that MPI-style clusters can be upgraded rack-by-rack or even node-by-node - if they're ripping it all apart, they haven't set it up properly in the first place!
The same is with this IBM's announcement - the god damned thing is in a fucking lab - it will be a year or so before it's actually implemented. By the time when they actually go online, there well may be some faster cluster online.
Just another piece of PR bullshit.
And it still can't run Doom 3 at 60 fps.
Except that it's not on the most recent Top 500 list anywhere.
Remember how Va. Tech replaced all 1100 G5 nodes with G5 XServes a few months ago? Well, when you do something like that, you have to rerun and resubmit the benchmark. Va. Tech were not able to get the machine back together soon enough to rerun the benchmark in time to make the last list; there's even a big caveat about it on the Top 500 home page.
(It's also not clear that the original version of the Va. Tech machine ever did anything other than run that benchmark, but that's another matter.)
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
Being innerested in protein folding I saw an article on Blue Gene being invented to be used for simulating the folding of a protein with the full use of quantum mechanic calculations. The computer was to be so fast it would take just one year of execution time to simulate a full fold.
A few years the fastest supercomputers were being built to simulate atomic explosions including the first computer to break the teraflops barrier.
The Earth Simulator was built for peaceful purposes. Blue Gene is in name motivated by genetics.
I know atomic bombs explode and kill a lot of people. Those things work. I want to know how proteins fold. Are we to understand that funding for supercomputer research must be driven by the arms race?
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
1. This is an item about IBM, so someone has to mention SCO. 2. "All your Supercomputer are belong to us"
comes from building hardware for a specific task. Unfortunately most of you can't access this little bit of nerd heaven but some incredibly cool hardware architectures are being described at the High Performance Embedded Computing conference. Sky and Mercury have some of their hottest new designs here. How about a machine that can do a 256 mega-sample FFT in real time?, or a self configuring supercomputer on a chip? Of course most of these tricks will never escape the lab except for the speed-ups for rendering engines...one place where gamers and the DOD are driving technology in a dead heat race with lots of winners. Besides, in a few months, something will come along that will go even faster than blue gene.
SLASHDOT: news for people who can't concentrate on work or have no life at all and got tired of yelling back at the TV.
If I remember correctly, the Blue Gene set up has comparitively little ram. 3-D numerical simulations, like I do, almost always need a lot of ram in order to store the values of the field at all the grid points. O(N^3) indeed...
All Abstract Structures of Objects and their Relationships exist.
So how long will it take before a Mac rumor site predicts that this CPU will be in the next PowerBook?
-ch
If I had mod points I would let you have one but I don't so instead I must write this note. Thank you for specifying that your link was to a PDF. It's terrible having to wait for ages acrobat to load because you clicked a PDF in your web browser instead of just loading it in acrobat.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
There simply cannot be a perfectly correct method for doing it.You [not exactly you] measure power by comparing Human Neurons and Computer nodes ,huh? What is the proof that it is correct?
And to top it, though all humans have essentially same structure of brain, a lot could vary.
YOU NEVER KNOW WHAT THE POWER OF HUMAN BRAIN IS
Probably Infinite.
So what happens when some of the 130000 processors fail?Will it continue to work properly?can they be replaced individually? ,what could happen to the full system?
And in a rare case that despipte the cooling provided a processor burns
You comment is asinine at best, abso-fucking-lutely stupid at worst.
Yes, I'm sure they're all not playing Doom3.
I feel it's worth mentioning that if you look at the Earth Simulator's event calendar, you can see the Earth Simulator was tested and working in April of 2002. It's amazing it has taken so long for someone to beat their Linpack score... Two years is an eternity in the computing world. Good job NEC, you had a great run!
... how many FPS?
> No.
> Fine.
Beginning Simulation...
Actually I heard it gets 1Quadrillion PetaFONTops, in that they are able to choose fonts 1 Quadrillion times faster than the average art major using a single Apple.
I wonder how this compares to the one NASA is building, which is being collaborated with Intel and SGI. Since you can't base performance simply on the number of processors, it should be interesting.
A machine actually capable of playing Doom3.
"Anyone who is capable of getting themselves made President should on no account be allowed to do the job."-THG
The bluegene may be faster, but the Earth Simulator sure looks cooler. Obviously, this proves that the Earth Simulator is the superior (superer?) computer.
I worked on it and I didn't even know this got published!
Thanks for pointing this out, now I can add this publication to my cv:)
(I'm M Tubbs, see page 1. I worked on the SIMD FPU)
IBM only used 8 racks to accomplish that speed, while the Earth Simulator has 320 processor node cabinets and 64 interconnect cabinets. The ES's grid design looks neat though.
SproutWorks Software Design
Not all projects work well in a highly distributed kind of setting.
Tasks that can be worked on pretty much independently such as finding primes or the SETI@HOME will work find on whatever system can be cobbled together. (To see if n is a prime you don't care of n-2 or n+2 is also a prime. Every integer can be tested for primeness regardless of other numbers.)
Other kinds of computing need to share a great deal more information between the processors. Weather simulation, geological studies, etc. (To understand what data you have at point x,y it is useful to check the areas around that for something similiar.)
For these kind of tasks, the best system we have come up with far is a super computer with the processors jammed as close to each other as we can put them.
IBM said its still-unfinished BlueGene/L System, named fot its ability to model the folding of human ptoteins, can sustain speeds of 36 tetaflops.
;)
Hopefully we will actually be celebrating 36 tetaflops in 10*18 months = 15 years. Maybe then they'll have the computing capacity to figure out why I can't wake up at 7 for work but I can wake up at 6 for golf
0- Eamonman Proud member of DNRC
This will surely be one of the main highlights at Supercomputing 2004 (Nov 6-12). Expect IBM to present a lot more data at the show. Many people will be interested in seeing their utilization numbers. I've heard a rumor that it may amazingly high. There's also a rumor that IBM has intentionally under-reported the TFlop number by a significant margin. Why they would do that is unclear, unless they were worried about someone else stealing their thunder before the show. We should learn a lot more in a bit over a month.
* Not actual girls
In regards to the Virgin Galatic thread a couple days ago -- To all those who said they'd bet on Japan because they had the "fastest" computer -- looks like many should foot = mouth. Bah -- this has nothing to do with anything. Globalistic(global nationalism? facism?) mindset of everyone v. America is starting to get frighteningly trite. valder.
Yeah, this is all old news now (from yesterday, omg), but you have to love this headline:
IBM creates fastest super model in the world
So pay up!
-Darl
Finally someone beat the Earth Simulator! I'm surprised IBM's stock didn't go sky high.
Marge: You know Homey, The "E" doesn't work on that typewriter. ... no. ... no. ... no ... Earl! ... no ... Bill Simpson.
Homer: We don't need no stinkin' "E".
Restaurant Review?
Eatery Evaluation?
Food Box: Go or No Go by Homer
-- "Guss Who's Coming To Criticiz Dinnr?"
Get your Unix fortune now!
HPCWire
On Wednesday, IBM claimed title to the world's fastest supercomputer by reporting that a Blue Gene/L system sited at the IBM lab in Rochester, Minnesota had achieved a Linpack-benchmark performance of 36.01 TFs/s, narrowly edging out the Linpack-benchmark performance of the Earth Simulator, which is only 35.86 TFs/s. Since the Earth Simulator has a peak performance of 40.96 TFs/s and this particular Blue Gene/L system has a peak performance of just over 45 TFs/s, the Earth Simulator is sustaining a somewhat higher fraction of its peak on Linpack. IBM has accomplished something. Still, before breaking out the champagne, we are advised to step back and try to gain some perspective about what these numbers mean. They certainly mean something but may not bear all the weight that is being put on them by the media and marketing people.
In truth, the Linpack race is becoming a private party for Linux clusters. The winner is just the biggest cluster at the time that doesn't break and has a few robust locality mechanisms that exploit the abundance of local and global spatial and temporal locality in the Linpack benchmark. If the machine being developed at NASA achieves a peak of 50 TFs/s and does not sink beneath 80% efficiency on Linpack, it will beat both the current Earth Simulator and this particular configuration of Blue Gene/L.
But just how valuable is it to win the Linpack race? How much, if anything, does this have to do with developing a general-purpose parallel computer that can tackle the full range of problems this nation needs to solve? What is a _general-purpose_ parallel computer anyway?
The original working title of the most recent High-End Crusader article in HPCwire ("High-End Computing Needs Radical Programming Change" [108384]) was "High-Productivity General-Purpose Parallel Computing". The plan for that article was to combine two themes. First, as has been often argued here, there is a natural division between high-bandwidth applications/algorithms, which---in the _most_ demanding case---engage in frequent fine-grained long- range communication and thus require strongly parallel, high-bandwidth systems in order to be computed efficiently, and low-bandwidth applications/algorithms, which---in the _least_ demanding case---engage in infrequent coarse-grained short-range communication and thus may be computed efficiently on almost any parallel architecture, including weakly parallel, low-bandwidth systems such as Linux clusters.
Second, as has often been suggested here, conventional parallel machines, i.e., clusters of scalar SMP nodes that communicate among themselves using MPI, have become increasingly burdensome to program. In particular, severe nonuniformity of memory access has led to tight coupling of control and data decomposition. This has produced an unfortunate tradeoff between locality and parallelism for high performance on a given architecture. The solution to this problem is a synergistic mix of architectural improvements and improvements in the programming-language system, in particular the design of new programming abstractions and language constructs for general-purpose parallel programming.
Why does language enter here? Well, do computer architects need reminding that architecture and language are inextricably linked and that, to improve either, we need to improve both? Programming languages obviously need architectural support but they themselves lead to such things as 1) relieving the burden of parallel programming to enhance programmer productivity, 2) allowing fine-grained anonymous communication, 3) exploiting diverse forms of parallelism and locality, and 4) driving computer architecture in the right direction.
Also, the current design thrust in high-bandwidth systems is to combine a broad range of parallelism mechanisms with a broad range of locality mechanisms---all compatible with each other---so that no form of parallelism and no form of locality need be left on the "compiler-room" floor. But given our fail