Supercomputer Advancement Slows?

← Back to Stories (view on slashdot.org)

Supercomputer Advancement Slows?

Posted by Soulskill on Friday January 28, 2011 @04:33AM from the moore-flops-moore-problems dept.

kgeiger writes "In the Feb. 2011 issue of IEEE Spectrum online, Peter Kogge, an IEEE Fellow and professor of computer science and engineering at the University of Notre Dame, outlines why we won't see exaflops computers soon. To start with, consuming 67 MW (an optimistic estimate) is going to make a lot of heat. He concludes, 'So don't expect to see a supercomputer capable of a quintillion operations per second appear anytime soon. But don't give up hope, either. [...] As long as the problem at hand can be split up into separate parts that can be solved independently, a colossal amount of computing power could be assembled similar to how cloud computing works now. Such a strategy could allow a virtual exaflops supercomputer to emerge. It wouldn't be what DARPA asked for in 2007, but for some tasks, it could serve just fine.'"

15 of 86 comments (clear)

Min score:

Reason:

Sort:

Less of a matter of can't, but won't by mlts · 2011-01-28 04:44 · Score: 2

In the past, there were a lot of applications that a true supercomputer was needed to be built for to solve, be it basic modeling of weather, rendering stuff for ray-tracing, etc.
Now, most applications are able to be done by COTS hardware. Because of this, there isn't much of a push to keep building faster and faster computers.
So, other than the guys who need the top of the line CPU cycles for very detailed models, such as the modelling used to simulate nuclear testing, there isn't really as big a push for supercomputing as there was in the past.
1. Re:Less of a matter of can't, but won't by vbraga · 2011-01-28 04:51 · Score: 4, Interesting
  
  I don't know if this is true.
  Weather modeling is still done on supercomputers.
  Engineering applications needs high performance computing on a regular basis: geophysics (offshore oil, 4D seismic, ...), materials science (MD, ...), and others. There's also academical problems.
  I've seen a lot of new HPC centers being built or getting new equipment in the last few years (Rio de Janeiro, Brazil). From small CUDA clusters to heavy duty Cray systems (not in Rio, but nearby).
  
  --
  English is not my first language. Corrections and suggestions are welcome.
2. Re:Less of a matter of can't, but won't by dr2chase · 2011-01-28 05:52 · Score: 2
  
  The problem, not immediately obvious, is that if you shrink the grid size in a finite-elements simulation (which describes very very many of them), you must also shrink the time step, because you are modeling changes in the physical world, and it takes less time for change to propagate across a smaller element. And at each time step, everyone must chat with their "neighbors" about the new state of the world. The chatting is what supercomputers do well, compared to a city full of gaming rigs with GPUs.
  
  The constraints of faster chatting also drive up power density.
  
  Another issue with COTS hardware (or any other) is that if you use enough of it, failure of some part becomes a certainty. This means you need redundancy and/or checkpointing, and changes your tolerance for hardware failure (COTS may not be tuned to that design point). Full redundancy uses twice the resources, but means you (almost) never wait; otherwise, checkpoint and recover eats into your wall time.
3. Re:Less of a matter of can't, but won't by Darinbob · 2011-01-28 08:05 · Score: 2
  
  I think the problem here is in calling these "applications". Most super computers are used to run "experiments". Scientists are always going to want to push to the limits of what they can compute. They're unlikely to just think that because a modern desktop is as fast as a super computer a couple decades ago, that they are fine just running the same numbers they ran a couple decades ago too.
Re:Breathtakingly dumb. by c0d3g33k · 2011-01-28 05:15 · Score: 2

So wait. Your answer to "very expensive general purpose machine" is "design many slightly less expensive single purpose machines"? Your "factor of hundred" performance improvement will likely be overshadowed by the "factor of thousand" increase in economic cost.
Provide believable numbers or your argument is bullshit. You may be right, but your style of discourse requires concrete evidence to be at all convincing.
Re:Rent Out My Machine by SuricouRaven · 2011-01-28 05:15 · Score: 2

Two problems:
1. The value of the work your CPU can do is probably less than the extra power it'll consume. Maybe the GPU could it, but then:
2. You are not a supercomputer. Computing power is cheap - unless you're running a cluster of GPUs, it could take a very long time for you to earn even enough to be worth the cost of the payment transaction.

What you are talking about is selling CPU time. It's only had one real application since the days of the mainframe days, and that's in cloud computing as it offers the ability to buy instantly if the customer has a sudden need for more (Eg, Slashdot just linked to their site). It just isn't economically viable right now, because anyone who needs so much processing power they might need to buy it can probably just go and buy their own cluster.
Re:Rent Out My Machine by ceoyoyo · 2011-01-28 05:17 · Score: 3, Insightful

Because nobody uses a real supercomputer for that kind of work. It's much cheaper to buy some processing from Amazon or use a loosely coupled cluster, or write an @Home style app.
Supercomputers are used for tasks where fast communication between processors is important, and distributed systems don't work for these tasks.
So the answer to your question is that tasks that are appropriate for distributed computing are already done that way (and when lots of people are willing to volunteer, why would they pay you?).
More and more applications all the time by mangu · 2011-01-28 05:20 · Score: 2

In the past, there were a lot of applications that a true supercomputer was needed to be built for to solve, be it basic modeling of weather, rendering stuff for ray-tracing, etc.
Now, most applications are able to be done by COTS hardware
It's true, many applications that needed supercomputers in the past can be done by COTS hardware today. But this does not mean there are no applications for bigger computers. As each generation of computers assume the tasks done by the former supercomputers, new applications appear for the next supercomputer.
Take weather modeling, for instance. Today we still can't predict rain accurately. That's not because the modeling itself is not accurate, but because the spatial resolution needed to predict rainfall beyond our computers. Engineers still use wind tunnels, they still have tanks to test ship models, there are many situations where the most powerful computers today cannot perform calculations at the same level of precision one gets from scale models.
And then there are entirely new applications that are way beyond the capacity of our current computers. Drug design is one example, a computer capable of calculating accurately the shape a protein molecule will have given its sequence of amino acids is still a dream.
LA TE N C Y I S F O R E V by tarpitcod · 2011-01-28 05:28 · Score: 3, Insightful

These modern machines which consist of zillions of cores attached over very low bandwidth and high latency link are really not supercomputers for a huge class of applications. Unless your application exhibits extreme memory locality and hardly any interconnect bandwidth / can tolerate long latencies.
The current crop of machines is driven mostly by marketing folks and not by people who really want to improve the core physics like Cray used to.
BANDWIDTH COSTS MONEY, LATENCY IS FOREVER
Take any of these zillion dollar plies of CPU's and just try doing this:
for ( x=0; x .lt. bounds; ++x )
{
humungousMemoryStructure [ x ] = humungousMemoryStructure1 [ x ] * humungousMemoryStructure2 [ randomAddress ] + humungousMemoryStructure3 [ anotherMostlyRandomAddress ] ;
}
It'll suck eggs. You'd be better off with a single liquid nitrogen cooled GaAs/ECL processor surrounded by the fastest memory you can get your hands on all packed into the smallest place you can and cooled with LN or LHe.
Half the problem is that everyone measures performance for publicity with LINPACK MFLOPS. It's a horrible metric.
If you really want to build a great new supercomputer get a (smallish) bunch of smart people together like Cray did, and focus on improving the core issues. Instead of spending all your erfforts on hiding latency, tackle it head on. Figure out how to build a fast processor and cool it. Figure out how to surround it with memory.
Yes,
Customers will still use commodity MPP machines for the stuff that parallelizes.
Customers will still hire mathematicians, and have them look at ways to Map things that seem inherently non local into spaces that are local.
Customers who have money and the mathematicians couldn't help will need your company and your GaAs/ECL or LHe cooled fastest SCALAR / Short Vector box in the world.
It's Von Neuman's fault by ka9dgx · 2011-01-28 05:39 · Score: 2

I read what I thought were the relevant sections of the big PDF file that went along with the article. They know that the actual RAM cell power use would only be 200 KW for an exabyte, but the killer comes when you address it in rows, columns, etc... then it goes to 800KW, and then when you start moving it off chip, etc... it gets to the point where it just can't scale without running a generating station just to supply power.
What if instead of trying to address everything that way, they break up the computing and move it to the data... so that RAM is tied directly to the logic that would use it... it would waste some logic gates, but the power savings would be more than worth it.
Instead of having 8kit rows... just a 16x4 bit look up table would be the basic unit of computation. Globally read/writable at setup time, but otherwise only accessed via single bit connections to neighboring cells. Each cell would be capable of computing 4 single bit operations simultaneously on the 4 bits of input, and passing them to their neighbors.
This bit processor grid (bitgrid) is turing complete, and should be scalable to the exaflop scale, unless I've really missed something. I'm guessing somewhere around 20 megawatts for first generation silicon, then more like 1 megawatt after a few generations.
1. Re:It's Von Neuman's fault by Animats · 2011-01-28 06:07 · Score: 3, Interesting
  
  What if instead of trying to address everything that way, they break up the computing and move it to the data... so that RAM is tied directly to the logic that would use it.
  It's been tried. See Thinking Machines Corporation. Not many problems will decompose that way, and all the ones that will can be decomposed onto clusters.
  The history of supercomputers is full of weird architectures intended to get around the "von Neumann bottleneck". Hypercubes, SIMD machines, dataflow machines, associative memory machines, perfect shuffle machines, partially-shared-memory machines, non-coherent cache machines - all were tried, and all went to the graveyard of bad supercomputing ideas.
  The two extremes in large-scale computing are clusters of machines interconnected by networks, like server farms and cloud computing, and shared-memory multiprocessors with hardware cache consistency, like almost all current desktops and servers. Everything else, with the notable exception of GPUs, has been a failure. Even the Cell, the most widely deployed non-standard architecture ever, was only used in the PS3, and was more trouble than it was worth.
Re:How about those limited edition Gallium chips?? by mangu · 2011-01-28 05:41 · Score: 3, Informative

A little bird informs the world that the US has a supercomputer already running on them, somewhere between 100Ghz-1Thz per processor
Unlikely. If you do the calculations, you'll find that the current 3GHz limit is about as fast as you can get data from other chips on a circuit board. 3GHz is 0.33 nanoseconds period, the time it takes for light to travel ten centimeters in a vacuum. A faster CPU will stay idle most of the time, waiting for the data it requested from other chips to arrive at the speed of light.
67 Megawatts? by Waffle+Iron · 2011-01-28 06:53 · Score: 2

That doesn't seem like a show stopper. In the 1950s, the US Air Force built over 50 vacuum tube SAGE computers for air defense. Each one used up to 3 MW of power and probably wasn't much faster than an 80286. They didn't unplug the last one until the 1980s.
If they get their electricity wholesale at 5 cents/kWh, 67 MW would cost about $30,000,000 per year. That's steep, but probably less than the cost to build and staff the installation.
Yes, a forecast with CURRENT technology by Luke_2010 · 2011-01-28 08:12 · Score: 2

I've read the article (the WHOLE article) and the exaflop issue is generally posed in terms of power requirements in reference to current silicon technlogy and its most strictly related future advancements. The caveat of that is that not even IBM thinks exaflop computing can be achieved with current technology, that's why they are deeply involved with photonic CMOS, of which they have already made the first working prototype. Research into exaflop computing in IBM is largely based on that. You can't achieve the necessary power requirements without moving (at least in part) from electronic to photonic. This will decrease power requirements (and cooling requirements) by a large factor.
A virtual cloud based super computer? by Yaos · 2011-01-28 10:02 · Score: 3, Funny

Why has nobody tried this before? They could easily plow through the data from SETI, fold proteins, or even have a platform for creating and distributing cloud based computing turnkey computing solutions! It's too bad that the cloud was not invented until a year or two ago, this stuff could have probably started out in 1999 if the cloud existed back then.