Supercomputer Advancement Slows?
kgeiger writes "In the Feb. 2011 issue of IEEE Spectrum online, Peter Kogge, an IEEE Fellow and professor of computer science and engineering at the University of Notre Dame, outlines why we won't see exaflops computers soon. To start with, consuming 67 MW (an optimistic estimate) is going to make a lot of heat. He concludes, 'So don't expect to see a supercomputer capable of a quintillion operations per second appear anytime soon. But don't give up hope, either. [...] As long as the problem at hand can be split up into separate parts that can be solved independently, a colossal amount of computing power could be assembled similar to how cloud computing works now. Such a strategy could allow a virtual exaflops supercomputer to emerge. It wouldn't be what DARPA asked for in 2007, but for some tasks, it could serve just fine.'"
In the past, there were a lot of applications that a true supercomputer was needed to be built for to solve, be it basic modeling of weather, rendering stuff for ray-tracing, etc.
Now, most applications are able to be done by COTS hardware. Because of this, there isn't much of a push to keep building faster and faster computers.
So, other than the guys who need the top of the line CPU cycles for very detailed models, such as the modelling used to simulate nuclear testing, there isn't really as big a push for supercomputing as there was in the past.
So wait. Your answer to "very expensive general purpose machine" is "design many slightly less expensive single purpose machines"? Your "factor of hundred" performance improvement will likely be overshadowed by the "factor of thousand" increase in economic cost.
Provide believable numbers or your argument is bullshit. You may be right, but your style of discourse requires concrete evidence to be at all convincing.
Two problems:
1. The value of the work your CPU can do is probably less than the extra power it'll consume. Maybe the GPU could it, but then:
2. You are not a supercomputer. Computing power is cheap - unless you're running a cluster of GPUs, it could take a very long time for you to earn even enough to be worth the cost of the payment transaction.
What you are talking about is selling CPU time. It's only had one real application since the days of the mainframe days, and that's in cloud computing as it offers the ability to buy instantly if the customer has a sudden need for more (Eg, Slashdot just linked to their site). It just isn't economically viable right now, because anyone who needs so much processing power they might need to buy it can probably just go and buy their own cluster.
Because nobody uses a real supercomputer for that kind of work. It's much cheaper to buy some processing from Amazon or use a loosely coupled cluster, or write an @Home style app.
Supercomputers are used for tasks where fast communication between processors is important, and distributed systems don't work for these tasks.
So the answer to your question is that tasks that are appropriate for distributed computing are already done that way (and when lots of people are willing to volunteer, why would they pay you?).
In the past, there were a lot of applications that a true supercomputer was needed to be built for to solve, be it basic modeling of weather, rendering stuff for ray-tracing, etc.
Now, most applications are able to be done by COTS hardware
It's true, many applications that needed supercomputers in the past can be done by COTS hardware today. But this does not mean there are no applications for bigger computers. As each generation of computers assume the tasks done by the former supercomputers, new applications appear for the next supercomputer.
Take weather modeling, for instance. Today we still can't predict rain accurately. That's not because the modeling itself is not accurate, but because the spatial resolution needed to predict rainfall beyond our computers. Engineers still use wind tunnels, they still have tanks to test ship models, there are many situations where the most powerful computers today cannot perform calculations at the same level of precision one gets from scale models.
And then there are entirely new applications that are way beyond the capacity of our current computers. Drug design is one example, a computer capable of calculating accurately the shape a protein molecule will have given its sequence of amino acids is still a dream.
These modern machines which consist of zillions of cores attached over very low bandwidth and high latency link are really not supercomputers for a huge class of applications. Unless your application exhibits extreme memory locality and hardly any interconnect bandwidth / can tolerate long latencies.
The current crop of machines is driven mostly by marketing folks and not by people who really want to improve the core physics like Cray used to.
BANDWIDTH COSTS MONEY, LATENCY IS FOREVER
Take any of these zillion dollar plies of CPU's and just try doing this: .lt. bounds; ++x )
for ( x=0; x
{
humungousMemoryStructure [ x ] = humungousMemoryStructure1 [ x ] * humungousMemoryStructure2 [ randomAddress ] + humungousMemoryStructure3 [ anotherMostlyRandomAddress ] ;
}
It'll suck eggs. You'd be better off with a single liquid nitrogen cooled GaAs/ECL processor surrounded by the fastest memory you can get your hands on all packed into the smallest place you can and cooled with LN or LHe.
Half the problem is that everyone measures performance for publicity with LINPACK MFLOPS. It's a horrible metric.
If you really want to build a great new supercomputer get a (smallish) bunch of smart people together like Cray did, and focus on improving the core issues. Instead of spending all your erfforts on hiding latency, tackle it head on. Figure out how to build a fast processor and cool it. Figure out how to surround it with memory.
Yes,
Customers will still use commodity MPP machines for the stuff that parallelizes.
Customers will still hire mathematicians, and have them look at ways to Map things that seem inherently non local into spaces that are local.
Customers who have money and the mathematicians couldn't help will need your company and your GaAs/ECL or LHe cooled fastest SCALAR / Short Vector box in the world.
I read what I thought were the relevant sections of the big PDF file that went along with the article. They know that the actual RAM cell power use would only be 200 KW for an exabyte, but the killer comes when you address it in rows, columns, etc... then it goes to 800KW, and then when you start moving it off chip, etc... it gets to the point where it just can't scale without running a generating station just to supply power.
What if instead of trying to address everything that way, they break up the computing and move it to the data... so that RAM is tied directly to the logic that would use it... it would waste some logic gates, but the power savings would be more than worth it.
Instead of having 8kit rows... just a 16x4 bit look up table would be the basic unit of computation. Globally read/writable at setup time, but otherwise only accessed via single bit connections to neighboring cells. Each cell would be capable of computing 4 single bit operations simultaneously on the 4 bits of input, and passing them to their neighbors.
This bit processor grid (bitgrid) is turing complete, and should be scalable to the exaflop scale, unless I've really missed something. I'm guessing somewhere around 20 megawatts for first generation silicon, then more like 1 megawatt after a few generations.
A little bird informs the world that the US has a supercomputer already running on them, somewhere between 100Ghz-1Thz per processor
Unlikely. If you do the calculations, you'll find that the current 3GHz limit is about as fast as you can get data from other chips on a circuit board. 3GHz is 0.33 nanoseconds period, the time it takes for light to travel ten centimeters in a vacuum. A faster CPU will stay idle most of the time, waiting for the data it requested from other chips to arrive at the speed of light.
That doesn't seem like a show stopper. In the 1950s, the US Air Force built over 50 vacuum tube SAGE computers for air defense. Each one used up to 3 MW of power and probably wasn't much faster than an 80286. They didn't unplug the last one until the 1980s.
If they get their electricity wholesale at 5 cents/kWh, 67 MW would cost about $30,000,000 per year. That's steep, but probably less than the cost to build and staff the installation.
I've read the article (the WHOLE article) and the exaflop issue is generally posed in terms of power requirements in reference to current silicon technlogy and its most strictly related future advancements. The caveat of that is that not even IBM thinks exaflop computing can be achieved with current technology, that's why they are deeply involved with photonic CMOS, of which they have already made the first working prototype. Research into exaflop computing in IBM is largely based on that. You can't achieve the necessary power requirements without moving (at least in part) from electronic to photonic. This will decrease power requirements (and cooling requirements) by a large factor.
Why has nobody tried this before? They could easily plow through the data from SETI, fold proteins, or even have a platform for creating and distributing cloud based computing turnkey computing solutions! It's too bad that the cloud was not invented until a year or two ago, this stuff could have probably started out in 1999 if the cloud existed back then.