Wintel, Universities Team On Parallel Programming
kamlapati writes in with a followup from the news last month that Microsoft and Intel are funding a laboratory for research into parallel computing at UC Berkeley. The new development is the imminent delivery of the FPGA-based Berkeley Emulation Engine version 3 (BEE3) that will allow researchers to emulate systems with up to 1,000 cores in order to explore approaches to parallel programming. A Microsoft researcher called BEE3 "a Swiss Army knife of computer research tools."
It's a little disingenuous to claim that programmers are "stuck" with a serial programming model. The fact of the matter is that multi-threaded programming is a common paradigm which takes advantage of multiple cores just fine. Additionally, many algorithms cannot be parallelized.
Even languages like Erlang which bring parallelization right to the front of the language are still stuck running serial operations serially. There is sometimes no way around doing something sequentially.
Now, can we blow a few cycles on a few cores trying to predict which operations will get executed next? Yeah, sure, but that's not a programming problem, it's a hardware design problem.
1) Quantum computing != parallel computing.
2) A significant number of applications can and do run on 1000+ cores. Sure, most are scientific apps rather than consumer apps, but there is a market for it nevertheless. Go tell a high performance computing guy that there's no need for 1k cores on a single chip and watch him collapse laughing at you.
The laws of probability forbid it!
Interestingly enough, Dave Patterson http://www.eecs.berkeley.edu/Faculty/Homepages/patterson.html, once president of ACM http://membernet.acm.org/public/membernet/storypage_2.cfm?ci=June_2006&announcement=1&CFID=1668767&CFTOKEN=37941036 was once on a project to do that http://iram.cs.berkeley.edu/. Now he's working on ParLab http://parlab.eecs.berkeley.edu/. I don't always agree with him (and vice versa) but he's nobody's fool.
Faith, young grasshopper...
If you want a more technical reason DRAM and CPU's don't go together, spend an informative hour looking up the IC fab process for CMOS logic (CPUs) and DRAM. They're VERY VERY different. DRAM needs capacitory density to get the price-per-bit down so they use their own custom fabs optimized for that. This makes it really hard to fit lots of logic and DRAM on to one chip.
I'm about to start a graduate degree in this area so I'm a little biased. However, I think a lot of problems can be solved in parallel. For example, maybe, LZW compression as it's implemented in the "zip" format might not be easily parallelizable but that doesn't prevent us from developing a compression algorithm with parallelism in mind. I did some undergraduate research in parallel search algorithms and I know for a fact that there are many, many ways you can parallelize search. Frankly, saying that you can't parallelize algorithms is a bit closed minded. Many problems don't inherently require serial solutions, it's just current algorithms handle them that way. Rather than trying to implement existing algorithms on a massively parallel processor, you want to re-tackle the problem under a new model, a model of an arbitrary number of processors. You build around the idea of data-parallelism rather than task-parallelism. Many, many things are possible under this model and I think it's naive to think otherwise. You don't need to think, how do I juggle 1000 threads around, you think, how do I take a problem, break it up into arbitrarily many chunks and distribute those chunks to an arbitrary number of processors and how do I do all that scheduling efficiently? This model doesn't work for interactive tasks mind you (where you're waiting for user input), but I'm very confident a model can be developed that can.
Instead, you describe, using the toolset, the problem in a way which is decomposable, and the tools spread the work over the 1000+ cores.
One day soon, the computer industry will realize that, 150 years after Charles Babbage invented his idea of a general purpose sequential computer, it is time to move on and change to a new computing model. The industry will be dragged kicking and screaming into the 21st century. Threads were not originally intended to be the basis of a parallel software model but only a mechanism for running multiple sequential (not parallel) programs concurrently. The multithreading approach to parallel computing embraced by Intel, AMD and Microsoft is a disaster in the making because the future of computing is not multithreaded. See Nightmare on Core Street for more.
I don't think anyone except you said anything about threads. You may have just described exactly what the GP was describing -- point is, why should you have to break them down into individual programs yourself?
This is precisely what is wrong with the current approach. The idea that the problem should be addressed from the top down has been tried for decades and has failed miserably. The idea that we should continue to write implicitly sequential programs and have some tool extract the parallelism by dividing the program into concurrent threads is completely ass-backwards, IMO. We should start with parallel elements and build implicitly parallel modules from those elements. Nothing needs to be broken down. Hardware engineers have been doing it for years and it works.
Personally, I like Erlang, but the point is the same -- come up with a toolset and/or programming paradigm which makes scaling to thousands of cores easy and natural.
Erlang is not the answer for many reasons, otherwise the entire computer world (especially the high performance parallel research community) would have jumped on it in earnest since it's been around for quite a while. One reason is that it uses a coarse-grain approach to parallelism; you can't even parallelize a quicksort routine (an ideal candidate for parallel processing) in Erlang. Consider also that Erlang is not deterministic and has no mechanism for automatic load balancing. The same goes for all the other functional programming language approaches to concurrency.
The only problem I have yet to see addressed is how to properly test a threaded app, as it's non-deterministic.
The solution is not to use threads at all. They are not needed for parallelism. The difficulty of programming with threads is the primary reason that the multicore industry is in a panic right now. Multithreaded programs are a nightmare to maintain, especially if you did not write the original code. Ask the folks at Intel, Microsoft and AMD. And it's not for the lack of trying. Billions of dollars have been spent on making multithreaded parallel programming easy in the last two decades. They're still at it. What's wrong with this picture?