An Overview of Parallelism

← Back to Stories (view on slashdot.org)

Posted by kdawson on Saturday February 10, 2007 @11:55AM from the cores-galore dept.

Mortimer.CA writes with a recently released report from Berkeley entitled "The Landscape of Parallel Computing Research: A View from Berkeley: "Generally they conclude that the 'evolutionary approach to parallel hardware and software may work from 2- or 8-processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism.' This assumes things stay 'evolutionary' and that programming stays more or less how it has done in previous years (though languages like Erlang can probably help to change this)." Read on for Mortimer.CA's summary from the paper of some "conventional wisdoms" and their replacements.
Old and new conventional wisdoms:

Old CW: Power is free, but transistors are expensive.
New CW is the "Power wall": Power is expensive, but transistors are "free." That is, we can put more transistors on a chip than we have the power to turn on.
Old CW: Monolithic uniprocessors in silicon are reliable internally, with errors occurring only at the pins.
New CW: As chips drop below 65-nm feature sizes, they will have high soft and hard error rates.
Old CW: Multiply is slow, but load and store is fast.
New CW is the "Memory wall" [Wulf and McKee 1995]: Load and store is slow, but multiply is fast.
Old CW: Don't bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
New CW: It will be a very long wait for a faster sequential computer (see above).

8 of 197 comments (clear)

Min score:

Reason:

Sort:

would that be by macadamia_harold · 2007-02-10 11:57 · Score: 5, Funny

Mortimer.CA writes with a recently released report from Berkley entitled "The Landscape of Parallel Computing Research: A View from Berkeley

Would that be a Parallelograph?

--
Push Button, Receive Bacon
Erlang by Anonymous Coward · 2007-02-10 12:09 · Score: 5, Insightful

Erlang only provides a way of proving parallel correctness, a la CSP. This means avoiding deadlocks and such. The primary difficulty of crafting algorithms to run efficently over multiple CPUs still remains. Erlang does not do any automatic parallelization, and expects the programmer to write the code with multiple CPUs in mind.

I'm wating for a language which would parallelize stuff for you. This is most likely to be a functinal language, or an extension to an existing functional language. Maybe even Erlang.
1. Re:Erlang by cpm80 · 2007-02-10 16:19 · Score: 5, Informative
  
  I think using a *specific* language for automatic parallelization is the wrong way. Some GNU folks are working on language independent autoparallelization for GCC 4.3. Their implementation is an extension to the OpenMP implementation in GCC. Read OpenMP and automatic parallelization in GCC, D. Novillo, GCC Developers' Summit, Ottawa, Canada, June 2006 http://people.redhat.com/dnovillo/Papers/ for details.
Hmmm... by ardor · 2007-02-10 12:13 · Score: 5, Interesting

"but is likely to face diminishing returns as 16 and 32 processor systems are realized"

Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?

--
This sig does not contain any SCO code.
From TFA by obender · 2007-02-10 12:17 · Score: 5, Funny

The target should be 1000s of cores per chip
640 cores should be enough for anyone.
It's a supercomputer perspective by Animats · 2007-02-10 12:59 · Score: 5, Insightful

I just heard that talk; he gave it at EE380 at Stanford a few weeks ago.
First, this is a supercomputer guy talking. He's talking about number-crunching. His "13 dwarfs" are mostly number-crunching inner loops. Second, what he's really pushing is getting everybody in academia to do research his way - on FPGA-based rackmount emulators.
Basic truth about supercomputers - the commercial market is zilch. You have to go down to #60 on the list of the top 500 supercomputer before you find the first real commercial customer. It's BMW, and the system is a cluster of 1024 Intel x86 1U servers, running Red Hat Linux. Nothing exotic; just a big server farm set up for computation.
More CPUs will help in server farms, but there we're I/O bound to the outside world, not talking much to neighboring CPUs. If you have hundreds of CPUs on a chip, how do you get data in and out? But we know the answer to that - put 100Gb/s Ethernet controllers on the chip. No major software changes needed.
This brings up one of the other major architectural truths: shared memory multiprocessors are useful, and clusters are useful. Everything in between is a huge pain. Supercomputer guys fuss endlessly over elaborate interconnection schemes, but none of them are worth the trouble. The author of this paper thinks that all the programming headaches of supercomputers will have to be brought down to desktop level, but that's probably not going to happen. What problem would it solve?
What we do get from the latest rounds of shrinkage are better mobile devices. The big wins commercially are in phones, not desktops or laptops. Desktops have been mostly empty space inside for years now. In fact, that's true of most non-mobile consumer electronics. We're getting lower cost and smaller size, rather than more power.
Consider cars. For the first half of the 20th century, the big thing was making engines more powerful. By the 1960s, engine power was a solved problem, (the 1967 turbine-powered Indy car finally settled that issue) and cars really haven't become significantly more powerful since then. (Brakes and suspensions, though, are far better.)
It will be very interesting to see what happens with the Cell. That's the first non-shared memory multiprocessor to be produced in volume. If it turns out to be a dead end, like the Itanium, it may kill off interest in that sort of thing for years.
There are some interesting potential applications for massive parallelism for vision and robotics applications. I expect to see interesting work in that direction. The more successful vision algorithms do much computation, most of which is discarded. That's a proper application for many-CPU machines, though not the Cell, unless it gets more memory per CPU. Tomorrow's robots may have a thousand CPUs. Tomorrow's laptops, probably not.
1. Re:It's a supercomputer perspective by RecessionCone · 2007-02-10 14:05 · Score: 5, Informative
  
  I don't think you were listening very carefully to the talk (or know much about Computer Architecture) if you think Dave Patterson is a supercomputer guy. Perhaps you've heard of the Hennessy & Patterson Quantitative Approach to Computer Architecture book (you know, the one used at basically every university to teach about computer architecture). Patterson has been involved in a lot of different things within computer architecture over the years, including being one of the main people behind RISC and RAID (as well as being the president of the ACM). I saw his talk when it was given at Berkeley, and you really missed the point if you thought it was about supercomputing. The talk was about the future of computing in general, which is increasingly parallel, in case you're unaware of that fact. GPUs are already at 128 cores, Network processors are up to 200 cores. Intel is going to present an 80 core x86 test chip tomorrow at ISSCC. Physics won't support faster single core processors at the rate we're accustomed to, so the whole industry is going parallel, which is a sea change in the industry. Patterson's talk is aimed at the research community, since we don't have good answers as to how these very parallel systems should be architected and programmed. FPGA emulation is a great way to play around with massive multiprocessor configurations and programming strategies, which is why Patterson is advocating it (his RAMP project has researchers from MIT, Berkeley, Stanford, Texas, Washington involved (among others)). You also need to have a little more imagination about what we could do with more computing power. Try looking at Intel's presentations on RMS http://www.intel.com/technology/itj/2005/volume09i ssue02/foreword.htm.
Shared-state concurrency is harder than you think. by zestyping · 2007-02-10 13:59 · Score: 5, Insightful
Reliably achieving even simple goals using concurrent threads that share state is extremely difficult. For example, try this task:

Implement the Observer (aka Listener) pattern (specifically the thing called "Subject" on the Wikipedia page). Your object should provide two methods, publish and subscribe. Clients can call subscribe to indicate their interest in being notified. When a client calls publish with a value, your object should pass on that value by calling the notify method on everyone who has previously subscribed for updates.

Sounds simple, right? But wait:
- What if one of your subscribers throws an exception? That should not prevent other subscribers from being notified.
- What if notifying a subscriber triggers another value to be published? All the subscribers must be kept up to date on the latest published value.
- What if notifying a subscriber triggers another subscription? Whether or not the newly added subscriber receives this in-progress notification is up to you, but it must be well defined and predictable.
- Oh, and by the way, don't deadlock.
Can you achieve all these things in a multithreaded programming model (e.g. Java)? Try it. Don't feel bad if you can't; it's fiendishly complicated to get right, and i doubt i could do it.

Or, download this paper and start reading from section 3, "The Sequential StatusHolder."

Once you see how hard it is to do something this simple, now think about the complexity of what people regularly try to achieve in multithreaded systems, and that pretty much explains why computer programs freeze up so often.