Is Parallel Programming Just Too Hard?

← Back to Stories (view on slashdot.org)

Is Parallel Programming Just Too Hard?

Posted by kdawson on Monday May 28, 2007 @04:24PM from the Moore's-Law-for-software dept.

pcause writes "There has been a lot of talk recently about the need for programmers to shift paradigms and begin building more parallel applications and systems. The need to do this and the hardware and systems to support it have been around for a while, but we haven't seen a lot of progress. The article says that gaming systems have made progress, but MMOGs are typically years late and I'll bet part of the problem is trying to be more parallel/distributed. Since this discussion has been going on for over three decades with little progress in terms of widespread change, one has to ask: is parallel programming just too difficult for most programmers? Are the tools inadequate or perhaps is it that it is very difficult to think about parallel systems? Maybe it is a fundamental human limit. Will we really see progress in the next 10 years that matches the progress of the silicon?"

10 of 680 comments (clear)

Min score:

Reason:

Sort:

It's not trivial, and often not necessary by Opportunist · 2007-05-28 16:34 · Score: 5, Interesting

Aside from my usual lament that people already call themselves programmers when they can fire up Visual Studio, parallelizing your tasks opens quite a few cans of worms. Many things can't be done simultanously, many side effects can occur if you don't take care and generally, programmers don't really enjoy multithreaded applications, for exactly those reasons.

And often enough, it's far from necessary. Unless you're actually dealing with an application that does a lot of "work", calculate or display, preferable simultanously (games would be one of the few applications that come to my mind), most of the time, your application is waiting. Either for input from the user or for data from a slow source, like a network or even the internet. The average text processor or database client is usually not in the situation that it needs more than the processing power of one core. Modern machines are by magnitudes faster than anything you usually need.

Generally, we'll have to deal with this issue sooner or later, especially if our systems become more and more overburdened with "features" while the advance of processing speed will not keep up with it. I don't see the overwhelming need for parallel processing within a single application for most programs, though.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Are Serial Programmers Just Too Dumb? by ArmorFiend · 2007-05-28 16:40 · Score: 4, Interesting

For this generation of "average" programmers, yes its too hard. Its the programming language, stupid. The average programming language has come a remarkably short distance in the last 30 years. Java and Fortran really aren't very different, and neither is well suited to paralellizing programs.

Why isn't there a mass stampede to Erlang or Haskell, languages that address this problem in a serious way? My conclusion is that most programmers are just too dumb to do major mind-bending once they've burned their first couple languages into their ROMs.

Wait for the next generation, or make yourself above average.
Yes and No by synx · 2007-05-28 16:47 · Score: 5, Interesting

The problem with parallel programming is we don't have the right set of primitives. Right now the primitives are threads, mutexes, semaphores, shared memory and queues. This is the machine language of concurrency - it's too primitive to effective write lots of code by anyone who isn't a genius.

What we need is more advanced primitives. Here are my 2 or 3 top likely suspects:

- Concurrent Sequential Programs - CSP. This is the programming model behind Erlang - one of the most successful concurrent programming languages available. Writing large, concurrent, robust apps is as simple as 'hello world' in Erlang. There is a whole new way of thinking that is pretty much mind bending. However, it is that new methodology that is key to the concurrency and robustness of the end applications. Be warned, it's functional!
- Highly optimizing functional languages (HOFL) - These are in the proto-phase, and there isn't much available, but I think this will be the key to extremely high performance parallel apps. Erlang is nice, but not high performance computing, but HOFLs won't be as safe as Erlang. You get one or the other. The basic concept is most computation in high performance systems is bound up in various loops. A loop is a 'noop' from a semantic point of view. To get efficient highly parallel systems Cray uses loop annotations and special compilers to get more information about loops. In a functional language (such as Haskel) you would use map/fold functions or list comprehensions. Both of which convey more semantic meaning to the compiler. The compiler can auto-parallelize a functional-map where each individual map-computation is not dependent on any other.
- Map-reduce - the paper is elegant and really cool. It seems like this is a half way model between C++ and HOFLs that might tide people over.

In the end, the problem is the abstractions. People will consider threads and mutexes as dangerous and unnecessary as we consider manual memory allocation today.
Re:Nope. by lmpeters · 2007-05-28 16:49 · Score: 5, Interesting

It is not difficult to justify parallel programming. Ten years ago, it was difficult to justify because most computers had a single processor. Today, dual-core systems are increasingly common, and 8-core PC's are not unheard of. And software developers are already complaining because it's "too hard" to write parallel programs.

Since Intel is already developing processors with around 80 cores, I think that multi-core (i.e. multi-processor) processors are only going to become more common. If software developers intend to write software that can take advantage of current and future processors, they're going to have to deal with parallel programming.

I think that what's most likely to happen is we'll see the emergence of a new programming model, which allows us to specify an algorithm in a form resembling a Hasse diagram, where each point represent a step and each edge represents a dependency, so that a compiler can recognize what can and cannot be done in parallel and set up multiple threads of execution (or some similar construct) according to that.
Re:Nope. by poopdeville · 2007-05-28 17:11 · Score: 5, Interesting

I think that what's most likely to happen is we'll see the emergence of a new programming model, which allows us to specify an algorithm in a form resembling a Hasse diagram, where each point represent a step and each edge represents a dependency, so that a compiler can recognize what can and cannot be done in parallel and set up multiple threads of execution (or some similar construct) according to that.

This is more-or-less how functional programming works. You write your program using an XML-like tree syntax. The compiler utilizes the tree to figure out dependencies. See http://mitpress.mit.edu/sicp/full-text/book/book-Z -H-10.html#%25_sec_1.1.5. More parallelism can be drawn out if the interpreter "compiles" as yet unused functions while evaluating others. See the following section.

--
After all, I am strangely colored.
Re:Nope. by Lost+Engineer · 2007-05-28 17:34 · Score: 5, Interesting

It is still difficult to justify if you can more easily write more efficient single-threaded apps. What consumer-level apps out there really need more processing power than a single core of a modern CPU can provide? I already understand the enterprise need. In fact, multi-threaded solutions for enterprise and scientific apps are already prevalent, that market having had SMP for a long time.
We don't think in recursion either by TheMCP · 2007-05-28 18:20 · Score: 4, Interesting

Most programmers have difficulty thinking about recursive processes as well, but there are still some who don't and we still have use for them. I should say "us", as I make many other programmers batty by using recursion frequently. Programmers tell me all the time that they find recursion difficult - difficult to write, difficult to trace, difficult to understand, difficult to debug. Conversely, I find it easier - all I have to do is reduce the problem to its simplest form and determine the end case, and a tiny snip of code will do where a huge mess of iterative code would otherwise have been required. So, I don't understand why anyone would want to write iterative code when recursion can solve the problem.

I suspect that parallel programming may be similar - some programmers will "get it", others won't. Those who "get it" will find it fun and easy and be unable to understand why everyone else finds it hard.

Also, most developement tools were created with a single processor system in mind: IDEs for parallel programming are a new-ish concept and there are few. As more are developed we'll learn about how the computer can best help the programmer to create code for a parallel system, and the whole process can become more efficient. Or maybe automated entirely; at least in some cases, if the code can be effectively profiled the computer may be able to determine how to parallelize it and the programmer may not have to worry about it. So, I think it's premature to argue about whether parallel programming is hard or not - it's different, but until we have taken the time to further develop the relevant tools, we won't know if it's really hard or not.

And of course, for a lot of tasks it simply won't *matter* - anything with a live user sitting there, for example, only has to be fast enough that the person perceives it as being instantaneous. Any faster than that is essentially useless. So, for anything that has states requiring user input, there is a "fast enough" beyond which we need not bother optimizing unless we're just trying to speed up the system as a whole, and that sort of optimization is usually done at the compiler level. It is only for software requiring unusually large amounts of computation or for systems which have been abstracted to the point of being massively inefficient beneath the surface that the fastest possible computing speed is really required, and those are the sorts of systems to which specialist programmers could be applied.
"Dragged Kicking and Screaming" by netfunk · 2007-05-28 19:03 · Score: 4, Interesting

Tom Leonard, a programmer from Valve, gave a fascinating talk about this at GDC this year, about retrofitting multicore support into Half-Life 2 (specifically, into the Source Engine, which powers Half-Life 2). Not surprisingly, this talk was named "Dragged Kicking and Screaming" ...

There was a lot of really good wisdom in there, whether you are writing a game or something else that needs to get every possible performance boost.

I'm sure they probably drew from 20+ years worth of whitepapers (and some newer ones about "lock-free" mutexes, see chapter 1.1 of "Game Programming Gems 6"), but what I walked away from the talk with was the question: "why the hell didn't _i_ think of that?"

There were several techniques they used that, once you built a framework to support it, made parallelizing tasks dirt simple. A lot of it involves putting specific jobs onto queues and letting worker threads pick them up when they are idle, and being able to assign specific jobs to specific cores to protect your investment in CPU cache.

Most of the rest of the work is building things that don't need a result immediately, and trying to build things that can be processed without having to compete for various pieces of state...sometimes easier said than done, sure. But after hearing his talk, I was of the opinion that while parallelism is always more complex than single-threaded code, doing this well is something most developers aren't even _thinking_ about yet. In most cases, we're not even at the point where we can talk about _languages_ and _tools_, since we aren't even using the ones we have well.

--ryan.

--
Don't say, "don't quote me," because if no one quotes you, you probably haven't said a thing worth saying.
Re:our brains aren't wired to think in parallel by EvanED · 2007-05-28 19:40 · Score: 5, Interesting

I can eat, talk and think at the same time, all are pretty conscious actions

True, but can you talk (perhaps reciting something from memory) at the same time you are listening to something? Even if it's not a volume issue; you're wearing headphones say.

Feynman has a chapter in "What do you care what other people think" where he talks about some informal experimentation he did where he tried to figure out what he could do at the same time as accurately timing out a minute. Essentially, the same time as counting. He found that (1) he could be very consistent about timing out a given time, and that (2) he could do most things while counting. But what he couldn't do is talk. On discussion with other people in his dorm/frat/house/whatever, there was another person who could talk, but couldn't read while timing things out. Turns out that the reason it differed was because they counted differently; Feynman was hearing "one, two, three, ..." while this other guy was watching the numbers pass in front of his eyes.

Activities are localized in the brain; it seems that these areas are largely independent, but try two tasks that use the same area and you're SOL.
Re:I blame the tools by DaChesserCat · 2007-05-29 03:12 · Score: 5, Interesting

I was using a potential answer to this in 1990. I was working for a small company in the Provo/Orem area, called Computer System Architects, which was selling Transputer hardware. For those who haven't heard of Transputers, they were small, 16- or 32-bit processors, with a small amount of built-in RAM (not a cache; this was actually in the memory map and you could do small tasks on a Transputer without any external RAM), 2-4 high-speed serial channels (easily implemented with 4 wires) and a stack-based architecture. Adding megabytes of external RAM was easy, and it was embarrassingly easy to connect up networks of these things, even on one board (in a single ISA slot), and build cluster. An external card cage, in those days, could hold 20 slots, which would hold up to 80 Transputers, using our products.

I did some Assembly and some C, but the kicker language for this chip was called Occam II. Among other things, it used the indentation in the code to determine block structure. A quick example:
PAR step A step B step C SEQ step D step E
In this example, steps A, B and C would all be executed in parallel with another task which ran step D then step E. If you had one Transputer in your machine, it would multi-task. If you had multiple CPU's available, it would spread the task across the CPU's.

It also has a basic construct called a Channel. These were very easy to set up and use. These were how the different tasks communicated with each other.

It was not difficult to spawn thousands of tasks, each one doing a relatively small part of an overall task, with full communication and synchronization. Again, if you had multiple CPU's available, it would spread the tasks across them. A board with multiple Transputers was usually doing ray-tracing or rendering Mandelbrot fractals as a demo anytime we went to a trade or tech show. They could knock it down to one processor, and things got done relatively quickly. Then, they'd kick in 4 or 16 CPU's and blow people's minds.

This was in 1990. A 386DX-33 was high-end, back then. The Transputer didn't run DOS or Windows, so it didn't survive in the market of the time. That was a shame; I benchmarked a variety of them, then ran identical benchmarks on various other machines as technology marched on. A T805 running 30 MHz (the top end Transputer I ever got to play with) blasted through mixed integer/floating-point calculations about as fast a 486DX2-66 (which didn't come on the market for another couple years). There was an occasion where I had 16 of those T805's sitting my machine. You'd need a Pentium II to be able to match that occasion. It was well over a decade later that the P-II became available.

Cool tech, but the programming tools were what allowed you to really use the parallelization. It was typical to achieve over 95% linear speedup (i.e. 20 CPU's gave real-world 19x performance); sometimes we went over 99%. Most Intel SMP machines are lucky if they give 80% linear speedup (4 CPU's = 3.2x total performance).

--
... by the Dew of Mountains the thoughts acquire speed, the hands acquire shakes, the shakes become a warning