Is Parallel Programming Just Too Hard?

our brains aren't wired to think in parallel by rritterson · 2007-05-28 16:28 · Score: 5, Insightful

I can't speak for the rest of the world, or even the programming community. That disclaimer spoken, however, I can say that parallel programming is indeed hard. The trivial examples, like simply running many processes in parallel that are doing the same thing (as in, for example, Monte Carlo sampling) are easy, but the more difficult examples of parallelized mathematical algorithms I've seen, such as those in linear algebra are difficult to conceptualize, let alone program. Trying to manage multiple threads and process communication in an efficient way when actually implementing it adds an additional level of complexity.

I think the biggest reason why it is difficult is that people tend to process information in a linear fashion. I break large projects into a series of chronologically ordered steps and complete one at a time. Sometimes if I am working on multiple projects, I will multitask and do them in parallel, but that is really an example of trivial parallelization.

Ironically, the best parallel programmers may be those good managers, who have to break exceptionally large projects into parallel units for their employees to simultaneously complete. Unfortunately, trying to explain any sort of technical algorithm to my managers usually exacts a look of panic and confusion.

--
-Ryan
AUWYHSTOT (Acronyms are Useless When You Have to Spell Them Out Too)

Re:our brains aren't wired to think in parallel by Anonymous Coward · 2007-05-28 16:36 · Score: 5, Informative

I do a lot of multithreaded programming, this is my bread and butter really. It is not easy - it takes a specific mindset, though I would disagree that it has much to do with management. I am not a manager, never was one and never will be. It requires discipline and careful planning.

That said, parallel processing is hardly a holy grail. On one hand, everything is parallel processing (you are reading this message in parallel with others, aren't you?). On the other, when we are talking about a single computer running a specific program, parallel usually means "serialized but switched really fast". At most there is a handful of processing units. That means, that whatever it is you are splitting among these units has to give itself well to splitting this number of ways. Do more - and you are overloading one of them, do less - and you are underutilizing resources. In the end, it would be easier often to do processing serially. Potential performance advantage is not always very high (I know, I get paid to squeeze this last bit), and is usually more than offset by difficulty in maintenance.
Re:our brains aren't wired to think in parallel by jd · 2007-05-28 17:00 · Score: 5, Insightful

Parallel programming is indeed hard. The standard approach these days is to decompose the parallel problem into a definite set of serial problems. The serial problems (applets) can then be coded more-or-less as normal, usually using message-passing rather than direct calls to communicate with other applets. Making sure that everything is scheduled correctly does indeed take managerial skills. The same techniques used to schedule projects (critical path analysis) can be used to minimize time overheads. The same techniques used to minimize the use of physical resources (SIMPLEX aka Operational Research) works just as well on software for physical resources there.
The first problem is that people who make good managers make lousy coders. The second problem is that people who make good coders make lousy managers. The third problem is that plenty of upper management types (unfortunately, I regard the names I could mention as genuinely dangerous and unpredictable) simply have no understanding of programming in general, never mind the intricacies of parallelism.
However, resource management and coding is not enough. These sorts of problems are typically either CPU-bound and not heavy on the networks, or light on the CPU but are network-killers. (Look at any HPC paper on cascading network errors for an example.) Typically, you get hardware which isn't ideally suited to either extreme, so the problem must be transformed into one that is functionally identical but within the limits of what equipment there is. (There is no such thing as a generic parallel app for a generic parallel architecture. There are headaches and there are high-velocity exploding neurons, but that's the range of choices.)

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:our brains aren't wired to think in parallel by poopdeville · 2007-05-28 17:35 · Score: 5, Insightful

That's debatable. You don't look at text one letter at a time to try to decipher the meaning, do you? You probably look at several letters, or even words, at a time. Understanding how the letters or words relate (spatially, syntactically, semantically) is a parallel process. Unless you're a very slow reader, your eyes have probably moved on from the words you're interpreting before you've understood their meaning. This is normal. This is how you establish a context for a particular word or phrase -- by looking at the surrounding words. Another parallel process.

Every process is serial from a broad enough perspective. Eight hypothetical modems can send 8 bits per second. Or are they actually sending a single byte?

--
After all, I am strangely colored.
Re:our brains aren't wired to think in parallel by Anonymous Coward · 2007-05-28 18:04 · Score: 5, Funny

Our brains are massively parallel, but we do not consciously attend to more than a couple of things at a time.
Speak for yourself, I forget to breathe whenever I'm doing someth...
Re:our brains aren't wired to think in parallel by EvanED · 2007-05-28 19:40 · Score: 5, Interesting

I can eat, talk and think at the same time, all are pretty conscious actions

True, but can you talk (perhaps reciting something from memory) at the same time you are listening to something? Even if it's not a volume issue; you're wearing headphones say.

Feynman has a chapter in "What do you care what other people think" where he talks about some informal experimentation he did where he tried to figure out what he could do at the same time as accurately timing out a minute. Essentially, the same time as counting. He found that (1) he could be very consistent about timing out a given time, and that (2) he could do most things while counting. But what he couldn't do is talk. On discussion with other people in his dorm/frat/house/whatever, there was another person who could talk, but couldn't read while timing things out. Turns out that the reason it differed was because they counted differently; Feynman was hearing "one, two, three, ..." while this other guy was watching the numbers pass in front of his eyes.

Activities are localized in the brain; it seems that these areas are largely independent, but try two tasks that use the same area and you're SOL.

It's not trivial, and often not necessary by Opportunist · 2007-05-28 16:34 · Score: 5, Interesting

Aside from my usual lament that people already call themselves programmers when they can fire up Visual Studio, parallelizing your tasks opens quite a few cans of worms. Many things can't be done simultanously, many side effects can occur if you don't take care and generally, programmers don't really enjoy multithreaded applications, for exactly those reasons.

And often enough, it's far from necessary. Unless you're actually dealing with an application that does a lot of "work", calculate or display, preferable simultanously (games would be one of the few applications that come to my mind), most of the time, your application is waiting. Either for input from the user or for data from a slow source, like a network or even the internet. The average text processor or database client is usually not in the situation that it needs more than the processing power of one core. Modern machines are by magnitudes faster than anything you usually need.

Generally, we'll have to deal with this issue sooner or later, especially if our systems become more and more overburdened with "features" while the advance of processing speed will not keep up with it. I don't see the overwhelming need for parallel processing within a single application for most programs, though.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.

Yes and No by synx · 2007-05-28 16:47 · Score: 5, Interesting

The problem with parallel programming is we don't have the right set of primitives. Right now the primitives are threads, mutexes, semaphores, shared memory and queues. This is the machine language of concurrency - it's too primitive to effective write lots of code by anyone who isn't a genius.

What we need is more advanced primitives. Here are my 2 or 3 top likely suspects:

- Concurrent Sequential Programs - CSP. This is the programming model behind Erlang - one of the most successful concurrent programming languages available. Writing large, concurrent, robust apps is as simple as 'hello world' in Erlang. There is a whole new way of thinking that is pretty much mind bending. However, it is that new methodology that is key to the concurrency and robustness of the end applications. Be warned, it's functional!
- Highly optimizing functional languages (HOFL) - These are in the proto-phase, and there isn't much available, but I think this will be the key to extremely high performance parallel apps. Erlang is nice, but not high performance computing, but HOFLs won't be as safe as Erlang. You get one or the other. The basic concept is most computation in high performance systems is bound up in various loops. A loop is a 'noop' from a semantic point of view. To get efficient highly parallel systems Cray uses loop annotations and special compilers to get more information about loops. In a functional language (such as Haskel) you would use map/fold functions or list comprehensions. Both of which convey more semantic meaning to the compiler. The compiler can auto-parallelize a functional-map where each individual map-computation is not dependent on any other.
- Map-reduce - the paper is elegant and really cool. It seems like this is a half way model between C++ and HOFLs that might tide people over.

In the end, the problem is the abstractions. People will consider threads and mutexes as dangerous and unnecessary as we consider manual memory allocation today.

Re:Nope. by lmpeters · 2007-05-28 16:49 · Score: 5, Interesting

It is not difficult to justify parallel programming. Ten years ago, it was difficult to justify because most computers had a single processor. Today, dual-core systems are increasingly common, and 8-core PC's are not unheard of. And software developers are already complaining because it's "too hard" to write parallel programs.

Since Intel is already developing processors with around 80 cores, I think that multi-core (i.e. multi-processor) processors are only going to become more common. If software developers intend to write software that can take advantage of current and future processors, they're going to have to deal with parallel programming.

I think that what's most likely to happen is we'll see the emergence of a new programming model, which allows us to specify an algorithm in a form resembling a Hasse diagram, where each point represent a step and each edge represents a dependency, so that a compiler can recognize what can and cannot be done in parallel and set up multiple threads of execution (or some similar construct) according to that.

Yes, because programmers are too conservative by Coryoth · 2007-05-28 16:51 · Score: 5, Insightful

Parallel programming doesn't have to be quite as painful as it currently is. The catch is that you have to face the fact that you can't go on thinking with a sequential paradigm and have some tool, library, or methodology magically make everything work. And now, I'm not talking about functional programming. Functional programming is great, and has a lot going for it, but solving concurrent programming issues is not one of those things. Functional programming deals with concurrency issues by simply avoiding them. For problems that have no state and can be coded purely functionally this is fine, but for a large number of problems you end up either tainting the purity of your functions, or wrapping things up in monads which end up having the same concurrency issues all over again. It does have the benefit that you can isolate the state, and code that doesn't need it is fine, but it doesn't solve the issue of concurrent programming.

No, the different sorts of paradigms I'm talking about no shared state, message passing concurrency models ala CSP and pi Calculus and the Actor Model. That sort of approach in terms of how to think about the problem shows up in languages like Erlang, and Oz which handle concurrency well. The aim here is to make message passing and threads lightweight and integrated right into the language. You think in terms actors passing data, and the language supports you in thinking this way. Personally I'm rather fond of SCOOP for Eiffel which elegantly integrates this idea into OO paradigms (an object making a method call is, ostensibly, passing a message after all). That's still research work though (only available as a preprocessor and library, with promises of eventually integrating it into the compiler). At least it makes thinking about concurrency easier, while still staying somewhat close more traditional paradigms (it's well worth having a look at if you've never heard of it).

The reality, however, is that these new languages which provide the newer and better paradigms for thinking and reasoning about concurrent code, just aren't going to get developer uptake. Programmers are too conservative and too wedded to their C, C++, and Java to step off and think as differently as the solution really requires. No, what I expect we'll get is kluginess retrofitted on to existing languages in a slipshod way that sort of work, in as much as it is an improvement over previous concurrent programming in that language, but doesn't really make the leap required to make the problem truly significantly easier.

--
Craft Beer Programming T-shirts

Amdahl's law by apsmith · 2007-05-28 16:51 · Score: 5, Insightful

I've worked with parallel software for years - there are lots of ways to do it, lots of good programming tools around even a couple of decades back (my stuff ranged from custom message passing in C to using "Connection-Machine Fortran"; now it's java threads) but the fundamental problem was stated long ago by Gene Amdahl - if half the things you need to do are simply not parallelizable, then it doesn't matter how much you parallelize everything else, you'll never go more than twice as fast as using a single thread.

Now there's been lots of work on eliminating those single-threaded bits in our algorithms, but every new software problem needs to be analyzed anew. It's just another example of the no-silver-bullet problem of software engineering...

--

Energy: time to change the picture.

Re:Nope. by poopdeville · 2007-05-28 17:11 · Score: 5, Interesting

I think that what's most likely to happen is we'll see the emergence of a new programming model, which allows us to specify an algorithm in a form resembling a Hasse diagram, where each point represent a step and each edge represents a dependency, so that a compiler can recognize what can and cannot be done in parallel and set up multiple threads of execution (or some similar construct) according to that.

This is more-or-less how functional programming works. You write your program using an XML-like tree syntax. The compiler utilizes the tree to figure out dependencies. See http://mitpress.mit.edu/sicp/full-text/book/book-Z -H-10.html#%25_sec_1.1.5. More parallelism can be drawn out if the interpreter "compiles" as yet unused functions while evaluating others. See the following section.

--
After all, I am strangely colored.

Re:Two words: map-reduce by allenw · 2007-05-28 17:15 · Score: 5, Informative

Implementing MapReduce is much easier these days: just install and contribute to the Hadoop project. This is an open source, Java-based MapReduce implementation, including a distrbuted filesystem called HDFS.

Even though it is implemented in Java, you can use just about anything with it, using the Hadoop streaming functionality.

Re:Nope. by Lost+Engineer · 2007-05-28 17:34 · Score: 5, Interesting

It is still difficult to justify if you can more easily write more efficient single-threaded apps. What consumer-level apps out there really need more processing power than a single core of a modern CPU can provide? I already understand the enterprise need. In fact, multi-threaded solutions for enterprise and scientific apps are already prevalent, that market having had SMP for a long time.

Re:Nope. by EvanED · 2007-05-28 19:30 · Score: 5, Insightful

Yes, but that's an easy sort of parallelism. Heck, I wrote a fractal generator that did the generation in a separate thread in 11th grade after writing my first Win32 program 4 or 5 months previous. It was also my first experience with threads. I'm not even sure I really knew what they were before that. This isn't *really* paralleling the application in the sense TFA means.

Closer is this: After some more work and a rewrite (for other reasons), I had "Fracked" running n threads, each rendering 1/n of the display. Data parallelism == easy parallelism.

But a lot of problems don't fit these models, and need a LOT of thought put into how to parallelize them. It's likely that some problems in P are not efficiently parallelizable.

That's irrelevant. by Estanislao+Mart�nez · 2007-05-28 19:54 · Score: 5, Insightful

Our cognitive system does many things at the same time, yes. That doesn't answer the question that's being posed here: whether explicit, conscious reasoning about parallel processing is hard for people.

--

Are you adequate?

Actually it's just pipelined by Moraelin · 2007-05-28 20:54 · Score: 5, Insightful

Actually, it's more like pipelined. The fact that your eyes already moved to the next letter, just says that the old one is still going through the pipeline. Yeah, there'll be some bayesian prediction and pre-fetching involved, but it's nowhere near consciously doing things in parallel.

Try reading two different texts side by side, at the same time, and it won't work that neatly parallel any more.

Heck, there were some recent articles about why most Powerpoint presentations are a disaster: in a nutshell, because your brain isn't that parallel, or doesn't have the bandwidth for it. If you try to read _and_ hear someone saying something (slightly) different at the same time, you just get overloaded and do neither well. The result is those time-wasting meetings where everyone goes fuzzy-brained and forgets everything as soon as the presentation flipped to the next chart.

To get back to the pipeline idea, the brain seems to be quite the pipelined design. Starting from say, the eyes, you just don't have the bandwidth to consciously process the raw stream of pixels. There are several stages of buffering, filtering out the irrelevant bits (e.g., if you focus on the blonde in the car, you won't even notice the pink gorilla jumping up and down in the background), "tokenizing" it, matching and cross-referencing it, etc, and your conscious levels work on the pre-processed executive summary.

We already know, for example, that the shortest term buffer can store about 8 seconds worth of raw data in transit. And that after about 8 seconds it will discard that data, whether it's been used or not. (Try closing your eyes while moving around a room, and for about 8 seconds you're still good. After that, you no longer know where you are and what the room looks like.)

There's a lot of stuff done in parallel at each stage, yes, but the overall process is really just a serial pipeline.

At any rate, yeah, your eyes may already be up to 8 seconds ahead of what your brain currently processes. It doesn't mean you're that much of a lean, mean, parallel-processing machine, it just means that some data is buffered in transit.

Even time-slicing won't really work that well, because of that (potential) latency and the finite buffers. If you want to suddenly focus on another bit of the picture, or switch context to think of something else, you'll basically lose some data in the process. Your pipeline still has the old data in it, and it's going right to the bit bucket. That or both streams get thrashed because there's simply not enough processing power and bandwidth for both to go through the pipeline at the same time.

Again, you only need to look at the fuzzy-brain effect of bad Powerpoint presentations to see just that in practice. Forced to try to process two streams at the same time (speech and text), people just make a hash of both.

--
A polar bear is a cartesian bear after a coordinate transform.

Lack of skill in the field by node159 · 2007-05-28 22:02 · Score: 5, Informative

I've seen it over and over in the industry, there is a distinct lack of parallel programming skill and knowledge, and even senior developers with years under their belt struggle with the fundamentals.

Parallel programming does add a layer of complexity and its inherent lack of general solutions does make abstracting its complexity away difficult, but I suspect that the biggest issue is the lack of training of the work force. It isn't something you can pick up easily without a steep learning curve with many hard lessons in it, definitely not something that can be incorporated as a new thing to be learnt on the fly with deadlines looming.
Another aspect is that its fundamental to the design, parallelism can and often will dictate the design and if the software architects are not actively designing for it or are not experienced enough to ensure that it remains a viable future option, future attempts to parallelise can be difficult at best.

Ultimately the key issues are:
* Lack of skill and training in the work force (including the understanding that there are no general solutions)
* Lack of mature development tool to easy the development process (debugging tools especially)

--
GPLv2: I want my rights, I want my phone call! DRM: What use is a phone call, if you are unable to speak?

Re:I blame the tools by DaChesserCat · 2007-05-29 03:12 · Score: 5, Interesting

I was using a potential answer to this in 1990. I was working for a small company in the Provo/Orem area, called Computer System Architects, which was selling Transputer hardware. For those who haven't heard of Transputers, they were small, 16- or 32-bit processors, with a small amount of built-in RAM (not a cache; this was actually in the memory map and you could do small tasks on a Transputer without any external RAM), 2-4 high-speed serial channels (easily implemented with 4 wires) and a stack-based architecture. Adding megabytes of external RAM was easy, and it was embarrassingly easy to connect up networks of these things, even on one board (in a single ISA slot), and build cluster. An external card cage, in those days, could hold 20 slots, which would hold up to 80 Transputers, using our products.

I did some Assembly and some C, but the kicker language for this chip was called Occam II. Among other things, it used the indentation in the code to determine block structure. A quick example:
PAR step A step B step C SEQ step D step E
In this example, steps A, B and C would all be executed in parallel with another task which ran step D then step E. If you had one Transputer in your machine, it would multi-task. If you had multiple CPU's available, it would spread the task across the CPU's.

It also has a basic construct called a Channel. These were very easy to set up and use. These were how the different tasks communicated with each other.

It was not difficult to spawn thousands of tasks, each one doing a relatively small part of an overall task, with full communication and synchronization. Again, if you had multiple CPU's available, it would spread the tasks across them. A board with multiple Transputers was usually doing ray-tracing or rendering Mandelbrot fractals as a demo anytime we went to a trade or tech show. They could knock it down to one processor, and things got done relatively quickly. Then, they'd kick in 4 or 16 CPU's and blow people's minds.

This was in 1990. A 386DX-33 was high-end, back then. The Transputer didn't run DOS or Windows, so it didn't survive in the market of the time. That was a shame; I benchmarked a variety of them, then ran identical benchmarks on various other machines as technology marched on. A T805 running 30 MHz (the top end Transputer I ever got to play with) blasted through mixed integer/floating-point calculations about as fast a 486DX2-66 (which didn't come on the market for another couple years). There was an occasion where I had 16 of those T805's sitting my machine. You'd need a Pentium II to be able to match that occasion. It was well over a decade later that the P-II became available.

Cool tech, but the programming tools were what allowed you to really use the parallelization. It was typical to achieve over 95% linear speedup (i.e. 20 CPU's gave real-world 19x performance); sometimes we went over 99%. Most Intel SMP machines are lucky if they give 80% linear speedup (4 CPU's = 3.2x total performance).

--
... by the Dew of Mountains the thoughts acquire speed, the hands acquire shakes, the shakes become a warning

Slashdot Mirror

Is Parallel Programming Just Too Hard?

19 of 680 comments (clear)