Is Parallel Programming Just Too Hard?

← Back to Stories (view on slashdot.org)

Is Parallel Programming Just Too Hard?

Posted by kdawson on Monday May 28, 2007 @04:24PM from the Moore's-Law-for-software dept.

pcause writes "There has been a lot of talk recently about the need for programmers to shift paradigms and begin building more parallel applications and systems. The need to do this and the hardware and systems to support it have been around for a while, but we haven't seen a lot of progress. The article says that gaming systems have made progress, but MMOGs are typically years late and I'll bet part of the problem is trying to be more parallel/distributed. Since this discussion has been going on for over three decades with little progress in terms of widespread change, one has to ask: is parallel programming just too difficult for most programmers? Are the tools inadequate or perhaps is it that it is very difficult to think about parallel systems? Maybe it is a fundamental human limit. Will we really see progress in the next 10 years that matches the progress of the silicon?"

19 of 680 comments (clear)

Min score:

Reason:

Sort:

Two words: map-reduce by Anonymous Coward · 2007-05-28 16:29 · Score: 3, Interesting

Implement it, add CPUs, earn billion$. Just Google it.
It's not trivial, and often not necessary by Opportunist · 2007-05-28 16:34 · Score: 5, Interesting

Aside from my usual lament that people already call themselves programmers when they can fire up Visual Studio, parallelizing your tasks opens quite a few cans of worms. Many things can't be done simultanously, many side effects can occur if you don't take care and generally, programmers don't really enjoy multithreaded applications, for exactly those reasons.

And often enough, it's far from necessary. Unless you're actually dealing with an application that does a lot of "work", calculate or display, preferable simultanously (games would be one of the few applications that come to my mind), most of the time, your application is waiting. Either for input from the user or for data from a slow source, like a network or even the internet. The average text processor or database client is usually not in the situation that it needs more than the processing power of one core. Modern machines are by magnitudes faster than anything you usually need.

Generally, we'll have to deal with this issue sooner or later, especially if our systems become more and more overburdened with "features" while the advance of processing speed will not keep up with it. I don't see the overwhelming need for parallel processing within a single application for most programs, though.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Not justifyable by dj245 · 2007-05-28 16:40 · Score: 3, Interesting

I can see this going down in cubicles all through the gaming industry. The game is mostly coming together, the models have been tuned, textures drawn, code is coming together, and the coder goes to the pointy haired boss.

Coder: We need more time to make this game multithreaded!
PHB: Why? Can it run on one core of a X?
Coder: Well I suppose it can but...
PHB: Shove it out the door then.

If flight simulator X is any indication (a game that should have been easy to parallize) this conversation happens all the time and games are launched taking advantage of only one core.

--
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
Are Serial Programmers Just Too Dumb? by ArmorFiend · 2007-05-28 16:40 · Score: 4, Interesting

For this generation of "average" programmers, yes its too hard. Its the programming language, stupid. The average programming language has come a remarkably short distance in the last 30 years. Java and Fortran really aren't very different, and neither is well suited to paralellizing programs.

Why isn't there a mass stampede to Erlang or Haskell, languages that address this problem in a serious way? My conclusion is that most programmers are just too dumb to do major mind-bending once they've burned their first couple languages into their ROMs.

Wait for the next generation, or make yourself above average.
Yes and No by synx · 2007-05-28 16:47 · Score: 5, Interesting

The problem with parallel programming is we don't have the right set of primitives. Right now the primitives are threads, mutexes, semaphores, shared memory and queues. This is the machine language of concurrency - it's too primitive to effective write lots of code by anyone who isn't a genius.

What we need is more advanced primitives. Here are my 2 or 3 top likely suspects:

- Concurrent Sequential Programs - CSP. This is the programming model behind Erlang - one of the most successful concurrent programming languages available. Writing large, concurrent, robust apps is as simple as 'hello world' in Erlang. There is a whole new way of thinking that is pretty much mind bending. However, it is that new methodology that is key to the concurrency and robustness of the end applications. Be warned, it's functional!
- Highly optimizing functional languages (HOFL) - These are in the proto-phase, and there isn't much available, but I think this will be the key to extremely high performance parallel apps. Erlang is nice, but not high performance computing, but HOFLs won't be as safe as Erlang. You get one or the other. The basic concept is most computation in high performance systems is bound up in various loops. A loop is a 'noop' from a semantic point of view. To get efficient highly parallel systems Cray uses loop annotations and special compilers to get more information about loops. In a functional language (such as Haskel) you would use map/fold functions or list comprehensions. Both of which convey more semantic meaning to the compiler. The compiler can auto-parallelize a functional-map where each individual map-computation is not dependent on any other.
- Map-reduce - the paper is elegant and really cool. It seems like this is a half way model between C++ and HOFLs that might tide people over.

In the end, the problem is the abstractions. People will consider threads and mutexes as dangerous and unnecessary as we consider manual memory allocation today.
Re:Nope. by lmpeters · 2007-05-28 16:49 · Score: 5, Interesting

It is not difficult to justify parallel programming. Ten years ago, it was difficult to justify because most computers had a single processor. Today, dual-core systems are increasingly common, and 8-core PC's are not unheard of. And software developers are already complaining because it's "too hard" to write parallel programs.

Since Intel is already developing processors with around 80 cores, I think that multi-core (i.e. multi-processor) processors are only going to become more common. If software developers intend to write software that can take advantage of current and future processors, they're going to have to deal with parallel programming.

I think that what's most likely to happen is we'll see the emergence of a new programming model, which allows us to specify an algorithm in a form resembling a Hasse diagram, where each point represent a step and each edge represents a dependency, so that a compiler can recognize what can and cannot be done in parallel and set up multiple threads of execution (or some similar construct) according to that.
Re:our brains aren't wired to think in parallel by bloosqr · 2007-05-28 16:52 · Score: 3, Interesting

Back when i was in graduate school we used to joke .. in the future everything will be monte carlo :)

While everything perhaps can't be solved using monte carlo type integration tricks .. there is more that can be done w/ 'variations of the theme' than is perhaps obvious .. (or perhaps you can rephrase the problem and ask the same question a different way) .. perhaps if you are dreaming like .. what happens if i have a 100,000 processors at my disposal etc
Too much emphasis on instruction flow by putaro · 2007-05-28 17:01 · Score: 3, Interesting

I've been doing true parallel programming for the better part of 20 years now. I started off writing kernel code on multi-processors and have moved on to writing distributed systems.

Multi-threaded code is hard. Keeping track of locks, race conditions and possible deadlocks is a bitch. Working on projects with multiple programmers passing data across threads is hard (I remember one problem that took days to track down where a programmer passed a pointer to something on his stack across threads. Every now and then by the time the other thread went to read the data it was not what was expected. But most of the time it worked).

At the same time we are passing comments back and forth here on Slashdot between thousands of different processors using a system written in Perl. Why does this work when parallel programming is so hard?

Traditional multi-threaded code places way too much emphasis on synchronization of INSTRUCTION streams, rather than synchronization of data flow. It's like having a bunch of blind cooks in a kitchen and trying to work it so that you can give them instructions so that if they follow the instructions each cook will be in exactly the right place at the right time. They're passing knives and pots of boiling hot soup between them. One misstep and, ouch, that was a carving knife in the ribs.

In contrast, distributed programming typically puts each blind cook in his own area with well defined spots to use his knives that no one else enters and well defined places to put that pot of boiling soup. Often there are queues between cooks so that one cook can work a little faster for a while without messing everything up.

As we move into this era of cheap, ubiquitous parallel chips we're going to have to give up synchronizing instruction streams and start moving to programming models based on data flow. It may be a bit less efficient but it's much easier to code for and much more forgiving of errors.
1. Re:Too much emphasis on instruction flow by underflowx · 2007-05-28 17:26 · Score: 3, Interesting
  
  My experience with data flow is LabVIEW. As a language designed to handle simultaneous slow hardware communication and fast dataset processing, it's a natural for multi-threading. Parallelization is automated within the compiler based on program structure. The compiler's not all that great at it (limited to 5 explicit threads plus whatever internal tweaking is done), but... the actual writing of the code is just damn easy. Not to excuse the LabVIEW compiler: closed architecture, tight binding to the IDE, strong typing that's really painful, memory copies everywhere. But the overall model for dataflow is just superior for parallel applications. It's unfortunate that there seems to be little alternative out there with similar support for data flow, but more overall utility.
Re:Nope. by poopdeville · 2007-05-28 17:11 · Score: 5, Interesting

I think that what's most likely to happen is we'll see the emergence of a new programming model, which allows us to specify an algorithm in a form resembling a Hasse diagram, where each point represent a step and each edge represents a dependency, so that a compiler can recognize what can and cannot be done in parallel and set up multiple threads of execution (or some similar construct) according to that.

This is more-or-less how functional programming works. You write your program using an XML-like tree syntax. The compiler utilizes the tree to figure out dependencies. See http://mitpress.mit.edu/sicp/full-text/book/book-Z -H-10.html#%25_sec_1.1.5. More parallelism can be drawn out if the interpreter "compiles" as yet unused functions while evaluating others. See the following section.

--
After all, I am strangely colored.
Re:Nope. by Lost+Engineer · 2007-05-28 17:34 · Score: 5, Interesting

It is still difficult to justify if you can more easily write more efficient single-threaded apps. What consumer-level apps out there really need more processing power than a single core of a modern CPU can provide? I already understand the enterprise need. In fact, multi-threaded solutions for enterprise and scientific apps are already prevalent, that market having had SMP for a long time.
We don't think in recursion either by TheMCP · 2007-05-28 18:20 · Score: 4, Interesting

Most programmers have difficulty thinking about recursive processes as well, but there are still some who don't and we still have use for them. I should say "us", as I make many other programmers batty by using recursion frequently. Programmers tell me all the time that they find recursion difficult - difficult to write, difficult to trace, difficult to understand, difficult to debug. Conversely, I find it easier - all I have to do is reduce the problem to its simplest form and determine the end case, and a tiny snip of code will do where a huge mess of iterative code would otherwise have been required. So, I don't understand why anyone would want to write iterative code when recursion can solve the problem.

I suspect that parallel programming may be similar - some programmers will "get it", others won't. Those who "get it" will find it fun and easy and be unable to understand why everyone else finds it hard.

Also, most developement tools were created with a single processor system in mind: IDEs for parallel programming are a new-ish concept and there are few. As more are developed we'll learn about how the computer can best help the programmer to create code for a parallel system, and the whole process can become more efficient. Or maybe automated entirely; at least in some cases, if the code can be effectively profiled the computer may be able to determine how to parallelize it and the programmer may not have to worry about it. So, I think it's premature to argue about whether parallel programming is hard or not - it's different, but until we have taken the time to further develop the relevant tools, we won't know if it's really hard or not.

And of course, for a lot of tasks it simply won't *matter* - anything with a live user sitting there, for example, only has to be fast enough that the person perceives it as being instantaneous. Any faster than that is essentially useless. So, for anything that has states requiring user input, there is a "fast enough" beyond which we need not bother optimizing unless we're just trying to speed up the system as a whole, and that sort of optimization is usually done at the compiler level. It is only for software requiring unusually large amounts of computation or for systems which have been abstracted to the point of being massively inefficient beneath the surface that the fastest possible computing speed is really required, and those are the sorts of systems to which specialist programmers could be applied.
Not too hard at all by JustNiz · 2007-05-28 18:21 · Score: 3, Interesting

Parallel programming is NOT too hard. Yes its harder than a single-threaded approach sometimes, but in my experience usually the problem maps more naturally into a multi-threaded paradigm.
The real probelm is this: I've seen time and again that the real problem is that most companies do not require, recognise, or reward high technical skill, experience and ability instead they favour minimal cost and speed of product delivery times over quality.
Also it seems most Human Resources staff/agents don't have the necessary skills to actually identify skilled Software Developers compared to useless Software Developers that have a few matching buzzwords on their resume, because they themselves don't understand enough to ask the right questions so resort to resume-keyword matching.
The consequence is that the whole notion of Software Development being a skilled profession is being undermined and devalued. This is allowing a vast amount of people to be employed as Software Developers that don't have the natural ability and/or proper training to do the job.
To those people, parallel programming IS hard. To anyone with some natural ability, the proper understanding of the issues (you get from say a BS Computer Science degree) and a naturally rigorus approach, no it really isn't.
"Dragged Kicking and Screaming" by netfunk · 2007-05-28 19:03 · Score: 4, Interesting

Tom Leonard, a programmer from Valve, gave a fascinating talk about this at GDC this year, about retrofitting multicore support into Half-Life 2 (specifically, into the Source Engine, which powers Half-Life 2). Not surprisingly, this talk was named "Dragged Kicking and Screaming" ...

There was a lot of really good wisdom in there, whether you are writing a game or something else that needs to get every possible performance boost.

I'm sure they probably drew from 20+ years worth of whitepapers (and some newer ones about "lock-free" mutexes, see chapter 1.1 of "Game Programming Gems 6"), but what I walked away from the talk with was the question: "why the hell didn't _i_ think of that?"

There were several techniques they used that, once you built a framework to support it, made parallelizing tasks dirt simple. A lot of it involves putting specific jobs onto queues and letting worker threads pick them up when they are idle, and being able to assign specific jobs to specific cores to protect your investment in CPU cache.

Most of the rest of the work is building things that don't need a result immediately, and trying to build things that can be processed without having to compete for various pieces of state...sometimes easier said than done, sure. But after hearing his talk, I was of the opinion that while parallelism is always more complex than single-threaded code, doing this well is something most developers aren't even _thinking_ about yet. In most cases, we're not even at the point where we can talk about _languages_ and _tools_, since we aren't even using the ones we have well.

--ryan.

--
Don't say, "don't quote me," because if no one quotes you, you probably haven't said a thing worth saying.
No. No. No. by Duncan3 · 2007-05-28 19:06 · Score: 3, Interesting

We've had a steady stream of multi-core guests here at Stanford lecturing on the horror and panic in the industry about multi-core, and how the world is ending. I've seen Intel guys practically shit themselves on stage, they are so terrified we can't find them a new killer multi-core app in time. That's BS. OK so it's a hour lecture, maybe two if you're not that fast. Parallel/SMP is not that hard, you just have to be careful and follow one rule, and it's not even a hard rule. There are even software tools to check you followed the rule!

But that's not the problem...

The problem is, a multi-year old desktop PC is still doing IM, email, web surfing, Excel, facebook/myspace, and even video playing fast enough a new one won't "feel" any faster once you load up all the software, not one bit. For everyone but the hardcore momma's basement dwelling gamers, the PC on your home/work desk is already fast enough. All the killer apps are now considered low-CPU, bandwidth is the problem.

Now sure, I use 8-core systems at the lab, and sit on top of the 250k-node Folding@home so it sounds like I've lost my mind, but you know what, us super/distributed computing science geeks are still 0% of the computer market if you round. (and we'll always need every last TFLOP of computation on earth, and still need more)

That's it. Simple. Sweet. And a REAL crisis - but only to the bottom line. The market has nowhere to go but lower wattage lower cost systems which means lower profits. Ouch.

--
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
Re:our brains aren't wired to think in parallel by EvanED · 2007-05-28 19:40 · Score: 5, Interesting

I can eat, talk and think at the same time, all are pretty conscious actions

True, but can you talk (perhaps reciting something from memory) at the same time you are listening to something? Even if it's not a volume issue; you're wearing headphones say.

Feynman has a chapter in "What do you care what other people think" where he talks about some informal experimentation he did where he tried to figure out what he could do at the same time as accurately timing out a minute. Essentially, the same time as counting. He found that (1) he could be very consistent about timing out a given time, and that (2) he could do most things while counting. But what he couldn't do is talk. On discussion with other people in his dorm/frat/house/whatever, there was another person who could talk, but couldn't read while timing things out. Turns out that the reason it differed was because they counted differently; Feynman was hearing "one, two, three, ..." while this other guy was watching the numbers pass in front of his eyes.

Activities are localized in the brain; it seems that these areas are largely independent, but try two tasks that use the same area and you're SOL.
Re:A different approach to parallel programming by try_anything · 2007-05-28 19:43 · Score: 3, Interesting

How would a string of Chinese characters like beads on a string be any better than a string of alphanumeric characters like beads on a string? Many (most?) Chinese words are written using multiple characters, so a language that had a special use for lone characters would end up looking a lot like Fortran 77 anyway. (What does the FSSCRD function do?)

I think you're reasoning from some notion about the way "Oriental" people think differently from "Western" people. I tend to doubt that idea based on the kinds of people who have enthusiastically pushed it: originally imperialist racists of various groups, each bent on proving the superiority of their group, and more recently western PC racists who compulsively idealize everything non-Western. Despite the taint of racism, the idea may have some basis in fact as well -- there is ongoing research that occasionally manages to produce some evidence for it -- but the sad fact is that we have only been able to create programming languages that express a very tiny subset of the way "Western" people supposedly think anyway. The problem is not a lack of nonlinear, context-sensitive ways of thinking; the problem is that before we can use a given way of thinking to communicate with a computer, we must essentially enable the computer to think the same way. If you buy into the western PC version of the dualism, "Oriental = nonlinear, inclusive, sensitive, flexible, context-sensitive; Western = linear, exclusive, autistic, rigid, blinkered," then digital computers are quintessentially Western beings that cannot be made to appreciate Eastern ways of thinking, at least not without a few more decades of AI research and performance improvements.

The Chinese government might very well be hard at work creating a quintessentially Chinese programming language, but it's a bad idea to pin your hopes on political science. It tends to suck. On top of that, many excellent programming languages have been doomed by much smaller barriers to entry than learning an entirely new system of writing. On top of that, your 2D array of characters is doomed by the multitudes of multicharacter words in Chinese. To add yet more on top of that, another poster just pointed out that the idea that it expresses has already been expressed in other languages using ASCII.

I wouldn't have bothered piling on you like this if your post didn't strike me as racist. The commonly accepted story about the differences between Eastern and Western ways of thinking is propagated by uninformed repitition. Chinese, Americans, left-wingers, right-wingers, everybody has learned to love it and interpret it to flatter their side, so they all repeat it in unison. It pollutes the discourse. Wouldn't it be nice if everyone who didn't have firsthand experience just shut the hell up? Then we might hear something different from the standard story that gets passed around like a centuries-old fruitcake. Or we might hear the same thing, but then at least it would mean something.
Re:I blame the tools by DaChesserCat · 2007-05-29 03:12 · Score: 5, Interesting

I was using a potential answer to this in 1990. I was working for a small company in the Provo/Orem area, called Computer System Architects, which was selling Transputer hardware. For those who haven't heard of Transputers, they were small, 16- or 32-bit processors, with a small amount of built-in RAM (not a cache; this was actually in the memory map and you could do small tasks on a Transputer without any external RAM), 2-4 high-speed serial channels (easily implemented with 4 wires) and a stack-based architecture. Adding megabytes of external RAM was easy, and it was embarrassingly easy to connect up networks of these things, even on one board (in a single ISA slot), and build cluster. An external card cage, in those days, could hold 20 slots, which would hold up to 80 Transputers, using our products.

I did some Assembly and some C, but the kicker language for this chip was called Occam II. Among other things, it used the indentation in the code to determine block structure. A quick example:
PAR step A step B step C SEQ step D step E
In this example, steps A, B and C would all be executed in parallel with another task which ran step D then step E. If you had one Transputer in your machine, it would multi-task. If you had multiple CPU's available, it would spread the task across the CPU's.

It also has a basic construct called a Channel. These were very easy to set up and use. These were how the different tasks communicated with each other.

It was not difficult to spawn thousands of tasks, each one doing a relatively small part of an overall task, with full communication and synchronization. Again, if you had multiple CPU's available, it would spread the tasks across them. A board with multiple Transputers was usually doing ray-tracing or rendering Mandelbrot fractals as a demo anytime we went to a trade or tech show. They could knock it down to one processor, and things got done relatively quickly. Then, they'd kick in 4 or 16 CPU's and blow people's minds.

This was in 1990. A 386DX-33 was high-end, back then. The Transputer didn't run DOS or Windows, so it didn't survive in the market of the time. That was a shame; I benchmarked a variety of them, then ran identical benchmarks on various other machines as technology marched on. A T805 running 30 MHz (the top end Transputer I ever got to play with) blasted through mixed integer/floating-point calculations about as fast a 486DX2-66 (which didn't come on the market for another couple years). There was an occasion where I had 16 of those T805's sitting my machine. You'd need a Pentium II to be able to match that occasion. It was well over a decade later that the P-II became available.

Cool tech, but the programming tools were what allowed you to really use the parallelization. It was typical to achieve over 95% linear speedup (i.e. 20 CPU's gave real-world 19x performance); sometimes we went over 99%. Most Intel SMP machines are lucky if they give 80% linear speedup (4 CPU's = 3.2x total performance).

--
... by the Dew of Mountains the thoughts acquire speed, the hands acquire shakes, the shakes become a warning
Re:That's irrelevant. by LionMage · 2007-05-29 10:45 · Score: 3, Interesting

Much of the work of programming is taking an parallel process (i.e. pretty much anything you want a program to do) and translating it into the sequential model used by most programming languages.
Well, that statement makes a gross assumption. Every software developer (programmer/engineer) thinks that their particular domain is representative of what computer programmers do in general. However, if you in fact write software that automates common business tasks, the software is directly analogous to the process it seeks to model and/or replace. Most business tasks are sequential, so procedural single-threaded programming is a perfectly fine model to use.

Yes, for the things that most casual users and computer scientists think are "interesting," there is some inherent level of parallelism that can get pretty high. For a few choice types of tasks (ray tracing, rendering certain fractals), the problem itself is embarrassingly parallel because there is no direct coupling between the solutions of sub-problems. However, at some level you can't decompose your problem any further, and you can extract no more parallelism.

The vast majority of computer software these days is written for businesses. Most of it is not the stuff you see on the shelf at the local computer store, but is written custom to solve a specific business need. Most business processes are inherently serial operations: you perform one step, then you perform another step that depends on the previous step. Occasionally, you get lucky and you discover multiple steps that can be done simultaneously. However, making such processes explicitly parallel might not be the most advantageous move; after all, most modern CPU architectures are adept at out-of-order execution, and can analyze instruction streams to figure out dependencies dynamically. Heck, most modern CPU architectures support multiple in-flight instructions, and allow multiple instructions to complete simultaneously, which extracts parallelism at a very low level.

When I first learnt to program, I remember learning a little about function and then later discovering that they completed synchronously. I was surprised by this, because it just didn't seem to make any sense; a function has a deadline by which it needs to complete (when I use the result), but it can run in the background until then and join with the calling thread.
You can still explicitly use this technique in modern programming languages; Java, for example, has Thread.join() which is used for precisely such situations. However, just because you can do something doesn't mean that you should; there is overhead associated with spawning threads of execution and synchronizing threads when some result is needed. If the computation being performed by a function call is long-lived, then it makes sense to spawn a thread to perform that computation -- assuming that there are sufficient computational resources to truly run that thread concurrently (e.g., another CPU core able to perform that computation). Otherwise, you're burning more computational resources and probably making your code actually slower in the process (due to management overhead). And if you're multitasking on a single CPU core, spawning another thread will almost certainly result in a slower-running program (because you still have all the overhead of managing another thread, but none of the benefit of true hardware-level concurrency).

Most programming tasks are inherently parallel. I can't think of a single piece of code I've written recently that hasn't been a solution to a parallel problem.
The first sentence is really an unsupported conjecture. The second sentence is an attempt to provide anecdotal evidence drawn from your own experiences to support the conjecture in the first. My own life experience is vastly different from yours, but then again, that too is anecdotal evidence; neither your nor my personal experiences are really "proof" o