Is Parallel Programming Just Too Hard?
pcause writes "There has been a lot of talk recently about the need for programmers to shift paradigms and begin building more parallel applications and systems. The need to do this and the hardware and systems to support it have been around for a while, but we haven't seen a lot of progress. The article says that gaming systems have made progress, but MMOGs are typically years late and I'll bet part of the problem is trying to be more parallel/distributed. Since this discussion has been going on for over three decades with little progress in terms of widespread change, one has to ask: is parallel programming just too difficult for most programmers? Are the tools inadequate or perhaps is it that it is very difficult to think about parallel systems? Maybe it is a fundamental human limit. Will we really see progress in the next 10 years that matches the progress of the silicon?"
Parallel programming isn't all that hard, what is difficult is justifying it.
:-)
What's hard, is trying to write multi-threaded java applications that work on my VT-100 terminal.
I can't speak for the rest of the world, or even the programming community. That disclaimer spoken, however, I can say that parallel programming is indeed hard. The trivial examples, like simply running many processes in parallel that are doing the same thing (as in, for example, Monte Carlo sampling) are easy, but the more difficult examples of parallelized mathematical algorithms I've seen, such as those in linear algebra are difficult to conceptualize, let alone program. Trying to manage multiple threads and process communication in an efficient way when actually implementing it adds an additional level of complexity.
I think the biggest reason why it is difficult is that people tend to process information in a linear fashion. I break large projects into a series of chronologically ordered steps and complete one at a time. Sometimes if I am working on multiple projects, I will multitask and do them in parallel, but that is really an example of trivial parallelization.
Ironically, the best parallel programmers may be those good managers, who have to break exceptionally large projects into parallel units for their employees to simultaneously complete. Unfortunately, trying to explain any sort of technical algorithm to my managers usually exacts a look of panic and confusion.
-Ryan
AUWYHSTOT (Acronyms are Useless When You Have to Spell Them Out Too)
Oh noes! Software doesn't get churned out immediately upon the suggestion of parallel programming! Programmers might actually be debugging their own code!
There's nothing new here: just somebody being impatient. Parallel code is getting written. It is not difficult, nor are the tools inadequate. What we have is non-programmers not understanding that it takes a while to write new code.
If anything, that the world hasn't exploded with massive amounts of parallel code is a good thing: it means that proper engineering practice is being used to develop sound programs, and the jonny-come-lately programmers aren't able to fake their way into the marketplace with crappy code, like they did 10 years ago.
Well, the way they 'teach' programming nowadays, programmers are simply just typists.
No need to worry about memory management, java will do it for you.
No need to worry about data location, let the java technology of the day do it for you.
No need to worry about how/which algorithm you use, just let java do it for you, no need to optimize your code.
Problem X => Java cookbook solution Y
Though there are many very good parallel programmers who make excellent use of the Message Passing Interface, we are entering a new era of parallel computing where MPI will soon be unusable. Consider when the switch was made from assembly language to a programming language - when the "processor" contained too many components to be effectively programmed with machine language. That same threshold has long since passed with parallel computers. Now that we have computers with more than 100 thousand processors and are working to build computers with more than a million processors, MPI has become the assembly language of parallel programming. It hence, needs to be replaced with a new parallel language that can controll great numbers of processors.
In one word, the answer is yes. It's difficult for people who are used to programming for a single CPU.
Programmers that are accustomed to non-parallel programming environments forget to think about the synchronization issues that come up in parallel programming. Several conventional programs do not take into account synchronization of the shared memory or message passing requirements that come up for these programs to work correctly in a parallel environment.
This is not to say that there will not be any progress in this field. There will be and there has been. The design techniques and best practices differ for parallel programming than for the conventional programming. Also currently there is limited IDE support for debugging purposes. There are already several books on this topic and classes in the universities. As the topic becomes more and more important, computer science students will be required to take such classes (as opposed to it being optional) and more and more programmers that know and are experts in parallel programming will be churned out. It's just not as popular because the universities don't currently seem to make it a required subject. But that will change because of the advancement in hardware and more market demand for expert parallel programmers.
Our brains might be limited about other things, but this is just a matter of better education. 'Nuff said.
Since this discussion has been going on for over three decades with little progress in terms of widespread change
Funny, I've seen an explosion in the number of compute clusters in the past decade. Those employ parallelism, of differing types and degrees. I guess I'm not focused as much on the games scene - is this somebody from the Cell group writing in?
I mean, when there's an ancient Slashdot joke about something there has to be some entrenchment.
The costs are just getting to the point where lots of big companies and academic departments can afford compute clusters. Just last year the price of multi-core CPU's made it into mainstream desktops (ironically, more in laptops so far). Don't be so quick to write off a technology that's just out of its first year of being on the desktop.
Now, that doesn't mean that all programmers are going to be good at it - generally programmers have a specialty. I'm told the guys who write microcode are very special, are well fed, and generally left undisturbed in their dark rooms, for fear that they might go look for a better employer, leaving the current one to sift through a stack of 40,000 resumes to find another. I probably wouldn't stand a chance at it, and they might not do well in my field, internet applications - yet we both need to understand parallelism - they in their special languages and me, perhaps with Java this week, doing a multithreaded network server.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
It's very easy to just say "Parallel programming is too hard", and try to ignore it. But consider this: the languages we use today are designed to work in the same way as the CPUs of old: one step at a time, in sequence, from start to finish.
... there's a hell of a lot to think about, and you only have to get it wrong once to introduce a very hard to reproduce and debug problem. Other languages are more abstract, and have much greater opportunities for compilers to extract the inherent parallelism; it's just that they have had more use to date in academia to illustrate principles than in the real world to solve problems.
A special case of this is seen in the vector units found in today's CPUs: MMX, SSE, Altivec, and so forth. You can't write a C compiler that takes advantage of these units (not easily, anyway), because the design of C means that a programmer takes the mathematics and splits it up into a bunch of sequential instructions. Fortran, on the other hand, is readily adapted, because the language is designed for mathematics; you tell it what you need to do, but not how to do it.
In the same way, trying to cram parallelism into a program written in C is a nightmare. Semaphores, exclusive zones, shared variables, locking, deadlocks
As time marches on, and the reality of the situation becomes increasingly obvious, I would expect that performance-intensive apps will start to be written in languages better suited to the domain of parallel programming. Single threaded apps will remain - eg, Word doesn't really need any more processing power (MS' best efforts to the contrary notwithstanding) - and C-like languages will still be used in that domain, but I don't think inherently sequential languages, like C, C++, and others of that nature, will be as common in five or ten years as they are today, simply because of the rise of the parallel programming domain necessitating the rise of languages that mimic that domain better than C does.
Parallel programming doesn't have to be quite as painful as it currently is. The catch is that you have to face the fact that you can't go on thinking with a sequential paradigm and have some tool, library, or methodology magically make everything work. And now, I'm not talking about functional programming. Functional programming is great, and has a lot going for it, but solving concurrent programming issues is not one of those things. Functional programming deals with concurrency issues by simply avoiding them. For problems that have no state and can be coded purely functionally this is fine, but for a large number of problems you end up either tainting the purity of your functions, or wrapping things up in monads which end up having the same concurrency issues all over again. It does have the benefit that you can isolate the state, and code that doesn't need it is fine, but it doesn't solve the issue of concurrent programming.
No, the different sorts of paradigms I'm talking about no shared state, message passing concurrency models ala CSP and pi Calculus and the Actor Model. That sort of approach in terms of how to think about the problem shows up in languages like Erlang, and Oz which handle concurrency well. The aim here is to make message passing and threads lightweight and integrated right into the language. You think in terms actors passing data, and the language supports you in thinking this way. Personally I'm rather fond of SCOOP for Eiffel which elegantly integrates this idea into OO paradigms (an object making a method call is, ostensibly, passing a message after all). That's still research work though (only available as a preprocessor and library, with promises of eventually integrating it into the compiler). At least it makes thinking about concurrency easier, while still staying somewhat close more traditional paradigms (it's well worth having a look at if you've never heard of it).
The reality, however, is that these new languages which provide the newer and better paradigms for thinking and reasoning about concurrent code, just aren't going to get developer uptake. Programmers are too conservative and too wedded to their C, C++, and Java to step off and think as differently as the solution really requires. No, what I expect we'll get is kluginess retrofitted on to existing languages in a slipshod way that sort of work, in as much as it is an improvement over previous concurrent programming in that language, but doesn't really make the leap required to make the problem truly significantly easier.
Craft Beer Programming T-shirts
I've worked with parallel software for years - there are lots of ways to do it, lots of good programming tools around even a couple of decades back (my stuff ranged from custom message passing in C to using "Connection-Machine Fortran"; now it's java threads) but the fundamental problem was stated long ago by Gene Amdahl - if half the things you need to do are simply not parallelizable, then it doesn't matter how much you parallelize everything else, you'll never go more than twice as fast as using a single thread.
Now there's been lots of work on eliminating those single-threaded bits in our algorithms, but every new software problem needs to be analyzed anew. It's just another example of the no-silver-bullet problem of software engineering...
Energy: time to change the picture.
No, parallel programming isn't "too hard", it's just that programmers never learn how to do it because they spend all their time on mostly useless crap: enormous and bloated APIs, enormous IDEs, gimmicky tools, and fancy development methodologies and management speak. Most of them, however, don't understand even the fundamentals of non-procedural programming, parallel programming, elementary algorithms, or even how a CPU works.
These same programmers often think that ideas like "garbage collection", "extreme programming", "visual GUI design", "object relational mappings", "unit testing", "backwards stepping debuggers", and "refactoring IDEs" (to name just a few) are innovations of the last few years, when in reality, many of them have been around for a quarter of a century or more. And, to add insult to injury, those programmers are often the ones that are the most vocal opponents of the kinds of technologies that make parallel programming easier: declarative programming and functional programming (not that they could actually define those terms, they just reject any language that offers such features).
If you learn the basics of programming, then parallel programming isn't "too hard". But if all you have ever known is how to throw together some application in Eclipse or Visual Studio, then it's not surprising that you find it too hard.
I've encountered two problems with parallel programming 1. For applications which are constantly being changed under tight deadlines, parallel programming becomes an obstacle. Parallelism adds complexity which hinders quick changes to applications. This is a very broad generalization, but often the case. 2. Risk. Parallelism introduces a lot of risk, things that arent easy to debug. Some problems I faced happened only once every couple of weeks, and involved underlying black-box libraries. For financial applications, this was absolutely too much risk to bear. I would never do parallel programming for financial applications unless management was behind it fully (as they are for HUGE efforts such as monte carlo simulations and VaR apps.)
I seem to recall comments from Tim Sweeney and John Carmack that parallelism needed to start from the beginning of the code - IE, if you weren't thinking about it and implementing it when you started the engine, it was too late. You can't just tack it on as a feature. Unreal Engine 3 is a prime example of an engine that is properly parallelized. It was designed from the ground up to take full advantage of multiple processing cores.
If your programmers are telling you they need more time to turn a single-threaded game into a multi-threaded one, then the correct solution IS to push the game out the door, because it won't benefit performance to try to do it at the end of a project. It's a fundamental design choice that has to be made early on.
I would argue that most user tasks cannot be fundamentally parallel. Quite simply, if a user highlights some text, and hits the bold button, there really isn't anything to split across multiple cores. No matter how much processing is necessary to make the text bold (or to build the table, or to check the spelling of a word, or to format a report, or to calculate a number, or to make a decision -- as in A.I.) it's a serial concept, and a serial algorithm, and cannot be anything more. Serial is cool too remember.
So we're looking at multiple tasks. Obvious gaming and operating systems get to split by engine (graphics, network, interface, each type of sound, A.I., et cetera). I'd guess that there is a limit of about 25 such engines that anyone can dream up. Obviously raytracing gets to have something like 20 cores per pixel, which is really really cool. But that's clearly the exception.
So really, in my semi-expert and wholy professional opinion, I think priming is the way to go. That is, it takes the user up to a second to click a mouse button. So if the mouse is over a word, start guessing. Get ready to bold it, underline it, turn it into a table, look up related information, pronounce it, stretch it, whatever.
Think of it as: "we have up to a full second to do something. We don't know what it's going to be, but we have all of these cores just sitting here. So we'll just start doing stuff." It's the off-screen buffering of the user task world.
Which results in just about any task, no matter how complicated, can be instantly presented -- having already been calculated.
Doesn't exactly save power, but hey, the whole point is to utilize power. And power requires power. There must be someone's law -- the conservation of power, or power conversion, or something that discusses converting power between various abstract or unrelated forms -- muscle to crank to generator to battery to floatig point to computing to management to food et cetera; whatever.
Right, so priming / off-screen buffering / preloading. Might as well parse every document in the directory when you open the first one. Might as well load every application on start-up. Can't have idle cores lying around just for fun.
Has anyone ever thought that maybe we won't run out of copper after all? I'd bet that at some point in the next twenty years, we go back to clock speed improvement. I'd guess that it's when core busing becomes rediculous.
I still don't understand how we went from shared RAM as the greatest thing in the world to GPU's with on-board RAM, to CPU's with three levels of on-chip RAM, to cores each with their own on-core RAM.
Hail to the bus driver; bus driver man.
After years of driving the programming profession to its least common denominator, and eliminating anything that was considered non-essential, somebody is surprised that current professionals are not elastic enough to quickly adapt to a changing environment in hardware. Whoda thunk it? The ones, you may have left, with some skills are nearing retirement.
"To those who are overly cautious, everything is impossible. "
Many of the responses here, frankly, demonstrate how poorly people understand both the problem and the potential solutions.
/.) that for parallel/distributed processing, Language Does Matter. (And UML is clearly part of the problem...)
Are we talking about situations that lend themselves to data flow techniques? Or Single Instruction/Multiple Data (e.g. vector) techniques? Or problems that can be broken down and distributed, requiring explicit synchronization or avoidance?
I agree with other posters (and I've said it elsewhere on
Then there's the related issue of long-lived computation. Many of the interesting problems in the real-world take more than a couple of seconds to run, even on today's machines. So you have to worry about faults, failures, timeouts, etc.
One place to start for distributed processing kinds of problems, including fault tolerance, is the work of Nancy Lynch of MIT. She has 2 books out (unfortunately not cheap), "Distributed Algorithms" and "Atomic Transactions". (No personal/financial connection, but I've read & used results from her papers for many years...)
I get the sense that parallelism/distributed computation is not taught at the undergrad level (it's been a long time since I was an undergrad, and mine is not a computer science/software engineering/computer engineering degree.) And that's a big part of the problem...
dave
Two words: map-reduce
(Score:4, Interesting)
by Anonymous Coward on 07-05-29 6:29 (#19305183)
Implement it, add CPUs, earn billion$. Just Google it.
Wow, fantastic. Turns out parallel programming was really really easy after all. I guess we can make all future computer science courses one week then, since every problem can be solved by map-reduce! The AC has finally found the silver bullet. Bravo, mods.
Being bitter is drinking poison and hoping someone else will die
Basically like a compiled function?
I've been programming using parallel techniques for more than 25 years using APL.
I was so imnpressed with APL that I implemented an APL like derivative language
called Simmunity which is a hybrid compiler for an APL like syntax. Simmunity
can be targeted at a parallel processor implemented in FPGA's that provides
multiple simultaneously executing vector/matrix operations in parallel...
(stay tuned for more info)
The quintessential parallel database programming language available is clearly
KxSystems Q and KDB+
See the faq at http://kx.com/faq/ and tutorial at http://kx.com/q/d/primer.htm
Ken Iverson invented APL and then the J language which anyone interested in
really discussing parallel programming should look at closely. The Connection
Machine LISP had many of the APL operators/functions and certainly Map/Reduce
that it added to provide a convienent parallel expression langauge
Most programmers think in terms of sequential code execution with threads and
processes. APL incourages the programmer to conceive of programming using
vector and matrix operations to process strings and numbers and manipulate
data like a spreadsheet. An application that people might be familiar with
is Lotus Improv which provided naturally parallel expressions based on a subset
of APL operations.
Cheers, Simbuddha
PHB: Why? Can it run on one core of a X?
Coder: Well I suppose it can but...
PHB: Shove it out the door then.
I think the PHB may have a point. Games are expensive to make, gamers are fickle, and if reviews say a game "looks dated" it is a death sentence for sales. Rewriting an engine to be multi-threaded after it is done will take a lot of time and money, and most PCs out there have more dated hardware than you think, so most current players won't see any benefints whatsoever from it anyway.
Being bitter is drinking poison and hoping someone else will die
2. Parallel code is a monster to write. I'm not talking simple scatter-gather data spreaders. Imaging Adobe photoshop running across a 400 machine cluster, handling hundreds of users at a time. The data concurency issues, data residence, locking, message handling, message reordering Total bloody nightmare...., If youve parallelized a markov model it doesn't really compare.
3. The tools arent adequate. tracing a data race, or deadlock in a cluster is a beast. MPI and PVM are nice but are really narrow in the scope that they handle problems.
4. It isnt just non-programmers.. Parallel is a whole different scale of complexity... Almost everything I see is a "parallelize the brute zones of a specialty engine once it works in serial"..... Its an important baby step and it really blows non-programmers for a loop. But we are a far sight from having an implicitly parallel version of MS-word.
5. Parallel isnt new, dual cpu boxes have been in userspace since the late 90's, it has been mostly ignored by applications. The use of network resources is horribly behind the times. The ability to aggregate resources on the fly is a total joke compared to where it should be.
Storm
Brought up programming C and Fortran I struggled for a long time with recursion. When I moved on to using functional languages(erlang to be specific) because recursion is integral(no loop construct) I eventually became quite comfortable with it.
It goes to show to become a better programmer investigate as many programming paradigms as possible.
Choose your allies carefully, it is highly unlikely you will be held accountable for the actions of your enemies
The language isn't called LabView, it's called "G". LabView is the whole package (G interpreter/compiler, drivers, libraries, etc).
And, yeah, G is pretty cool. National Instruments offers a slightly dated, but otherwise completely functional version of LabView for free for noncommercial use.
Our cognitive system does many things at the same time, yes. That doesn't answer the question that's being posed here: whether explicit, conscious reasoning about parallel processing is hard for people.
Are you adequate?
Actually, it's more like pipelined. The fact that your eyes already moved to the next letter, just says that the old one is still going through the pipeline. Yeah, there'll be some bayesian prediction and pre-fetching involved, but it's nowhere near consciously doing things in parallel.
Try reading two different texts side by side, at the same time, and it won't work that neatly parallel any more.
Heck, there were some recent articles about why most Powerpoint presentations are a disaster: in a nutshell, because your brain isn't that parallel, or doesn't have the bandwidth for it. If you try to read _and_ hear someone saying something (slightly) different at the same time, you just get overloaded and do neither well. The result is those time-wasting meetings where everyone goes fuzzy-brained and forgets everything as soon as the presentation flipped to the next chart.
To get back to the pipeline idea, the brain seems to be quite the pipelined design. Starting from say, the eyes, you just don't have the bandwidth to consciously process the raw stream of pixels. There are several stages of buffering, filtering out the irrelevant bits (e.g., if you focus on the blonde in the car, you won't even notice the pink gorilla jumping up and down in the background), "tokenizing" it, matching and cross-referencing it, etc, and your conscious levels work on the pre-processed executive summary.
We already know, for example, that the shortest term buffer can store about 8 seconds worth of raw data in transit. And that after about 8 seconds it will discard that data, whether it's been used or not. (Try closing your eyes while moving around a room, and for about 8 seconds you're still good. After that, you no longer know where you are and what the room looks like.)
There's a lot of stuff done in parallel at each stage, yes, but the overall process is really just a serial pipeline.
At any rate, yeah, your eyes may already be up to 8 seconds ahead of what your brain currently processes. It doesn't mean you're that much of a lean, mean, parallel-processing machine, it just means that some data is buffered in transit.
Even time-slicing won't really work that well, because of that (potential) latency and the finite buffers. If you want to suddenly focus on another bit of the picture, or switch context to think of something else, you'll basically lose some data in the process. Your pipeline still has the old data in it, and it's going right to the bit bucket. That or both streams get thrashed because there's simply not enough processing power and bandwidth for both to go through the pipeline at the same time.
Again, you only need to look at the fuzzy-brain effect of bad Powerpoint presentations to see just that in practice. Forced to try to process two streams at the same time (speech and text), people just make a hash of both.
A polar bear is a cartesian bear after a coordinate transform.
Games have been driving PC performance lately. You shouldn't be opposed to such things.
You can get a 4 core chip for under $600 now because of it. If you are into high performance computing then you should beg the game developers for something that can use as many cores as you can throw at it. Because as you said, you are 0% of the market, and gamers are a huge chunk of it.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Looking at the proposals for switching languages, they just shift the problem around (as another poster has already mentioned). The only thing new that might be useful is the idea of "Guards" from Haskell (for why, see below). But you don't need a new programming language to implement that, with a little imagination you can build it into a C++ library.
In my experience, 99% of the problems are due to two problems: overlooked shared resources and deadlocks. An example of the first problem might be when two threads are pushing/popping from the same queue, causing duplicates or skips. An example of the second would be thread A has resource X and is waiting for resource Y, while thread B has resource Y and is waiting for resource X. Guards could possibly make it easier to avoid some forms of deadlock by making it easier to acquire multiple resources simultaneously with less risk of deadlock. I think I have a new library to go code now... ;-)
The problem with programming is that we are forced to use programming languages. Parallel programming is what programming should have been in the first place. We find it difficult because the tools are inherently sequential. Why? Because they are based on languages (usually English) and languages are inherently sequential. Just the fact that most programmers have to learn a computer language based on English (why not Chinese or French?) should be a clue that the linguistic approach is fundamentally flawed. The way a program works has nothing to do with what language you understand.
What is needed is a new software paradigm that uses parallelism to start with. It should be a system based on elementary communicating objects. And the best way to depict parallel objects and signal pathways is to do it graphically. I say that the entrire computer industry have been doing it wrong from the beginning (since Lady Ada Lovelace). It is time to move away from the algorithmic model and adopt a non-algorithmic, synchronous software model. Even our processors should be redesigned for the new model. Only then will find that parallel programming is not onl y easy but the only way to do it.
Look up Semaphores?
SIG: HUP
Most "programmers" seem to have a difficult time churning out normal, non-parallel code that runs, much less code that is bug free. Do you seriously think your average code monkey reusing snippets from the language-of-the-week cookbook is going to be able to wrap his head around something that actually requires a little thought?
Multi-core designs do not change the fact that processors will always execute the vast majority of their instructions sequentially. Even a 1000 core machine would still be executing more instructions in serial than in parallel. If each core ran at 1Hz they would still be executing 1000 instructions per second in parallel but each of those is also part of a sequence in each core so it is executing >1000 in serial depending on how deeply pipelined the cores are! The extreme of parallelism is a neural network but neurons do not execute instructions: if you want each time-step of the processor to have a well defined meaning (so that it could justifiably be called an instruction execution) then it needs a context - it needs to be part of a sequence.
On a more general note I don't know why people have taken to evangelising functional programming for this discussion. Functional languages are particularly not the solution for parallel programming. People are too dumb or lazy to write explicitly parallel code and think if they write everything in a functional language the compiler will handle that messy business for them. Wrong! As soon as you get to something moderately complex like matrix multiplication functional languages will fail to find a good strategy. There is no way in hell a compiler can figure out an optimal way to parallelise matrix multiplication for an arbitrary architecture because the optimal matrix blocking and memory access patterns depend on the processors cache design. The ATLAS project achieves this to some extent but it is a highly specialised 'compiler' if you can call it that, and needs to know the cpu type and cache sizes/organisation to work optimally.