Slashdot Mirror


Is Parallel Programming Just Too Hard?

pcause writes "There has been a lot of talk recently about the need for programmers to shift paradigms and begin building more parallel applications and systems. The need to do this and the hardware and systems to support it have been around for a while, but we haven't seen a lot of progress. The article says that gaming systems have made progress, but MMOGs are typically years late and I'll bet part of the problem is trying to be more parallel/distributed. Since this discussion has been going on for over three decades with little progress in terms of widespread change, one has to ask: is parallel programming just too difficult for most programmers? Are the tools inadequate or perhaps is it that it is very difficult to think about parallel systems? Maybe it is a fundamental human limit. Will we really see progress in the next 10 years that matches the progress of the silicon?"

21 of 680 comments (clear)

  1. Re:our brains aren't wired to think in parallel by Anonymous Coward · · Score: 5, Informative

    I do a lot of multithreaded programming, this is my bread and butter really. It is not easy - it takes a specific mindset, though I would disagree that it has much to do with management. I am not a manager, never was one and never will be. It requires discipline and careful planning.

    That said, parallel processing is hardly a holy grail. On one hand, everything is parallel processing (you are reading this message in parallel with others, aren't you?). On the other, when we are talking about a single computer running a specific program, parallel usually means "serialized but switched really fast". At most there is a handful of processing units. That means, that whatever it is you are splitting among these units has to give itself well to splitting this number of ways. Do more - and you are overloading one of them, do less - and you are underutilizing resources. In the end, it would be easier often to do processing serially. Potential performance advantage is not always very high (I know, I get paid to squeeze this last bit), and is usually more than offset by difficulty in maintenance.

  2. Errors are difficult to find out in such systems by himanshuarora · · Score: 2, Informative

    Well, I was a Teaching Assistant of 'Concurrent Algorithms'. There used to be 30 students for that class. For a given assignment there used to be 30 different solutions and many used to be wrong.

    It is difficult to find out errors in these kind of algorithms. There has been some papers published in big conferences and it was found that some mathematical proofs were wrong because they could not think of some scenarios where it can fail.

    --
    Spam: Any activity on internet to gain popularity without paying to advertising companies like Google.
  3. Re:Two words: map-reduce by allenw · · Score: 5, Informative
    Implementing MapReduce is much easier these days: just install and contribute to the Hadoop project. This is an open source, Java-based MapReduce implementation, including a distrbuted filesystem called HDFS.

    Even though it is implemented in Java, you can use just about anything with it, using the Hadoop streaming functionality.

  4. A Case Study by iamdrscience · · Score: 2, Informative

    My brother just recently started doing IT stuff for the research psych department at a respected university I won't name. They do a lot of mathematical simulations with Matlab and in order to speed these up they decided to buy several Mac Pro towers with dual quad core Xeons at $15,000*. The problem is, their simulations aren't multithreaded (I don't know if this is a limitation of Matlab or of their programming abilities -- maybe both) so while one core is cranking on their simulations to the max, the other 7 are sitting there idle! So while a big part of this ridiculous situation is just uninformed people not understanding their computing needs, it also shows that there are plenty of programmers stuck playing catch-up since computers with multiple cores (i.e. Core 2 Duo, Athlon X2, etc.) have made their way onto the desktops of normal users.

    I think this is a temporary situation though, and something that has happened before, there have been many cases where new powerful hardware seeps into the mainstream before programmers are prepared to use it.



    *I know what you're thinking: "How the hell do you spend $15,000 on a Mac?". I wouldn't have thought it was possible either, but basically all you have to do is buy a Mac with every single option that no sane person would buy: max out the overpriced RAM, buy the four 750GB hard drives at 100% markup, throw in a $1700 Nvidia Quadro FX 4500, get the $1K quad fibre channel card, etc.

  5. Just look around by const · · Score: 2, Informative

    A lot of efforts are being done to simplify the parallel programming, both on micro scale and component scale. Just look through lambda-the-ultimate archives. Micro scale is mostly invisible to application programmers, and it is mostly done by compilers when they use SSE and friends, one of notable efforts of making it more explicit is OpenMP. On component scale, most of them are based on the message passing concurrency model (after you grok it, it is really really simple). The best effort that I have seen from point of view of usability is E programming language. I tried to clone its core ideas in my pet project AsyncObjects Framework, but usability is less than E's one because of the framework clutter.

  6. Re:Nope. by Anonymous Coward · · Score: 4, Informative

    Functional programming is no harder than procedural/OO programming. A good functional interpreter can already draw this kind of parallelism out of a program. And yes, there are compiled functional languages with parallelizing compilers. (Erlang and OCaml come to mind)

  7. Re:Nope. by Durandal64 · · Score: 3, Informative

    Multi-threaded programming is very difficult. But some things just shouldn't be done on multiple threads either. Multi-threading is a trade-off to get (generally) better efficiency and performance in exchange for vastly more complex control logic in many cases. This greater complexity means that the program is much more difficult to debug and maintain. Sometimes multi-threading a program is just a matter of replacing a function call with a call to pthread_create(...). But sometimes a program just can't be multi-threaded without introducing unacceptable complexity. A lot of the complaints of difficulty in multi-threading comes from people trying to multi-thread programs that can't be easily changed.

  8. Re:A different approach to parallel programming by jlarocco · · Score: 4, Informative

    I've often wondered if parallel programming would be easier if it were done in Chinese characters instead of English/European alphabetical characters.

    Perhaps having a Chinese character represent a simple block of pre-compiled code that does one simple thing. Then the characters could be placed in two-dimensional order to form parallel threads. This would require a completely different approach to compiler development. But that would be OK because compilers are stuck in the 1970s anyway.

    Maybe I'm a "bit thick", but that doesn't make any sense to me. It's an interesting idea, but I just don't see how it'd help parrallelize things. At the very least, it seems to be solving the wrong problem.

    The biggest problem right now is that it's really hard to split most tasks into parts that can be performed at the same time. Once a parrallel algorithm is devised, it's relatively easy to write a program that performs the task in parrallel.

    Also, I don't know what you mean about compilers being stuck in the 70s. There have been massive improvements to compilers in the last 40 years.

    Just getting away from the idea of having code based on a very limited set of alphanumeric characters strung together like beads on a string might help unlock a whole new era of innovative approaches to parallel program development strategies.

    But programming doesn't work like that. Individual characters in a programming language are almost irrelevant.

  9. Re:our brains aren't wired to think in parallel by mwvdlee · · Score: 2, Informative

    You can eat and talk at the same time?
    Didn't your mother ever teach you not to?

    On a serious note; how conscious are you when you talk? Do you consciously trigger all required muscle movement or does you brain queue a "say 'word'" command which is then processed unconsciously? Probably neither, but the latter is likely closer to reality. There is actually VERY little that we do consciously.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
  10. Re:Nope. by Novus · · Score: 2, Informative

    Asynchronous message passing removes the deadlock problem
    Could you elaborate on your reasoning behind that? You can just as easily have processes deadlocked waiting for messages from each other as for semaphores or locks to be released (and I've seen hundreds of examples as a TA on a concurrent programming course).
  11. High performance multiprocessing by Cafe+Alpha · · Score: 2, Informative

    I've worked on a high performance multiprocessor system.

    There are a few principles and tools that make much of it manageable, and then there's some parts that are inherently nearly impossible.

    A few principles:

    1. Task switches are very very time consuming, at least under preemptive multitasking, and especially when that preemption (as it does in Windows) involves process switches. Since mutexes rely on task switches, they're very expensive, although exact implementation matters too.

    2. It's possible to come up with non-blocking algorithms that require no mutexes (though you do need low level access to interlocked instructions, or barriers). Unfortunately that's where it gets nearly impossible because proving that a non-blocking algorithm is correct requires a combinatorial analysis. I could go into the some of the principles of non-blocking algorithms, but they're still very hard even when you know what you're doing. Still it's useful to understand some basic interlocked instructions because people often use mutexes when a simple interlocked instruction would do the same task infinitely quicker.

    3. Having a few basic non-blocking queue algorithms (LIFO and FIFO) up your sleeve can suffice to allow rather separate tasks to communicate cheaply. There are even nonblocking queues that allow an unlimited number of processors to compete for each end of the queue simultaneously, and so you can have common banks of work to be balanced between processors, for instance.

    4. It's often a win to give very separate tasks to each processor, and have a single process per processor. Put all IO on one processor, for instance and have it communicate with the other processors through non-blocking queues.

    5. Even though fair preemptive multitasking is expensive (because of the task switches), in special cases you can use cooperative multitasking (fibers, green threading) and save the time. One advantage to cooperative multitasking is that you can control exactly when a fiber wakes up. In some benchmarks I've seen (producer-consumer), green threading outraces preemptive multitasking by over an order of magnitude.

    6. If you can all tie all interrupts to a single processor (which is possible in Windows) and lock specific processes to each processor, then you can (almost) guarantee that certain processors won't interrupted, and thus those processors are the ones that are cheapest to use for any task that the other processors will have to wait for.

    7. Aggregate. There are some tasks that can not be accomplished in parallel with the usual work. It's useful to save up a bunch of work like that so that you don't have to shut down the other tasks too often.

    8. State machines with atomic arcs. This gets closer to the harder sort of algorithms, but its useful to talk about what's easy. If you need a bunch of processors to cooperate, it's useful to have a state machine description of the global state. If the state can be stored in a single double word, then transitions between states can be accomplished atomically with a CAS instruction (compare and swap), and then it doesn't matter which processor changes the state. They may all try, and one will succeed. If one of the states is too complicated to fit in a double word, then you can either have a spin-lock intermediate state or in some cases, the state double word can include a pointer (but this gets complicated).

    9. Cache delays. Multiprocessor systems have to keep the caches coherent. Because of this situations like a CPU reading a value that's dirty in another processor's cache can bring all of the processors to a halt while they transfer data. I'm a few years out of date on what the newest processors do, but it's probably still true, and still true that these pauses are much longer on systems like the Sparc that allow hundreds of processors than it is in x86 machines that are more limited in the number of processors. In any case it's a win to keep data memory use separate between processors when possible.

  12. Re:Nope. by Anonymous Coward · · Score: 1, Informative

    Sure, it's not "parallel programming" but this discussion has clearly degenerated into a anti-threading discussion which is so common on Slashdot. The GP simply stated that it is hard to justify anything other than a single-thread for consumer software, and I'm pointing out that clearly isn't the case.

  13. Re:Not justifyable by smallfries · · Score: 2, Informative

    Given that it is an example of an embarrasingly parallel algorithm, like ray-tracing. No, it is not the best example. The difficulty in parallel programming is trying to divide up the work when your sub-problems are not entirely independent of each other.

    --
    Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  14. Lack of skill in the field by node159 · · Score: 5, Informative

    I've seen it over and over in the industry, there is a distinct lack of parallel programming skill and knowledge, and even senior developers with years under their belt struggle with the fundamentals.

    Parallel programming does add a layer of complexity and its inherent lack of general solutions does make abstracting its complexity away difficult, but I suspect that the biggest issue is the lack of training of the work force. It isn't something you can pick up easily without a steep learning curve with many hard lessons in it, definitely not something that can be incorporated as a new thing to be learnt on the fly with deadlines looming.
    Another aspect is that its fundamental to the design, parallelism can and often will dictate the design and if the software architects are not actively designing for it or are not experienced enough to ensure that it remains a viable future option, future attempts to parallelise can be difficult at best.

    Ultimately the key issues are:
    * Lack of skill and training in the work force (including the understanding that there are no general solutions)
    * Lack of mature development tool to easy the development process (debugging tools especially)

    --
    GPLv2: I want my rights, I want my phone call! DRM: What use is a phone call, if you are unable to speak?
  15. Re:Nope. by stony3k · · Score: 3, Informative

    The truth is languages such as C/C++ and Java(to lesser extent) are not good languages to write parallel code in.
    To a certain extent Java is trying to correct that with the java.util.concurrent library. Unfortunately, not many people seem to have started using it yet.
    --
    Freedom is not worth having if it does not include the freedom to make mistakes. - Mahatma Gandhi
  16. Re: Nope. by Dolda2000 · · Score: 3, Informative
    It is definitely hard to justify parallel programming, even though many computers are gaining SMP capabilities. The thing isn't necessarily that it is particularly hard to write multi-threaded applications -- the thing is that it is a lot harder to write a multi-threaded program than to write a single-threaded program. Suddenly, you have to introduce locks in all shared data structures and ensure proper locking in all parts of the program. That kind of thing just adds a significant part to the complexity of a program, and it requires a lot more testing as well. Therefore, justification is definitely needed.

    The real question then, is: Is it justified? To be honest, for most programs, the answer is no. Most interactive programs have a CPU-time/real-time ratio of a lot less than 1% during their lifetime (and very likely far less than 10% during normal, active use), so any difference brought by parallelizing them won't even be noticed. Other programs, like compilers, don't need to be parallelized, since you can just run "make -j8" to use all of your 8 cores at once. I would also believe that there are indeed certain programs that are rather hard to parallelize, like games. I haven't written a game in a quite a long time now, and I don't know the advances that the industry has made as of late, but a game engine's step cycle usually involves a lot of small steps, where the execution of the next depends on the result of the previous one. You can't even coherently draw a scene before you know that the state of all game objects has been calculated in full. Not that I'm saying that it isn't parallelizable, but I would think it is, indeed, rather hard.

    So where does that leave us? I, for one, don't really see a great segment of programs that really need parallelizing. There may be a few interactive programs, like movie editors, where the program logic is heavy enough for it to warrant a separate UI thread to maintain interactive responsiveness, but I'd argue that segment is rather small. A single CPU core is often fast enough not to warrant parellelizing even many CPU-heavy programs. There definitely is a category of programs that do benefit from parellelization (e.g. database engines which serve multiple clients), but they are often parellelized already. For everyone else, there just isn't incentive enough.

  17. Re:I blame the tools by ensignyu · · Score: 3, Informative

    FYI, that programming model is called futures.

  18. Multics by Mybrid · · Score: 3, Informative
    Multics was the precursor to Unix. Multics was a massively parallel operating system. It was designed to be an OS for a public utility. The thinking back then was all Americans would have computer terminals like a vt100 and plug into a municipal computer over a network.

    Multics was designed for long lived processes. Short lived processes are something we take for granted today but wasn't assumed back then. Today we assume that the sequence is we open a program, perform a task, close the program. Microsoft Outlook, for example, relies on Outlook being closed for its queue when to purge email that's been deleted. Programs are not designed to be up years-on-end. In fact, Microsoft reboots their OS every month with Windows Update. I've often speculated that the reboot is often requested not because the patch requires it but because Microsoft knows that its OS needs rebooted, often.

    Why? Why wouldn't one just leave every application you've ever opened, opened?

    The reason is that programmers cannot reliably write long running process code. Programs crashed all the time in Multics. Something Multics wasn't very good at handling back in the 1960s. There was some research done and it was observed that programmers could write code for short lived processes fairly well but not long lived.

    So, one lessoned learn from from the failure Multics is that programmers do not write reliable, long running code.

    Parallel processing is a processing better suited to long running processes. Since humans are not good at writing long running processes it makes sense then that parallel processing is rare. The innovation to deal with this sticky dilemma was the client-server model. Only the server needs to be up for long periods of time. The clients can and should perform short lived tasks and only the server needs to be reliably programmed to run 24/7. Consequently you see servers have clusters, RAID storage, SAN storage and other parallel engineering and clients do not. In some sense, Windows is the terminal they were thinking of back in the Multics days. The twist is that given humans are not very good at writing long running processes then the client OS, Windows, is designed around short lived processes. Open, perform task, close. I leave Firefox open for more than a couple of days and it is using 300MB of memory and slowing my machine down. I close and reopen Firefox daily.

    Threads didn't exist in the computing word until Virtual Memory with it's program isolation came to be. What happened before threads? What happened before threads is that programmers were in a shared, non-isolated environment. Where Multics gave isolation to every program, Unix just recognizes two users: root and everyone else. Before Virtual Memory, this meant that all user programs could easily step on each other and programs could bring each down. Which happened a lot.

    Virtual Memory is one of the greatest computing advances because it specifically deals with a short coming in human nature. Humans are not very good at programming memory directly, i.e. pointers.

    It wasn't very long after VM came out that threads were invented to allow intra-application memory sharing. Here's the irony though. There still as no advancement in getting humans to perform reliable programming. Humans today are still not very good at programming memory directly, even with monitors, semaphores and other OS helpers.

    When I was in my graduate OS class the question was raised then of "when do you invoke a thread?" given you probably shouldn't to avoid instability.

    The answer was to think about threads then as "light weight processes". The teaching was that given this a thread was appropriate for:

    1. IO blocked requests.

      Have one thread per IO device like keyboard, mouse, monitor, etc. There should be one thread dedicated to CPU only and the CPU thread controls all the IO threads. The IO threads should be given the simple task of servicing requests on behalf of the CPU thread.

    2. Performance

      Onl

  19. Re:Nope. by poopdeville · · Score: 3, Informative

    Well, Lisp doesn't look like XML. But its s-expressions are able to deal with tree like structures just as easily. If you have some free time, I suggest reading http://www.defmacro.org/ramblings/lisp.html.

    --
    After all, I am strangely colored.
  20. Time-to-market issue by anars · · Score: 2, Informative

    Pardon me if this point has already been made.

    What's the most important thing in high-performance computing nowadays; is it super-optimized low-level parallel programming or is it just plain parallel programming? I'm talking about any high-level framework which allows for easy multi-threading in applications which need it. A framework that will not only heighten developer productivity, but also utilize the additional cores in a computer system with a minimal effort. Wouldn't that be better than not doing it at all? The C-omega language http://research.microsoft.com/Comega/ by Microsoft Research Labs does exactly this.

    I have to agree, though, that the problem here is us - the programmers. The implementation of the language can only as smart as it is, and potential side-effects in terms of data dependencies etc. will definately occour. I'm not aware of any languages which implement a degree of Data Flow Analysis http://en.wikipedia.org/wiki/Data_flow_analysis, which allows the compiler to alert of such "stupidities" propagated by us--or perhaps even optimize them for the better.

    As several of you pointed out, matrix calculations can easily be distributed onto n number of cores, computers, or whatnot without any problems. I can't mention any real-life application where fast matrix calculations could save lives, but hey..