Slashdot Mirror


What Makes Parallel Programming Difficult?

An anonymous reader writes "Intel's Aater Suleman writes about why parallel programming is difficult. ... I was unaware ... that a major challenge in multi-threaded programming lies in optimizing parallel programs not just getting them to run. His analysis is insightful and the case study is very enlightening if you are unfamiliar with parallel code debugging. "

196 comments

  1. Easy! by Anonymous Coward · · Score: 4, Funny

    Motherboards rarely have parallel ports these days!

    1. Re:Easy! by sgbett · · Score: 5, Funny

      is that things sometimes happen in the wrong order.

      --
      Invaders must die
    2. Re:Easy! by sgbett · · Score: 5, Funny

      The problem with parallel programming

      --
      Invaders must die
    3. Re:Easy! by sydneyfong · · Score: 0

      Q)Why did the multithreaded chicken cross the road?
      A)to To other side. get the
      Q)Why did the multithreaded chicken cross the road?
      A)other to side. To the get

      --
      Don't quote me on this.
    4. Re:Easy! by mybecq · · Score: 1

      Difficult Parallel Makes Programming What?

      (Prior message was optimized for concurrent throughput).

    5. Re:Easy! by Anonymous Coward · · Score: 0

      is that things sometimes happen in the wrong order.

      Not really. The problem is knowing what is and what isn't worth spreading over multiple cores.

    6. Re:Easy! by BadPirate · · Score: 2

      Yes it is happen when that terrible's.

      --
      - Holy crap, I've got MOD points! Who thought that was a good idea.
    7. Re:Easy! by Acius · · Score: 2

      - The nazi grammar parallel

      No combination above gramatically words possible of the is correct. You 's drop the should.

      --
      Acius the unfamous
    8. Re:Easy! by blair1q · · Score: 1

      That's a different problem, related to the fact that people buy "parallel" but forget to read the part where it says "synchronize". It's kind of like not validating input, or ignoring return values. Not so much a pitfall as a failure to have basic skills.

      The problem these folks are having is that they bought parallel but decided it just wasn't optimizing their stuff enough. It'd be really funny if they found they could optimize-away the parallelization, like, by precomputing certain results and using a lookup, or something. That's how the oldbies did it.

    9. Re:Easy! by oztiks · · Score: 1

      Because its just as difficult as parallel parking?

    10. Re:Easy! by TheCount22 · · Score: 1

      Vector clocks can prevent causality violations in concurrent systems.

    11. Re:Easy! by Anonymous Coward · · Score: 0

      The problem with parallel programming wrong order.

      I agree, also thread safety is occasionally overlooked while using libraries.

    12. Re:Easy! by Jane+Q.+Public · · Score: 2

      I have to disagree on this. Synching multi-threaded programs is often VERY far from trivial, and takes much more than "basic skills". TFA made that very point; often there are hidden dependencies that may not be obvious even to the developer who originally wrote the program. If it only took "basic" skills then we would have a lot more multi-threaded programs these days.

      I ran into this problem myself recently: I wrote a program that interfaced with websites via http, and the problem was that while it was actively engaged in its main loop, it ignored any input from the UI (so the STOP button would not work, for example). I actually had to put the main processing in a separate thread so that a user could click "STOP" and abort the process (which could take a long time).

      Funny thing was: in order for the whole thing to work, the second thread had to alter a variable that was available by the first thread... so it had to break the 'rules' to work.

      It would be nice if Cocoa had the equivalent of the old VB "GetEvents" method, which polled the queue for any pending interface events. That would have solved that problem without involving multiple threads.

    13. Re:Easy! by Anonymous Coward · · Score: 0

      That's because Javascript doesn't have proper semaphores.

    14. Re:Easy! by antdude · · Score: 1

      Same with serial, PS/2 (mine only had one and using PS/2+USB adapters doesn't work well -- too many pauses and sometimes pause), PCI (not Express), etc. :(

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    15. Re:Easy! by Anonymous Coward · · Score: 0

      Javascript doesn't need proper semaphores because all functions in Javascript are guaranteed to be atomic. No actual concurrency (within the scripts on a single page) exists. If you set a timeout (m, 0), it doesn't start executing m immediately - m cannot begin execution until the script engine has exited the block that it is currently executing and it is once more "idle" (waiting for a key, click, or other event).

  2. I think they made a movie about this... by thelenm · · Score: 1

    "You're just not thinking fourth-dimensionally!"

    "Right, right, I have a real problem with that."

    --
    Use Ctrl-C instead of ESC in Vim!
    1. Re:I think they made a movie about this... by dch24 · · Score: 1

      McFly! But really, the first and second movies were way better.

      If you've read the article, you'll feel a little smarter today. Even if it was just a good review.

      If you haven't heard of Amdahl's Law, pay attention. (Since simple.wikipedia.org doesn't have a nice, short explanation, how about a car analogy?)

      Amdahl's Law
      You're street racing. But there's a school zone in the middle of the course, and you always slow down for the school.

      No matter how fast you go on the rest of the course, that school zone in the middle will just end up taking a larger and larger percent of your lap time, until it alone is what is keeping up your lap time.

      Applying the analogy to parallel computing: after a certain number of cores, more cores doesn't make the problem go faster. Something else (probably many things) will eat up so much of the total time that it won't improve things to double, quadruple, or add a hundred more cores.

    2. Re:I think they made a movie about this... by Anonymous Coward · · Score: 0

      Shouldn't we be looking at how manufacturing industries have solved those problems? Aren't they doing "parallel manufacturing" all the time?

      Let's pick a "random" item like, say, the iPhone. It doesn't matter if one part is built-in inside another, it's just yet another parallel process somewhere.
      - glass back
      - glass front
      - LCD panel
      - capacitive touch overlay
      - CPU
      - RAM
      - wireless ICs
      - various resistors and capacitors
      - connectors
      - buttons
      - PCB
      - all the software (iOS, drivers, APIs, Mail, Safari, iTunes, Maps, etc)

      If a single part is missing, the whole process gets delayed but some parts are not dependent on others. It's not important if the LCD isn't ready at the PCB assembly stage, but it is important at the final assembly stage.

      Looking at the problem like a race with a single car is not going to help you understand it.

    3. Re:I think they made a movie about this... by ildon · · Score: 1

      Your understanding of Amdahl's Law isn't quite correct. I guess if you want to use the street race analogy, it's more like you each have your own lane for part of the race, but for another part of the race you have to go single file into a single lane, and if you're parallel to someone as you approach the choke point you have to arbitrarily let them ahead of you or pass them based on some set of rules that may not necessarily be fair. And the goal isn't to win the race but you're all delivering something and the goal is to move the maximum amount of the goods as fast as possible.

      You know what, a street racing analogy doesn't work at all for Amdahl's Law.

    4. Re:I think they made a movie about this... by adonoman · · Score: 1

      Amdahl's law is saying that no matter how much paralization you get, you are still going to be limited by the longest non-paralizable path. This is no different than your real life example - your longest non-paralizable path (say CPU->PCB assembly->ROM flashing->final assembly->packaging) is going to limit how fast you can manufacture your iPhones. It doesn't matter how fast you can manufacture glass backs, or RAM, if it's not on that critical path, you won't be speeding up your result at all.

    5. Re:I think they made a movie about this... by jd · · Score: 1

      Well of course you'd have a problem if using reals. For four dimensions, you need to typecast to quaternions.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    6. Re:I think they made a movie about this... by Opportunist · · Score: 1

      Not quite, Amdahl's Law says that no matter how well you optimize your parallel tasks, the one that you cannot parallelize will gobble up your time. I don't think there's a good car analogy for that.

      Ask your project manager. He's dealing with that problem on a daily base.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    7. Re:I think they made a movie about this... by dch24 · · Score: 1

      Good point, but with my post (GGP) I really was hoping to spark this kind of discussion. Maybe manufacturers will find solutions to programming problems, you never know...

    8. Re:I think they made a movie about this... by dch24 · · Score: 1

      Amdahl's law is just about speedups. It doesn't imply parallel operations at all. I know the wikipedia page says it's about parallel processes. It's not.

    9. Re:I think they made a movie about this... by dch24 · · Score: 1

      I am my project manager, you insensitive clod. :-)

      People are like cars. The faster you drive them, the more hot air you get out their rear end.

    10. Re:I think they made a movie about this... by readin · · Score: 1

      A couple of problems with the analogy:
      1. In manufacturing, the idea is to do the exact same thing a jillion times with the exact same result. Interchangeable parts make different rates of a production easier to deal with. In computing this isn't the case. The assembly line make be identical each of the jillion times, but the data going through it is not.
      2. Manufacturing plants are expensive to build. It makes sense for an engineer to spend weeks or even months optimizing the process. We don't have that luxury for most of the code we write.

      --
      I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.
    11. Re:I think they made a movie about this... by Anonymous Coward · · Score: 0

      > I don't think there's a good car analogy for that.
      Sure there is, compare amount of gas && # of cylandars in the car to the amount of force generated.
      At some point just adding more cylandars isn't going to give you more power relative to the amount of gas consumed (2cc vs 16cc engine).

    12. Re:I think they made a movie about this... by Opportunist · · Score: 1

      While true this isn't due to the problem of not being able to parallelize more tasks. If you want to use the "add more doesn't mean more gets created" metric, use the myth of adding more people to a project to speed it up. Even aside of training and other overhead, you cannot simply parallelize everything in software creation. Let's do a simple example.

      There's two tasks that can be ran parallel. An administrative task (like meetings and organizing crap) and the programming task of creating the actual program. They're dependent on each other, i.e. nothing's complete until both are done, but they can be ran parallel to each other. The admin task takes 2 time units, the programming task takes 10. Now, all overhead aside, let's assume that the programming task is actually 10 tasks each taking one time unit. In other words, you could use 5 programmers, each of them taking 2 time units to complete two of the 10 tasks, which would parallelize nicely with the 2 time unit admin task. Adding more programmers and parallelizing more programming tasks does not allow you to complete the overall job faster since you would have to wait for the admin task to complete.

      Good PMs know where to put their resources to get the best results and parallelize what they can. Bad ones will try to parallelize everything and waste a lot of resources, leading to dead hours while waiting for the ones that cannot be parallelized to complete.

      It's amazing how many bad PMs exist...

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    13. Re:I think they made a movie about this... by ildon · · Score: 1

      I didn't look at the wikipedia page. This is from memory from the parallel programming class I took. Either way the main point is that your analogy was poor.

    14. Re:I think they made a movie about this... by Anthony+Mouse · · Score: 2

      Importantly, comparisons to factories are generally unwise because manufacturing is 'embarrassingly parallel' -- it doesn't matter if you have to make the whole iPhone in one single step that takes two straight hours and you can't divide it into any constituent part. You can still manufacture as many iPhones a day as you like because you just set up however many manufacturing units which each make one iPhone every two hours, until you have the desired output capacity.

      The 'hard' problems to parallelize are the ones that work like building a house: You can't lay the foundation until you grade the terrain. You can't put in the walls until you lay the foundation. You can't install the electrical until there are walls. And so on. There is no known way to convert an unimproved lot and a truck full of concrete and drywall into into a house in less than 5 minutes, no matter how many construction workers you have available.

    15. Re:I think they made a movie about this... by Jane+Q.+Public · · Score: 1

      "I don't think there's a good car analogy for that."

      Sure there is.

      You have yourself a passel of old Dodge Chargers (painted orange for some weird reason), on a REALLY wide road, all tied together through the windows with some good ol' Southern one-inch hemp.

      Whichever car has the wrong mix of gasoline to moonshine will drag the whole pack down. To carry the analogy further: if it has real problems it could crash the whole lot.

      :o) There is ALWAYS a car analogy.

    16. Re:I think they made a movie about this... by rufty_tufty · · Score: 1

      "Manufacturing plants are expensive to build. It makes sense for an engineer to spend weeks or even months optimizing the process. We don't have that luxury for most of the code we write."
      Really, code should be reusable. Code should be modular. Code should be written to be maintained and with the assumption that it will be in use for decades. At least that's good code.
      I admit that there's always glue code, shall we call it mortar code that may be written quickly and thrown away, but for the guts/bricks of the code why not write it well?
      What kind of code do you write that you expect it all to be temporary?

      --
      "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
    17. Re:I think they made a movie about this... by readin · · Score: 1

      How much brick code do you think gets written compared to how much glue code gets written? By definition, if the brick code is written well and can be re-used, it only gets written once. But specialized code for a particular client doesn't see much re-use. If it weren't for the need for the specialized code, frameworks like Spring and EJB wouldn't get much use. But each time those frameworks are used for specialized code represents an instance of code that isn't likely to see much re-use. And given that the user-base for such systems aren't always large, the client usually isn't willing to pay for a small number of features in code that is re-usable (but quickly obsolete - still using client server instead of a web app? Got web services in there?) compared to a large number of features in code that is pretty good.

      --
      I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.
    18. Re:I think they made a movie about this... by rufty_tufty · · Score: 1

      I would hope hardly any glue code is written, ideally anyway, but your comment that "We don't have that luxury for most of the code we write." made me think that you were saying that most of what you write is glue code.
      I guess I just don't get it because I am a hardware engineer and the code I write 99% of the time is inherently parallel because it is hardware, you have to go out of your way to serialise it. To me at the low level almost all algorithms are parallelizeable because thar's my job to implement algorithms in hardware. At a high level most things seem to be parallelizable because why can't you draw your window at the same time as downloading the jpegs at the same time as working out how you will render your table at the same time as playing your background music etc.
      Now I understand that it is quicker to write serial C code and most languages and frameworks are built around the C methodology so serial is the quick way to write things and that's what you do with the quick bits of glue code but I don't understand why you'd do that as the main part of the software architecture. Or am I wrong in assuming most code is architected before it is implemented?

      --
      "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
    19. Re:I think they made a movie about this... by readin · · Score: 1

      How did you get this job where most of your code is parallel? That sounds more interesting than what I'm doing. Where do I find some jobs like it?

      --
      I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.
    20. Re:I think they made a movie about this... by rufty_tufty · · Score: 1

      I'm an ASIC engineer, rather not say the name of the company, however here is a name of companies that employ lots of people who do this:
      Intel, Marvel*, Qualcomm*, Broadcom*, Imagination Technologies*, Nortel, Synopsys, Cadence, Cisco, Xilinx. Look up the languages of Verilog and VHDL. Take a look at some of the projects on opencores.org. Some good starter stuff on http://www.fpga4fun.com/ too.
      Any company doing ASIC design will need lots of software engineers who can work with this parallel hardware too. I'd love to say more but I'd be breaking NDAs. Have a look at any of the companies I've marked with a * above and look at some of their chip architectures, many of them have dozens of different processors/hardware blocks all needing to run at the same time - and that's just the control software, the underlying hardware on every company mentioned above is written in wither vhdl or verilog and each chip is millions of lines of code producing parallel operation.

      --
      "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
  3. use the right tool by Anonymous Coward · · Score: 0

    Erlang (and other functional, single-static-assignment languages) are perfect for parallel programming. Sure, you can do parallel programming in C or Ruby or Python, but you need to be very careful about side effects. And the compiler doesn't know enough, so you need to manage all that shit yourself. Analogy time: If you were on the prowl for some vagina, you could try a woman, a ladyboy, or a man. Which one would you choose?

    1. Re:use the right tool by emt377 · · Score: 1

      Erlang (and other functional, single-static-assignment languages) are perfect for parallel programming.

      In that case you can restate the original question as "how do I implement an Erlang compiler and runtime?" If, indeed, Erlang embodies the most efficient way to solve all parallell programming problems. Which of course is absurd. Erlang is useless for implementing servers, kernels, runtimes, networking stacks, file systems, device drivers, VM systems, database servers, or anything else that actually makes a computer tick. It's simply an interface to an already-implemented computing system. It's good to excellent for some problem dmains, but not all domains, and certainly not the computing (system) domain itself. It's one solution to a particular computing domain problem. How well it performs depends on how it's implemented, which brings us back to the original question: how to learn to design and write MT-hot code on SMP hardware.

    2. Re:use the right tool by gnalre · · Score: 1

      I Which of course is absurd. Erlang is useless for implementing servers, kernels, runtimes, networking stacks, file systems, device drivers, VM systems, database servers, or anything else that actually makes a computer tick.

      When I first read this I assumed the poster was being sarcastic, but reading again they actually believe it. Erlang useless for implementing servers? Erlang is used to implement loads of servers. You look at the back end of the a lot the the top 100 company web services and you will find them using Erlang to implement their server functionality. Erlang is used to provide database systems too.

      The idea that Erlang is some sort of toy academic language that is not used for anything practical is a joke. Erlang came out of a real industrial requirement to allow telecom switches to support millions of processes while remaining up 24/7. From that it has grown to be used in all sort of real world applications. Some could even suggest that its purity has been compromised in order to meet the real world requirements.

      The problem with parallel programming is not that its difficult but the tools that are used to implement it. I have used two languages that have made parallel programming a doddle. The first is Occam and the second is Erlang. They both implement parrallel programming the same way, and that is avoid shared memory and use message passing. Once you do that things get far simpler.

      --
      Choose your allies carefully, it is highly unlikely you will be held accountable for the actions of your enemies
    3. Re:use the right tool by St.Creed · · Score: 1

      Occam + transputer boards was a brilliant idea. Didn't work out so well in practice but that wasn't due to Occam. It's about the most elegant language I've ever seen.

      However, writing an Occam compiler would run into the same sort of issues as the article describes, I fear. Unless CPU-cores take some of the transputer lessons to heart.

      --
      Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)
    4. Re:use the right tool by rufty_tufty · · Score: 1

      I have often wondered why we can't treat CPU cores like we treat memory. Always assume there will be enough* and if there isn't then have things like the time sharing algorithms to make up the slack like we have virtual memory algorithms. As opposed to the moment where the entire mindset is one or at best a few processes. It's as if we're back in the says of a few KB of memory but we've somehow managed to implement GUIs and abstracted many layed software because we've got a really good at running the virtual memory...

      *ok I'd be worried at a programmer who assumed there would always be enough memory to solve any problem and implement any algorithm no matter how sloppy, but you get my drift.

      --
      "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
    5. Re:use the right tool by badkarmadayaccount · · Score: 1

      Context switch overhead.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
  4. Multithreading some problems is (probably) hard by betterunixthanunix · · Score: 1

    There is a class of problems, the P complete problems, which are (probably) inherently hard to multithread in any advantageous way, and this class includes some pretty important real-world problems:

    http://en.wikipedia.org/wiki/P-complete

    --
    Palm trees and 8
    1. Re:Multithreading some problems is (probably) hard by TaoPhoenix · · Score: 1

      I'll reply to you.

      You're a bit smarter than me but I think you're saying there are lots of easy tasks that are easy to run on a single linear code thread but are hard to split and recombine with any less loss of time / resources.

      --
      My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
    2. Re:Multithreading some problems is (probably) hard by blair1q · · Score: 1

      he's saying there are lots of hard problems that are easy to parallelize, and also lots of hard problems that are hard to parallelize

      there are probably some easy problems that are hard to parallelize but since they're easy nobody even thinks to bother

      parallelization is itself an optimization, so when you start talking about optimizing after you've parallelized you've just gone back to the original problem

      maybe what's needed is an architecture that is structured to handle P-complete problems, rather than single- or parallel-threaded problems

    3. Re:Multithreading some problems is (probably) hard by Cryacin · · Score: 1

      Yes, but, how fast do you really need to run bejewelled deluxe?!?

      (ducks)

      --
      Science advances one funeral at a time- Max Planck
  5. PLC programmers have been doing this for years... by superkurty · · Score: 1

    PLC code for automated mechanical systems runs into all of the same timing issues etc. Obviously the PLC code is much smaller but this is mostly due to the instructions acting on smaller data sets, but can still be just as complicated,.

  6. unaware? WTF? by Nadaka · · Score: 1

    I am kinda curious how anyone even tangentially involved in programming could not be aware that the problem with writing parallel programming was doing it for a gain in efficiency. Making a thread or process is generally just a couple lines of code, synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism.

    1. Re:unaware? WTF? by Mongoose+Disciple · · Score: 5, Insightful

      synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism

      And yet people constantly get these details wrong in practice.

      It's an extra layer of complexity and it introduces extra chances to make mistakes, even around areas where a programmer could know better. There's not much way around that. If people only made coding mistakes around difficult problems software would be dramatically more bulletproof than it actually is.

    2. Re:unaware? WTF? by Hooya · · Score: 1

      I tend to think of it as an extra dimension in code. With non-parallel code, the code you have (it's sequence) is the same as what it's sequence would be when run. With parallel code, the run-time sequence is different than the code as it's laid out in source.

      I see people have trouble with just async stuff (eg. AJAX) and have a hard time wrapping their mind around the fact that even though the callback function is in-sequence with the rest of the code, it's not actually called in that sequence - hence the 'callback'. Now, go full tilt with parallel code, and the heads start to spin since the code you're looking at can be run completely out of sequence of where it appears in the call graph.

      The easiest time i've had with parallel programming was with erlang.

    3. Re:unaware? WTF? by Anonymous Coward · · Score: 0

      I am kinda curious how anyone even tangentially involved in programming could not be aware that the problem with writing parallel programming was doing it for a gain in efficiency. Making a thread or process is generally just a couple lines of code, synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism.

      These problems have definitely not been solved in the general case - they have been solved for a subset of parallel programming problems.

    4. Re:unaware? WTF? by Nadaka · · Score: 1

      I blame that mostly on the languages not being explicit about what operations and methods are or are not thread safe. And for capable programmers those errors generally only occur when you try to make things more efficient by avoiding excess mutex and data duplication in the pursuit of efficiency.

    5. Re:unaware? WTF? by Anonymous Coward · · Score: 1

      synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism

      And yet people constantly get these details wrong in practice.

      It's "Simon Says" programing. You can't just say "raise your right hand", you constantly have to remember to say "Simon says raise your right hand".

      As anyone who has played the game can attest, in the midst of keeping track of all the outlandish directions it's easy to forget the distinction between "raise your right hand" and "Simon says raise your right hand". The human brain can only handle so much at once. Tacking on an additional things to keep track of just makes things harder.

    6. Re:unaware? WTF? by Anonymous Coward · · Score: 1

      I think you misunderstand what was said.

      A bad programmer (hint: most programmers are bad programmers) can just as easily write unreliable software with only 1 thread. In this sense, it's understandable that many programmers can't write stable multi-threaded programs. On the other hand, a disciplined and experienced software engineer who understands synchronization can churn out lots of code that runs safely in multiple threads. Sure, they will be scratching their heads on the occasional race condition, but that's just analogous to forgetting to zero out a dangling pointer, or whatever; it may be trickier to reproduce and therefore diagnose than some bugs, but it's otherwise just like any other bug.

      The real problem with parallel programming that I see (and I think this is what the commenter your're replying to is getting at with the first sentence or so) is that it can be hard to make the parallelism give you an actual speedup. You can look at a problem and say, "Ah ha! I am CPU bound! More cores will alleviate this!" Then it turns out after implementation to not be much gain, or your work queues and synchronization become a bottleneck, or whatever. Doing it safely is not a huge problem. Doing it well is.

    7. Re:unaware? WTF? by johanatan · · Score: 1

      That's the same way I like to think of 'higher-order programming' (with or without threads)-- an extra degree of freedom.

    8. Re:unaware? WTF? by Anonymous Coward · · Score: 0

      I blame that mostly on the languages not being explicit about what operations and methods are or are not thread safe.

      And that is why we are still doing parallel programming in Fortran 95. Any good F95 compiler would have caught the bug mentioned in the program. The language standard only allows constructs and procedures/subroutines inside parallel constructs that are explicitly declared "thread save" (elemental/pure). And the criteria for that are well defined and get checked at compile time.
      Saved my ass countless times.

    9. Re:unaware? WTF? by dgatwood · · Score: 1

      I see people have trouble with just async stuff (e.g. AJAX) and have a hard time wrapping their mind around the fact that even though the callback function is in-sequence with the rest of the code, it's not actually called in that sequence - hence the 'callback'.

      In my experience, the reason people have so much trouble with async stuff is that every single JavaScript API I've seen that does things asynchronously is rather fundamentally designed *wrong*. They don't provide any clean mechanism for passing data structures with variables from the calling function into the callback short of constructing a helper function on the fly. Since most people have a hard time wrapping their head around the concept of creating anonymous functions, many programmers end up using global variables. This works until things happen out of order or they need to use that code in parallel. Then, things break miserably, and the entire mechanism falls flat.

      Also, JavaScript overuses callbacks. If you can't guarantee that other things won't happen in the DOM tree before the callback fires anyway, it would be a much saner design to simply declare that "this synchronous call may block, and while this call is blocked, other activity may occur". The only thing callbacks do in JavaScript is force you to do a crapton of extra work to pass local variables into the callback. This callback-heavy design causes a lot of extra work for developers, while providing exactly zero benefit. It's not like JavaScript is raw code running in a native thread or anything—the main program loop isn't JavaScript code; it's part of the interpreter—so it's almost exactly as easy for the interpreter to save off the function state and restore it transparently as it is for that interpreter to handle a return statement and trigger a callback at a particular point.

      Compared with JavaScript, parallel programming using sane languages and sane APIs is trivial. Contrast the JavaScript custom function kludge with the much cleaner block syntax used by Apple, or even plain old C callbacks with refCon/userData parameters, and it's easy to see why people hate writing asynchronous code in JavaScript. It's not that writing asynchronous code is inherently hard, but rather that writing asynchronous code in a language that goes out of its way to make it hard, and using an API set that similarly goes out of its way to make it hard is inherently hard.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    10. Re:unaware? WTF? by jd · · Score: 1

      Yes they have. It's called Critical Path Analysis.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    11. Re:unaware? WTF? by Anonymous Coward · · Score: 0

      Javascript 1.7 has generator functions, which can "yield" (return) in the middle of a function and resume at the same spot. This can be used to build coroutines, although it's still not entirely natural since the caller has to know how to deal with generators. Also JS 1.7 is currently only supported by Firefox.

    12. Re:unaware? WTF? by blair1q · · Score: 1

      And yet people constantly get these details wrong in practice.

      If they can't get those right innately, then they never even started out to do parallel programming, they just stuck a fork in their execution flow.

    13. Re:unaware? WTF? by IICV · · Score: 1

      I am kinda curious how anyone even tangentially involved in programming could not be aware that the problem with writing parallel programming was doing it for a gain in efficiency. Making a thread or process is generally just a couple lines of code, synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism.

      Actually, interestingly enough, parallelizing code not only results in computational efficiency gains, but it frequently results in energy efficiency gains as well.

      Why? Because if instead of running one CPU at 2 GHz you can run two CPUs at 1 GHz for the same amount of time, you can cut the voltage used per CPU roughly in half (though it depends on the CPU, of course) - and if you ignore losses due to leakage, power consumption scales quadratically with respect to voltage. This means that if you cut the voltage used in the CPU in half you cut your total energy consumption by 1/4th.

      This means that things like multicore smartphones aren't necessarily that bad of an idea - if you can parallelize your tasks well enough, you can keep both cores clocked low enough that you actually see energy efficiency gains.

    14. Re:unaware? WTF? by CastrTroy · · Score: 2

      I took a course in parallel programming. We worked with MPI in C if anyone is curious or interested. The hardest part was completely changing the way you thought about programming. It was like the first time you looked at prolog. Or the first time you tried a functional language like scheme. If you've only ever done procedural, it can be quite a big deal to switch. Also, it's quite a bit harder to debug multithreaded code, as the exact order of the instructions is different everytime you run it. Tracking down a deadlock and other problems that only exist in parallel programs can be quite difficult. Anybody who thinks thread syncronization is a solved problem has never written a complex multithreaded application, or is a programming genius. Most likely the former.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    15. Re:unaware? WTF? by steelfood · · Score: 1

      The key to parallel programming is compartmentalization. With a good enough foundation that compartmentalizes properly, parallel programming would only be a matter of coding the compartments and synchronizing their results.

      That having been said, parallel programming these days solves a fairly niche problem. The speed of modern processors make faking parallelism with interrupts viable. The only area where parallelism truly makes sense is when working with extremely large amounts of large chunks of data.

      For example, in gaming, there is a need for the rendering, texturing, lighting, and physics engine to work at 100% all at once in addition to the UI and AI (the latter two could possibly be satisfied with interrupts, but only up to a certain point). In weather forecasting, there's a need to independently calculate the output of a large amount of locations in order to generate the input for the next group of results.

      What's difficult is trying to parallelize every problem regardless of the nature of the problem, especially just for the marketing department to throw the buzzword into the promotional materials. And what makes things difficult are generic schedulers who would do it for you even if you didn't want it. But a good programmer should be able to compartmentalize the problem, and then determine based on that whether parallelization is appropriate.

      --
      "If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
    16. Re:unaware? WTF? by AustinAlert · · Score: 1

      This article deals with this exact point. Multicore can save energy if program can be parallelized. http://www.futurechips.org/chip-design-for-all/a-multicore-save-energy.html

    17. Re:unaware? WTF? by Anonymous Coward · · Score: 0

      Now, go full tilt with parallel code, and the heads start to spin since the code you're looking at can be run completely out of sequence of where it appears in the call graph.

      Out of sequence related to point of observation that is data. By freely associating it could be said that the world of (parallel) programming needs its theories of relativity popularized.

    18. Re:unaware? WTF? by Anonymous Coward · · Score: 0

      Actually, interestingly enough, parallelizing code not only results in computational efficiency gains, but it frequently results in energy efficiency gains as well.

      Why? Because if instead of running one CPU at 2 GHz you can run two CPUs at 1 GHz for the same amount of time, you can cut the voltage used per CPU roughly in half (though it depends on the CPU, of course) - and if you ignore losses due to leakage, power consumption scales quadratically with respect to voltage. This means that if you cut the voltage used in the CPU in half you cut your total energy consumption by 1/4th.

      This means that things like multicore smartphones aren't necessarily that bad of an idea - if you can parallelize your tasks well enough, you can keep both cores clocked low enough that you actually see energy efficiency gains.

      BATTERY LIFE.

      There, I said it. Sheesh, I don't know how you managed to dance around it for so long without actually saying it.

  7. Moral of the story... by kvvbassboy · · Score: 2
    learn2efficientparallelalgorithms.

    Good tutorial for someone who wants to jump into some parallel programming, but it's mostly Operating Systems 101 (or 601).

    Honestly though, if you have not optimized your algorithm or code to for parallelism and you want to do it now, you might probably be better off writing the whole thing from scratch, and the tutorial explains why very nicely/.

    1. Re:Moral of the story... by betterunixthanunix · · Score: 1

      Assuming that there is a good way to take advantage of multiple processors for the particular problem your code solves. For a course project once, we found ourselves trying to solve a linear program, and someone suggested using the research cluster to speed things up. As it turns out, linear programming problems are P-complete, and there is no known way to make meaningful use of multiple cores (it is very likely that no such method even exists, and that the problem is inherently sequential).

      --
      Palm trees and 8
    2. Re:Moral of the story... by Anonymous Coward · · Score: 0

      Even with inherently sequential problems, you can use parallelization to pipeline the solution and increase throughput ( as opposed to latency.) Very rarely are computers used to perform a task just once, so this works nicely.

  8. Depends on the Problem... by Anonymous Coward · · Score: 1

    Not all tasks are suited for parallel programming. Some are and can be highly optimized, but if you have something that simply can't be reduced further then parallel programming will not solve your problems.

    1. Re:Depends on the Problem... by Anonymous Coward · · Score: 0

      Yep. For example, a ladyboy has a mouth and an ass, so "she" can please two cocks at once. A real woman also a vagina, so she can please three cocks at once. But a man only has one cock, so he can only please one orifice at a time.

    2. Re:Depends on the Problem... by Anonymous Coward · · Score: 0

      It would have been a much nicer example if he had gained any performance from his multi-threading. Getting a negligible speedup for all that effort and 4 cores doesn't a compelling case make. The missing analysis, of course, is that the task is memory bandwidth limited, and so the 4 cores are sitting there waiting for their requested cache lines most of the time. Doing that analysis *before* investing in multi-threading is pretty important.

    3. Re:Depends on the Problem... by Bengie · · Score: 1

      I've seen one example of a threaded serial task. Intel has HyperThread optimization PDF some where showing some interesting tricks using HT. The example they had was on an i7, so fairly recent. They had a serial task that iterated through an array. They loaded an extra thread that synced with the primary thread, ran it on the other virtual thread, and all it did was call the prefetch instruction on the array.

      Even though they had a modern architecture with advanced prefetching and linear memory access, using the prefetch command on a separate thread on the same core increased the performance by 30% over the working thread doing the prefetching. The biggest issue with this approach is making sure the second thread doesn't get too far ahead of the working thread as too much prefteching will start to cause cache evictions and make it slower.

      Corner case, I know, but still interesting.

      I would assume something simple like this would do the above. quick napkin code, mind any errors

      WorkerThread:
      volatile int i;
      for(i = 0; i y) max = y;

      for(int i = x; i max; i++)
      prefetch(array[i]);

      sleep(1);
      }

      no locking involved, and the preftech thread never goes more than 1024 elements ahead.

      This is kind of a bad example in that there is little computational work being done and 'i' would have to be volatile because if the value of i was stored in only a register during the loop, the other thread would never see the increments. But if gives the idea, being only pseudo-code.

    4. Re:Depends on the Problem... by Bengie · · Score: 1

      uhhggs.. ignore the code.. something happened and some of the code that previewed didn't show up.

  9. Obvious department of obviousness by johnwbyrd · · Score: 0

    Jebus in a sidecar, Slashdot; do you think we've all never heard of parallel programming before? OpenMP has been around for ten years now. Yeah, you can't write a for loop in C and expect magically parallelization. Seriously, is there ANYONE here who doesn't know this already?

    If you're trying to dumb down your article content to the point where it annoys any vaguely experienced programmers... well, you're succeeding.

    1. Re:Obvious department of obviousness by David+Greene · · Score: 1

      Yeah, you can't write a for loop in C and expect magically parallelization.

      Sure you can, if you're not using pointers or are using restrict-qualified pointers and you have a good enough compiler.

      --

  10. What? by YodasEvilTwin · · Score: 1

    This hasn't been news for 30+ years. Parallel programming is hard for a multitude of well-known technical reasons, this is nothing new. Another major reason is that the human brain is no good at parallel programming; again, nothing new. I say let's either try to solve the problems or move on.

  11. A broader problem that also lacks formalization by amn108 · · Score: 1

    In my opinion, many problems with software development, are just as applicable in other domains of our life, and parallel programming is definitely one of them. We equally well have problems managing large teams of people working in parallel. These are problems of logistics, management and also (and no I am not joking) - cooking. And we're as bad handling these as we now handle software development. It may be right, however, to start solving this with the computers - no need to throw away rotten food and/or burned-out employees under delusional management.

    What could help is a trusted set of formal theories. That would be a start. Whether it will be a mathematician, a computer programmer, or a kitchen chef that writes this set, is not important. Stripped bare of the bathwater - abstracted and proven - the baby will be what we need.

    1. Re:A broader problem that also lacks formalization by betterunixthanunix · · Score: 1

      There has been quite a bit of work on formalizing parallel computing. NP problems are exactly that: problems that can be solved efficiently on a computer that can explore an unbounded number of solution paths in parallel. There is also the NC hierarchy, which can be thought of as problems that can be solved efficiently on a sequential computer and "much more efficiently" on a parallel computer (that is, polynomial time on a sequential computer, and polylogarithmic time on a parallel computer with a polynomial bound on the number of processors).

      --
      Palm trees and 8
    2. Re:A broader problem that also lacks formalization by Anonymous Coward · · Score: 0

      I've often thought that a good approach to parallel programming would be a language that represented something like a PERT chart. You spell out the resource dependencies, the language enforces them (statically checks that your resources are not used incompatibly in two places that you have declared parallelizable), and then the compiler spits out code for n-processors. You would also need a way to spell out a pipelined operation. Bonus points for being able to hint at (or, better, infer) critical path.

      Obviously, you can do all these things with thread primitives, but somehow the syntax doesn't match up with the PERT graph in my head. Also, there is nothing forcing you to write things in a thread safe way. Really, that static checking is key for reliability.

      If I had the ability to finish any project I ever started, I would try to create such a language, but I'm too much of a slacker. Probably easier to get my head around a functional language.

    3. Re:A broader problem that also lacks formalization by amn108 · · Score: 1

      That's a monumental and exceedingly involving task - to create such a novel concept of a language, partly because few have been written, and even fewer are usable to mere mortals, who happen to be using imperative languages just fine, mind you. Articles upon articles full of either terms-within-terms or lack there-of, have been written. Some people once set upon defining this new computing upon which you seem to touch and created www.tunes.org, which has since stagnated.

      I would advise you to write down your ideas and always fragment the problems that you identify and/or abstract them to a point where you can again spot the beginning of a formal theory that you can publush. That way you will be nearing the solution you often cannot see with predictable, verifiable and not least visually available and memorable steps. Also, that way, even if you don't come within a reach of the final ultimate goal which is a working language, you will have developed a whole set of usable laws and theories that can help others in their quest for the same goal. If you know you can't reach the goal, at least be a gentleman and pave a good road for those that find your path later.

    4. Re:A broader problem that also lacks formalization by Anonymous Coward · · Score: 0

      Thanks for the encouragement. I tend to get started on lots of projects, figure out the big ideas of how they should work, and then lose interest when the details start. Not so much the "devil in the details" kinds of things, those are fun, just the plain old boring ones. I suspect a lot of people fall into that category. Anyway, I guess putting my thoughts down on paper as I go keeps it from being wasted effort.

      I'm not even sure that my concept is so novel - I haven't worked as a programmer for several years and I don't really have any idea what else is out there. I guess there are really two interrelated parts of the idea - the "scheduling" aspect of it, and the "enforcement" aspect. Maybe one or the other is somewhat resolved.

      The prevalence of heisenbugs in parallel programming does indicate that something to check for potential conflicts in your code is lacking in use. I generally detest interpreted languages because so many bugs crop up because of things like misspelled variable and function names that simple declarations would prevent. It's easy enough to find and fix these, but I always have a nagging suspicion that I missed one. Likewise, a compiler should be able to say "hey moron, you're using that same global over there in that other routine you think can be run in parallel."

      As far as the scheduling aspect, my life since being a programmer has at least partially involved managing network cable installations. I tended to make a lot of charts in the vein of PERT and GANTT charts, but of my own design. If you can keep a large group of people all simultaneously working toward the same goal, why wouldn't the same approach work for a bunch of CPU's? Each task on the PERT chart would require detailed instructions in a traditional (probably imperative) form, so I don't think it would be a big brain-bender in the way a lot of languages are. The trick is keeping a clear and concise expression of the graph structure while not getting lost in a forest of semaphores and whatnot.

      Ok...enough rambling ...

  12. "What Makes Parallel Programming Difficult?" by Anonymous Coward · · Score: 2, Insightful

    That you have to do everything all at once. How would you tell 50 kids to sort 50 toy cars? How would you tell 50 footballers to line up by height all at once? How would you have 50 editors edit the 50 pages of a screenplay all at once so that it makes sense from a continuity perspective? All these problems are very easy, but become very hard when you have to do it all at once...

    1. Re:"What Makes Parallel Programming Difficult?" by Short+Circuit · · Score: 1

      The ultimate solution to most of those appears to be, "have to do it fifty times, and assign one job to a person".

      That said, not all problems are so easily dealt with as bulk operations. Perhaps you have some real need to take one item and spit the work up among multiple people. The real driving force for all the scenarios I can think of is a desire or need for an upper bound on job latency.

      I do this kind of work as my day job. I've also got some experience in managing groups of coders (at work, not particularly via the site linked to in my sig). The problems are remarkably related. You know that job you had where there were meetings held once a day to synchronize between multiple groups? You know how you hated that those meetings waste your time? That's the kind of overhead that parallel programmers weigh and try to avoid when writing their code.

      Who knows? Perhaps "lockless" algorithms will find some particular use in real-world management problems.

    2. Re:"What Makes Parallel Programming Difficult?" by Anonymous Coward · · Score: 0

      >Perhaps "lockless" algorithms will find some particular use in real-world management problems.

      No, the secretary is the lock, the thing that makes sure that the hotel you're fucking your mistress in is not the same one you're spending your wedding anniversary with your wife at. There will never be a "lockless" time when you can fuck all three (mistress, wife, secretary) at once.

    3. Re:"What Makes Parallel Programming Difficult?" by Nadaka · · Score: 1

      You can if they swing that way.

    4. Re:"What Makes Parallel Programming Difficult?" by jbolden · · Score: 1

      How would you tell 50 footballers to line up by height all at once?

      Easy break into n subgroups have them line up by height and then perform pairwise merges.

    5. Re:"What Makes Parallel Programming Difficult?" by Anonymous Coward · · Score: 0

      Good luck explaining that to footballers! Reiterating my point of what makes parallel programming difficult...

  13. Re:PLC programmers have been doing this for years. by Anonymous Coward · · Score: 0

    Are you kidding?

    A PLC runs the same code over and over again in an endless loop. This is a language that does not even have _ANY_ sort of mutexes, semaphores, or threads. Hell, most PLCs don't even have local variables, or even function arguments!

    Many PLCs let you configure interrupt handlers, but there is no way to synchronize them with the main loop.

  14. Lame by oldhack · · Score: 3, Interesting

    I bet there are hundreds of simple tutorials on concurrent processing on the web, and these two pieces bring nothing new/interesting.

    --
    Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
    1. Re:Lame by Anonymous Coward · · Score: 0

      Even more, I didn't see a single mention of FPGA programming, a naturally concurrent environment. Perhaps they should look a little closer to the metal to find tried solutions to bring back to high level.

  15. a fix by gargeug · · Score: 1

    Well, we just learned about this in a graduate comp arch course and yeah, it can get hairy. Especially if using a processor consistency model as opposed to the sequential consistency model. The easiest fix is to throw up some fence instructions around the interdependent code sections to force sequential consistency, and then lay some flags as a means of time signaling between the processors. This eliminates the randomness that the author discussed in one of the examples.

  16. Re:PLC programmers have been doing this for years. by _0xd0ad · · Score: 0

    Mutexes are just bits that are turned on or off. Semaphores are integers. Local variables are memory locations that aren't used by the rest of the program (or the rest of the program may use them freely and their value need not be retained from scan to scan). Function arguments are just memory locations that are written by the rest of the program before calling the subroutine, which reads them and uses their values.

    The only of those that I actually haven't used in a PLC program are semaphores.

  17. Chapel by Warlord88 · · Score: 2

    I think the language Chapel being developed by Cray is taking concrete steps to make Parallel Programming easier. It is very similar in syntax to C++ and even supports some higher level constructs.

    This looks like an advertisement for Chapel but I have no relation to Cray. Having taken a graduate parallel programming course, I cannot agree more with the statement that "Parallel Programming is difficult". I struggled a lot with pthreads and MPI before doing the final assignment in Chapel which was a pleasant surprise. The difference between serial and parallel code was FIVE characters only - and it gave a near linear speedup on three different machines.

  18. Re:PLC programmers have been doing this for years. by _0xd0ad · · Score: 0

    No wait - I take that back; I've used semaphores too.

  19. Parallelisation of serial code is the problem by GreatDrok · · Score: 1

    20 years ago when I was working with transputers we use Occam. It was a very pure parallel programming language and it wasn't too difficult. However, writing parallel code meant starting from scratch (getting rid of the dusty decks of old algorithms as my professor described it). However, this never really happened and we've ended up with primitive parallelisation nailed on to sequential code. The are many parallel architectures, SIMD, MIMD, distributed memory, shared memory and combinations of them all and none of the languages currently available really suit. You have basic multithreading and MPI bolted on to old sequential languages and developers trying to take algorithms which absolutely must proceed in order and trying to find sections they can perform in parallel. This isn't going to get you good performance or scalability. In Occam, we wrote procedures which were independent of each other and communicated over channels and all would be running at once. You wired them together and poured you data in at one end and results came out at the other. Due to the very fine grained parallelism it was extremely scalable - I hade code running on 80+ transputers at over 90% efficiency in 1990 and it would run happily on many more if I had them, and yet it also ran perfectly on one since serialisation of parallel programs is easy whereas parallelising serial code is very hard and inefficient. I find it sad that the current state of parallel programming is still so far behind where we were 20 years ago due to all this legacy stuff. Even when people start out with new algorithms, they still start with serial processes and then try and parallelise rather than considering the parallelism from the outset and designing the algorithm so it doesn't require everything in memory and doesn't produce different results when number of CPUs changes.

    --
    "I have the attention span of a strobe lit goldfish, please get to the point quickly!"
    1. Re:Parallelisation of serial code is the problem by jd · · Score: 1

      Occam still exists. KROC supports the latest version of Occam (Occam-Pi), which supports mobile processes, generics and other concepts acquired from the developments in serial programming and computer clusters.

      I consider it to be one of the finest languages out there for learning parallel programming and consider that most of the modern failures in the field are a result of people not knowing it and therefore not knowing the fundamentals. You can't run if you can't walk.

      The transputer was the ultimate in computer cluster technology - a serial line is all it took. No switches, no routers, no expensive hardware, just raw mesh topology. It was a very sad day when it was sold to ST Electronics. It still exists, but is used only in multimedia appliances these days. A bit of a comedown for a chip that threatened to crush Cray.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:Parallelisation of serial code is the problem by lucian1900 · · Score: 1

      Erlang has most of the observable properties of Occam, so it's not all lost.

      The problem is that people still insist on using mutable shared state in concurrent programs. If only they stopped doing silly things like that...

    3. Re:Parallelisation of serial code is the problem by gnalre · · Score: 1

      Erlang has most of the observable properties of Occam, so it's not all lost.

      The problem is that people still insist on using mutable shared state in concurrent programs. If only they stopped doing silly things like that...

      Erlang has most of the observable properties of Occam, so it's not all lost.

      The problem is that people still insist on using mutable shared state in concurrent programs. If only they stopped doing silly things like that...

      Right on, brother

      --
      Choose your allies carefully, it is highly unlikely you will be held accountable for the actions of your enemies
  20. Re:multiple processors by TaoPhoenix · · Score: 1

    I'll take a stab and reply to you that "8 processors was enough for anyone", in the sense that multiplexing 8 programs is just insane. Better to just run 8 prorams each on their own core, and use some progs that can use 4 cores at a time. That leaves 4 free.

    (Overly simpistic) I agree, but 1028 cores is not the answer. We need the next generation in raw core power to move computing forward. 8 killer cores will beat 1024 mediocre cores.

    --
    My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
  21. Re:30 years by TaoPhoenix · · Score: 1

    "The news of the problems of parallelization is still news after 30 years"
    - paraphrase of Christian saying

    --
    My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
  22. C.A.R. Hoare by Anonymous Coward · · Score: 0

    Isn't this what Hoare formally described in "Communicating Sequential Processes"?

  23. As a parallel computing programmer by Anonymous Coward · · Score: 0

    ...I have found that the strategy chosen for coordinating activity among and communicating between running bits of software on a parallel system usually causes most of the problems. Some blessed lead guy or (god forbid) manager will declare, "We are going to use ____" [threads | json over web services | CORBA | EJB | enterprise service bus | ...] and the project is stuck with that paradigm come hell or high water. Believe it or not, all parallel computing problems are not created equal. And it is often not until an engineering team has spent time puzzling out the data and process interactions before the best way to set up the solution becomes clear. Parallel systems magnify the liabilities of poor and/or rigid technology choices.

    Also, the herd mentality of layering stacks of frameworks atop other stacks of virtual machines is a recipe for disappointment. For truly performance-critical applications you want to physicalize, not virtualize. You want to stop treating a cluster of boxes as a general-purpose computing resources and start treating it as a custom-crafted parallel appliance tailored for your problem domain. You want to get your knuckles dirty with hardware characteristics and then tailor your logic and data structures such that they can be pinned to NUMA nodes (for example) and utilize the caches and locally-attached RAM for the highest benefit.

    Less dependence upon stacks of frameworks, more understanding your problem and tailoring and hardware and software solution to fit it more efficiently.

  24. Efficiency is hard by tlhIngan · · Score: 4, Informative

    Using locks and the like make it very easy to do multithreaded and parallel programs.

    The big problem comes when you need multiple locks because you find your program is waiting more on locks than anything else which is gumming up the whole works, and that can easily lead to deadlocks and other fun stuff.

    Another way is to consider lockless algorithms, which don't have such blocking mechanisms. However, then you get into issues where atomicity isn't quite so atomic thanks to memory queues and re-ordering done in the modern CPU, and thus have to start adding memory barriers before doing your atomic exchanges.

    Raymond Chen (of Microsoft) did a nice write up of the lockfree ways to do things and what Windows provides to accomplish them.

    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/05/10149783.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/06/10150261.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/06/10150262.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/07/10150728.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/08/10151159.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/08/10151258.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/12/10152296.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/13/10152929.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/14/10153633.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/15/10154245.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/19/10155452.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/20/10156014.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/21/10156539.aspx
    http://blogs.msdn.com/b/oldnewthing/archive/2011/04/22/10156894.aspx

    1. Re:Efficiency is hard by lucian1900 · · Score: 1

      The problem is using mutable shared state. If you change one of those properties (mutable private state and immutable shared state), writing correct and efficient programs suddenly becomes much easier.

    2. Re:Efficiency is hard by Prune · · Score: 2

      Worse, typical synchronization primitives such as in pthreads and Windows are optimized not for speed but for error handling etc. It is easy to beat their speed with custom implementations, sometimes with dramatic speed increases (see for example numerous articles at http://locklessinc.com/articles/ [locklessinc.com] and I've implemented some of the mutexes and ticket locks and my Windows/Linux software has become faster as a result). In addition, using lock-free algorithms whenever possible can provide a further performance enhancement (various algorithms and optimizations at www.1024cores.net), but along with implementing those goes an assumption of understanding of memory consistency semantics (acquire, release, consume, weak ordering, strong consistency, etc.) in order to minimize implied or explicit memory barriers, as well as thread local storage tricks in order to minimize data dependencies, the latter becoming all the more important as the number of cores increases and memory necessarily becomes less uniform in access times. Add to this the need to use cache-aware and/or cache-oblivious algorithms, and it is very clear that optimization in a modern hardware environment is anything but simple, and moreover cannot be anything but simple!

      --
      "Politicians and diapers must be changed often, and for the same reason."
    3. Re:Efficiency is hard by blair1q · · Score: 1

      is anything but simple, and moreover cannot be anything but simple!

      parse error

    4. Re:Efficiency is hard by ShakaUVM · · Score: 1

      >>Using locks and the like make it very easy to do multithreaded and parallel programs.

      Eh, the Dining Philosophers would like to ask you out to eat. Locks can introduce timing issues which can result in a program locking up at random.

      The real difficulty of parallel programming comes from two things (speaking as someone who has a Master's in the subject):
      1) The development environment isn't as well supported as single-threaded coding.
      2) It requires a different mindset to write code solidly. Remember how it blew your mind the first time you learned how to write recursive functions? Parallel programming is like that.

      There's a lot of clever hacks to exploit parallelism in code, but in general, learning how to decompose a program into parallel bits isn't really the hard part.

    5. Re:Efficiency is hard by Tacvek · · Score: 1

      It also does not help that threads are very easy to mess up. Look at all the traditional programs that used multiple threads for nonparallel concurrency, and how much trouble the developers have with deadlocks, or forgetting to use locks on a shared variable access, or calling code intended only to be used by the other thread, etc.

      So even if you have the ideal parallel version of the algorithm all planned out, actually implementing it correctly can still be problematic.

      --
      Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
    6. Re:Efficiency is hard by Prune · · Score: 1

      Talk about convicted on a technicality *rolleyes*

      --
      "Politicians and diapers must be changed often, and for the same reason."
  25. Now what? How to actually monitor and debug? by aharth · · Score: 1

    Parallel programming is hard. Bummer. I'd rather be interested in an article that talks about *monitoring and debugging* parallel programs (I currently struggle to monitor parallel algorithms implemented in Java). Anybody?

  26. The answer is... by drb226 · · Score: 1

    that most of today's popular programming languages do not accommodate higher-level forms of expression required for easy parallelism. Declarative languages have a slight edge at being able to express where sequential dependencies are.

  27. Also... by drb226 · · Score: 1

    keep dreaming http://www.flowlang.net/

  28. Functional programming by Logic+and+Reason · · Score: 1

    One more reason why functional programming matters. Many programs become trivial to parallelize when you avoid mutation and side-effects outside of limited, carefully-controlled contexts.

    It's truly a joy when you can parallelize your code by changing a single character (from "map" to "pmap"). There's usually a little more to it than that, but you almost never have to deal with explicit locks or synchronization.

    1. Re:Functional programming by Anonymous Coward · · Score: 0

      Agree completely, functional programming is a key enabling paradigm for the parallel world. I think thats why so many .NET developers are stoked with the PLINQ extensions... it just makes life so easy.

    2. Re:Functional programming by jbolden · · Score: 1

      Agreed. Learning to isolate side effects is one of the best things a programmer will get from FP.

    3. Re:Functional programming by Aater+Suleman · · Score: 1

      I agree with that but they should only be used after you know whats underneath the hood. If we "teach" parallel programming that way then the programmers will never understand the underlying challenges. It will become all magic to them like JAVA is. See my post on this very topic: http://www.futurechips.org/thoughts-for-researchers/csee-professors-graduates-understand-computers.html

    4. Re:Functional programming by jbolden · · Score: 1

      JAVA doesn't produce great programmers IMHO. But Intel is going to need people with lower level skills and C, assembly, machine, forth is just too much to ask for a general curriculum. Assume they are going to have to pick that up on the job, IMHO.

    5. Re:Functional programming by emt377 · · Score: 1

      One more reason why functional programming matters. Many programs become trivial to parallelize when you avoid mutation and side-effects outside of limited, carefully-controlled contexts.

      More specifically, the problem of parallel programming is the problem of structuring state for concurrent access. Everything else (the mechanics of locks etc) is trivial and mainly a typing exercise.

      The reason this is difficult is that it's an optimization; normally when we optimize code we write the naive version first (or near-naive), then optimize it. We might change a hash table to a vector, memcpy() to a few lines of inline assembler for a particular target, hand code CRC/checksumming/RC4 in assembler, use a freelist instead of malloc for some particular data structure, etc. At some point the gains no longer outweigh the effort and we're done. With MT code however, you can't do this, because the very fact that it's going to run concurrently is going to affect the most high-level design. It needs to be made concurrent from the outset; there is no going back later and making just some critical path MT-hot. Since concurrency requires changes at the highest level, existing software is essentially going to need to be completely taken apart and put back together again. More or less a rewrite. And my experience is this frequently fails (programmers who can do this to large, complex bodies of software don't grow on trees). This is why it's hard - it's an optimization that needs to be done before there's anything to optimize. It affects pretty much every aspect of factoring. Not more complex necessarily, sometimes it's just different from what one might do at first glance.

      In fact, writing MT-friendly software is a lot like writing portable software. It's really tough to do a good job on existing code (try porting a Windows program to OS X for instance, even though 90% of the code base might have nothing to do with Windows per se, it just uses Windows APIs to get work done). But factor out the platform-specific parts, like the presentation from the outset with an intent to one day port it, and it's perfectly doable. In a first iteration it might only have a UI for a single platform (a "sparse implementation") much like the first round of MT-hot code might run with a concurrency of 1 (because parts are stubbed or just skeletons, or implement the trivial case of a concurrency of one).

  29. Side-effects by Anonymous Coward · · Score: 0

    Side-effects are the pantomime villain of parallel programming.

    This has been known since the late fifties.

    The fact nobody has given a shit until recently is not surprising, but I don't have much sympathy either.

  30. Poor choice of language by jbolden · · Score: 1

    What makes parallel programming hard is poor languages. Languages that allow state changes and don't keep them isolated. Isolate changes of state, all changes of state and be careful about what kinds of combinators to use. Google map-reduce works whenever

    a) You can organize your data into an array
    b) You can do computations on array element in isolation
    c) You can do computations on the entire array via. associate operations pairwise

    And most programs do meet those properties but they slip in all sorts of subtle state changes so out of order execution isn't possible. The key to parallelism is better language selection.

    1. Re:Poor choice of language by Aater+Suleman · · Score: 1

      Thanks for reading my article. Awesome point! actually I really wanted to point out in my article that Google Map-Reduce is just what you need for the histogram kernels (not sure if you that post of mine: http://www.futurechips.org/tips-for-power-coders/writing-optimizing-parallel-programs-complete.html). There are many problems that fit, but I would say that there many that don't. Travelling sales man problem, scheduling problems, and branch-and-bound in general don't fall in that category. Databases is another one.

    2. Re:Poor choice of language by russotto · · Score: 1

      There's a reason Mapreduce is said to operate on "embarrassingly parallel" problems. There are a lot of them. But there are also a lot of problems which are not embarrassingly parallel; for instance, they have nonlocal data dependencies.

    3. Re:Poor choice of language by jbolden · · Score: 1

      Nonlocal data dependencies are not a problem. Map reduce is fantastic for distributing to clients. Each client performs a mini map reduce and then the results are collected.

    4. Re:Poor choice of language by jbolden · · Score: 1

      No I hadn't read your futurechips article. Though I agree with what you wrote. But frankly the parallelism is obvious its only by using C you are making things complex:

      a) construct a function that takes an array of char and returns a count hash
      b) construct a function that takes two count hashes and adds them to produce a count hash
      c) construct a function that splits an array of char evenly into n pieces

      array input -> (c) -> each piece goes to (a) -> (b) invoked to combine

      Why make it any more complex than that?

    5. Re:Poor choice of language by Paradigma11 · · Score: 1

      here is a nice lecture on this topic: http://channel9.msdn.com/Shows/Going+Deep/C9-Lectures-Dr-Ralf-Lmmel-Going-Bananas

      Just don't nest folds because you are to lazy to make higher order functions of them. The most efficient obfuscation mechanism i have ever seen ;)

    6. Re:Poor choice of language by jbolden · · Score: 1
  31. Parallel programming *NOT* widely needed by ljw1004 · · Score: 3, Interesting

    The dirty secret of parallel programming is that it's *NOT* so widely needed. I think a lot of academics got funding to study automatic parallelization or other parallel techniques, and they latch on to multicore as a justification for it, but it's not.

    There is only one GOOD reasons to use multithreading -- because your work is compute-bound. This typically happens on large-data applications like audio/video processing (for which you just call out to libraries that someone else has written), or else on your own large-data problems that have embarrassingly trivial parallelization: e.g.

    var results = from c in Customers.AsParallel() where c.OrderStatus="unfilfilled" select new {Name=c.Name, Cost=c.Cost};

    Here, using ParallelLINQ, it's as simple as just sticking in "AsParallel()". The commonest sort of large-data problems don't have any complicated scheduling.

    There are also BAD reasons why people have used multithreading, particularly to deal with long-latency operations like network requests. But this is a BAD reason, and you shouldn't use multithreading for it. There are better alternatives, as shown by the Async feature in F#/VB/C# which I worked on, which was also copied into Javascript with Google's traceur compiler). e.g.

    Task task1 = (new WebClient()).DownloadStringTaskAsync("http://a.com");
    Task task2 = (new WebClient()).DownloadStringTaskASync("http://b.com");
    Task winner = await Task.WhenAny(task1,task2);
    string result = await winner;

    Here it kicks off two tasks in parallel. But they are cooperatively multitasked on the same main thread at the "await" points. Therefore there is *NO* issue about race conditions; *NO* need to use semaphores/mutexes/condition-variables. The potential for unwanted interleaving is dramatically reduced.

    So in the end, who are the people who still need to develop multithreaded algorithms? There are very few. I think they're just the people who write high-performance multithreaded libraries.

    1. Re:Parallel programming *NOT* widely needed by blair1q · · Score: 1

      The dirty secret of parallel programming is that it's *NOT* so widely needed.

      That's kind of begging the question, there.

      Those who need it know they need it. Those who think it's neato-keen and want to play with it try to come up with ways to use that that are maybe not obvious, and for which it is maybe not even necessary. I've known about this at least since the first Thinking Machines came out and our school got one of the bigger ones, solved the problem they used in the grant proposal that paid for it in about a week, then realized they had a multi-million-dollar computer with nothing to do. Our best misuses of it involved animating the activity lights (TM's had some of the best blinkenlights ever put on a real-world box) and abusing the built-in inter-core routing algorithms to solve our problems, not so much pegging the performance meter on the actual mass of parallel cores.

      I'm sure this still happens. I'm also sure that any computer you can buy at Best Buy this year will have 2 or 4 cores, and 99% of the people using it will have 0% of a clue how to leverage that, though the OS now has a better idea, and is running a metric assload of independent processes that need to spread around, so such a machine isn't a total waste.

    2. Re:Parallel programming *NOT* widely needed by Aater+Suleman · · Score: 1

      I kind of agree and disagree with you guys at the same time. Parallel programming is becoming inevitable if you want your code to run faster because the "free lunch" of Intel making it faster --by burning insane power-- is power. Code that is at acceptable performance will stay single threaded, but code needing performance will have to multi-threaded. Adobe multi-thredaded there code, even image processing stuff like ImageMagick did it. Graphics has been doing it for a while. Its just that motivation is very clear, and I feel that competition will take programmers to this unfriendly land even if they dont like it.

    3. Re:Parallel programming *NOT* widely needed by Xyrus · · Score: 2

      The dirty secret of parallel programming is that it's *NOT* so widely needed. I think a lot of academics got funding to study automatic parallelization or other parallel techniques, and they latch on to multicore as a justification for it, but it's not.

      It is widely needed, but perhaps not by you.

      There is only one GOOD reasons to use multithreading -- because your work is compute-bound.

      Ok, now you're confusing the issue. Are you talking about parallel programming or multi-threaded programming? Parallel programming is larger in scope the simple multi-threaded programming.

      So in the end, who are the people who still need to develop multithreaded algorithms? There are very few. I think they're just the people who write high-performance multithreaded libraries.

      No, there are quite a few and many of them make quite a lot of money to do so. Do you think the programmers at ILM, 3DS, Pixar, and NASA are just sitting around doing nothing? Does your MT algorithm library also know how to optimize for the GPU as well? Are you sure that a one size fits all approach will work for every application out there?

      To be clear, parallel programming also encompasses programming platforms like super-computers, which I assure you does not have a simple MT algorithm library to magically optimize applications. A whole new level of complexity is added with the introduction of GPUs. It gets even more complicated if you deal with non-symmetric distributions or dynamic regridding.

      The world of parallel programming is quite large with a lot of big companies and brainpower invested to maximize efficiency and utilization. I think you're greatly underestimating how important and vast the field is.

      --
      ~X~
    4. Re:Parallel programming *NOT* widely needed by dbIII · · Score: 1
      even image processing stuff like ImageMagick did it. Graphics has been doing it for a while

      Graphics is some low hanging fruit that can get a fair bit of benefit from a lot of threads. A lot of operations with graphics involve doing the same thing to a large amount of data that can easily be carved into individual chunks and dealt with in parallel. It's the same with video where the same transformation or filter gets applied independantly to thousands of frames and all you care about is getting it done quickly and put in the correct order at the end.

    5. Re:Parallel programming *NOT* widely needed by Aater+Suleman · · Score: 1

      I believe there are programs that need performance and are not graphics and movies. Databases is one. Excel functions like goal seek is another. How about web browsers? There is work going on to parallelize HTML rendering. On a side note, I do want to point out that graphics seems so regular because it was made this way by design, which is one way of finding parallelism. Gfx was constrained in a way that it became parallel, e.g., it was mandated that triangles was the only building block and only certain types of blending modes will allowed. Not surprisingly, the trend is now changing. Ten years ago, a GPU was a fixed function hardware but today a major chunk of it is programmable and new standards like DX11 are pushing that even further. It is clear that if you want better graphics, the code becomes irregular, which isn't as easy to parallelize but still needs performance. Animating movies fall in that category of graphics where parallelism isn't that easy to find ...

    6. Re:Parallel programming *NOT* widely needed by Aater+Suleman · · Score: 1

      I completely agree with Xyrus. Parallel programming is a big think and I feel that every programmer is gonna get exposed to it eventually.

    7. Re:Parallel programming *NOT* widely needed by ihavnoid · · Score: 1

      There is only one GOOD reasons to use multithreading -- because your work is compute-bound.

      ...and therefore, most of the people in the world won't need anything beyond an Intel Atom because their tasks aren't compute-bound.

      Seriously, I don't know what kind of code you write for a living, but the code I write is almost always has some portion of compute-bound submodules, even if what I do has nothing to do with video codecs or 3d or whatever field that there are convenient libraries.

    8. Re:Parallel programming *NOT* widely needed by Anonymous Coward · · Score: 0

      I think you are drastically underestimating the need for computational resources.

      Current programs have grown up in a world where the currently available processing power on the desktop is pathetic - approximately 0.0001 HBs (where 1HB = human brain ~= 10^14 operations/second approx). And many of them are very old, so of course they run within one CPU. For example Microsoft Word is decades old.

      Here are some very common tasks that need parallelism:

      1. Search. Have a look at google's white papers on the topic - you will see they use parallelism extensively. This is one of the most widely used applications.

      2. Graphics. Just about every system has a GPU that processes graphics commands in a massively parallel environment. This is tricky to program. Have you written a graphics driver lately?

      3. System builds - make -k massively speeds up compile/build. Same for regression testing.

      4. Operating systems. One CPU for MSFT, one for the virus scanner one for all the other preloaded crud and one for me. We have been doing this for decades and usually get this right now, except when we don't.

      The claim is made that the vast majority of problems are easily parallelized. Some are, some aren't. Obviously the easy things will be done first (process level, or where the data partitioning is simple). Where possible the parallelism is done at the middleware level eg the database of application server so that the application programmer can still have a largely simple serial model - programming is hard enough without adding parallel programming.

      But there are lots of things where parallelism is hard and would deliver big benefits if achieved. I am currently working on some financial analysis software that is very compute bound where it would be great to be able to run multiple simulations on the same data at one time, but it is hard to get the synchronization right. OK I can do it, but it takes time and effort. Timing problems are hard to debug.

      Physical simulations, where you have to carefully synchronize the boundaries between the data partitions. AI, where given the processing power deficit between CPUs and humans you need all the CPU power you can get - and that means parallelism and feeding results up and down the semantic hierarchies. Voice/speech recognition, real time control tasks involving sophisticated understanding of visual and other inputs etc etc.

      Note also this problem is going to get worse. The number of CPUs is growing in a Moore's law-like fashion. We will soon be in a situation where a single threaded program will only have access to 1/1000 of the system's capacity.

    9. Re:Parallel programming *NOT* widely needed by Anonymous Coward · · Score: 0

      What are you talking about? Animating movies is infinitely parallel. Stupidly parallel. Embarassing. Look at pixar or dreamworks render farms. As a 'dumb-parallel' solution, you can do each frame independently. Per-pixel-lighting is independent as the pixar algorithm shows.

    10. Re:Parallel programming *NOT* widely needed by Anonymous Coward · · Score: 0

      the Async feature in F#/VB/C# which I worked on, which was also copied into Javascript with Google's traceur compiler

      Sweet! Creating explicit continuations in node.js is a tedious chore. Node.js on traceur will be goodness. Thanks for the design work!

    11. Re:Parallel programming *NOT* widely needed by StripedCow · · Score: 1

      Cooperative multitasking is like going back to Windows 3.1. That operating system totally relied on the trick you mentioned, and as we all know, it sucked.

      Why did it suck? Obviously, for many reasons but most of all because once such a "fake thread" of execution starts taking a long time, it blocks all other threads.

      Why do you think a modern webserver uses many different threads?

      Sigh... I guess you need to gain some more experience in the parallel programming world.

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    12. Re:Parallel programming *NOT* widely needed by Anonymous Coward · · Score: 0

      You're confusing a lot of things here dude. According to your logic we'd all by fine running old Celeron or whatnots. Most modern CPUs are multi-threaded and, believe me, it's quite pathetic to see an app being dog slow on an Octocore Mac because the app's designer had the same mindset as yours.

      Parallel programming is important because we're reaching physical limits while building chips and that the trend is to add more and more cores to CPUs.

      People stuck in the past like you are creating programs only using, today, 1/8th of today's octocore CPUs. Tomorrow, people like you will be creating program only using 1/64th or 1/256th of the CPU's capability.

      By then methink your little shortsighted app will be CPU bound because you'll only be using 1/256th of the CPU.

       

    13. Re:Parallel programming *NOT* widely needed by Anonymous Coward · · Score: 0

      I think they're just the people who write high-performance multithreaded libraries.

      And this is still a very large number since the number of programs and our dependence on programs is just too much.

    14. Re:Parallel programming *NOT* widely needed by maraist · · Score: 1

      The problem is that people tend to focus on single-threaded designs for 3rd party libraries.. Then when those libraries get linked to larger libraries (or main apps) which are MT, then the whole world comes crashing down. Now you have to treat every function call of the ST-library as a critical region.

      Thus while YOU may not care about MT, you should strive to make all your code reentrant at the very least. To whatever degree this allows contention-free memory access (e.g. ZERO global variables). This future proofs your code.

      Use modern inversion-of-control programming models - don't own your dependencies, but instead require factories for the objects you actually need. You read-only value-objects where-possible, factory-IOC singletons where possible, stack-based allocation, minimal co-mingled data-structures. If you have the ability to leverage Garbage collection, do so for co-mingled structures (or maybe explicit reference counting).

      This may actually slow your library down and grow it's size slightly.. But you'll have less coupled / more modular code, which can't be a bad thing. And you'll be more likely to be MT-safe out of the box.

      Then if tomorrow you find you can parallelize some aspect (possibly by using message-passing worker threads), or that some 3rd party can do so TO your work, you'll get performance for free.

      --
      -Michael
  32. Seriously? by Anonymous Coward · · Score: 0

    What kind of imbecile posted this? Did he/she think that optimising should be simpler then just getting them to run? Besides, if you are unfamiliar with parallel code debugging, perhaps you are not the best person to judge what is enlightening and what is not. Try selling used cars instead.

  33. Honestly... by emanem · · Score: 1

    ...very simple article...was expecting something more technical!
    Anyway, cheers for the effort to write it!

    Ps. Looks like he discovered an open secret :-P

    1. Re:Honestly... by Aater+Suleman · · Score: 1

      ...very simple article...was expecting something more technical! Anyway, cheers for the effort to write it! Ps. Looks like he discovered an open secret :-P

      In my defense, I wasn't targeting parallel programming experts:-) The theme of my post was to familiarize people who don't know parallel programming with the challenges.

    2. Re:Honestly... by emanem · · Score: 1

      Fair play mate!

      I guess you won't be explaining the cache line size issue and memory alignment too, right? :-)
      As I said, thumbs up at least for the effort!

      Cheers!

      Ps. Questions for Java experts, how does the JVM (the best implementation btw) deal with cache line size and memory alignment issues? Is there any optimization in this area?

    3. Re:Honestly... by Aater+Suleman · · Score: 1

      I touched on the cache line alignment and memory a bit. Can/will talk about that if its a topic of interest. JVM's in my experience cache align almost all data structures. I have profiled some JAVA code using PIN. Perhaps that information can help you ...

  34. The problem: too much data sharing by Animats · · Score: 1

    The basic problem with parallel programming is that, in most widely used languages, all data is by default shared by all threads. C, C++, and Python all work that way. The usual bug is race conditions.

    There have been many languages for parallel programming which don't have default sharing, but they've never taken over outside some narrow niches. Partly because most of them weren't that useful outside their niche.

    The other classic problem is that in most shared-data languages with locks, the language doesn't know what the lock is protecting. So you can still code race conditions by accident.

    1. Re:The problem: too much data sharing by Aater+Suleman · · Score: 1

      The basic problem with parallel programming is that, in most widely used languages, all data is by default shared by all threads. C, C++, and Python all work that way. The usual bug is race conditions.

      There have been many languages for parallel programming which don't have default sharing, but they've never taken over outside some narrow niches. Partly because most of them weren't that useful outside their niche.

      The other classic problem is that in most shared-data languages with locks, the language doesn't know what the lock is protecting. So you can still code race conditions by accident.

      Hey, my concern is that there is lots of code where I do want to assume sharing (since its easier to think that way for some problems). MPI is kind of like what you are saying and programming in MPI is not much fine either.

    2. Re:The problem: too much data sharing by lucian1900 · · Score: 1

      The problem is the combination of mutable and shared state. If state is either immutable shared or mutable private, everything is much easier.

      Things get even more interesting with persistent immutable data structures, like Clojure's.

  35. Re:PLC programmers have been doing this for years. by superkurty · · Score: 1

    This is probably why many programmers experience such difficulty programming (efficiently) PLCs.... Lots of very large, very complicated programs are just "the same code running endlessly". The devil is in the inputs.

  36. Use a practical tool by mangu · · Score: 1

    Erlang (and other functional, single-static-assignment languages) are perfect for parallel programming.

    Okay, then the only problem is getting something useful out of Erlang.

    Back in 1985 the Japanese government announced a "fifth generation" computing project, with software to be developed in Prolog. So I went and learned Prolog, an intriguing and amusing language. Only problem is, it was totally useless for any actual application, as the Japanese found out.

    Sorry, but in order to believe any of the promises of one of those non-vonNeumann languages, I have to see a practical working application first.

    1. Re:Use a practical tool by deapbluesea · · Score: 1

      von Neuman is an architecture, the word you are looking for is Imperative languages. Those "other" languages could be functional or logical. If you really want to get down to brass tacks, it comes down to whether you want stateful programming (imperative requires state), or stateless programming (functional and logical programming).

      --
      Government is not reason; it is not eloquent; it is force. Like fire, it is a dangerous servant and a fearful master.
  37. Scala has added parallel collections and has Akka by MCRocker · · Score: 1

    The Scala community has tried to move the problem into a more practical realm by adding things like parallel collections, DSL's to abstract out the problem for specific applications and the Akka Project for simpler concurrency.

    Most of the parallel programming discussion I've seen is very complicated and not likely to appeal to those who have to do practical day-to-day business projects. By pushing the abstractions up a level, I think the Scala folks have made parallel programming more accessible for the average developer.

    Ref: Parallel Collections video.

    --
    Signatures are a waste of bandwi (buffering...)
  38. It's not. by Anonymous Coward · · Score: 0

    Seriously. Parallel programming IS NOT hard. I'm tired of people complaining about this; in 90% of cases, it's trivially easy. Like most things in programming, there are some hick ups in a small fraction of what you do, but that's just part of the game. Parallel computing isn't somehow magically more difficult than programming in general. Dear God.

    Unless of course we're talking about dealing with terribly written bolt on libraries for circa 1980s programming languages that should not be in use in the modern era. That can be hard -- but that's the fault of old technology being expected to handle new problems and a lack of investment in creating new tools in the form of system-level languages...it has little or nothing to do with the underlying challenges of parallel programming itself.

  39. Re:PLC programmers have been doing this for years. by Aater+Suleman · · Score: 1

    All hardware guys (myself included) claim that they have been doing concurrent work since long. It takes writing a parallel program to understand the challenges. The communication latency (a huge issue), non-determinism, caches, need to deal with legacy code, and the need to make it robust that makes it a much much harder problem. My goal is to make hardware guys see these challenges so I can say you hit a pet-peeve. (I am a hardware guy who has learned software over the years).

  40. GOTO by Anonymous Coward · · Score: 0

    'nuff said.

  41. Parallel programming is garbage by Billly+Gates · · Score: 1

    The problem is moving optimizations to sofware which is the most retarded thing I ever heard. Most problems are not parrallizeable (if such a word exists) and is not needed if Intel made decent processors rather than the Itanium disaster. VLIW is heavily dependable on this. Parallel Programming as Intel wants it is to have the developers make up for the shortcummings by having programmers invent tricks while they put more cache and other things on their chips. We have the corei series of processors now at least but still. Why should we learn how to do parallel programming because they lost billions in R&D and have these silly api's and a market to sell these tools.

    Advancements have been made to make that argument obsolete as putting more cache while making compilers harder to write is not making huge improvements in performance. It is still old fashioned branch predictions. So yes it is usefull in some circumstances, but in business all I care about is how fast I can get SQL out to a client/server app or to an intranet app

    I admit I have not wrote code in 5 years so I am not the best source before the professional programmers nail me here. This is just from what I see

    1. Re:Parallel programming is garbage by Aater+Suleman · · Score: 1

      Billy, I understand that parallel programming is difficult. However, it was inevitable for Intel, AMD, or IBM to do it. Why? Because as you make a single core bigger, it provides less performance for power, i.e., a 2x faster core generally burns 4x higher power. That rate is not sustainable. Proof: look at the size of the heat sink on your processors. It couldn't get any bigger. Thus, Intel had to stop making single core faster and there were two options: increase cache (which is of no use beyond a certain point) or do nothing or go multicore. They chose the last one as the other two were not possible. This makes software's life harder but this is the world we live in. My personal position is that hardware should understand software challenges and help them .. which is the point of my blog Future Chips.

    2. Re:Parallel programming is garbage by jbolden · · Score: 1

      You are absolutely right on this one. Obviously we've hit limits of CPU performance and parallel is the way to go.

      As an aside though let me point one other option which Intel was exploring in the late 80s-early 90s: break the CPU up into a series of processors each with different complementary instruction sets. Intel played around with 486/i960 combo where the 486 offered great task switching, high instruction speed, built in floating point and the i960 offered rapid vector calculation. IBM RS/6000 line was based on the same theory of design originally.

    3. Re:Parallel programming is garbage by Aater+Suleman · · Score: 1

      Agreed. In fact, that style of functional partitioning is the most power efficient approach. Btw, Intel's latest chip SandyBridge is an example of that too. There is GPU which is good at one type of work and a CPU which is good at another type of code (AMD fusion is the same way). I do feel that functional partitioning makes programming even worse though. Now not only do you need to find parallelism but also decide which task best runs where. Automating this process is long ways so until then its a tough one. You do see that happening for some obvious cases though, i.e., GPU.

    4. Re:Parallel programming is garbage by grumbel · · Score: 1

      Most problems are not parrallizeable

      That is only true in a very theoretical sense and completely wrong in practice. Almost everything that burns CPU on your computer today is easily parallelizable, your video encoder doesn't really care if the other CPU is crunching away on the next scene in the movie, most image filters work just fine when applied to only a section of an image, a game could easily do AI for each unit in parallel and your webbrowser shouldn't really care if each webpage is rendered on a separate thread either.

      The part of the code that isn't parallelizable certainly exist, but it is almost never that part of the code that is actually CPU intensive, especially not on your Average Joe's computer. The only problem with parallelization is that neither programs nor programming languages are prepared for it, so it is a lot more troublesome to add then it should be.

  42. Project Schedules is paramount by dogmasponge · · Score: 0

    I never would have time to optimize code for as long as this would take. If a tool or process works you move on. How fast the code runs is never as important as how fast the schedule moves.

  43. Re:PLC programmers have been doing this for years. by Cryacin · · Score: 1

    No, the devil's in the steering comittee.

    --
    Science advances one funeral at a time- Max Planck
  44. What makes it difficult -- in one word by Anonymous Coward · · Score: 0

    Concurrency

  45. Re:PLC programmers have been doing this for years. by deapbluesea · · Score: 1

    Mutexes and semaphores are a bit more than bits and integers. You have to be able to modify them atomically. If you don't design your PLC correctly, you end up with two paths attempting to modify the same bit at the same time. Without atomicity, you can't know which won.

    --
    Government is not reason; it is not eloquent; it is force. Like fire, it is a dangerous servant and a fearful master.
  46. Parallelism is very easy,provided you don't do it. by master_p · · Score: 1

    Parallelism is very easy, provided that you don't do it yourself.

    Use a pure functional programming language like Haskell that can be automatically parallelized.

    Or use a programming language that uses the Active Object Pattern (or the Actor model).

    Or do as I do: use C++ to implement message passing, then have one thread per object. Objects don't have public members, they simply communicate by exchanging messages.

    In all the above cases, the trick is to avoid state. In Haskell, state is avoided by design; in the Actor Model/Active Object pattern, state is avoided due to isolation: effectively, each thread is like a separate process.

  47. Re:Parallelism is very easy,provided you don't do by Aater+Suleman · · Score: 1

    I have a small issue with the message passing. Doesn't it make the barrier of entry higher for the average programmers? Many claim that IBM Cell was harder to code for than the XBOX because of this.

  48. Re:multiple processors by maraist · · Score: 4, Interesting

    While you're correct from a temporarily practical measure, I disagree in theory. OS theory 20 or more years ago was about one very simple concept.. Keeping all resources utilized. Instead of buying 20 cheap, slow full systems (at a meager $6k each), you can buy 1 $50k machine and time-share it. All your disk-IO will be maximized, all your CPUs will be maximized, network etc. Any given person is running slower, but you're saving money overall.

    If I have a single 8 core machine but it's attached to a netapp disk-array of 100 platters over a network, then the latency means that the round trip of a single-threaded program is almost guaranteed to leave platters idle. If, instead I split a problem up into multiple threads / processes (or use async-IO concepts), then each thread can schedule IO and immediately react to IO-completion, thereby turning around and requesting the next random disk block. While async-IO removes the advantage of multiple CPUs, it's MASSIVELY error-prone programming compared to blocking parallel threads/processes.

    A given configuration will have it's own practical maximum and over-saturation point. And for most disk/network sub-systems, 8 cores TODAY is sufficient. But with appropriate NUMA supported motherboards and cache coherence isolation, it's possible that a thousand-thread application-suite could leverage more than 8 cores efficiently. But I've regularly over-committed 8 core machine farms with 3 to 5 thousand threads and never had responsiveness issues (each thread group (client application) were predominantly IO bound). Here, higher numbers of CPUs allows fewer CPU transfers during rare periods of competing hot CPU sections. If I have 6 hot threads on 4 cores, the CPU context switches leach a measureable amount of user-time. But by going hyper-threading (e.g. doubling the number of context registers), we can reduce the overhead slightly.

    Now for HPC, where you have a single problem you're trying to solve quickly/cheaply - I'll admit it's hard to scale up. Cache contention KILLS performance - bringing critical region execution to near DRAM speeds. And unless you have MOESI, even non-contentious shared memory regions run at BUS speeds. You really need copy-on-write and message passing. Of course, not every problem is efficient with copy-on-write algorithms (i.e. sorting), so YMMV. But this, too was an advocation for over-committing.. Meaning while YOUR problem doesn't divide. You can take the hardware farm and run two separate problems on it. It'll run somewhat slower, but you get nearly double your money's worth in the hardware - lowering costs, and thus reducing the barrier to entry to TRY and solve hard problems with compute farms.
    amazon EC anyone?

    --
    -Michael
  49. Computer Languages! by TheCount22 · · Score: 1

    What makes parallel programming hard is computer languages.

    Most languages today are actually not designed for parallelism or concurrency simply because most computers for a very long time had only one core. This is why we have threading and locks everywhere. Threads have huge overhead from hundreds of kilobytes to megabytes. That may seem like nothing but ideally for parallelism and concurrency to work you need to be able to create thousands of processes at nearly no cost (hundreds of bytes each). And locks, don't even get me started with that!

    Shared mutable state is also a major problem it makes parallelism very hard, again current languages make heavy use of it (Singletons).

    Anyway, this problem has been solved ages ago just look into the Actor Model and Erlang to get started that should pretty much cover it.

    1. Re:Computer Languages! by Anonymous Coward · · Score: 0

      I hear a lot that functional programming languages solved parallel programming years ago. I ask, why haven't they become common then?

    2. Re:Computer Languages! by Savantissimo · · Score: 1

      Because they require several different kinds of completely alien thinking; knowing imperative programming is almost a disadvantage to learning functional programming. Also, most functional languages were too purist to allow actually getting much real work done - the libraries were usually very weak so every problem involved reinventing dozens of wheels. Clojure may have solved this problem - it isn't too purist and it can use all the Java libraries. It still requires a lot from the programmer, though perhaps less than the gain in power it provides.

      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
  50. Re:PLC programmers have been doing this for years. by AuMatar · · Score: 1

    A bit is a mutex if your processor has an atomic test and set instruction. Which these days they all do. A semaphore is an integer protected by a mutex. He oversimplified a bit, but if you know what you're doing it is that simple.

    --
    I still have more fans than freaks. WTF is wrong with you people?
  51. Re:PLC programmers have been doing this for years. by _0xd0ad · · Score: 0

    If you don't design your PLC correctly, you end up with two paths attempting to modify the same bit at the same time.

    A PLC does in fact run the same code over and over in an endless loop. There are no threads. No other processes. No other code can modify your mutex. This line executes, and the line following this line will follow this line in execution, nothing in-between, no question about it. There is simply no way that anything else can attempt to modify the same bit at the same time.

    The only things that can modify that bit are other parts of the program, and since the program executes in order, you know exactly when that will happen.

  52. Re:PLC programmers have been doing this for years. by marcosdumay · · Score: 1

    In the end, what makes it harder is that the things you do on software are usually way more complex than the things you do on hardware. If not for that, that argument would make perfect sense as hardware also have issues with communication latency, non-determinism, and localized data (not just caches), to an even higher degree than software. And you also must make it robust.

  53. Re:Parallelism is very easy,provided you don't do by jbolden · · Score: 1

    The barrier for learning to program purely functionally is high. But its a 1x barrier vs. complexity for ever. It makes the system simple.

    I agree with him, avoid state.

  54. Linda by michael_cain · · Score: 1

    For certain types of problems, the Linda coordination primitives and shared tuple-space make parallel programming much easier. I used the original C-Linda many, many years ago, and IBM's TSpaces for Java more recently. If you're trying to do little bitty actions on lots of data with tight coordination, the overhead is pretty bad. Looking into PyLinda is on my list of things to do...

  55. child's play! by ewertz · · Score: 0

    I figured this all out when I was a kid... share nothing!

  56. Parallel Programming by Anonymous Coward · · Score: 0

    Indeed it is is extremely difficult to do parallel programming. Back in the 70's we had several groups of programmers working on programs to do so.
    Add in the operating system technicalities and the hardware issues (storage serialization must be taken into effect).

    The issue which is wicked becomes more difficult when you have multiple cpu's running as well.

    When we first fired up out multiprocessor back in the 1970's we openned a can of worms as we found the programs that "worked" before stop working. Even IBM in one of their products did not like multiprocessing doing parallel programming. IBM did supply a fix quickly but we had a heck of a time documenting it. One vendor had had a LONG time bug in their code and refused to believe that a storage location could change withing a single cpu instruction until we proved it (simply I might add). It really can get dirty if things are not working the way they should (OS wise).

  57. make? by mevets · · Score: 1

    Makefiles specify dependencies and recipes quite tidily. A simple implementation of make is provided in the "AWK book" as an example program- http://www.cs.bell-labs.com/cm/cs/awkbook/

  58. Re:PLC programmers have been doing this for years. by deapbluesea · · Score: 1

    If that's the case, there is no need for mutex or semaphore constructs because those are only needed for parallelism. Parent was stating that he's used both in PLCs. If that's the case, why? Either he has no clue, or you're wrong. I don't really care either way, just addressing the fact that you can't just implement a mutex as a flag or a semaphore as an integer as stated by parent.

    --
    Government is not reason; it is not eloquent; it is force. Like fire, it is a dangerous servant and a fearful master.
  59. Re:PLC programmers have been doing this for years. by deapbluesea · · Score: 1

    Funny, just went through three different manufacturer's PLC instruction sets and none of them listed test and set or any other kind of discernible atomic instruction. Do you have any references you could point me to? I'd be interested to know for sure.

    --
    Government is not reason; it is not eloquent; it is force. Like fire, it is a dangerous servant and a fearful master.
  60. here's a crazy (and better) idea by slashmydots · · Score: 1

    I've got a crazy question to throw out there. Which do you think is easier? Design a motherboard capable of disguisesing a 4 core chip as a 1 core chip as far as the software sees it and then splits off the workload automatically across all cores

    or

    make every programmer in the entire world working on any multithreaded application jump through a ridiculous amount of hoops and headcahes to get their software to run properly.

    Now they're both incredibly difficult but I still think it's sort of one sided. Seriously, why has nobody attempted to develop something like this yet? Making life difficult for thousands of programmers compared to making some weird CPU virtualization technique that manages to properly manage threads and not blue screen the OS due to memory sharing violations and then making a ton of money on that exact technology...well I think that sort of spells it out.

    1. Re:here's a crazy (and better) idea by Anonymous Coward · · Score: 0

      Done a long time ago. I think it was called the Pentium. It had (I think) the U and V pipeline, and some programmer effort was required for maximum results. These days, it would be called having multiple execution units, and no effort is really needed, although being able to optimize assembly never hurts..

  61. Intel by happinessme · · Score: 1

    Intel, the world's largest semiconductor chip maker, which was established in 1968, has 41 years of product innovation and market leadership in history. In 1971, Intel introduced the world's first microprocessor. Brought about by the microprocessor computer and the Internet revolution has changed the world. http://www.cheaptoryburchshop.com/ http://www.sexytoyslove.com/ http://www.edhardygo.com/

  62. Bad debuggers by Anonymous Coward · · Score: 0

    Humans don't think well in parallel. The way you normally do parallel programming in a language like verilog-FPGA is using a parallel simulator like modelsim. It visualizes operations and computations that are normally hard to visualize. As far as I know, there is no good analog to modelsim for traditionally procedural languages like C. Until there is, it is going to be hard.

    I also think that there is little effort spent trying to teach students to program in parallel. Curriculums might benefit by emphasizing hardware description languages more than they currently do.

  63. Nothing! by HBSorensen · · Score: 1

    Parallel Programming is one of the old Arts which the new "developers" don't know jack about.

    --
    Never buy Sony CDs - they will open up your computer to anyone..
  64. Informative but still quite naive by Anonymous Coward · · Score: 0

    Nothing wrong with what's described in the article, but the analysis is still quite naive. If you really want to know why parallel programming is hard, i.e., when you start from sequential programs (which all algorithms are) and want to parallelize them efficiently, you need to go and ask a compiler developer or someone who works on compiler parallelization. There are a large number of complex issues in play here that can best be summarized at a high level with: "Parallelization, for a given algorithm, is hard because dependence analysis is hard" and it has a lot to do with language design. The example in the article illustrates a rather easy-to-address difficulty: the solution to it is privatization which removes false dependences. On another note, an important point missed by the writer on why parallel programs may not provide better performance in many cases is contention for the same amount of memory bandwidth. Unless memory bandwidth catches up with compute power, parallelization will primarily only help compute bound problems.

  65. Re:PLC programmers have been doing this for years. by _0xd0ad · · Score: 0

    All operations are atomic. For instance, if you have a line of code that tests a mutex bit, and, if it's unset, it sets it and branches into a subroutine, you don't have to worry about the bit being modified by external code between when you test it and when you set it.

    However, since the real processes that are occurring outside the PLC take much, much longer than the few milliseconds that it takes to execute a single scan of the PLC program, most of the time the PLC is actually just running multiple subroutines which are all waiting for inputs to change on the PLC (maybe, a level reaches a setpoint, or a valve reaches its open or closed limit) so that they can move to the next stage of their execution. As a result you have most of the same issues that are caused by multi-threading with respects to concurrency, preventing deadlock, etc.

  66. Re:Parallelism is very easy,provided you don't do by master_p · · Score: 1

    Not really, at least in languages where messages are indistinguishable from normal function calls.

  67. Re:Parallelism is very easy,provided you don't do by Aater+Suleman · · Score: 1

    Not sure what you mean. Are you thinking RPC?