Slashdot Mirror


Multi-Threaded Programming Without the Pain

holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."

24 of 327 comments (clear)

  1. Huh? by dreamchaser · · Score: 4, Insightful

    I didn't know the PS3 had thousands of cores ;)

    I think what he meant was 'each tracked in a separate thread'...obviously each core is still handling many threads. I haven't watched the presentation and don't plan on it until later today, too much to do and I'd rather read something about it. It just sounds like it provides an efficient high level way to write a multi threaded app. Evolutionary but not revolutionary?

    1. Re:Huh? by Gr8Apes · · Score: 4, Insightful

      Even so, this is a "bad" implementation. There's absolutely no reason for there to be 1 thread per chicken. That's inherently not scalable. What you really want are an optimum number of threads for the number of cores in a pool that handle work units (chickens). This will scale much higher than the 1 thread per object model discussed in this topic.

      Oh, and there's no such thing as "easy" multi-threading. Hell, the average programmer can't even grasp OO, so what makes them think they can grasp threading which has many many more aspects to it?

      --
      The cesspool just got a check and balance.
    2. Re:Huh? by Procyon101 · · Score: 4, Insightful

      You are using JVM threads. Most massively scalable threaded languages, like Erlang, use green threads. A green thread acts like a thread from the standpoint of the programmer, but carries little or no context switch cost (because it's not really a thread). The underlying platform then load balances these green threads across the actual hardware in an optimal pool of true threads.

      What makes these programming languages easy to grasp the massive concurrency of is one of 2 things:

      1) In Erlang and Termite (A scheme dialect) there is no mutable state, and no globals. Every function is in essence a "service" that simply gets messages and then responds with replies. There is no need to think about locking in such a system and very easy message passing idioms to do what you would normally do with mutable object orientation.

      2) In languages like Haskell, there is no concept of a "thread" at all... not even a single thread. There is no concept of "ordering". Things are defined as they are in mathematics.. as relationships between functions and variables. There is no mutable state allowed. This strictness allows the compiler to make very deep conclusions as to what can be parallelized. The compiler can then load balance under the covers across any number of procs without exposing any issues of concurrency to the user at all.

      So yes, in Java (and OO in general), concurrency is very, very difficult. In other paradigms though it can be trivial, or even transparent.

    3. Re:Huh? by shutdown+-p+now · · Score: 2, Insightful

      A C++ library could simply implement Erlang semantics on top of C++.

  2. Bah humbug by kahei · · Score: 2, Insightful


    Multithreaded development is commonplace in applications that need it. The places it's not common in are:

    -- old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'.

    -- places that have absolutely no need for it, which certainly includes the chicken demo. One core per chicken?? Seems more like the guy just discovered threads but hasn't quite grasped what they're for.

    --
    Whence? Hence. Whither? Thither.
    1. Re:Bah humbug by ari_j · · Score: 3, Insightful

      You are, of course, correct. The other thing that people need to keep in mind is that there is rarely only a single process running on a given machine. For applications where it makes sense, such as video rendering on a machine doing nothing else, multithreading can increase overall performance. For applications where it doesn't or where there are other things running on the same machine, you normally end up with worse overall performance by trying to get your naturally single-threaded program to run on multiple cores at once when the extra cores would be better dedicated to running things other than your program.

      Multithreading is a tool. Just like more traditional tools, like the hammer, this one is useful for certain applications. But multithreading is not the only tool at your disposal - people need to stop looking at everything as if it were a nail.

    2. Re:Bah humbug by aldheorte · · Score: 3, Insightful

      "old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'."

      I'm not sure I follow you there. Lightweight process models are perfect for multi-cores. The more the merrier. Given the andundance of high-quality networking and commodity machines, heavyweight programs outside of very niche areas that use internal threads are less suitable for distributed computing than lighteight process models that can call across the network or the OS to other lightweight processes. A heavywight process can only scale to the number of cores avaiable on the machine it is running on, whereas a flock of lightweight processes can scale to the locally available cores and onto to other machines in a distributed fashion without a major bump in the road between local and remote. Any machine that has multi-cores today could easily run, say, one Ruby process per core with negligible overhead.

    3. Re:Bah humbug by kcbrown · · Score: 4, Insightful

      -- old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'.

      And it's silly for it to be "on the way out".

      Anyone remember the Amiga? It had a preemptive multitasking OS that lacked hardware memory protection because the hardware it was running on couldn't support it. And while the OS itself was very fast and efficient, the overall system was relatively crash-prone, because any memory-related programming error in any running application had a decent chance of taking down the system.

      Fast forward to today. Every computer sold has hardware memory protection built-in. Anyone who doesn't know why that's a good thing needs to spend time on an Amiga.

      And yet, despite that, threads are all the rage. Why? Because people have this idiotic belief that they're somehow "more efficient" than processes. Such people probably program about as well as they think, which is to say not very well. Threads are indeed more efficient at context switching than processes, but the real question is: does that really matter? In the vast majority of cases, it doesn't, because in the vast majority of cases multiple threads are being used to make the user interface responsive. There's no way a human being can tell the difference between a millisecond-level context switch time and a microsecond-level one.

      On top of that, processes bring one critical advantage to the table that threads don't: memory protection. And for the same reason memory protection is important at the OS and hardware level, so too is it important at the process and thread level: it allows clean, protected separation of concern and greater overall application stability.

      The vast, vast majority of applications that are multithreaded don't actually need the slight additional context switch performance advantage that threads bring to the table, but they very much need the memory protection facilities that processes bring to the table. Which is another way of saying that if your application needs concurrency, you're a fool if you blindly use threads instead of processes.

      Even Windows supports fork() these days, with the POSIX subsystem (available, as far as I know, on any Windows 2000 and later system), so creating a clone of your current process is dirt simple even under Windows. End result: application authors have no good reason to use threads over processes unless they've actually done the math and can prove that their application really needs the slight performance advantage of threads more than the significant reliability advantage of processes.

      As to the other reason for using threads, the sharing of memory, there's this really cool new technology out these days. Maybe you've heard of it. It's called "shared memory". It's only been available for 20 years or so. No wonder most people haven't heard of it. Being forced to explicitly declare what's shared and what isn't is a good thing, because it makes you program easier to maintain, easier to debug, and more reliable -- all at the same time.

      The bottom line is this: if you need concurrency in your application, you should be using processes, not threads. If you insist on using threads, you'd better have a damned good reason for it, because the reliability implications of threads are hugely negative while the performance implications are modest at best.

      --
      Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
    4. Re:Bah humbug by odourpreventer · · Score: 2, Insightful

      Please correct me if I'm wrong, but it seems to me this discussion has gone into apples and oranges mode. Threads, as far as I'm aware, are supposed to be used for single, explicit tasks and always under supervision by a parent thread. I've used multi-threading with excellent results, but then I've taken pains to ensure that the threads don't have any privileges whatsoever. Processes, on the other hand, are more like stand-alone programs working in the same context.

    5. Re:Bah humbug by dkf · · Score: 3, Insightful

      On a single core without hyperthreading, your best bet (if you can) is to write very efficient single-threaded code, using non-blocking I/O as much as possible. Some language runtimes require the use of lots of threads even on single-core systems, but that's horrible.

      Once you've got multiple cores, getting multiple threads of execution (either in multiple processes or in multiple threads) makes a lot of sense. I believe hyperthreading benefits particularly such code that has multiple threads executing in the same bit of code since the parallelism there is within a memory management domain, so OpenMP is better there than pthreads, and pthreads is (probably) better than processes. On the other hand, if you're potentially working across a cluster (cue the beowulf jokes!) your code had better be written with processes (and probably MPI) in mind. Of course if you're going that way, you also ought to spend on getting a good interconnect network...

      All in all, getting proper high performance is tricky. The best guide to making things go faster is to try to reduce the amount of shared state between threads-of-execution. Reducing shared state also helps to make the code easier to debug. (Alas, dealing with the bits of state that must be shared is what makes life hard.)

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
  3. RapidMind = vendor lock-in by Anonymous Coward · · Score: 5, Insightful

    Both, RapidMind and Peakstream are proprietary commercial solutions and those companies are trying to lock users into their particular framework. What we really need is the equivalent as true open-source solution, perhaps as a gcc extension. Does anyone know if there is progress being made on this?

  4. what a joke by acidrain · · Score: 5, Insightful

    From the site:

    • 1. Replace types: The developer replaces numerical types representing floating point numbers and integers with the equivalent RapidMind platform types.
    • 2. Capture computations: While the user's application is running, sequences of numerical operations invoked by the user's application can be captured, recorded, and dynamically compiled to a program object by the RapidMind platform.
    • 3. Stream execution: The RapidMind platform runtime is used for managed parallel execution of program objects on the target hardware platform, which can be a GPU, the Cell processor, or a multicore CPU.

    Man thats some funny stuff. Wow that cracked me up. A *games* company using a tool that has this level of indirection?!? I sure hope these guys got a lot of money from their sucker VC to roll in.

    Look guys. There is no multi-processing silver bullet. It isn't even such a hard problem, *if you stop trying to solve it at such a low level*. Break your application into separate pieces that, *don't need to communicate very often.* Then this is the same kind of problem scalable websites like Google, MySpace, Hotmail and so on, have already, just without having to factor in the reliability issues. Finer grained multi-threading just leads to deadlocks and is really hard to debug. If you *really must* render the same sphere on 100 processors at the same time, then you need the speed of a custom coded solution. But you don't so let it go. The main loop of your program will be just fine as a single threaded implementation, 1 processor will do, and farm the 10% code / 90 % heavy lifting out in big clean chunks to other processors. If you find yourself writing some bizzare multi-threaded message passing system so that you can have 100s of threads all modifying the same live object model at the same time -- you are fucked, just forget about it 'cause you will never be able to debug that one killer bug that you know is going to get you right as you go to ship.

    --
    -- http://thegirlorthecar.com funny dating game for guys
  5. Thumbs up, thumbs down... by Anonymous Coward · · Score: 1, Insightful

    Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++.
    Thumbs up, since this is great for PCs, where you never know the hardware configuration you're running on. At the very least your applications will have the ability to "do better" with more hardware if it exists, yet still run on older stuff if it doesn't.

    For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core.
    Thumbs down, since you're on fixed hardware, and for the best benefit you're going to want to dedicate specific processors to specific tasks at specific times rather than have an algorithm just "figure it out for you."

    It may not seem like a bad idea now, but if we're to go down this path eventually we'll be praising researchers for getting a flavor of BASIC that can take advantage of all the PS3's processors seamlessly, but still has the same issues BASIC has.

    I think in general this begs the question, "Are we better off in the long run empowering ignorant programmers, or informed programmers?"

    For instance, would it have been better for the researcher to perhaps focus on how to teach programming for multiple processors rather than coming up with a library that "abstracts it away"?

    This is not to take away from the fact that he got it running on the PS3 though. Kudos for that.
  6. Life is Pain by gillbates · · Score: 2, Insightful

    First of all, I, and many others before me, have been writing multithreaded applications for years in the likes of Linux and UNIX. I have had to maintain multithreaded applications created by others. My collective experience tells me:

    It is not trivial.

    Let me repeat: It is not a trivial task. Even if you have libraries and an API which abstracts out the ugly stuff, you still have the problem of concurrency, proper locking, deadlocks, etc...

    The majority of problems with using multithreaded programming come not from "ugly" parts of the OS/API layer, but from a misunderstanding of the problem. A few problems in computer science - particularly in the physical sciences - do benefit from multithreading. And it is easier to use threads when writing a game than just to execute all of the IO in one big loop (Hello DOS!). But for most applications, using threads is not only unnecessary, but overkill, and introduces the possibility of yet another class of bugs for which the application must be tested. Furthermore, as deadlock and race conditions are often timing related, they are the most difficult type of bug to find and fix. Finding and fixing this class of bugs is still somewhat of a black art in the industry, and is highly dependent on the skill and experience of the programmer.

    In short, unless your system/application design cannot do without multithreaded programming, it is best not to use it. Even with a glossy API, you still cannot escape the fact that debugging a multithreaded application is an order of magnitude more difficult than a single threaded one. In any case, you shouldn't be using threads just because you can.

    --
    The society for a thought-free internet welcomes you.
  7. Re:What?! by zx75 · · Score: 4, Insightful

    I think you've missed the mark a little.

    I believe what he is saying is that if your an application developer who is pushing the limits of what a single core is capable of in terms of performance, then you are going to see decreasing rate of improvement and then stagnation because the focus of hardware development is shifting away from more power in a single core to more power because there are more cores.
    At some point you will hit a wall, and for single-threaded applications you're going to reach a point where there isn't any more power to be had.

    Therefore if you want to tap that extra power that a multi-core processor has, you will by definition *need* to start multi-threaded programming. This isn't about you people who are happy with the speed and power that you already have, research is pointless if you already have everything you could possibly need. This is for the people who push the edge, at some point if you need more you will need to learn to multi-thread correctly.

    And a simpler way to do it, is gold in my books.

    *From a former University classmate of Stephanos*

    --
    This is not a sig.
  8. Re:Toy Supercomputer by Doc+Ruby · · Score: 2, Insightful

    Because it's a toy supercomputer. If I find a way to expand its IO, I'll have a $600 supercomputer, scalable into a supercomputer cluster.

    If I listen to you, Anonymous defeatist Coward, and just cry "waaaahhh, I'm too dumb to hack a toy into a tool", then I'll just have a really cool toy.

    Allow me to introduce you to the term hack, which is what Slashdotters used to do before we were mostly posers.

    --

    --
    make install -not war

  9. Re:Functional programming by kcbrown · · Score: 2, Insightful

    Without native thread support it's hard to take advantage of multiple processor cores. Too bad we don't see more mature native compiled functional languages out there.

    What?

    Sorry, that's bullshit. If you want to take advantage of multiple processor cores, use multiple processes! Even Windows has fork() these days, thanks to its POSIX subsystem, so creating a clone of your process is very easy.

    You should use threads over processes only if you can prove that the context switch savings really does cause a big performance improvement for your application. If you really think about it, you'll find it's very rare indeed that the context switch overhead difference really matters, even on an OS like Windows where it's relatively high.

    --
    Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
  10. Comments from the presenter by sdt · · Score: 5, Insightful

    Good morning slashdot!

    As the (slightly terrified to find himself mentioned on slashdot) presenter in the video linked to above I thought I'd respond to a couple of comments in bulk. First off, I'm part of a much bigger team at RapidMind that builds this software to make targeting multicore and stream processors easier -- the system and the "chicken demo" was a group effort, and you can read more about it and the company in general in the article linked to from here, which unfortunately is PDF-only.

    For those crying out about multi-threading not being the solution: you're absolutely right! Our platform's approach to programming multi-core processors is to expose a data parallel model. In this model, the programmer explicitly deals with parallel programming (writing algorithms to work well on arbitrarily many cores) but all of the standard multi-threading issues such as deadlocks and race conditions are avoided, and the developer doesn't worry about how many cores there actually are.

    And no, the chicken demo didn't run each chicken on an individual core ;). But it did automatically scale to however many cores were available -- 6 SPUs and a PPU on the PS3, and 16 SPUs and 2 PPUs on a Cell Blade (on which we originally showed the simulation at GDC 2006).

    If you want to learn more, drop by our website at http://www.rapidmind.net. You can sign up for a free no-strings-attached evaluation version if you want to try it yourself.

  11. Or you could just use Ada by Black+Parrot · · Score: 2, Insightful

    which has had easy-to-use multithreading constructs built right into the language for the past 25 years or so.

    --
    Sheesh, evil *and* a jerk. -- Jade
    1. Re:Or you could just use Ada by Coryoth · · Score: 2, Insightful

      Unfortunately I think many programmers that read Slashdot are scared off by the clear, readable, maintainable syntax. Typing end is clearly too much work, or something, and as we all know IDEs can't possibly help with that... I would like to see Ada get more use, but unfortunately I doubt it is going to happen.

  12. Re:How many cores? by Anonymous Coward · · Score: 1, Insightful

    Respectfully, one core per chicken does not equate to one chicken per core, sir.

  13. Re:Functional programming by Lazerf4rt · · Score: 2, Insightful

    I recently shipped an Xbox 360 game and am about to ship a PS3 game, and having done a lot of system-level programming and optimization for both, I can tell you don't know what you're talking about. You're probably a smart guy and a good programmer but you're obviously speaking out of academic experience without having much real-world experience.

    The key to performance and stability does not lie in the discovery of high-level tools that abstract away all the hardware details for you. And it definitely doesn't lie in a functional language. They key is knowing your hardware and designing your software for it, right down to the low level. You have to create and manage your tasks/jobs/threads/fibers, each to do a specific thing, and you manage their lifetimes and the flow of data between them. If want need more performance, you need clever ways of pipelining your data.

    Anyway, just thought I'd share that. If you make a career in programming, you'll eventually learn that having a low-level understanding of each platform, and just using existing tools, is far more productive than trying to research and develop new programming tools. I'm downloading TFA's video right now, look forward to hearing whatever it is they have to say.

  14. Dining Chickens Problem by MS-06FZ · · Score: 3, Insightful

    I'm sure the demonstration would've been a lot more difficult if he'd used philosophers instead of chickens. Thing is, chickens can't even hold chopsticks. A chicken just goes straight for the feed, so there's just one resource being acquired. It's still possible for a chicken to starve, but as chickens don't eat that much it's more likely that any shut-out chickens would simply go hungry for a while, and then get to eat before starving.

    --
    ---GEC
    I'm but the humble pupil, seeking to snatch the scratchbuilt pebble from the master's fully articulated hand
  15. Re:Functional programming by The_Wilschon · · Score: 2, Insightful

    That approach works just fine if you know exactly what hardware your code is going to be running on, and you know that it will never have to run on any other hardware, and you know that you won't have to ever work on it again once it is released.

    In the Real World (ie not game consoles), programs must be portable. They must be maintainable. They must be writeable in a short time. Your approach completely ignores these requirements which enormously outweigh the tiny performance gains that you can get by tweaking for the hardware.

    High level languages remove the portability problem entirely, by shifting it to the shoulders of the language implementors. If I implement a language on one platform, and you implement the same language on a completely different platform, any program written in that language will now run just fine on both platforms. Sure, unless your compiler or interpreter is incredibly smart, programs written in that language won't run as fast as endlessly hand-tweaked assembly programs written for that platform. But if your compiler or interpreter is even just passingly not bloody stupid, then programs in that language will run pretty close to as fast.

    Higher level languages allow the programmer to express more directly exactly what he means. Suppose I have a very very low level language that manipulates bits (ie logic gates). I want an addition routine for arbitrary length sequences of bits. Implementing this in logic gates is not only very difficult and very very time consuming, but the end result doesn't look like addition at a glance. Now suppose I use a high level language. I can just write (a + b). It is clear to anyone who made it through elementary school that this code adds two things, a and b. Of course, I could abuse this, and actually make the language so that (a + b) deletes all files on the hard disk, but a realistic high level language is essentially self documenting. This means that 5 years down the road, when I've forgotten why I even wrote this bit of code, I can pull it up in a text editor and remember that it adds two numbers very easily. Suppose I then need it to add three numbers. Well, now that's a very easy change to make. The code is immensely more maintainable, not only by the original author, but by anyone, than code in a low level language.

    Many of the extremely high level languages that are disparagingly referred to as "research languages" have an extremely high information density. That is, I can express in 1 line of code what it might take you perhaps as many as hundreds of lines of code to express in a low level, close-to-the-metal language. This also contributes to the maintainability, because there is a lot less room for mistakes in one line of code than in 20 lines of code. More importantly, this very high expressiveness means that I can write code in a high level language much more quickly. When you're trying to compete with other companies, or trying to finish a PhD thesis in less than 10 years, or pretty much anything at all, being able to code 20 times faster is a very important thing.

    If you make a career outside of the very sheltered world of Xbox and PS3 programming, you'll see that endless performance tweaking in very low level languages is not only useless, but it is wastefully stupid.

    --
    SIGSEGV caught, terminating

    wait... not that kind of sig.