Slashdot Mirror


Multi-Threaded Programming Without the Pain

holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."

21 of 327 comments (clear)

  1. Re:Don't Bother by Anonymous Coward · · Score: 1, Informative

    99.99% of the time, multi-thredded programming is not needed, and can actually *Slow* things down as mutexes block each other, killing performance - It takes a certain amount of time to establish a mutex, so two threads working on a single bit of memory can perform like a dog as each try to block, and unblock the other.

    Wait, can you tell me how you can get to parallel tasks that need to access the same resource to coordinate in a more performant manner? Aren't referring to poor design/algos vs anything specifically having to do with multi-threading? Saying mutex's are slow and should be avoided is like saying disk io is slow and should be avoided. While true, if my app needs to deal with storage, then that's what it needs to do.

    Also, Multi-core CPU's, have made using windows bareable when under load, because current singly threadded processes leave one spare core available for the OS to use when any app decides to eat all the cpu!

    Actually multi-core has made it cheaper. Multi-processor boxen have been around forever and those who've used them have been enjoying the benefits for a gazillion years. Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time. There is no such thing as a simple "a single threaded app leaves the other processor/core free for OS use", all cores are constantly being used by all sorts of crap unless the OS is configured to actually force the above to happen (processor affinity). In that case even a multi-threaded app can be forced onto a single processor/core.

  2. Square peg, round hole. by Ihlosi · · Score: 2, Informative
    Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers.



    No. Whether something can be done effectively on multiple cores doesn't depend on the programmer, but on the type of processing. Some things have to be done in a certain order, and there's nothing even the best programmer in the world can change about that, period. If you try hacking something together that uses multiple threads for this type of processing, you'll just end up making things slower and messier.



    On the other hand, there are other types of processing that just lend themselves fantastically to being done multithreaded.

  3. Re:RapidMind = vendor lock-in by Anonymous Coward · · Score: 4, Informative

    OpenMP is implemented into GCC 4.2 (I think, I've never used it in GCC).

  4. How many cores? by Aladrin · · Score: 2, Informative

    "For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. "

    Wait wait wait... How many cores does a PS3 have? Thousands? I suspect someone has their facts sadly mistaken. I think they meant 'each with its own thread and using multiple cores to processing the threads,' but that isn't nearly as impressive sounding.

    --
    "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
  5. Re:RapidMind = vendor lock-in by acidrain · · Score: 3, Informative

    Does anyone know if there is progress being made on this?

    The GPUs will ship with C compilers soon enough. They are already supporting limited forms of C. Actually we will see hybrid CPUs (the cell being a first example) which are capable of massive amounts of parallel math operations stacked in along side some of your CPU cores in time. As the number of cores grows, room is made for specialized processors where that makes sense in the market.

    --
    -- http://thegirlorthecar.com funny dating game for guys
  6. Re:C++ can't be made safe by ratboy666 · · Score: 2, Informative

    This stuff is an outgrowth of the Sh work done at Waterloo U. Anyway, the idea is that the declarations in your code are replaced. The new types redefine standard operators to generate code for a parallel machine (say, an nVidia card, or a PS/3 cell).

    The code so generated can be run immediately, or deferred (note -- its been a while since I've looked at Sh, so I am being vague).

    I didn't think that this was a GENERAL multi-threading solution; more a way to easily generate code for the parallel machines that are coming available.

    --
    Just another "Cubible(sic) Joe" 2 17 3061
  7. Pthreads? by gillbates · · Score: 3, Informative

    Pthreads has been out for a while. It is open source, and runs on Linux, Windows, and Mac(?).

    Whether or not you believe concurrency should be an explicit library or a matter of compiler extension is a bit of a religious argument. But pthreads does offer the functionality, and works fairly well.

    --
    The society for a thought-free internet welcomes you.
  8. Re:Relativity by Anonymous Coward · · Score: 2, Informative

    The real bitch is when you have a bug because your bug is not reproductible as easily than in any other programming method.
    So no its not only 'another way of thinking'.
    And good luck trying to 'extend' multithreaded stuff.
    Multithreading should only be used on very special occasions where it is really needed.
    That is harly ever in most end users applications.

  9. Re:Don't Bother by Dog-Cow · · Score: 3, Informative

    "Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time."

    Plus, you completely and utterly missed the point of the poster you replied to. Most apps (who cares about major?) are single-threaded. The poster's point is that writing a multi-threaded app JUST BECAUSE THERE ARE MORE CPUs/CORES to handle them is pointless and stupid. If the app only requires a single thread, use just one. The other resources will get used by the OS or by other apps (that may, God forbid, *also* be single-threaded). He wasn't talking about dedicating a computing resource to an app. He was saying that an app should only use what it needs, with the understanding that the OS will make good use of any remaining resources for other tasks.

    What a lot of multi-thread-happy people seem to miss is that as long as the OS is multi-tasking, the other resources will not go to waste just because the app in the foreground isn't using them.

  10. This isn't multithreading in the traditional sense by igomaniac · · Score: 3, Informative

    There's a lot of posts saying that multithreading is really hard, which is completely true... But what RapidMind is providing is something else, something more like a SIMD model or vector computations. It solves things like elementwise operations on large arrays in an efficient manner using whatever parallel computing resources are available. It's a language with a semantics that don't require complicated synchronisation because you're bascially telling the compiler which operations are independent and then it can go off and compute it in the most efficient way possible. RapidMind was designed to make GPGPU programming easy, so it's a generalisation of the pixel shader model where you have a lot of 'threads' computing the color of each pixel on the display in parallel. This is an easy problem, because there is basically no communication between threads.

    --

    The interactive way to Go -- http://www.playgo.to/iwtg/en/
  11. use gcc4.2 by drerwk · · Score: 2, Informative

    Yes, gcc 4.2 supports OpenMP. As others note, parallel programming is still not trivial. But OpenMP is very nice. I have a write up on building and testing gcc 4.2 on OS X here: http://alphakilo.com/openmp-on-os-x-using-gcc-42/. Serious advantages are that OpenMP can be retrofitted to existing C/C++ and Fortran code. I know that everyone prefers to start from scratch and use Erlang or some other solution, but in a project I am working on, we already have about a million lines of C++. Current OpenMP implementations favor SMP machines, but one can go even further with the Intel OpenMP for clusters solution. I have not tried it myself yet, but I understand that it makes the issue of non-shared memory across the cluster machines transparent. As in all cases YMMV. But, if your code is amenable to parallelizing, OpenMP is a pretty straight forward way to go.

  12. Re:Huh? by Gr8Apes · · Score: 5, Informative

    First, last time I ran the ball test just to see how processors had improved in their capabilities to run code, I got to over 2K threads in a single JVM before significant degradation occurred and then it occurred rapidly.

    Using the threadpool concept, however, you can tune the size of the threadpool via performance metrics from the threads in the threadpool for the optimum size of threadpool, after which you can place however many objects on the pool you'd like. Generally, this is based on the work the thread has to do. If there is no I/O blocking, I've found that 2-3 threads per CPU with moderate CPU time work units will load it to 100% (read moderate CPU time work units as work units that take on the order of 100-1000 ms to complete). If you start adding in any type of I/O blocking, including large amounts of memory access, then that number goes up. A DB retriever system wound up running 64 threads for my particular work load due primarily to the lag involved in the synchronous calls made to the DB. I could have tuned that further using future tasks and reducing the number of threads (a Doug Lea addition to the JDK 1.5 and also available in his previous concurrency library) but my particular case didn't have any negative effects by running 64 threads, so we left it at that. This particular DB access module ran across 64 systems (64*64 threads) serving roughly 35K concurrent customers.

    I haven't run Erlang, so can't comment. I have heard nice things about it though, and I'm curious about it. One day I'll have enough time to play with it.

    --
    The cesspool just got a check and balance.
  13. Re:Functional programming by Communomancer · · Score: 2, Informative

    The main problem I see is that there is lack of focus in the functional arena.

    Whoa whoa whoa! You may not like Erlang's implementation, but you can hardly attribute it to a lack of focus. The whole language was built with concurrency in mind. Heck, the concurrency even has built-in network awareness. And Erlang's been multi-core since last May.

    Erlang goes multi-core

    Yeah, that doesn't say anything about your VM worries. I don't have those, though. Seamless multi-threading and a language paradigms designed for concurrency more than make up for the VM performance hit, imo. When I have to write non-trivial concurrent systems, I reach for the language that already has the plumbing excellently implemented. I'm sure it's better done than anything I could implement myself, and since the system is concurrent, cheap hardware is easily added to improve performance.

    Man, this is the second time this week I've had to stick up for Erlang around here.

    --
    "UNIX" is never having to say you're sorry.
  14. Re:"hundreds of cores"? by Lazerf4rt · · Score: 3, Informative

    Yeah, they're looking ahead too eagerly. That's what academics do.

    Let's not forget that Intel and IBM both recently found a manufacturing process to keep Moore's law going for the next several years. Most people in 2006 thought we hit a wall, and that the multicore revolution was inevitably under way, but that just might not be true anymore. That said, it is always nice to have at least a few cores in available in your system.

    At the same time, AMD's Fusion strategy looks pretty interesting. I really wonder what's going to become of that.

  15. Re:C++ can't be made safe by Coryoth · · Score: 2, Informative

    Well, if you're going to remove 99+% of the common trouble spots of multi-threaded coding by moving to a messaging paradigm, then yes, it probably is conceptually easier than OO. It can also be significantly slower depending upon the application's design and function and greatly increase its memory footprint. e.g., I don't think a game like Quake would work all that well under this paradigm. I think it is nowhere near as bad as you seem to think - it all depends on how message passing is handled. If you're doing via some slow complex scheme then, sure, it will be slow. But the trick is to think conceptually in terms of message passing - that doesn't mean it actually has to be handled with a big clunky message passing interface internally; just in terms of how you think about it. Take SCOOP for instance. The "message passing" mechanism there is feature calls, the message being parameters passed to the feature/method. The preprocessor and compiler handles all the messy details of locking etc. and in practice it runs about as fast as hand written threading. The difference is that you think in the simpler terms of actors and messages, while the computer (in this case the compiler) handles the grunt work of converting that into efficient code. This is no different than OO, or garbage collection, of course: you simplify what you need to write by writing in a higher level paradigm and leave the hard work of turning that into efficient machine code to the compiler.

    You'll also still have the potential of concurrent modifications in this scenario, but at least you won't be working on the same memory storage locations, potentially reading indeterminate/incoherent values. Instead you'll have inconsistent values displayed, depending upon which thread's data you're displaying. Read the page on SCOOP, or this draft paper, to see what it actually does - it is well worth it: it's the best mix of OO and concurrent programming I've ever seen. You won't end up with inconsistent values because everything will block accordingly with the compiler handling all the necessary locking/blocking/waiting and letting you get on with just writing code.
  16. Re:Bah humbug by Bluesman · · Score: 3, Informative

    Nope, still the same. The OS has to flush the TLB when it switches processes, which is the cache for virtual memory address lookups.

    This and the reduced startup time are the most compelling reasons to use threads instead of processes on a single core.

    However, on a large number of cores, things aren't so clear-cut, since if you have as many cores as active processes, you're not doing the context switching as much, and the benefit of using threading to reduce cache flushes isn't so clear. You'd still benefit from the quick startup of threads, so for things like a highly concurrent web server that creates a thread per user, threads may still be a better solution.

    Interestingly, the much maligned cooperative threads (user-space) are the fastest of all since the programmer can control when the context switch happens. However, if there's blocking or an infinite loop, the whole application will hang. You have to use asynchronous I/O and make sure no thread runs for too long.

    Like most things, it's a trade off between protection from various mistakes and errors vs. speed and control. Processes give you the most protection with the greatest amount of overhead, while user level threads give you the best performance, but only if you design everything correctly.

    --
    If moderation could change anything, it would be illegal.
  17. Multithreaded programming the easy way by pbrooks100 · · Score: 2, Informative
  18. Re:Bah humbug by Foolhardy · · Score: 2, Informative

    A critical section makes use of a synchronization mechanism like a semaphore at entry and exit time to ensure that only one thread of execution (whether that be a process or a thread) is running in the critical section at any one time.
    On Windows at least, a critical section requires no kernel object, and only a few instructions with no syscall to acquire and release as long as there is no contention on the object. If, while entering the section marked as already owned, a kernel notification event is created for waiters to sleep on. A kernel mutex OTOH, always requires a kernel object and a syscall for both acquire and release. Syscalls are quite expensive, making critical sections much faster in most cases. A design involving a large number of small lockable objects with rare contention would benefit from being able to use them in particular. I know that Solaris also has lightweight mutexes that can't be shared between processes, and I assume they avoid syscalls in most cases as well.
  19. Ada 2005 by krischik · · Score: 2, Informative

    You forgot to mention that Ada 2005 now adds Interfaces to both protected and task objects. See:

    http://en.wikibooks.org/wiki/Ada_Programming/Taski ng

    Ada's multi-threadeding is not only without the pain but great fun!

    Martin

  20. Ada is thread ready since 1983... by krischik · · Score: 2, Informative

    ...and Ada 2005 even supports Real-Time programming. It is possible - just not with C++.

    Find a short intro here:

    http://en.wikibooks.org/wiki/Ada_Programming/Taski ng

    Martin

  21. Re:Huh? by shutdown+-p+now · · Score: 2, Informative
    Have a read please. As it goes, when it comes to multithreading, the model used by C++, Java and similar languages is rapidly becoming outdated.