Slashdot Mirror


Multi-Threaded Programming Without the Pain

holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."

20 of 327 comments (clear)

  1. Re:"hundreds of cores"? by dreamchaser · · Score: 2, Interesting

    If by 'on the horizon' they mean 'possibly in the next ten years', then sure. I can see that happening. Quad cores are already here. If they double the number of cores every 18 months that means in 7.5 years we'll have 128 cores. I'm just throwing that out as an example, but it's certainly possible even if all the cores are not on the same package. Take 8 physical CPU's with 16 cores each for example.

    Just rampant speculation, but it is certainly possible.

  2. Functional programming by Cthefuture · · Score: 3, Interesting

    Also note that certain programming languages can make multithreaded programming a lot easier. Nothing against C++ (one of my favorite languages) but no matter what you do it's relatively hard to use in multithreaded applications compared to a functional language. We are already seeing more functional features put into existing languages.

    The main problem I see is that there is lack of focus in the functional arena. Many current functional languages are designed to use a VM with bytecode (Erlang for example) and don't support native threads easily (often requiring multiple VM instances and slow[er] message passing). The languages that do support native compiling almost always have other problems like horrible syntax (O'Caml, Lisp) or just general lack of refinement. Arguably Haskell comes the closest but suffers from a complicated and large backend support requirement like Java.

    Without native thread support it's hard to take advantage of multiple processor cores. Too bad we don't see more mature native compiled functional languages out there.

    --
    The ratio of people to cake is too big
    1. Re:Functional programming by shutdown+-p+now · · Score: 2, Interesting

      Well, speaking of game development, I hope you'll excuse me if I will hold the opinion of that guy Tim Sweeney, you know, the one behind Unreal, higher than yours? 'Cause he seems to disagree with you pretty strongly on many things, threading issues among them. Tools (which languages are) are key to solving this problem, and a lot of it does come from academia, just as all things heavily used in the industry today (like OOP) did.

  3. What?! by eldavojohn · · Score: 4, Interesting

    Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers.
    I'm a developer. I may not be the greatest one but I enjoy it. This declaration baffles me.

    You choose to go with a multi-threaded application when it is necessary. Anyone who just starts adding threads because they feel they need to utilize the number of cores is a complete idiot in my book. Hell, why don't we just put spin locks in there so your CPU usage shoots up and it looks like I'm using it to its full potential?

    My point is that there have been a few applications I've written that require a multi-threaded solution. Perhaps this API would have made my life easier but I doubt it as I had to pretty much structure by hand each thread. There are frameworks, graphical libraries and that also use multi-threading that the scheduler has taken care of in the past. Hurray for multi-core if you use those.

    A good programmer keeps things as simple as possible. They will be easier to maintain in the future. I'm afraid that this is unneeded layer of abstraction or some nut case trying to "utilize cores" for the sake of it. No one has only one application running at one time. The OS is usually running, you have a network process, etc. If I write my application to use one core, I'm giving the user more options to do with the other cores whatever he wants. Let the scheduler work with the futuristic hardware and sort that crap out.

    Also, not everyone is multi-core already. Take use into consideration please!
    --
    My work here is dung.
  4. Yep, concurrency is a problem, not a solution! by argent · · Score: 2, Interesting

    100% agree. Concurrency is a problem, not a solution, and it needs to be abstracted out early if you need it at all.

  5. Re:Huh? by dreamchaser · · Score: 3, Interesting

    Having written more than my share of threaded apps I agree 100%. I still haven't looked into this more, but it's probably a C++ class library that abstracts the creation and management of threads. Too many threads thrashes the processor nicely in many cases, so unless they have some magic behind the scenes managing the number of threads vs. cores then this is just a hyped up multi threading library.

    Fsck the chickens...show me what this does with a real game or a real world app that lends itself to highly parallel operations, then demo it on a quad quad core Xenon.

  6. Toy Supercomputer by Doc+Ruby · · Score: 3, Interesting

    The problem with programming the PS3 is that once the complexity of its parallel processors is handled, the CPU is so fast that it consumes and produces data much faster than the IO available. The Cell is a basically 204GFLOPS/32bit machine (plus the Power RISC, basically a Mac G5), with an internal 1.6Tbps bus. But even its builtin gigabit ethernet is puny compared to that kind of dataflow. It's not clear whether the USB slots are 1, 2 or 4 buses at 480Mbps each, but even 2Gbps more isn't so much. Maybe another gig-e can plug into its CompactFlash slot, bringing the total up to 4Gbps, but that's still only 0.25% the chip bus. In desperation, perhaps the SATA bus could also be used for another 1.3Gbps. Adding the HDMI output with some fancy codecing (especially on the receiving host) gives 10.2Gbps out, so the other 5.3Gbps can be used for input, but that's still only 5.3Gbps throughput, probably a lot less at under 100% efficiency per channel. The Cell can spin its wheels with 2000 instructions on the data it's got before it gets more. There are lots of "multimedia mixing" and transformation applications that could run multiple cycles in that 2K instructions, which instead need more machines for more IO.

    The PS3 doesn't seem to have the PCI-Express bus that would solve all these problems. For some reason Sony left out its old pet, FireWire, which could have added buses at 800Mbps each. There doesn't seem to be any expansion whatsoever, except changing the HD on the single SATA connector. To use what it's got, a huge amount of complex, heterogeneous IO management is necessary to use its power.

    It's strange to think that a $600 machine with around 5Gbps throughput and 7Tbps processing is a "toy", but the cropped IO makes the PS3 look that way, relative to its full power. Maybe a HW mod, even at $500 or possibly up to $2000, that adds PCIe for a half-dozen 2x10Gig-E cards, or even InfiniBand, will make this crazy little toy into more than just a development platform for games or prototypes for really expensive Cell machines. Who's got the way out?

    --

    --
    make install -not war

  7. Re:Huh? by Gr8Apes · · Score: 3, Interesting

    The chicken scenario described removed any curiosity I had about looking into the library further. Why? Because it's very similar to the Java 101 bouncing ball thread demo (one thread per ball) which is used to show why 1 thread per ball doesn't scale to first time would be multi-threaded programmers.

    --
    The cesspool just got a check and balance.
  8. Relativity by stratjakt · · Score: 2, Interesting

    However, multi-threaded development has been notoriously hard to do

    Only at first, once you wrap your head around it it becomes second nature.

    To a newbie, recursion is hard to do. To somebody who's been writing functional FORTRAN for 25 years, object oriented is hard to do.

    It's just another way of thinking about problems. The real bitch is having the toolkits and thread safe libraries at your disposal.

    --
    I don't need no instructions to know how to rock!!!!
  9. Active Objects by lefticus · · Score: 2, Interesting

    I'm not sure what techniques the developer is using as the um, "article" is a little light on details (unless I missed something) But the concept of Active Objects (a trivialized way of using threads) has been around for a while with generic implementations of them becoming more mainstream rapidly. In the past week there as been much discussion about active objects and "futures" on the boost mailing list and it is likely that both will become part of boost shortly. To put it simply, an active object is an object which has its own threaded message queue, so it is asynchronous from the rest of the system and a future is a return value from an asynchronous method call, a "future" value. These techniques are quite reasonable today because of concepts like fibers and the NPTL.

    And of course, a shameless plug for my active objects implementation (bsd style license). Actually, that page also does a decent job of demonstrating the concepts.

  10. Re:Don't Bother by Retric · · Score: 2, Interesting

    You must not work with network apps all that much.

    Think of the most basic email app possible. Now when a user presses "send mail" would you create a new fork (), try and micro manage the remote connation in a thread that handles the GUI, or force the user to wait around?

    Next think about video where you have a resource intensive task AND you still want a highly responsive GUI.

    Granted if all you ever work with is simple biz apps with one user you have a point but I think your 99.99% estimate says more about the work you do than programming in general. Because threads can often simply demanding applications.

  11. Re:Huh? by grumbel · · Score: 4, Interesting

    ### That's inherently not scalable.

    Not scalable? I beg to differ. Thousands threads for sure scale are a lot better then when you just have two or four or whatever, since with thousands you don't really have an upper limit of how many CPU you want to throw at the problem. The real issue with threads is that OS threads are extremely slow, so you can't have thousands threads or your machine would go to a crawl. Threads also are painful to work with since the languages just aren't up to the task.

    However for both these issues there exist solutions, namely Erlang, using user-level threads there is no upper limits and you really can have each chicken have its own thread without a problem and the language is also build from the base up to work nicely with threads.

    Now I haven't yet seen the talk, bittorrent still busy downloading it, but I seriously doubt that it will just be yet-another-simple-wrapper class.

  12. CSP Occam and Transputers by Anonymous Coward · · Score: 3, Interesting

    The communicating Sequential Processes style of programming allows for many lightweight simple threads that communicate over channels rather that the monitor based thread synchronization.

    The OCCAM language implemented this style of processing and the Transputer chip implemented a fast context switching hardware that OCCAM could run on.

    This was all done back in the 1980s.

    I even implemented the original version of the Java Communicating Sequential Processes API which brought CSP style programming to the Java world, although it is based on Java's underlying Thread mechanism so context switching isn't as fast as it could be.

  13. Transactional Memory by omnirealm · · Score: 3, Interesting

    For those who have not caught wind of this yet, transactional memory is currently the most promising solution to this problem and perhaps the most-covered subject in research conferences on parallel computing today. There have been several proposals for both hardware-based (at the cache level) and software-based architectures. Transactional memory greatly simplifies concurrent programming. When using transactions instead of locks, deadlocks go away completely and there is increased concurrency.

    --
    An unjust law is no law at all. - St. Augustine
  14. Re:Bah humbug by tuskentower · · Score: 2, Interesting

    Huh? Threads and processes are two different animals.
    Since you need a reason, here's one, its called concurrency. With processes I have to consume finite system resources to handle concurrency issues or role my own, which is called reinventing the wheel (aka waste of time). Thread libraries will do this for me.

  15. Re:Bah humbug by gbjbaanb · · Score: 4, Interesting

    Unfortunately you say processes have their own memory protection which is better than threads that have to do their own synchronisation when accessing shared memory, but then go on about process-based shared memory needing its own additional protection.

    If you need concurrency in your apps, there isn't that much between threads and processes. However, if you need interprocess-communication then you are far better off with threads, they are significantly faster wrt locking than processes as all process-based locks must be done at the OS level, using shared (and finite) system resources. Threads can just use a critical section and have done with it, almost no overhead.

    Threads are not more efficient at context switching than processes, the same procedure happens whether a thread is switched, or a process is (in fact, a process is really an app with 1 thread). However, as threads can share memory more efficiently, locking is often not needed as much so they appear to be more efficient.

    The best argument for threads v processes is Apache. Personally, I agree with the Apache group that Apache 2 with its thread-based model is better. They should know.

  16. Re:Huh? by joto · · Score: 2, Interesting

    Thousands threads for sure scale are a lot better then when you just have two or four or whatever, since with thousands you don't really have an upper limit of how many CPU you want to throw at the problem.

    Yes, the upper limit is thousand(s)! Go directly to jail. Do not pass Go. Do not collect $200.

    Seriously, with companies already offering 4 cores per CPU, and promising to offer 16 cores in the near future, and Moores law being as it is, you don't exactly have to be a visionary to predict that the future might bring a shitload(TM) of processor cores to somewhere in your vicinity. Note that a shitload(TM) is more than a few measly thousands. Oh, and before you start telling me that nobody needs a shitload(TM) of processor cores, remember that nobody needed more than 640K RAM either.

    The real issue with threads is that OS threads are extremely slow, so you can't have thousands threads or your machine would go to a crawl.

    OS threads aren't necessarily slow (I assume you mean switching between them). If this is at all true, it is an artificial limitation of current hardware/software combos that can be easily fixed (at least the fix is much easier than the work involved in creating shitload-core CPUs). Note that the cost of OS-threads, user-space threads, and OS processes vary wildly among different systems already. But shared memory really needs to go. It just doesn't scale.

    However for both these issues there exist solutions, namely Erlang, using user-level threads

    Last time I checked, Erlang used only user-space threads, meaning that even if you had a shitload(TM) of cores, a given Erlang program would only use one of them. Erlang focuses on modelling, not performance. I suspect there to be good ideas in Erlang, but it's not going to be the system programming language of the next century.

  17. Download and play with QtConcurrent today by IceFox · · Score: 2, Interesting

    A project that you can download and play with today is Trolltech's QtConcurrent. Given a task it will automatically manage creating threads and distributing the task among your cores.

    From the project page:

    The classes and functions available in the Qt Concurrent package allows you to write multi-threaded applications without having to use the basic threading synchronization primitives such as mutexes and wait conditions. This makes it easier to reason about and test parallel programs to make sure that they are correct.
    The Qt Concurrent components manage the threads they use automatically. Each application has a global thread counter, which limits the maximum number threads used at the same time. The maximum is scaled according to the number of CPU cores on the system at runtime. This means that programs written with Qt Concurrent today will continue to scale when deployed on many-core systems in the future.

    Very cool.

    --
    Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
  18. Inmos had the elegant solution by Tjp($)pjT · · Score: 2, Interesting
    Inmos Transputers C language development had an elegant solution. It should be migrated to mainstream C and C++ in my opinion:

    parallel
    {/* execute these statements in parallel if possible */
    statement1;
    statement2;
    ...
    statementn;
    }

    sequential
    {/* execute these statements in order as written */
    statement_1;
    statement_2;
    ...
    statement_n;
    }
    --
    - Tjp

    I am in wallow with my inner money grubbing capitalistic pig. ... Oink!

  19. Re:Huh? by Anonymous Coward · · Score: 1, Interesting

    Actually, Haskell has both threads and mutable state. However unlike Erlang, it also has a strong static type system which can isolate stateful bits from purely functional bits. I.e. you still get the benefits of purity (a function is always safe to evaluate on any thread in any order), while having the option to use e.g. shared state concurrency where it makes sense.

    Haskell threads are lightweight as well (and get distributed to the number of cores you specify on startup time), and they can also use software transactional memory (which eliminates the need for locks and other non-composable concurrency abstractions with a very simple programming model). Plus, Haskell is compiled, unlike Erlang, and will typically perform better as a result.

    Haskell is the best language I currently know of for concurrency and parallelism. There's room for improvement though. Perhaps I should say "least broken" rather than "best" to emphasize how pathetically late to the game we on the software side are on this issue. There's tons more to do, but I'd recommend everyone interested in these issues to take a gander at Haskell (specifically STM).