Slashdot Mirror


Multi-Threaded Programming Without the Pain

holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."

9 of 327 comments (clear)

  1. Functional programming by Cthefuture · · Score: 3, Interesting

    Also note that certain programming languages can make multithreaded programming a lot easier. Nothing against C++ (one of my favorite languages) but no matter what you do it's relatively hard to use in multithreaded applications compared to a functional language. We are already seeing more functional features put into existing languages.

    The main problem I see is that there is lack of focus in the functional arena. Many current functional languages are designed to use a VM with bytecode (Erlang for example) and don't support native threads easily (often requiring multiple VM instances and slow[er] message passing). The languages that do support native compiling almost always have other problems like horrible syntax (O'Caml, Lisp) or just general lack of refinement. Arguably Haskell comes the closest but suffers from a complicated and large backend support requirement like Java.

    Without native thread support it's hard to take advantage of multiple processor cores. Too bad we don't see more mature native compiled functional languages out there.

    --
    The ratio of people to cake is too big
  2. What?! by eldavojohn · · Score: 4, Interesting

    Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers.
    I'm a developer. I may not be the greatest one but I enjoy it. This declaration baffles me.

    You choose to go with a multi-threaded application when it is necessary. Anyone who just starts adding threads because they feel they need to utilize the number of cores is a complete idiot in my book. Hell, why don't we just put spin locks in there so your CPU usage shoots up and it looks like I'm using it to its full potential?

    My point is that there have been a few applications I've written that require a multi-threaded solution. Perhaps this API would have made my life easier but I doubt it as I had to pretty much structure by hand each thread. There are frameworks, graphical libraries and that also use multi-threading that the scheduler has taken care of in the past. Hurray for multi-core if you use those.

    A good programmer keeps things as simple as possible. They will be easier to maintain in the future. I'm afraid that this is unneeded layer of abstraction or some nut case trying to "utilize cores" for the sake of it. No one has only one application running at one time. The OS is usually running, you have a network process, etc. If I write my application to use one core, I'm giving the user more options to do with the other cores whatever he wants. Let the scheduler work with the futuristic hardware and sort that crap out.

    Also, not everyone is multi-core already. Take use into consideration please!
    --
    My work here is dung.
  3. Re:Huh? by dreamchaser · · Score: 3, Interesting

    Having written more than my share of threaded apps I agree 100%. I still haven't looked into this more, but it's probably a C++ class library that abstracts the creation and management of threads. Too many threads thrashes the processor nicely in many cases, so unless they have some magic behind the scenes managing the number of threads vs. cores then this is just a hyped up multi threading library.

    Fsck the chickens...show me what this does with a real game or a real world app that lends itself to highly parallel operations, then demo it on a quad quad core Xenon.

  4. Toy Supercomputer by Doc+Ruby · · Score: 3, Interesting

    The problem with programming the PS3 is that once the complexity of its parallel processors is handled, the CPU is so fast that it consumes and produces data much faster than the IO available. The Cell is a basically 204GFLOPS/32bit machine (plus the Power RISC, basically a Mac G5), with an internal 1.6Tbps bus. But even its builtin gigabit ethernet is puny compared to that kind of dataflow. It's not clear whether the USB slots are 1, 2 or 4 buses at 480Mbps each, but even 2Gbps more isn't so much. Maybe another gig-e can plug into its CompactFlash slot, bringing the total up to 4Gbps, but that's still only 0.25% the chip bus. In desperation, perhaps the SATA bus could also be used for another 1.3Gbps. Adding the HDMI output with some fancy codecing (especially on the receiving host) gives 10.2Gbps out, so the other 5.3Gbps can be used for input, but that's still only 5.3Gbps throughput, probably a lot less at under 100% efficiency per channel. The Cell can spin its wheels with 2000 instructions on the data it's got before it gets more. There are lots of "multimedia mixing" and transformation applications that could run multiple cycles in that 2K instructions, which instead need more machines for more IO.

    The PS3 doesn't seem to have the PCI-Express bus that would solve all these problems. For some reason Sony left out its old pet, FireWire, which could have added buses at 800Mbps each. There doesn't seem to be any expansion whatsoever, except changing the HD on the single SATA connector. To use what it's got, a huge amount of complex, heterogeneous IO management is necessary to use its power.

    It's strange to think that a $600 machine with around 5Gbps throughput and 7Tbps processing is a "toy", but the cropped IO makes the PS3 look that way, relative to its full power. Maybe a HW mod, even at $500 or possibly up to $2000, that adds PCIe for a half-dozen 2x10Gig-E cards, or even InfiniBand, will make this crazy little toy into more than just a development platform for games or prototypes for really expensive Cell machines. Who's got the way out?

    --

    --
    make install -not war

  5. Re:Huh? by Gr8Apes · · Score: 3, Interesting

    The chicken scenario described removed any curiosity I had about looking into the library further. Why? Because it's very similar to the Java 101 bouncing ball thread demo (one thread per ball) which is used to show why 1 thread per ball doesn't scale to first time would be multi-threaded programmers.

    --
    The cesspool just got a check and balance.
  6. Re:Huh? by grumbel · · Score: 4, Interesting

    ### That's inherently not scalable.

    Not scalable? I beg to differ. Thousands threads for sure scale are a lot better then when you just have two or four or whatever, since with thousands you don't really have an upper limit of how many CPU you want to throw at the problem. The real issue with threads is that OS threads are extremely slow, so you can't have thousands threads or your machine would go to a crawl. Threads also are painful to work with since the languages just aren't up to the task.

    However for both these issues there exist solutions, namely Erlang, using user-level threads there is no upper limits and you really can have each chicken have its own thread without a problem and the language is also build from the base up to work nicely with threads.

    Now I haven't yet seen the talk, bittorrent still busy downloading it, but I seriously doubt that it will just be yet-another-simple-wrapper class.

  7. CSP Occam and Transputers by Anonymous Coward · · Score: 3, Interesting

    The communicating Sequential Processes style of programming allows for many lightweight simple threads that communicate over channels rather that the monitor based thread synchronization.

    The OCCAM language implemented this style of processing and the Transputer chip implemented a fast context switching hardware that OCCAM could run on.

    This was all done back in the 1980s.

    I even implemented the original version of the Java Communicating Sequential Processes API which brought CSP style programming to the Java world, although it is based on Java's underlying Thread mechanism so context switching isn't as fast as it could be.

  8. Transactional Memory by omnirealm · · Score: 3, Interesting

    For those who have not caught wind of this yet, transactional memory is currently the most promising solution to this problem and perhaps the most-covered subject in research conferences on parallel computing today. There have been several proposals for both hardware-based (at the cache level) and software-based architectures. Transactional memory greatly simplifies concurrent programming. When using transactions instead of locks, deadlocks go away completely and there is increased concurrency.

    --
    An unjust law is no law at all. - St. Augustine
  9. Re:Bah humbug by gbjbaanb · · Score: 4, Interesting

    Unfortunately you say processes have their own memory protection which is better than threads that have to do their own synchronisation when accessing shared memory, but then go on about process-based shared memory needing its own additional protection.

    If you need concurrency in your apps, there isn't that much between threads and processes. However, if you need interprocess-communication then you are far better off with threads, they are significantly faster wrt locking than processes as all process-based locks must be done at the OS level, using shared (and finite) system resources. Threads can just use a critical section and have done with it, almost no overhead.

    Threads are not more efficient at context switching than processes, the same procedure happens whether a thread is switched, or a process is (in fact, a process is really an app with 1 thread). However, as threads can share memory more efficiently, locking is often not needed as much so they appear to be more efficient.

    The best argument for threads v processes is Apache. Personally, I agree with the Apache group that Apache 2 with its thread-based model is better. They should know.