Multi-Threaded Programming Without the Pain

← Back to Stories (view on slashdot.org)

Multi-Threaded Programming Without the Pain

Posted by kdawson on Thursday March 22, 2007 @12:32AM from the chicken-run dept.

holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."

17 of 327 comments (clear)

Which Comes First? by jforest1 · 2007-03-22 00:35 · Score: 5, Funny

The multi-threaded chicken or the multi-threaded egg?

--josh
1. Re:Which Comes First? by Kjella · 2007-03-22 00:41 · Score: 4, Funny
  
  Since the egg thread is always converted to a chicken thread while the chicken thread spawns new subthreads, the egg came first. QED.
  
  --
  Live today, because you never know what tomorrow brings
2. Re:Which Comes First? by cronot · 2007-03-22 00:44 · Score: 5, Funny
  
  the fork()
Huh? by dreamchaser · 2007-03-22 00:44 · Score: 4, Insightful

I didn't know the PS3 had thousands of cores ;)

I think what he meant was 'each tracked in a separate thread'...obviously each core is still handling many threads. I haven't watched the presentation and don't plan on it until later today, too much to do and I'd rather read something about it. It just sounds like it provides an efficient high level way to write a multi threaded app. Evolutionary but not revolutionary?
1. Re:Huh? by Gr8Apes · 2007-03-22 01:03 · Score: 4, Insightful
  
  Even so, this is a "bad" implementation. There's absolutely no reason for there to be 1 thread per chicken. That's inherently not scalable. What you really want are an optimum number of threads for the number of cores in a pool that handle work units (chickens). This will scale much higher than the 1 thread per object model discussed in this topic.
  
  Oh, and there's no such thing as "easy" multi-threading. Hell, the average programmer can't even grasp OO, so what makes them think they can grasp threading which has many many more aspects to it?
  
  --
  The cesspool just got a check and balance.
2. Re:Huh? by grumbel · 2007-03-22 02:22 · Score: 4, Interesting
  
  ### That's inherently not scalable.
  
  Not scalable? I beg to differ. Thousands threads for sure scale are a lot better then when you just have two or four or whatever, since with thousands you don't really have an upper limit of how many CPU you want to throw at the problem. The real issue with threads is that OS threads are extremely slow, so you can't have thousands threads or your machine would go to a crawl. Threads also are painful to work with since the languages just aren't up to the task.
  
  However for both these issues there exist solutions, namely Erlang, using user-level threads there is no upper limits and you really can have each chicken have its own thread without a problem and the language is also build from the base up to work nicely with threads.
  
  Now I haven't yet seen the talk, bittorrent still busy downloading it, but I seriously doubt that it will just be yet-another-simple-wrapper class.
3. Re:Huh? by Gr8Apes · 2007-03-22 02:47 · Score: 5, Informative
  
  First, last time I ran the ball test just to see how processors had improved in their capabilities to run code, I got to over 2K threads in a single JVM before significant degradation occurred and then it occurred rapidly.
  
  Using the threadpool concept, however, you can tune the size of the threadpool via performance metrics from the threads in the threadpool for the optimum size of threadpool, after which you can place however many objects on the pool you'd like. Generally, this is based on the work the thread has to do. If there is no I/O blocking, I've found that 2-3 threads per CPU with moderate CPU time work units will load it to 100% (read moderate CPU time work units as work units that take on the order of 100-1000 ms to complete). If you start adding in any type of I/O blocking, including large amounts of memory access, then that number goes up. A DB retriever system wound up running 64 threads for my particular work load due primarily to the lag involved in the synchronous calls made to the DB. I could have tuned that further using future tasks and reducing the number of threads (a Doug Lea addition to the JDK 1.5 and also available in his previous concurrency library) but my particular case didn't have any negative effects by running 64 threads, so we left it at that. This particular DB access module ran across 64 systems (64*64 threads) serving roughly 35K concurrent customers.
  
  I haven't run Erlang, so can't comment. I have heard nice things about it though, and I'm curious about it. One day I'll have enough time to play with it.
  
  --
  The cesspool just got a check and balance.
4. Re:Huh? by Procyon101 · 2007-03-22 07:08 · Score: 4, Insightful
  
  You are using JVM threads. Most massively scalable threaded languages, like Erlang, use green threads. A green thread acts like a thread from the standpoint of the programmer, but carries little or no context switch cost (because it's not really a thread). The underlying platform then load balances these green threads across the actual hardware in an optimal pool of true threads.
  
  What makes these programming languages easy to grasp the massive concurrency of is one of 2 things:
  
  1) In Erlang and Termite (A scheme dialect) there is no mutable state, and no globals. Every function is in essence a "service" that simply gets messages and then responds with replies. There is no need to think about locking in such a system and very easy message passing idioms to do what you would normally do with mutable object orientation.
  
  2) In languages like Haskell, there is no concept of a "thread" at all... not even a single thread. There is no concept of "ordering". Things are defined as they are in mathematics.. as relationships between functions and variables. There is no mutable state allowed. This strictness allows the compiler to make very deep conclusions as to what can be parallelized. The compiler can then load balance under the covers across any number of procs without exposing any issues of concurrency to the user at all.
  
  So yes, in Java (and OO in general), concurrency is very, very difficult. In other paradigms though it can be trivial, or even transparent.
RapidMind = vendor lock-in by Anonymous Coward · 2007-03-22 00:54 · Score: 5, Insightful

Both, RapidMind and Peakstream are proprietary commercial solutions and those companies are trying to lock users into their particular framework. What we really need is the equivalent as true open-source solution, perhaps as a gcc extension. Does anyone know if there is progress being made on this?
1. Re:RapidMind = vendor lock-in by Anonymous Coward · 2007-03-22 01:15 · Score: 4, Informative
  
  OpenMP is implemented into GCC 4.2 (I think, I've never used it in GCC).
What?! by eldavojohn · 2007-03-22 00:59 · Score: 4, Interesting

Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers.
I'm a developer. I may not be the greatest one but I enjoy it. This declaration baffles me.

You choose to go with a multi-threaded application when it is necessary. Anyone who just starts adding threads because they feel they need to utilize the number of cores is a complete idiot in my book. Hell, why don't we just put spin locks in there so your CPU usage shoots up and it looks like I'm using it to its full potential?

My point is that there have been a few applications I've written that require a multi-threaded solution. Perhaps this API would have made my life easier but I doubt it as I had to pretty much structure by hand each thread. There are frameworks, graphical libraries and that also use multi-threading that the scheduler has taken care of in the past. Hurray for multi-core if you use those.

A good programmer keeps things as simple as possible. They will be easier to maintain in the future. I'm afraid that this is unneeded layer of abstraction or some nut case trying to "utilize cores" for the sake of it. No one has only one application running at one time. The OS is usually running, you have a network process, etc. If I write my application to use one core, I'm giving the user more options to do with the other cores whatever he wants. Let the scheduler work with the futuristic hardware and sort that crap out.

Also, not everyone is multi-core already. Take use into consideration please!

--
My work here is dung.
1. Re:What?! by zx75 · 2007-03-22 01:34 · Score: 4, Insightful
  
  I think you've missed the mark a little.
  
  I believe what he is saying is that if your an application developer who is pushing the limits of what a single core is capable of in terms of performance, then you are going to see decreasing rate of improvement and then stagnation because the focus of hardware development is shifting away from more power in a single core to more power because there are more cores.
  At some point you will hit a wall, and for single-threaded applications you're going to reach a point where there isn't any more power to be had.
  
  Therefore if you want to tap that extra power that a multi-core processor has, you will by definition *need* to start multi-threaded programming. This isn't about you people who are happy with the speed and power that you already have, research is pointless if you already have everything you could possibly need. This is for the people who push the edge, at some point if you need more you will need to learn to multi-thread correctly.
  
  And a simpler way to do it, is gold in my books.
  
  *From a former University classmate of Stephanos*
  
  --
  This is not a sig.
what a joke by acidrain · 2007-03-22 01:00 · Score: 5, Insightful
From the site:
- 1. Replace types: The developer replaces numerical types representing floating point numbers and integers with the equivalent RapidMind platform types.
- 2. Capture computations: While the user's application is running, sequences of numerical operations invoked by the user's application can be captured, recorded, and dynamically compiled to a program object by the RapidMind platform.
- 3. Stream execution: The RapidMind platform runtime is used for managed parallel execution of program objects on the target hardware platform, which can be a GPU, the Cell processor, or a multicore CPU.
Man thats some funny stuff. Wow that cracked me up. A *games* company using a tool that has this level of indirection?!? I sure hope these guys got a lot of money from their sucker VC to roll in.

Look guys. There is no multi-processing silver bullet. It isn't even such a hard problem, *if you stop trying to solve it at such a low level*. Break your application into separate pieces that, *don't need to communicate very often.* Then this is the same kind of problem scalable websites like Google, MySpace, Hotmail and so on, have already, just without having to factor in the reliability issues. Finer grained multi-threading just leads to deadlocks and is really hard to debug. If you *really must* render the same sphere on 100 processors at the same time, then you need the speed of a custom coded solution. But you don't so let it go. The main loop of your program will be just fine as a single threaded implementation, 1 processor will do, and farm the 10% code / 90 % heavy lifting out in big clean chunks to other processors. If you find yourself writing some bizzare multi-threaded message passing system so that you can have 100s of threads all modifying the same live object model at the same time -- you are fucked, just forget about it 'cause you will never be able to debug that one killer bug that you know is going to get you right as you go to ship.
--
-- http://thegirlorthecar.com funny dating game for guys
Re:Bah humbug by kcbrown · 2007-03-22 01:43 · Score: 4, Insightful

-- old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'.

And it's silly for it to be "on the way out".

Anyone remember the Amiga? It had a preemptive multitasking OS that lacked hardware memory protection because the hardware it was running on couldn't support it. And while the OS itself was very fast and efficient, the overall system was relatively crash-prone, because any memory-related programming error in any running application had a decent chance of taking down the system.

Fast forward to today. Every computer sold has hardware memory protection built-in. Anyone who doesn't know why that's a good thing needs to spend time on an Amiga.

And yet, despite that, threads are all the rage. Why? Because people have this idiotic belief that they're somehow "more efficient" than processes. Such people probably program about as well as they think, which is to say not very well. Threads are indeed more efficient at context switching than processes, but the real question is: does that really matter? In the vast majority of cases, it doesn't, because in the vast majority of cases multiple threads are being used to make the user interface responsive. There's no way a human being can tell the difference between a millisecond-level context switch time and a microsecond-level one.

On top of that, processes bring one critical advantage to the table that threads don't: memory protection. And for the same reason memory protection is important at the OS and hardware level, so too is it important at the process and thread level: it allows clean, protected separation of concern and greater overall application stability.

The vast, vast majority of applications that are multithreaded don't actually need the slight additional context switch performance advantage that threads bring to the table, but they very much need the memory protection facilities that processes bring to the table. Which is another way of saying that if your application needs concurrency, you're a fool if you blindly use threads instead of processes.

Even Windows supports fork() these days, with the POSIX subsystem (available, as far as I know, on any Windows 2000 and later system), so creating a clone of your current process is dirt simple even under Windows. End result: application authors have no good reason to use threads over processes unless they've actually done the math and can prove that their application really needs the slight performance advantage of threads more than the significant reliability advantage of processes.

As to the other reason for using threads, the sharing of memory, there's this really cool new technology out these days. Maybe you've heard of it. It's called "shared memory". It's only been available for 20 years or so. No wonder most people haven't heard of it. Being forced to explicitly declare what's shared and what isn't is a good thing, because it makes you program easier to maintain, easier to debug, and more reliable -- all at the same time.

The bottom line is this: if you need concurrency in your application, you should be using processes, not threads. If you insist on using threads, you'd better have a damned good reason for it, because the reliability implications of threads are hugely negative while the performance implications are modest at best.

--
Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
Re:400MB download by aadvancedGIR · 2007-03-22 01:43 · Score: 5, Funny

Me, I was expecting 100 4MB movies files that you would have to play concurrently.
Comments from the presenter by sdt · 2007-03-22 02:01 · Score: 5, Insightful

Good morning slashdot!

As the (slightly terrified to find himself mentioned on slashdot) presenter in the video linked to above I thought I'd respond to a couple of comments in bulk. First off, I'm part of a much bigger team at RapidMind that builds this software to make targeting multicore and stream processors easier -- the system and the "chicken demo" was a group effort, and you can read more about it and the company in general in the article linked to from here, which unfortunately is PDF-only.

For those crying out about multi-threading not being the solution: you're absolutely right! Our platform's approach to programming multi-core processors is to expose a data parallel model. In this model, the programmer explicitly deals with parallel programming (writing algorithms to work well on arbitrarily many cores) but all of the standard multi-threading issues such as deadlocks and race conditions are avoided, and the developer doesn't worry about how many cores there actually are.

And no, the chicken demo didn't run each chicken on an individual core ;). But it did automatically scale to however many cores were available -- 6 SPUs and a PPU on the PS3, and 16 SPUs and 2 PPUs on a Cell Blade (on which we originally showed the simulation at GDC 2006).

If you want to learn more, drop by our website at http://www.rapidmind.net. You can sign up for a free no-strings-attached evaluation version if you want to try it yourself.
Re:Bah humbug by gbjbaanb · 2007-03-22 03:38 · Score: 4, Interesting

Unfortunately you say processes have their own memory protection which is better than threads that have to do their own synchronisation when accessing shared memory, but then go on about process-based shared memory needing its own additional protection.

If you need concurrency in your apps, there isn't that much between threads and processes. However, if you need interprocess-communication then you are far better off with threads, they are significantly faster wrt locking than processes as all process-based locks must be done at the OS level, using shared (and finite) system resources. Threads can just use a critical section and have done with it, almost no overhead.

Threads are not more efficient at context switching than processes, the same procedure happens whether a thread is switched, or a process is (in fact, a process is really an app with 1 thread). However, as threads can share memory more efficiently, locking is often not needed as much so they appear to be more efficient.

The best argument for threads v processes is Apache. Personally, I agree with the Apache group that Apache 2 with its thread-based model is better. They should know.