Multi-Threaded Programming Without the Pain
holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."
99.99% of the time, multi-thredded programming is not needed, and can actually *Slow* things down as mutexes block each other, killing performance - It takes a certain amount of time to establish a mutex, so two threads working on a single bit of memory can perform like a dog as each try to block, and unblock the other.
Wait, can you tell me how you can get to parallel tasks that need to access the same resource to coordinate in a more performant manner? Aren't referring to poor design/algos vs anything specifically having to do with multi-threading? Saying mutex's are slow and should be avoided is like saying disk io is slow and should be avoided. While true, if my app needs to deal with storage, then that's what it needs to do.
Also, Multi-core CPU's, have made using windows bareable when under load, because current singly threadded processes leave one spare core available for the OS to use when any app decides to eat all the cpu!
Actually multi-core has made it cheaper. Multi-processor boxen have been around forever and those who've used them have been enjoying the benefits for a gazillion years. Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time. There is no such thing as a simple "a single threaded app leaves the other processor/core free for OS use", all cores are constantly being used by all sorts of crap unless the OS is configured to actually force the above to happen (processor affinity). In that case even a multi-threaded app can be forced onto a single processor/core.
No. Whether something can be done effectively on multiple cores doesn't depend on the programmer, but on the type of processing. Some things have to be done in a certain order, and there's nothing even the best programmer in the world can change about that, period. If you try hacking something together that uses multiple threads for this type of processing, you'll just end up making things slower and messier.
On the other hand, there are other types of processing that just lend themselves fantastically to being done multithreaded.
OpenMP is implemented into GCC 4.2 (I think, I've never used it in GCC).
"For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. "
Wait wait wait... How many cores does a PS3 have? Thousands? I suspect someone has their facts sadly mistaken. I think they meant 'each with its own thread and using multiple cores to processing the threads,' but that isn't nearly as impressive sounding.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Does anyone know if there is progress being made on this?
The GPUs will ship with C compilers soon enough. They are already supporting limited forms of C. Actually we will see hybrid CPUs (the cell being a first example) which are capable of massive amounts of parallel math operations stacked in along side some of your CPU cores in time. As the number of cores grows, room is made for specialized processors where that makes sense in the market.
-- http://thegirlorthecar.com funny dating game for guys
This stuff is an outgrowth of the Sh work done at Waterloo U. Anyway, the idea is that the declarations in your code are replaced. The new types redefine standard operators to generate code for a parallel machine (say, an nVidia card, or a PS/3 cell).
The code so generated can be run immediately, or deferred (note -- its been a while since I've looked at Sh, so I am being vague).
I didn't think that this was a GENERAL multi-threading solution; more a way to easily generate code for the parallel machines that are coming available.
Just another "Cubible(sic) Joe" 2 17 3061
Pthreads has been out for a while. It is open source, and runs on Linux, Windows, and Mac(?).
Whether or not you believe concurrency should be an explicit library or a matter of compiler extension is a bit of a religious argument. But pthreads does offer the functionality, and works fairly well.
The society for a thought-free internet welcomes you.
The real bitch is when you have a bug because your bug is not reproductible as easily than in any other programming method.
So no its not only 'another way of thinking'.
And good luck trying to 'extend' multithreaded stuff.
Multithreading should only be used on very special occasions where it is really needed.
That is harly ever in most end users applications.
"Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time."
Plus, you completely and utterly missed the point of the poster you replied to. Most apps (who cares about major?) are single-threaded. The poster's point is that writing a multi-threaded app JUST BECAUSE THERE ARE MORE CPUs/CORES to handle them is pointless and stupid. If the app only requires a single thread, use just one. The other resources will get used by the OS or by other apps (that may, God forbid, *also* be single-threaded). He wasn't talking about dedicating a computing resource to an app. He was saying that an app should only use what it needs, with the understanding that the OS will make good use of any remaining resources for other tasks.
What a lot of multi-thread-happy people seem to miss is that as long as the OS is multi-tasking, the other resources will not go to waste just because the app in the foreground isn't using them.
There's a lot of posts saying that multithreading is really hard, which is completely true... But what RapidMind is providing is something else, something more like a SIMD model or vector computations. It solves things like elementwise operations on large arrays in an efficient manner using whatever parallel computing resources are available. It's a language with a semantics that don't require complicated synchronisation because you're bascially telling the compiler which operations are independent and then it can go off and compute it in the most efficient way possible. RapidMind was designed to make GPGPU programming easy, so it's a generalisation of the pixel shader model where you have a lot of 'threads' computing the color of each pixel on the display in parallel. This is an easy problem, because there is basically no communication between threads.
The interactive way to Go -- http://www.playgo.to/iwtg/en/
Yes, gcc 4.2 supports OpenMP. As others note, parallel programming is still not trivial. But OpenMP is very nice. I have a write up on building and testing gcc 4.2 on OS X here: http://alphakilo.com/openmp-on-os-x-using-gcc-42/.
Serious advantages are that OpenMP can be retrofitted to existing C/C++ and Fortran code. I know that everyone prefers to start from scratch and use Erlang or some other solution, but in a project I am working on, we already have about a million lines of C++.
Current OpenMP implementations favor SMP machines, but one can go even further with the Intel OpenMP for clusters solution. I have not tried it myself yet, but I understand that it makes the issue of non-shared memory across the cluster machines transparent.
As in all cases YMMV. But, if your code is amenable to parallelizing, OpenMP is a pretty straight forward way to go.
First, last time I ran the ball test just to see how processors had improved in their capabilities to run code, I got to over 2K threads in a single JVM before significant degradation occurred and then it occurred rapidly.
Using the threadpool concept, however, you can tune the size of the threadpool via performance metrics from the threads in the threadpool for the optimum size of threadpool, after which you can place however many objects on the pool you'd like. Generally, this is based on the work the thread has to do. If there is no I/O blocking, I've found that 2-3 threads per CPU with moderate CPU time work units will load it to 100% (read moderate CPU time work units as work units that take on the order of 100-1000 ms to complete). If you start adding in any type of I/O blocking, including large amounts of memory access, then that number goes up. A DB retriever system wound up running 64 threads for my particular work load due primarily to the lag involved in the synchronous calls made to the DB. I could have tuned that further using future tasks and reducing the number of threads (a Doug Lea addition to the JDK 1.5 and also available in his previous concurrency library) but my particular case didn't have any negative effects by running 64 threads, so we left it at that. This particular DB access module ran across 64 systems (64*64 threads) serving roughly 35K concurrent customers.
I haven't run Erlang, so can't comment. I have heard nice things about it though, and I'm curious about it. One day I'll have enough time to play with it.
The cesspool just got a check and balance.
The main problem I see is that there is lack of focus in the functional arena.
Whoa whoa whoa! You may not like Erlang's implementation, but you can hardly attribute it to a lack of focus. The whole language was built with concurrency in mind. Heck, the concurrency even has built-in network awareness. And Erlang's been multi-core since last May.
Erlang goes multi-core
Yeah, that doesn't say anything about your VM worries. I don't have those, though. Seamless multi-threading and a language paradigms designed for concurrency more than make up for the VM performance hit, imo. When I have to write non-trivial concurrent systems, I reach for the language that already has the plumbing excellently implemented. I'm sure it's better done than anything I could implement myself, and since the system is concurrent, cheap hardware is easily added to improve performance.
Man, this is the second time this week I've had to stick up for Erlang around here.
"UNIX" is never having to say you're sorry.
Yeah, they're looking ahead too eagerly. That's what academics do.
Let's not forget that Intel and IBM both recently found a manufacturing process to keep Moore's law going for the next several years. Most people in 2006 thought we hit a wall, and that the multicore revolution was inevitably under way, but that just might not be true anymore. That said, it is always nice to have at least a few cores in available in your system.
At the same time, AMD's Fusion strategy looks pretty interesting. I really wonder what's going to become of that.
Craft Beer Programming T-shirts
Nope, still the same. The OS has to flush the TLB when it switches processes, which is the cache for virtual memory address lookups.
This and the reduced startup time are the most compelling reasons to use threads instead of processes on a single core.
However, on a large number of cores, things aren't so clear-cut, since if you have as many cores as active processes, you're not doing the context switching as much, and the benefit of using threading to reduce cache flushes isn't so clear. You'd still benefit from the quick startup of threads, so for things like a highly concurrent web server that creates a thread per user, threads may still be a better solution.
Interestingly, the much maligned cooperative threads (user-space) are the fastest of all since the programmer can control when the context switch happens. However, if there's blocking or an infinite loop, the whole application will hang. You have to use asynchronous I/O and make sure no thread runs for too long.
Like most things, it's a trade off between protection from various mistakes and errors vs. speed and control. Processes give you the most protection with the greatest amount of overhead, while user level threads give you the best performance, but only if you design everything correctly.
If moderation could change anything, it would be illegal.
LabVIEW. http://www.ni.com/labview.
You forgot to mention that Ada 2005 now adds Interfaces to both protected and task objects. See:
i ng
http://en.wikibooks.org/wiki/Ada_Programming/Task
Ada's multi-threadeding is not only without the pain but great fun!
Martin
...and Ada 2005 even supports Real-Time programming. It is possible - just not with C++.
i ng
Find a short intro here:
http://en.wikibooks.org/wiki/Ada_Programming/Task
Martin
- Software transactional memory
- Erlang
- Fortress
As it goes, when it comes to multithreading, the model used by C++, Java and similar languages is rapidly becoming outdated.