Multi-Threaded Programming Without the Pain
holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."
The multi-threaded chicken or the multi-threaded egg?
--josh
Deadlock detected!
I didn't know the PS3 had thousands of cores ;)
I think what he meant was 'each tracked in a separate thread'...obviously each core is still handling many threads. I haven't watched the presentation and don't plan on it until later today, too much to do and I'd rather read something about it. It just sounds like it provides an efficient high level way to write a multi threaded app. Evolutionary but not revolutionary?
Multithreaded development is commonplace in applications that need it. The places it's not common in are:
-- old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'.
-- places that have absolutely no need for it, which certainly includes the chicken demo. One core per chicken?? Seems more like the guy just discovered threads but hasn't quite grasped what they're for.
Whence? Hence. Whither? Thither.
Both, RapidMind and Peakstream are proprietary commercial solutions and those companies are trying to lock users into their particular framework. What we really need is the equivalent as true open-source solution, perhaps as a gcc extension. Does anyone know if there is progress being made on this?
If by 'on the horizon' they mean 'possibly in the next ten years', then sure. I can see that happening. Quad cores are already here. If they double the number of cores every 18 months that means in 7.5 years we'll have 128 cores. I'm just throwing that out as an example, but it's certainly possible even if all the cores are not on the same package. Take 8 physical CPU's with 16 cores each for example.
Just rampant speculation, but it is certainly possible.
Also note that certain programming languages can make multithreaded programming a lot easier. Nothing against C++ (one of my favorite languages) but no matter what you do it's relatively hard to use in multithreaded applications compared to a functional language. We are already seeing more functional features put into existing languages.
The main problem I see is that there is lack of focus in the functional arena. Many current functional languages are designed to use a VM with bytecode (Erlang for example) and don't support native threads easily (often requiring multiple VM instances and slow[er] message passing). The languages that do support native compiling almost always have other problems like horrible syntax (O'Caml, Lisp) or just general lack of refinement. Arguably Haskell comes the closest but suffers from a complicated and large backend support requirement like Java.
Without native thread support it's hard to take advantage of multiple processor cores. Too bad we don't see more mature native compiled functional languages out there.
The ratio of people to cake is too big
You choose to go with a multi-threaded application when it is necessary. Anyone who just starts adding threads because they feel they need to utilize the number of cores is a complete idiot in my book. Hell, why don't we just put spin locks in there so your CPU usage shoots up and it looks like I'm using it to its full potential?
My point is that there have been a few applications I've written that require a multi-threaded solution. Perhaps this API would have made my life easier but I doubt it as I had to pretty much structure by hand each thread. There are frameworks, graphical libraries and that also use multi-threading that the scheduler has taken care of in the past. Hurray for multi-core if you use those.
A good programmer keeps things as simple as possible. They will be easier to maintain in the future. I'm afraid that this is unneeded layer of abstraction or some nut case trying to "utilize cores" for the sake of it. No one has only one application running at one time. The OS is usually running, you have a network process, etc. If I write my application to use one core, I'm giving the user more options to do with the other cores whatever he wants. Let the scheduler work with the futuristic hardware and sort that crap out.
Also, not everyone is multi-core already. Take use into consideration please!
My work here is dung.
From the site:
Man thats some funny stuff. Wow that cracked me up. A *games* company using a tool that has this level of indirection?!? I sure hope these guys got a lot of money from their sucker VC to roll in.
Look guys. There is no multi-processing silver bullet. It isn't even such a hard problem, *if you stop trying to solve it at such a low level*. Break your application into separate pieces that, *don't need to communicate very often.* Then this is the same kind of problem scalable websites like Google, MySpace, Hotmail and so on, have already, just without having to factor in the reliability issues. Finer grained multi-threading just leads to deadlocks and is really hard to debug. If you *really must* render the same sphere on 100 processors at the same time, then you need the speed of a custom coded solution. But you don't so let it go. The main loop of your program will be just fine as a single threaded implementation, 1 processor will do, and farm the 10% code / 90 % heavy lifting out in big clean chunks to other processors. If you find yourself writing some bizzare multi-threaded message passing system so that you can have 100s of threads all modifying the same live object model at the same time -- you are fucked, just forget about it 'cause you will never be able to debug that one killer bug that you know is going to get you right as you go to ship.
-- http://thegirlorthecar.com funny dating game for guys
Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
100% agree. Concurrency is a problem, not a solution, and it needs to be abstracted out early if you need it at all.
No. Whether something can be done effectively on multiple cores doesn't depend on the programmer, but on the type of processing. Some things have to be done in a certain order, and there's nothing even the best programmer in the world can change about that, period. If you try hacking something together that uses multiple threads for this type of processing, you'll just end up making things slower and messier.
On the other hand, there are other types of processing that just lend themselves fantastically to being done multithreaded.
In Java, this is allowed because of performance issues. You can make it almost 100% thread safe (note I said almost) by synchronizing every method, but there's still some gotchas in the JDK.
Multi-threaded programming is a skill that comes with a level of understanding, much like students of mathematics must reach a level of understanding to comprehend Algebra, Calculus, Differential Equations, and Partial Differential Equations (yeah, that last one's a bear, especially when you apply it to various physical models) respectively. It's why you can be a wiz at adding or subtracting, but utterly fail at algebra, or a wiz at algebra, but never be able to differentiate or integrate a function, and so on.
Writing code is easy. Any moron can do it, as the Java and PHP hordes have shown. Writing good code is harder, designing good OO code is even harder, and designing and writing good multi-threaded code is yet a step beyond that. It's why there are so few well-written multi-threaded apps, and most of those are in server land.
The cesspool just got a check and balance.
The problem with programming the PS3 is that once the complexity of its parallel processors is handled, the CPU is so fast that it consumes and produces data much faster than the IO available. The Cell is a basically 204GFLOPS/32bit machine (plus the Power RISC, basically a Mac G5), with an internal 1.6Tbps bus. But even its builtin gigabit ethernet is puny compared to that kind of dataflow. It's not clear whether the USB slots are 1, 2 or 4 buses at 480Mbps each, but even 2Gbps more isn't so much. Maybe another gig-e can plug into its CompactFlash slot, bringing the total up to 4Gbps, but that's still only 0.25% the chip bus. In desperation, perhaps the SATA bus could also be used for another 1.3Gbps. Adding the HDMI output with some fancy codecing (especially on the receiving host) gives 10.2Gbps out, so the other 5.3Gbps can be used for input, but that's still only 5.3Gbps throughput, probably a lot less at under 100% efficiency per channel. The Cell can spin its wheels with 2000 instructions on the data it's got before it gets more. There are lots of "multimedia mixing" and transformation applications that could run multiple cycles in that 2K instructions, which instead need more machines for more IO.
The PS3 doesn't seem to have the PCI-Express bus that would solve all these problems. For some reason Sony left out its old pet, FireWire, which could have added buses at 800Mbps each. There doesn't seem to be any expansion whatsoever, except changing the HD on the single SATA connector. To use what it's got, a huge amount of complex, heterogeneous IO management is necessary to use its power.
It's strange to think that a $600 machine with around 5Gbps throughput and 7Tbps processing is a "toy", but the cropped IO makes the PS3 look that way, relative to its full power. Maybe a HW mod, even at $500 or possibly up to $2000, that adds PCIe for a half-dozen 2x10Gig-E cards, or even InfiniBand, will make this crazy little toy into more than just a development platform for games or prototypes for really expensive Cell machines. Who's got the way out?
--
make install -not war
"For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. "
Wait wait wait... How many cores does a PS3 have? Thousands? I suspect someone has their facts sadly mistaken. I think they meant 'each with its own thread and using multiple cores to processing the threads,' but that isn't nearly as impressive sounding.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
I browsed through the 411 MB ogg file, but could not find any chicks. Where are they?
However, multi-threaded development has been notoriously hard to do
Only at first, once you wrap your head around it it becomes second nature.
To a newbie, recursion is hard to do. To somebody who's been writing functional FORTRAN for 25 years, object oriented is hard to do.
It's just another way of thinking about problems. The real bitch is having the toolkits and thread safe libraries at your disposal.
I don't need no instructions to know how to rock!!!!
I'm not sure what techniques the developer is using as the um, "article" is a little light on details (unless I missed something) But the concept of Active Objects (a trivialized way of using threads) has been around for a while with generic implementations of them becoming more mainstream rapidly. In the past week there as been much discussion about active objects and "futures" on the boost mailing list and it is likely that both will become part of boost shortly. To put it simply, an active object is an object which has its own threaded message queue, so it is asynchronous from the rest of the system and a future is a return value from an asynchronous method call, a "future" value. These techniques are quite reasonable today because of concepts like fibers and the NPTL.
And of course, a shameless plug for my active objects implementation (bsd style license). Actually, that page also does a decent job of demonstrating the concepts.
I Do C++
This stuff is an outgrowth of the Sh work done at Waterloo U. Anyway, the idea is that the declarations in your code are replaced. The new types redefine standard operators to generate code for a parallel machine (say, an nVidia card, or a PS/3 cell).
The code so generated can be run immediately, or deferred (note -- its been a while since I've looked at Sh, so I am being vague).
I didn't think that this was a GENERAL multi-threading solution; more a way to easily generate code for the parallel machines that are coming available.
Just another "Cubible(sic) Joe" 2 17 3061
Hail Eris, full of mischief...
E pluribus sanguinem
First of all, I, and many others before me, have been writing multithreaded applications for years in the likes of Linux and UNIX. I have had to maintain multithreaded applications created by others. My collective experience tells me:
It is not trivial.
Let me repeat: It is not a trivial task. Even if you have libraries and an API which abstracts out the ugly stuff, you still have the problem of concurrency, proper locking, deadlocks, etc...
The majority of problems with using multithreaded programming come not from "ugly" parts of the OS/API layer, but from a misunderstanding of the problem. A few problems in computer science - particularly in the physical sciences - do benefit from multithreading. And it is easier to use threads when writing a game than just to execute all of the IO in one big loop (Hello DOS!). But for most applications, using threads is not only unnecessary, but overkill, and introduces the possibility of yet another class of bugs for which the application must be tested. Furthermore, as deadlock and race conditions are often timing related, they are the most difficult type of bug to find and fix. Finding and fixing this class of bugs is still somewhat of a black art in the industry, and is highly dependent on the skill and experience of the programmer.
In short, unless your system/application design cannot do without multithreaded programming, it is best not to use it. Even with a glossy API, you still cannot escape the fact that debugging a multithreaded application is an order of magnitude more difficult than a single threaded one. In any case, you shouldn't be using threads just because you can.
The society for a thought-free internet welcomes you.
Me, I was expecting 100 4MB movies files that you would have to play concurrently.
Pthreads has been out for a while. It is open source, and runs on Linux, Windows, and Mac(?).
Whether or not you believe concurrency should be an explicit library or a matter of compiler extension is a bit of a religious argument. But pthreads does offer the functionality, and works fairly well.
The society for a thought-free internet welcomes you.
Good morning slashdot!
As the (slightly terrified to find himself mentioned on slashdot) presenter in the video linked to above I thought I'd respond to a couple of comments in bulk. First off, I'm part of a much bigger team at RapidMind that builds this software to make targeting multicore and stream processors easier -- the system and the "chicken demo" was a group effort, and you can read more about it and the company in general in the article linked to from here, which unfortunately is PDF-only.
For those crying out about multi-threading not being the solution: you're absolutely right! Our platform's approach to programming multi-core processors is to expose a data parallel model. In this model, the programmer explicitly deals with parallel programming (writing algorithms to work well on arbitrarily many cores) but all of the standard multi-threading issues such as deadlocks and race conditions are avoided, and the developer doesn't worry about how many cores there actually are.
And no, the chicken demo didn't run each chicken on an individual core ;). But it did automatically scale to however many cores were available -- 6 SPUs and a PPU on the PS3, and 16 SPUs and 2 PPUs on a Cell Blade (on which we originally showed the simulation at GDC 2006).
If you want to learn more, drop by our website at http://www.rapidmind.net. You can sign up for a free no-strings-attached evaluation version if you want to try it yourself.
"Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time."
Plus, you completely and utterly missed the point of the poster you replied to. Most apps (who cares about major?) are single-threaded. The poster's point is that writing a multi-threaded app JUST BECAUSE THERE ARE MORE CPUs/CORES to handle them is pointless and stupid. If the app only requires a single thread, use just one. The other resources will get used by the OS or by other apps (that may, God forbid, *also* be single-threaded). He wasn't talking about dedicating a computing resource to an app. He was saying that an app should only use what it needs, with the understanding that the OS will make good use of any remaining resources for other tasks.
What a lot of multi-thread-happy people seem to miss is that as long as the OS is multi-tasking, the other resources will not go to waste just because the app in the foreground isn't using them.
which has had easy-to-use multithreading constructs built right into the language for the past 25 years or so.
Sheesh, evil *and* a jerk. -- Jade
You must not work with network apps all that much.
Think of the most basic email app possible. Now when a user presses "send mail" would you create a new fork (), try and micro manage the remote connation in a thread that handles the GUI, or force the user to wait around?
Next think about video where you have a resource intensive task AND you still want a highly responsive GUI.
Granted if all you ever work with is simple biz apps with one user you have a point but I think your 99.99% estimate says more about the work you do than programming in general. Because threads can often simply demanding applications.
The communicating Sequential Processes style of programming allows for many lightweight simple threads that communicate over channels rather that the monitor based thread synchronization.
The OCCAM language implemented this style of processing and the Transputer chip implemented a fast context switching hardware that OCCAM could run on.
This was all done back in the 1980s.
I even implemented the original version of the Java Communicating Sequential Processes API which brought CSP style programming to the Java world, although it is based on Java's underlying Thread mechanism so context switching isn't as fast as it could be.
For those who have not caught wind of this yet, transactional memory is currently the most promising solution to this problem and perhaps the most-covered subject in research conferences on parallel computing today. There have been several proposals for both hardware-based (at the cache level) and software-based architectures. Transactional memory greatly simplifies concurrent programming. When using transactions instead of locks, deadlocks go away completely and there is increased concurrency.
An unjust law is no law at all. - St. Augustine
There's a lot of posts saying that multithreading is really hard, which is completely true... But what RapidMind is providing is something else, something more like a SIMD model or vector computations. It solves things like elementwise operations on large arrays in an efficient manner using whatever parallel computing resources are available. It's a language with a semantics that don't require complicated synchronisation because you're bascially telling the compiler which operations are independent and then it can go off and compute it in the most efficient way possible. RapidMind was designed to make GPGPU programming easy, so it's a generalisation of the pixel shader model where you have a lot of 'threads' computing the color of each pixel on the display in parallel. This is an easy problem, because there is basically no communication between threads.
The interactive way to Go -- http://www.playgo.to/iwtg/en/
Yes, gcc 4.2 supports OpenMP. As others note, parallel programming is still not trivial. But OpenMP is very nice. I have a write up on building and testing gcc 4.2 on OS X here: http://alphakilo.com/openmp-on-os-x-using-gcc-42/.
Serious advantages are that OpenMP can be retrofitted to existing C/C++ and Fortran code. I know that everyone prefers to start from scratch and use Erlang or some other solution, but in a project I am working on, we already have about a million lines of C++.
Current OpenMP implementations favor SMP machines, but one can go even further with the Intel OpenMP for clusters solution. I have not tried it myself yet, but I understand that it makes the issue of non-shared memory across the cluster machines transparent.
As in all cases YMMV. But, if your code is amenable to parallelizing, OpenMP is a pretty straight forward way to go.
"Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers."
The marketplace wants and needs new technologies for more powerful processors. Multicore serves the needs of chip makers, not their customers. Making all software multi-threaded is trying to solve the wrong problem. It's going to result in lower-quality software without a significant increase in performance.
Yeah, they're looking ahead too eagerly. That's what academics do.
Let's not forget that Intel and IBM both recently found a manufacturing process to keep Moore's law going for the next several years. Most people in 2006 thought we hit a wall, and that the multicore revolution was inevitably under way, but that just might not be true anymore. That said, it is always nice to have at least a few cores in available in your system.
At the same time, AMD's Fusion strategy looks pretty interesting. I really wonder what's going to become of that.
Craft Beer Programming T-shirts
I'm sure the demonstration would've been a lot more difficult if he'd used philosophers instead of chickens. Thing is, chickens can't even hold chopsticks. A chicken just goes straight for the feed, so there's just one resource being acquired. It's still possible for a chicken to starve, but as chickens don't eat that much it's more likely that any shut-out chickens would simply go hungry for a while, and then get to eat before starving.
---GEC
I'm but the humble pupil, seeking to snatch the scratchbuilt pebble from the master's fully articulated hand
LabVIEW. http://www.ni.com/labview.
A project that you can download and play with today is Trolltech's QtConcurrent. Given a task it will automatically manage creating threads and distributing the task among your cores.
From the project page:
The classes and functions available in the Qt Concurrent package allows you to write multi-threaded applications without having to use the basic threading synchronization primitives such as mutexes and wait conditions. This makes it easier to reason about and test parallel programs to make sure that they are correct.
The Qt Concurrent components manage the threads they use automatically. Each application has a global thread counter, which limits the maximum number threads used at the same time. The maximum is scaled according to the number of CPU cores on the system at runtime. This means that programs written with Qt Concurrent today will continue to scale when deployed on many-core systems in the future.
Very cool.
Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
parallel
{/* execute these statements in parallel if possible */
statement1;
statement2;
statementn;
}
sequential
{/* execute these statements in order as written */
statement_1;
statement_2;
statement_n;
}
- Tjp
I am in wallow with my inner money grubbing capitalistic pig. ... Oink!
You forgot to mention that Ada 2005 now adds Interfaces to both protected and task objects. See:
i ng
http://en.wikibooks.org/wiki/Ada_Programming/Task
Ada's multi-threadeding is not only without the pain but great fun!
Martin
...and Ada 2005 even supports Real-Time programming. It is possible - just not with C++.
i ng
Find a short intro here:
http://en.wikibooks.org/wiki/Ada_Programming/Task
Martin