Multi-Threaded Programming Without the Pain
holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."
The multi-threaded chicken or the multi-threaded egg?
--josh
Deadlock detected!
I didn't know the PS3 had thousands of cores ;)
I think what he meant was 'each tracked in a separate thread'...obviously each core is still handling many threads. I haven't watched the presentation and don't plan on it until later today, too much to do and I'd rather read something about it. It just sounds like it provides an efficient high level way to write a multi threaded app. Evolutionary but not revolutionary?
Multithreaded development is commonplace in applications that need it. The places it's not common in are:
-- old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'.
-- places that have absolutely no need for it, which certainly includes the chicken demo. One core per chicken?? Seems more like the guy just discovered threads but hasn't quite grasped what they're for.
Whence? Hence. Whither? Thither.
Where is the abstract getting "hundreds of cores in desktops on the horizon" from? Is this actually expected soon, or are they just looking ahead a bit too eagerly?
Both, RapidMind and Peakstream are proprietary commercial solutions and those companies are trying to lock users into their particular framework. What we really need is the equivalent as true open-source solution, perhaps as a gcc extension. Does anyone know if there is progress being made on this?
The programming language must provide thread safety as part of its own paradigm, not as an add-on to the language in the form of functions, classes, templates or whatnot. Even java doesn't do it; you can make definitions of public accessors in java classes that are a mix of synchronized and not synchronized. That is provably unsafe (provided that these public accessors access the same private data), yet the language and the compiler allow it. Elegant degradation during runtime in case of deadlock detection or race conditions are non existent in the world of programming languages, yet you can easily think them up (binding some sort of preference to a thread and let the most preferential thread break the lock - experimenting with sleeping a thread for a number of seconds if it seems to race - notifying an 'exception' thread in case of all of this). I don't think that _any_ language that I know (and that isn't saying much, I'm not boasting) is really thread-ready.
Religion is what happens when nature strikes and groupthink goes wrong.
At 411MB this'd better be some demo because chicken little it ain't.
Also note that certain programming languages can make multithreaded programming a lot easier. Nothing against C++ (one of my favorite languages) but no matter what you do it's relatively hard to use in multithreaded applications compared to a functional language. We are already seeing more functional features put into existing languages.
The main problem I see is that there is lack of focus in the functional arena. Many current functional languages are designed to use a VM with bytecode (Erlang for example) and don't support native threads easily (often requiring multiple VM instances and slow[er] message passing). The languages that do support native compiling almost always have other problems like horrible syntax (O'Caml, Lisp) or just general lack of refinement. Arguably Haskell comes the closest but suffers from a complicated and large backend support requirement like Java.
Without native thread support it's hard to take advantage of multiple processor cores. Too bad we don't see more mature native compiled functional languages out there.
The ratio of people to cake is too big
You choose to go with a multi-threaded application when it is necessary. Anyone who just starts adding threads because they feel they need to utilize the number of cores is a complete idiot in my book. Hell, why don't we just put spin locks in there so your CPU usage shoots up and it looks like I'm using it to its full potential?
My point is that there have been a few applications I've written that require a multi-threaded solution. Perhaps this API would have made my life easier but I doubt it as I had to pretty much structure by hand each thread. There are frameworks, graphical libraries and that also use multi-threading that the scheduler has taken care of in the past. Hurray for multi-core if you use those.
A good programmer keeps things as simple as possible. They will be easier to maintain in the future. I'm afraid that this is unneeded layer of abstraction or some nut case trying to "utilize cores" for the sake of it. No one has only one application running at one time. The OS is usually running, you have a network process, etc. If I write my application to use one core, I'm giving the user more options to do with the other cores whatever he wants. Let the scheduler work with the futuristic hardware and sort that crap out.
Also, not everyone is multi-core already. Take use into consideration please!
My work here is dung.
From the site:
Man thats some funny stuff. Wow that cracked me up. A *games* company using a tool that has this level of indirection?!? I sure hope these guys got a lot of money from their sucker VC to roll in.
Look guys. There is no multi-processing silver bullet. It isn't even such a hard problem, *if you stop trying to solve it at such a low level*. Break your application into separate pieces that, *don't need to communicate very often.* Then this is the same kind of problem scalable websites like Google, MySpace, Hotmail and so on, have already, just without having to factor in the reliability issues. Finer grained multi-threading just leads to deadlocks and is really hard to debug. If you *really must* render the same sphere on 100 processors at the same time, then you need the speed of a custom coded solution. But you don't so let it go. The main loop of your program will be just fine as a single threaded implementation, 1 processor will do, and farm the 10% code / 90 % heavy lifting out in big clean chunks to other processors. If you find yourself writing some bizzare multi-threaded message passing system so that you can have 100s of threads all modifying the same live object model at the same time -- you are fucked, just forget about it 'cause you will never be able to debug that one killer bug that you know is going to get you right as you go to ship.
-- http://thegirlorthecar.com funny dating game for guys
Is there a version that isn't a 400+MB movie file? I was expecting an article.
99.99% of the time, multi-thredded programming is not needed, and can actually *Slow* things down as mutexes block each other, killing performance - It takes a certain amount of time to establish a mutex, so two threads working on a single bit of memory can perform like a dog as each try to block, and unblock the other.
Wait, can you tell me how you can get to parallel tasks that need to access the same resource to coordinate in a more performant manner? Aren't referring to poor design/algos vs anything specifically having to do with multi-threading? Saying mutex's are slow and should be avoided is like saying disk io is slow and should be avoided. While true, if my app needs to deal with storage, then that's what it needs to do.
Also, Multi-core CPU's, have made using windows bareable when under load, because current singly threadded processes leave one spare core available for the OS to use when any app decides to eat all the cpu!
Actually multi-core has made it cheaper. Multi-processor boxen have been around forever and those who've used them have been enjoying the benefits for a gazillion years. Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time. There is no such thing as a simple "a single threaded app leaves the other processor/core free for OS use", all cores are constantly being used by all sorts of crap unless the OS is configured to actually force the above to happen (processor affinity). In that case even a multi-threaded app can be forced onto a single processor/core.
100% agree. Concurrency is a problem, not a solution, and it needs to be abstracted out early if you need it at all.
No. Whether something can be done effectively on multiple cores doesn't depend on the programmer, but on the type of processing. Some things have to be done in a certain order, and there's nothing even the best programmer in the world can change about that, period. If you try hacking something together that uses multiple threads for this type of processing, you'll just end up making things slower and messier.
On the other hand, there are other types of processing that just lend themselves fantastically to being done multithreaded.
It may not seem like a bad idea now, but if we're to go down this path eventually we'll be praising researchers for getting a flavor of BASIC that can take advantage of all the PS3's processors seamlessly, but still has the same issues BASIC has.
I think in general this begs the question, "Are we better off in the long run empowering ignorant programmers, or informed programmers?"
For instance, would it have been better for the researcher to perhaps focus on how to teach programming for multiple processors rather than coming up with a library that "abstracts it away"?
This is not to take away from the fact that he got it running on the PS3 though. Kudos for that.
Some researchers actually do bother: http://manticore.cs.uchicago.edu/
The problem with programming the PS3 is that once the complexity of its parallel processors is handled, the CPU is so fast that it consumes and produces data much faster than the IO available. The Cell is a basically 204GFLOPS/32bit machine (plus the Power RISC, basically a Mac G5), with an internal 1.6Tbps bus. But even its builtin gigabit ethernet is puny compared to that kind of dataflow. It's not clear whether the USB slots are 1, 2 or 4 buses at 480Mbps each, but even 2Gbps more isn't so much. Maybe another gig-e can plug into its CompactFlash slot, bringing the total up to 4Gbps, but that's still only 0.25% the chip bus. In desperation, perhaps the SATA bus could also be used for another 1.3Gbps. Adding the HDMI output with some fancy codecing (especially on the receiving host) gives 10.2Gbps out, so the other 5.3Gbps can be used for input, but that's still only 5.3Gbps throughput, probably a lot less at under 100% efficiency per channel. The Cell can spin its wheels with 2000 instructions on the data it's got before it gets more. There are lots of "multimedia mixing" and transformation applications that could run multiple cycles in that 2K instructions, which instead need more machines for more IO.
The PS3 doesn't seem to have the PCI-Express bus that would solve all these problems. For some reason Sony left out its old pet, FireWire, which could have added buses at 800Mbps each. There doesn't seem to be any expansion whatsoever, except changing the HD on the single SATA connector. To use what it's got, a huge amount of complex, heterogeneous IO management is necessary to use its power.
It's strange to think that a $600 machine with around 5Gbps throughput and 7Tbps processing is a "toy", but the cropped IO makes the PS3 look that way, relative to its full power. Maybe a HW mod, even at $500 or possibly up to $2000, that adds PCIe for a half-dozen 2x10Gig-E cards, or even InfiniBand, will make this crazy little toy into more than just a development platform for games or prototypes for really expensive Cell machines. Who's got the way out?
--
make install -not war
"For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. "
Wait wait wait... How many cores does a PS3 have? Thousands? I suspect someone has their facts sadly mistaken. I think they meant 'each with its own thread and using multiple cores to processing the threads,' but that isn't nearly as impressive sounding.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
"Transactional memory"
--jeffk++
ipv6 is my vpn
I browsed through the 411 MB ogg file, but could not find any chicks. Where are they?
However, multi-threaded development has been notoriously hard to do
Only at first, once you wrap your head around it it becomes second nature.
To a newbie, recursion is hard to do. To somebody who's been writing functional FORTRAN for 25 years, object oriented is hard to do.
It's just another way of thinking about problems. The real bitch is having the toolkits and thread safe libraries at your disposal.
I don't need no instructions to know how to rock!!!!
I'm not sure what techniques the developer is using as the um, "article" is a little light on details (unless I missed something) But the concept of Active Objects (a trivialized way of using threads) has been around for a while with generic implementations of them becoming more mainstream rapidly. In the past week there as been much discussion about active objects and "futures" on the boost mailing list and it is likely that both will become part of boost shortly. To put it simply, an active object is an object which has its own threaded message queue, so it is asynchronous from the rest of the system and a future is a return value from an asynchronous method call, a "future" value. These techniques are quite reasonable today because of concepts like fibers and the NPTL.
And of course, a shameless plug for my active objects implementation (bsd style license). Actually, that page also does a decent job of demonstrating the concepts.
I Do C++
Hail Eris, full of mischief...
E pluribus sanguinem
Mod parent up, PLEASE
First of all, I, and many others before me, have been writing multithreaded applications for years in the likes of Linux and UNIX. I have had to maintain multithreaded applications created by others. My collective experience tells me:
It is not trivial.
Let me repeat: It is not a trivial task. Even if you have libraries and an API which abstracts out the ugly stuff, you still have the problem of concurrency, proper locking, deadlocks, etc...
The majority of problems with using multithreaded programming come not from "ugly" parts of the OS/API layer, but from a misunderstanding of the problem. A few problems in computer science - particularly in the physical sciences - do benefit from multithreading. And it is easier to use threads when writing a game than just to execute all of the IO in one big loop (Hello DOS!). But for most applications, using threads is not only unnecessary, but overkill, and introduces the possibility of yet another class of bugs for which the application must be tested. Furthermore, as deadlock and race conditions are often timing related, they are the most difficult type of bug to find and fix. Finding and fixing this class of bugs is still somewhat of a black art in the industry, and is highly dependent on the skill and experience of the programmer.
In short, unless your system/application design cannot do without multithreaded programming, it is best not to use it. Even with a glossy API, you still cannot escape the fact that debugging a multithreaded application is an order of magnitude more difficult than a single threaded one. In any case, you shouldn't be using threads just because you can.
The society for a thought-free internet welcomes you.
I program in Vb.net and use multithreading for work on pictures.. jpegs and mpegs and such. I'm know I'm not a great programmer but Multithreading has allways been a pain in the keester for me. I still use vb.net 2000, but hofully they fixed the starting of stopped thread issue. or maybe reading a threads state accurately without having to pause the system for a few NS. I need to learn C I guess. maybe some classes would help.. maybe I shoudl just but the newest rendition of Visual Studio..
Ad eundum quo nemo ante iit!
I take your point about multi-core been around for years, however...What I should have been more clear about, is if 1 thread ends up in a infinate-loop, then, on a single core Windows PC, the whole OS starts to crawl, somtimes making it difficult to even open 'Windows Task manager' in order to kill the rogue process. Multiple processors/cores get around this problem - there will always be the idle cpu for windows to make up for its dodgey task scheduler. "Wait, can you tell me how you can get to parallel tasks that need to access the same resource to coordinate in a more performant manner?" - Nope, I cant. But, if you create some multithredded code, in which you create 2 threads that compete with each other for the same resource, then this is shite design, and the performance of your code will almost certainly be singly thredded code doing the same thing.
Pthreads has been out for a while. It is open source, and runs on Linux, Windows, and Mac(?).
Whether or not you believe concurrency should be an explicit library or a matter of compiler extension is a bit of a religious argument. But pthreads does offer the functionality, and works fairly well.
The society for a thought-free internet welcomes you.
Geez - at 411MB, it better be a complete operating system, plus development toolchain.
Many complete distros fit in under 411MB.
Nuts --
Just give us a little animated GIF of the chickens, we dont believe the rest of your claims anyway.
Agreed. The question is "will this be free?" If the answer is "no" then everyone leaves and Dr. Stephan finishes giving his speech to the guy picking up the rubbish.
Despite what the USPTO* clerks tell you, programming ideas are a dime a dozen. He's got as much chance of getting you to pay for this as I have of convincing all you C++ programmers to switch to my new proprietary (*D)++++(R)(TM) language. Only $1,250 a seat! What are you waiting for boys?
* = At least the guy who picks up garbage knows trash when he sees it.
"Transactional memory"
How is that more performant? If I have thread A and thread B trying to modify object C, if I use the traditional lock method I lock C while I'm writing to it. In the transactional memory model, some mechanism has to eventually do the same thing. You might have taken the load off of the higher level developer, but you've simply pushed the performance problems onto someone else.
As a side note, back in the late 80's, I worked on a 4GL that utilized a VM that used the transactional memory model. It was quite nice from an application developers standpoint and ANY modifications made by an application (memory, file, db, etc) was handled by the transaction manager so when a logical operation failed, EVERYTHING would get rolled back. So this concept is not new (and I'm sure others can come up with examples from even earlier).
My comment is valid, true, and anyone who has spent months dealing with code developed by cowboy coders who thought nothing of dropping in a new thread to perform almost every operation, without thought, will agree with me.
I seem to recall pretty much every app I used under OS/2 took advantage of threading. The workplace shell, of course, being the prime example. This was in 1992.
The problem, I think, is that the majority of programmers out there today who were just hobbyists back then, were learning on a very single-threaded platform. Because the model was never there, it's 'hard'. With OS/2 3+, it was always there, and anybody who dabbled on that platform were immediately exposed to how to implement threads, as they were such a core piece of the OS.
Good morning slashdot!
As the (slightly terrified to find himself mentioned on slashdot) presenter in the video linked to above I thought I'd respond to a couple of comments in bulk. First off, I'm part of a much bigger team at RapidMind that builds this software to make targeting multicore and stream processors easier -- the system and the "chicken demo" was a group effort, and you can read more about it and the company in general in the article linked to from here, which unfortunately is PDF-only.
For those crying out about multi-threading not being the solution: you're absolutely right! Our platform's approach to programming multi-core processors is to expose a data parallel model. In this model, the programmer explicitly deals with parallel programming (writing algorithms to work well on arbitrarily many cores) but all of the standard multi-threading issues such as deadlocks and race conditions are avoided, and the developer doesn't worry about how many cores there actually are.
And no, the chicken demo didn't run each chicken on an individual core ;). But it did automatically scale to however many cores were available -- 6 SPUs and a PPU on the PS3, and 16 SPUs and 2 PPUs on a Cell Blade (on which we originally showed the simulation at GDC 2006).
If you want to learn more, drop by our website at http://www.rapidmind.net. You can sign up for a free no-strings-attached evaluation version if you want to try it yourself.
"Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time."
Plus, you completely and utterly missed the point of the poster you replied to. Most apps (who cares about major?) are single-threaded. The poster's point is that writing a multi-threaded app JUST BECAUSE THERE ARE MORE CPUs/CORES to handle them is pointless and stupid. If the app only requires a single thread, use just one. The other resources will get used by the OS or by other apps (that may, God forbid, *also* be single-threaded). He wasn't talking about dedicating a computing resource to an app. He was saying that an app should only use what it needs, with the understanding that the OS will make good use of any remaining resources for other tasks.
What a lot of multi-thread-happy people seem to miss is that as long as the OS is multi-tasking, the other resources will not go to waste just because the app in the foreground isn't using them.
which has had easy-to-use multithreading constructs built right into the language for the past 25 years or so.
Sheesh, evil *and* a jerk. -- Jade
This is ridiculous tripe. Multi-threaded programming is hard not because the libraries are hard to use but because it requires alot of planning and thought to decide if you can actually gain a benefit by going multi-threaded.
The main benefit of multiple cores will not happen in userland. It will be in the kernels and the libcs'. Once userland processes can effectively get memory from the heap with minimal locking we will see a performance boost system wide(I'm talking 100 processes can all request memory from the heap with 0 locking). This is why cores are important, because we will be running more and more applications on our computers, we won't need the performance in a specific app, we will need all of our apps to be responsive all the time.
But, if you create some multithredded code, in which you create 2 threads that compete with each other for the same resource, then this is shite design, and the performance of your code will almost certainly be singly thredded code doing the same thing.
I assume you meant to say that the singly threaded code will certainly be better/faster? I would whole heartedly disagree with that generalization. It is very common to solve problems by multithreading where some common resources do need to be shared where the design is quite reasonable. As an extreme case, take something like SETI. You are processing large datasets where you break the data into chunks and kick off threads to churn on each chunk. Lets say it takes several minutes for each thread to process it's chunk and then it updates an internal results object. So you serialize access to the results object by using locks. How is this crappy design?
Now I agree, it is certainly possible to create some crappy code because you are more intent on multi-threading the thing vs coming up with a good solution that isn't necessarily multi-threaded. But there are tons of problems that do fit quite nicely into MT/MP patterns that don't scale well or at all being serialized. My opposition to your statement is that I believe you are over generalizing.
I have some practical experience on this one.
I inherited a massive data collection routine which I identified as a good candidate for threading. When I mentioned the idea to the original duhveloper his eyes just kinda glazed over. The objection was that the routine has always "worked" so why introduce risk with increased complexity? After he left I jumped in and multi-threaded it. I was able to thread 20 already busy servers all working for me at the same time and each server was threading stuff without any performance degredation. Normal 24/7 operations with many concurrent production sessions were not impacted in the least. The end result was that the former 22 hour long process now completes in 22 to 41 minutes and it's much more stable and reliable. And with a good thread pool class it really wasn't that complicated. Actually, the hardest thing was getting beyond the glazed eyed reservations of a clueless duhveloper who was too timorous to try something "new."
Why did the multi threaded chicken cross the road? to the other side To get the other side To get to the
http://www.rense.com/general79/wdx1.htm
A programming language designed for multithreads.
You must not work with network apps all that much.
Think of the most basic email app possible. Now when a user presses "send mail" would you create a new fork (), try and micro manage the remote connation in a thread that handles the GUI, or force the user to wait around?
Next think about video where you have a resource intensive task AND you still want a highly responsive GUI.
Granted if all you ever work with is simple biz apps with one user you have a point but I think your 99.99% estimate says more about the work you do than programming in general. Because threads can often simply demanding applications.
SH, from which RapidMind's core tech came from, is FOSS and you can do
many of the things their stuff does with SH.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
And it is called boost::futures
.The theory behind it, though, it not new: the Actor model is quite old, and it has been used in Erlang for quite sometime.
The communicating Sequential Processes style of programming allows for many lightweight simple threads that communicate over channels rather that the monitor based thread synchronization.
The OCCAM language implemented this style of processing and the Transputer chip implemented a fast context switching hardware that OCCAM could run on.
This was all done back in the 1980s.
I even implemented the original version of the Java Communicating Sequential Processes API which brought CSP style programming to the Java world, although it is based on Java's underlying Thread mechanism so context switching isn't as fast as it could be.
Most apps (who cares about major?) are single-threaded.
Oh really. Hmm let's see on my box right now:
firefox - 20 threads
thunderbird - 6 threads
vlc - 7 threads
acrobat - 3 threads
foobar2k - 7 threads
And of course all the Office apps which are all multi-threaded.
The poster's point is that writing a multi-threaded app JUST BECAUSE THERE ARE MORE CPUs/CORES to handle them is pointless and stupid. If the app only requires a single thread, use just one.
Actually I agree with the statement. However the original poster might have meant that, but his words taken explicitly painted a different, and quite wrong, picture of multi-threading.
What a lot of multi-thread-happy people seem to miss is that as long as the OS is multi-tasking, the other resources will not go to waste just because the app in the foreground isn't using them.
I don't multithread because I'm worried about "the other resources going to waste", I use it because MY apps resources might be going to waste. If I'm blocked waiting for some IO, there might be other things I could be doing in the mean time. I don't care that the OS is happily using those spare cycles for other stuff, it's my app that _could_ benefit from those cycles. I've lived in the "server" world for a very long time, and those times blocked by IO and taking advantage of multi-cores/cpu's comes up over and over. In my world saying that "99.99%" of multi-threading (and multi-processes) is a waste is simply wrong. Heck look at this conversation we're having. It is utilizing server code that spawns multiple code paths to share common resources (web servers, most likely not multi-threaded, but multi-processed). Our visualization app utlizes multiple threads such that it can do IO, render contents, and accept user inputs concurrently to provide a better user experience. How can anyone say that the overwhelming majority of multi-threading is not necessary to the point of being stupid?
For those who have not caught wind of this yet, transactional memory is currently the most promising solution to this problem and perhaps the most-covered subject in research conferences on parallel computing today. There have been several proposals for both hardware-based (at the cache level) and software-based architectures. Transactional memory greatly simplifies concurrent programming. When using transactions instead of locks, deadlocks go away completely and there is increased concurrency.
An unjust law is no law at all. - St. Augustine
There's a lot of posts saying that multithreading is really hard, which is completely true... But what RapidMind is providing is something else, something more like a SIMD model or vector computations. It solves things like elementwise operations on large arrays in an efficient manner using whatever parallel computing resources are available. It's a language with a semantics that don't require complicated synchronisation because you're bascially telling the compiler which operations are independent and then it can go off and compute it in the most efficient way possible. RapidMind was designed to make GPGPU programming easy, so it's a generalisation of the pixel shader model where you have a lot of 'threads' computing the color of each pixel on the display in parallel. This is an easy problem, because there is basically no communication between threads.
The interactive way to Go -- http://www.playgo.to/iwtg/en/
Yes, gcc 4.2 supports OpenMP. As others note, parallel programming is still not trivial. But OpenMP is very nice. I have a write up on building and testing gcc 4.2 on OS X here: http://alphakilo.com/openmp-on-os-x-using-gcc-42/.
Serious advantages are that OpenMP can be retrofitted to existing C/C++ and Fortran code. I know that everyone prefers to start from scratch and use Erlang or some other solution, but in a project I am working on, we already have about a million lines of C++.
Current OpenMP implementations favor SMP machines, but one can go even further with the Intel OpenMP for clusters solution. I have not tried it myself yet, but I understand that it makes the issue of non-shared memory across the cluster machines transparent.
As in all cases YMMV. But, if your code is amenable to parallelizing, OpenMP is a pretty straight forward way to go.
I work with network apps on a daily basis, thanks!!. Instead of using a new thread to handle each network transaction, the comms layer is fully asynchrounous, and never blocks. And so, the gui can submit requests to the server without fear of becoming unresponsive. Responses are placed back onto the gui thread when received.
As the articles say, the lock pressure is moved from the reader to the writer. Transactional Memory scales amazingly better when you have multiple threads which are reading common data. Please note that in today's system architectures even READING data on different cores at the same time may not be thread safe without memory barriers put into place to synchronize the caches.
There have been many papers written about the efficiency gains.
And as a bonus, writing multithreaded software with a "Transactional Memory" scheme is easier to "prove correct".
--jeffk++
ipv6 is my vpn
so maybe it wasn't on the PS3 but valve seems to take a more practical apoach.
This Article has some good info on what Valve is doing to bring threads into gaming.
Personaly I'd rather use their model because it's more utilitarian and less 'Because I can'.
Money is the root of all evil?
"Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers."
The marketplace wants and needs new technologies for more powerful processors. Multicore serves the needs of chip makers, not their customers. Making all software multi-threaded is trying to solve the wrong problem. It's going to result in lower-quality software without a significant increase in performance.
The real need for threading over multiple processors ("distributed computing") is in scientific applications, where tons of data needs to be tediously crunched.
The company "Interactive Supercomputing" offers a product that takes your code, and "parallelizes (multi-threads) it" with almost no additional work, allowing you to run it on multi-processor systems, or even on networked grids and clusters of numerous computers.
They started with a module for MATLAB, since that's a language commonly used to perform data-intensive scientific tasks, but are now working on a Python implementation!
Back in April of 1989, the Communications of the ACM published an article describing a language for just this sort of purpose, called Linda. It provided elegance and simplicity in multithreaded programming at the expense of more overhead for coordination (always a tradeoff). Communication was done by each thread putting the results of its processing into a shared pool, from which downstream threads would periodically take messages and perform further processing. No synchronization really, just producers and consumers operating on this shared pool of data. Obviously this would not be a silver bullet for every multithreaded need, but strong in the "simple" department.
I don't think you getting the kind of advantages you might assume with this approach. First off with non blocking IO you're wasting a lot of CPU time checking to see if you have a response vs. blocking. Second your manually balancing performance issues which is something modern operating systems are vary good at. Finally you're limiting the kinds of third party tools you can use.
Now you could easily have a GUI thread that uses message queues to talk with the networking layer(s) without changing much at all. Add in a simple thread pool to handle performance issues and you're going to have a more responsive and scalable app which also more flexible. (Granted if you have already spent a lot of time tuning things it's not such an issue.)
And, granted in some ways it's easer to debug your style of app, but as soon as have random developers mucking around in the code you're heading for trouble.
Actually, the rise of web apps backed by a multithreaded web server and a multithreaded DB in the 1990s made this pretty easy for millions of people.
Each one of those 80 cores only contains two FPUs and a simple message router. Such a core is *nowhere* near the complexity of an Opteron or a Core 2. The chip resembles a DSP or a GPU more than a real CPU. Sure I could fit more than a thousand 286 cores onto a chip, but I wouldn't want to run Linux on that (because even the simple x86 ops would take several cycles apiece -- transistors weren't thrown at pipelining or superscalarity). Not that Intel's baby isn't laudable as an engineering achievement, but they are grossly mis-marketing and over-hyping it -- almost lying about the nature of that beast.
I think it is pretty clear this represents a *paradigm shift* in the programming of thousands of animated chickens. I'm throwing away all the old software tools I used to *leverage* for this purpose. This will vastly increase my *efficiency*. I have a feeling it will also bleed over into different, but similar tasks -- like programming thousands of animated pigs or possibly ducks. It is not just an *evolution* in our ability to program thousands of chickens, but rather a *revolution* in the sense that it is *disruptive technology*.
Separate processes allow related tasks, such as pipelines, to run on *part* of the task. So, Pipes, socketpairs, and FIFOs require a master slave organization. Shared Memory (like threads) still requires locks to prevent two processes changing data at once. (I believe) Job controlled processes can run apart, with un-related processes in-between, slowing the calculation.
Threads support better co-operative or competitive models. They do not need to build IPC sockets to communicate. Threads allow "Thread Specific Data" blocks where it's vital to not share variables. Also, it is a matter of *documentation* to mark "shared variable" sections, rather than use Shared Memory to tell sections apart. Since threads run *only* when their common process is active, It is easier to synchronize them for a common goal.
Matrix algebra is one such co-operative activity. Each dot-product calculation can occur concurrently. The true number of dot-product threads depends on the hardware's CPU count. Modern Fortan compilers will create this in "parallel mode" without any code change.
Games are competitive models. Each thread represents a condender, such as a race car, or a Prolog search tree. Once a condender reaches its goal, it tries to lock a mutex for itself. The race car found an open space on the track. Prolog found the best (first) path to a goal.
First off with non blocking IO you're wasting a lot of CPU time checking to see if you have a response vs. blocking.
Not true if the lower layer is using callbacks. Your assuming that your using some type of polling which isn't necessarily the case. Of course if you're using a callback, then you have an implicitly multi-threaded app, but that's another story. But generally yes, it's been my experience that people use some manner of polling, which can be ok, but I've seen more poorly written polling code than multi-threaded code so you haven't really bought much from a pragmatic pov.
"However, multi-threaded development has been notoriously hard to do.
if ( statement.agree() )
leave_slashdot_now();
Noone I've ever met thinks this is even a tiny bit difficult. Maybe the problem is that there are not yet compilers AT ALL for these new chips? Nah, blame multithreading!
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
Stanford's Sequoia compiler is supposed to be open source eventually. Plus, it looks a little easier to use than RapidMind, PeakStream, or CUDA.
I'm sure the demonstration would've been a lot more difficult if he'd used philosophers instead of chickens. Thing is, chickens can't even hold chopsticks. A chicken just goes straight for the feed, so there's just one resource being acquired. It's still possible for a chicken to starve, but as chickens don't eat that much it's more likely that any shut-out chickens would simply go hungry for a while, and then get to eat before starving.
---GEC
I'm but the humble pupil, seeking to snatch the scratchbuilt pebble from the master's fully articulated hand
I've probably used half a dozen different parallel extensions of C and C++ over the years. This one doesn't look revolutionary, it merely looks painful.
As the articles say, the lock pressure is moved from the reader to the writer. Transactional Memory scales amazingly better when you have multiple threads which are reading common data.
OK, maybe I'm being dense here, but even if readers don't have to lock explicitly, at some point they need the new view of the data (i.e a writer has come along and modified the data). What is the mechanism to perform this "switch" that can be done in a way that doesn't require some type of lock even if it's to simply switch some pointers around. I guess one question I should ask is that are we talking about userland code here? I've been assuming yes. If you have support at the OS/hardware level for these transactional memory mechanism than sure, I can easily see how you can manipulate memory at a low enough level to maintain performance. Please enlighten me and/or point me to some articles.
Proper analysis of the problem to be solved and then the creation of a functional specification, with lots of diagrams of process and data flow, that is then vetted by your peers.
Undetectable Steganography? Yep, there's an app fo
Most apps? I think you are living in a PC-centric world (and even then I do not believe its true). Ever write a real-time embedded system? Its pretty much impossible to prove you can meet several hard dead-lines with a single thread. And most systems in this realm do not use time-slicing (which I assume you mean when you wrote "multi-tasking"). Not all multitasking OSes are time-sliced. In the real-time world we usually use OSes that do cooperative multitasking that are priority preemptive. Unless a higher priority task wakes, or the owning task blocks, a task has to processor until it gives it up (kind of how Windows 3 worked). It's wrong to assume that all programs and programmers use a hosted system (Windows, Unix, etc.). Actually, there are more embedded systems in the world than PCs, Macs, and such. People who rail against multi-threading have no concept there are systems out there unlike their home computers.
Anonymous Cowards suck.
Those of us who've worked on client/server types of apps have grown accustumed to managing mutliple processes (users) via databases. The database becomes the central "coordination engine", and A.C.I.D.-compliant databases ensure there are no stuck locks or race conditions. There may be lessons from this client/server world for a smaller scale. (Actually, I used to do it with desktop databases also, such as FoxPro; however, it was not ACID-compliant at the time, but close enough to be practical. As long as a user knew who had locked something, they could cordinate it among themselves.)
Table-ized A.I.
Did any of the chickens cross the road? If so, did the chicken say why?
Isn't one of C++0x's main features MultiThreading support?
STM has its own problems: http://patricklogan.blogspot.com/2007/02/misguided -road-not-to-be-travelled.html
cpeterso
LabVIEW. http://www.ni.com/labview.
A project that you can download and play with today is Trolltech's QtConcurrent. Given a task it will automatically manage creating threads and distributing the task among your cores.
From the project page:
The classes and functions available in the Qt Concurrent package allows you to write multi-threaded applications without having to use the basic threading synchronization primitives such as mutexes and wait conditions. This makes it easier to reason about and test parallel programs to make sure that they are correct.
The Qt Concurrent components manage the threads they use automatically. Each application has a global thread counter, which limits the maximum number threads used at the same time. The maximum is scaled according to the number of CPU cores on the system at runtime. This means that programs written with Qt Concurrent today will continue to scale when deployed on many-core systems in the future.
Very cool.
Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
parallel
{/* execute these statements in parallel if possible */
statement1;
statement2;
statementn;
}
sequential
{/* execute these statements in order as written */
statement_1;
statement_2;
statement_n;
}
- Tjp
I am in wallow with my inner money grubbing capitalistic pig. ... Oink!
Ideally, there would be hardware support for the data to be updated in a single transaction.
Another way to implement this is via pointers, where the pointer is updated once the new values are put into place. This pointer switch must be atomic. When a writer decides to write, it gets a copy of the current pointer. It starts the calculation and allocates a new result. This new result is put into place via changing the pointer with an atomic operation only if the original pointer has not changed.
If it did, then it must THROW OUT the calculations that it did and start them over again.
The reason why this is more efficient is that in typical programs more threads read the same data, and very little data is actually needing to be written to by multiple threads.
--jeffk++
ipv6 is my vpn
OK chaps, care to enlighten me as to whether this method relies on specific language constructs, or whether it is implicitly managed after the fact?
I.e. do you need thread objects and synchronisation primitives (such as critical sections), or can one design a program as though it were serial, and then let the compiler judge how to manage threading and concurrency?
How dare you be so modest!! You conceited bastard!!
If you're donwnloading the video, do their servers a favor and use the torrent. HTTP downloads are incredibly slow off of their server right now ....
When technology hits a dead end on one track, the industry goes looking for another track where progress can be made. A few brilliant people stay at the dead end beating their brains against the wall. You can stand there with your arms folded, tapping your foot, waiting for the brilliant people to knock down the wall, but I'm going to pursue the low-hanging fruit elsewhere. When they do knock it down, I'll be right behind you, and what will I have lost?
OpenMP is an open standard for multi-platform shared-memory parallel programming in C/C++ and Fortran. It is supported by GCC 4.2 and greater.
The languages that do support native compiling almost always have other problems like horrible syntax (O'Caml, Lisp) or just general lack of refinement.
Lisp is famous for its syntax. (Some of the really hardcore people say it has no syntax.) In any event, it does allow unrivaled support for defining your own syntax. That's basically how the language is used: you use Lisp to build the language you need.
So if you think Lisp has "horrible syntax", I question how you're using it. Part of the beauty of Lisp is that anything you don't like, you can change. If you really like Erlang or Haskell, you can add their syntax to Lisp yourself. (The opposite, as you noted, is not true.)
I admit it's a hard feature to wrap your brain around, if you're used to other languages. People coming from C++ tend to have the mindset:
- learn the language's data types and raw syntax (how to define a function, how big ints are, etc.)
- write your program using those features
and so the Lisp strategy of "imagine the perfect language for your application" doesn't occur to them, because in C++-land it wouldn't do any good. Your langauge doesn't support syntactic abstraction.
For example, if you were to think "gee, Aspects would really make this program simpler", in Lisp you might say "OK, so I'll spend an afternoon writing a macro to do that" (it's been done). In most any other language you'd have to say "Gee, I hope Gregor Kiczales and a few of his buddies from Xerox PARC decide to spend a few years founding a new company to write a compiler to add support for them to my language" (also been done).
One of the Lisp web application servers famously doesn't use the Common Lisp "if" -- the author thought its implicit "else" hurt readability, so he wrote his own ("if*"). A C++ programmer would call this "syntax", and assume that you can never write your own new control structures (because in C++ you can't).
I don't know OCaml, but I suspect there's a good chance you're dismissing it out of hand, too, simply because it doesn't look much like C++. Go on and use C++ if you like it; if it works for you, that's great. But stop badmouthing languages you've never really used.
And its syntax does suck. I don't think that's a good enough reason to dismiss it however, especially since its so damn good in every other way:
* Fast bytecode compilation that can be used in #! scripts
* optimizing native code compilation that performs as well as or better than every other language outside of C/C++/D
* debugger with backstepping
* profiler
* modules with seperate compilation
* strong static typing
* variant types and pattern matching
* imperative and OO features for when you want them
* a full lex and yacc
* camlp4 for creating syntax extensions or redefining syntax
How anyone could pass on all that just because the syntax is crappy is beyond me.
Win2K is the last one to support POSIX: http://support.microsoft.com/default.aspx?scid=kb; en-us;308259
You probably mean CreateProcess(), a Win32 API.
I think what you're really arguing is for work/task loops built up by message passing. These work queues can either be a different threads, or processes, and spend most of their time looping through the queue, instead of being created&destroyed per "user".
Apache 2 is for windows. The threaded model doesn't perform any better than the prefork model on unix. For unix users, apache 2 is just a way to get a less reliable version of apache that has had many new security holes introduced.
"When technology hits a dead end on one track, the industry goes looking for another track where progress can be made."
There are two different industries involved here: the hardware processor industry and the software industry. It looks like the former is looking for the latter to bail it out. Multicore isn't really a new track, it's more of a repackaging effort.
"I'm going to pursue the low-hanging fruit elsewhere."
I think all the discussion over this issue suggests that it isn't low-hanging fruit.
"When they do knock it down, I'll be right behind you, and what will I have lost?"
As always, it depends. If you and your co-workers don't make any additional mistakes due to the added complexity, probably nothing bad will happen. Obviously if multicore doesn't actually add a lot of multithreading to your project because you're already using it extensively, it won't have any effect at all.
Seems to be what?
You gotta love the American educational system...or the lack of Web spellchecking...one or the other...
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
Considering that hardware designers have been leaning over backwards for years to help software programmers stick with one fairly easy programming model, I wouldn't be too quick to blame the hardware industry. They know which side their bread is buttered on. The hardware company that requires the least adaptation from software developers has the advantage, just like the software company that requires the least change and adaptation from users always has the advantage.
This is Slashdot. People argue against something so they'll never have to try it :-) I've never had a choice; my first professional job was at a Java shop that used multithreaded servers, and at my current gig one of my ongoing tasks is to add multithreading to single-threaded C++ apps so that our customers get as much power as possible out of their multi-CPU workstations.
What I've learned is that there's basically a one-time cost when you redesign and refactor a single-threaded app to a multi-threaded app. Multi-threaded design is harder, but the resulting designs are often simpler, since single-threading forces you to specify many arbitrary choices about sequencing. Then when you consider resource utilization, it turns out those arbitrary choices aren't arbitrary after all, so you end up interleaving work in weird ways to keep resource utilization up. With multi-threaded apps, I just leave it up to the thread scheduler, which does a pretty damn good job.
On the programs I've worked on, the advantages of cleaner design (no need to specify serialization where it doesn't exist, no spaghetti flow control) have cancelled out the disadvantages of more complicated implementation (ensuring protection for every shared resource.) The cost of ensuring protection is mitigated by the fact that most of the resource-protection code in my programs is contained in a few shared data structure implementations that I reuse in all my projects.
Of course I write lots of single-threaded programs, too. Most of my programs start out single-threaded, with an eye toward future multithreading. Single threading has a decided advantage for simple programs; I just believe that advantage disappears when programs get large and complicated.
Game console programming is just as sheltered as any other narrowly focused programming field. Higher level languages aren't about sheltering, they are about abstraction. I understand the low level implimentation details, but I can choose to leave those details out of my mind, and concentrate on the high level problem. You use this same technique all the time, but you have the unfortunately common misconception that abstracting certain things is good, and abstracting other things is bad.
You don't have to impliment functions/procedures yourself, because you are using a high enough level language that it provides an abstraction for that. This makes you more productive, and your code easier to modularize and maintain. Keep moving up the high level chain and more and more things get abstracted, making you more productive and your code better. Especially since most low level programmers fail to impliment many abstractions that are very powerful, because they have never bothered to try existing high level languages and discover the power those abstractions provide.
Well, everytime someone posts a multi threading example it has to do with something that is not done by 55% of the programmer out there; programming a business application accessing a database. I can understand an OS or a DB engine running multiple threads but how about a report or a OLTP transaction oriented example?
bah!
Maybe you should rethink your position about synchronizing every method to make Java 'thread safe' Cause what that will really do is cause all your code to freeze when it is actually run in a multithreaded environment unless it is very simple.
You cannot just make code thread safe. You have to define where you are going to share memory between threads, and then figure out all interactions between the threads and define where the data can be shared. Otherwise you are asking for trouble. Using a process model makes it HARDER (but not impossible) to hang yourself, by forcing you to use shared memory, message passing, or other techniques to force you to think about the shared state of your code.
Writing code is easy, but the tools we choose to use make it hard to write good multithreaded code.
"Considering that hardware designers have been leaning over backwards for years to help software programmers stick with one fairly easy programming model, I wouldn't be too quick to blame the hardware industry. They know which side their bread is buttered on. The hardware company that requires the least adaptation from software developers has the advantage, just like the software company that requires the least change and adaptation from users always has the advantage."
I didn't see the "leaning over backwards" you refer to. What hardware improvements have hardware designers avoided because it would have caused the current programming model to change? You don't have to look any further than the x86 architecture with its segmented memory to conclude that ease of programming has never been a key goal at Intel.
"I've never had a choice; my first professional job was at a Java shop that used multithreaded servers, and at my current gig one of my ongoing tasks is to add multithreading to single-threaded C++ apps so that our customers get as much power as possible out of their multi-CPU workstations."
One advantage of server apps is that they inherently lend themselves to a mutithreading approach since each request can be seen as a seperate unit of work. As you probably know, the challenge comes when no such natural break-down is apparent.
I only claimed synchronizing would almost make it thread-safe. I did not ever state this was a good approach. You should read the second and third paragraphs of my GP.
The cesspool just got a check and balance.
Intel desktop and workstation microprocessors, or any superscalar microprocessors for that matter, are a great example. The programmer (or compiler) writes a single-threaded assembly language program for an extremely simple machine that is basically a fiction. The assembly programmer gets to think in terms of ISA abstractions like "the EAX register" when in the real microprocessor, with its superscalar pipelines, out-of-order execution, branch prediction, and so forth, there really isn't anything you could reasonably point to and say, "This is the EAX register." The microprocessor is a dizzyingly complex hardware simulation of the simple machine that software programmers use as their mental model of "the hardware."
Thanks to the insulation provided by the instruction set architecture, programmers have been able to mostly ignore the growing complexity of microprocessors and continue thinking in terms of the same single-threaded in-order execution model and a relatively slowly evolving set of ISAs. The cost of this mismatch between hardware and programming model becomes evident when you look at how much more performance you can get when you shift some of the burden from the hardware to the programmer, as with the Cell processor. (You can also save lots of power by forcing software tools to manage instruction scheduling. Microprocessors in cell phones and other battery-powered devices typically can't afford to spend power on analyzing, reordering, and creatively dispatching instruction streams. They leave that to compiler writers and programmers.)
When you look at the alternatives (and I'm sure Intel, AMD, and IBM have worked with more alternatives than I'll ever even hear of), symmetric multiprocessing is the smallest, least disruptive step that software developers could be asked to take. We still get a nice simple machine model that thankfully reflects very little of the complexity of the underlying hardware. The only thing that has gotten worse is that instead of getting unconditional, dramatic performance improvements for no effort at all, we get dramatic performance improvements conditional on our ability to use cores efficiently. And heck, single-threaded programs might keep getting faster anyway.
P.S. I admit I don't know anything about Intel's introduction of segmentation, but reading what Wikipedia says, it seems like the key selling point of segmentation was the ease of porting old software. Requiring users to move from straightforward 16-bit addressing to straightforward 32-bit addressing would have kept both hardware and software simpler than using Intel's segmentation idea, but segmentation made it easier to port old software, so it was a success. It sounds more like a present vs. future tradeoff than a software vs. hardware tradeoff.
"Intel desktop and workstation microprocessors, or any superscalar microprocessors for that matter, are a great example. The programmer (or compiler) writes a single-threaded assembly language program for an extremely simple machine that is basically a fiction. The assembly programmer gets to think in terms of ISA abstractions like "the EAX register" when in the real microprocessor, with its superscalar pipelines, out-of-order execution, branch prediction, and so forth, there really isn't anything you could reasonably point to and say, "This is the EAX register." The microprocessor is a dizzyingly complex hardware simulation of the simple machine that software programmers use as their mental model of "the hardware.""
This sort of "fiction" is a fundamental requirement of these devices to be classified as microprocessors; otherwise you'd have to perform "programming" by doing digital logic design or microcoding. It's like saying that car manufactures accomodate drivers legacy expectations by providing wheels for the car.
In addition, a number of these added features don't really preserve legacy expectations anyway. Hard real-time software can't really be written for most modern processors because the execution time of a particular section of code is not deterministic.
"P.S. I admit I don't know anything about Intel's introduction of segmentation, but reading what Wikipedia says, it seems like the key selling point of segmentation was the ease of porting old software."
Given the fact that the 8086/8088 couldn't natively run 8085 programs, I don't think backward compatibility was a major goal. If you have to translate instructions anyway, changing from an 8 bit address to 16 bit address isn't that much harder.
This pointer switch must be atomic.
Ahh, but isn't this the crux of the matter? The only way to make the pointer switch atomic without assist from the OS is to use a lock, which means that all readers must test the lock to make sure the pointer isn't in the middle of being updated. Voila, you have a protected section of code that even readers have to do some type of test. Forgive me if I'm still not understanding something but I just can't see how this can be done without hardware/OS help, then at that point you could also devise a much cheaper lock while you're at it.
If it did, then it must THROW OUT the calculations that it did and start them over again.
I guess this is where knowing your app comes in. If your app happens to have many writers (say an OLTP app), then the cost for those collisions gets extremely high, so it may be worth your while to block writers. Then again, while many OLTP apps have many writers, they tend not to have high level data collisions, so this may only be applicable if your dealing with some type of index cache or tree. Anyway, interesting topic (ok, my true nerd self has been exposed).
It isn't necessary for the ISA to be a fiction. In simple processors, the ISA *can* reflect the real, concrete structure of the processor. Assembly statements like "mov r2, r1" might tell you quite literally what is happening inside the processor. In fact, the programmer doesn't even need to know whether the ISA reflects the real structure of the processor or not. What matters to the programmer is that the behavior of the processor conforms to the model specified by the ISA.
Hardware designers went from processors almost as simple as a simple ISA to processors thousands of times more complex than the most complex ISA. Meanwhile, software programmers didn't have to come along for the ride. ISAs grew, but not at the same rate as hardware complexity. This is part of what I mean by hardware designers bending over backwards for software programmers. When it becomes possible to put more and more transistors on a microchip, yet the programming interface doesn't scale with the amount of processing power, it becomes hard to use the power efficiently. A Xeon can do a huge amount of work per unit time, but it's like five hundred people getting together to build a small house. When you throw all those transistors at a single instruction stream, most of them get applied to fairly small improvements like improving branch prediction by 2%. The instruction stream is simply inadequate to take advantage of that much hardware. Hardware efficiency suffers so programmers can keep using a simple model.
That's the logic behind multicore chips, and it's also why the Cell processor can put up such ludicrous performance numbers without any kind of technology breakthrough. Instead of wasting vast numbers of transistors on making small tweaks to single-process execution, why not make them available for real work? Let the programmer decide what to do with them. The programming model gets more complicated, but the programmer gets to use a bunch processing power that has until now been inefficiently applied because the programming interface has been too narrow.
I didn't mention backward compatibility; I mentioned porting. I'm pretty sure that changing the pointer size is in fact a big deal, especially for languages like C where types aren't exactly respected. If I understand what I'm reading, Intel's design allowed old programs that used 16-bit addressing to be recompiled to use 16-bit addressing inside a single segment. That meant that nobody had to dig through old code to find and correct all the places where a pointer was stored in a 16-bit datatype. Meanwhile, new programs could use up to 20 bits of address space.
You forgot to mention that Ada 2005 now adds Interfaces to both protected and task objects. See:
i ng
http://en.wikibooks.org/wiki/Ada_Programming/Task
Ada's multi-threadeding is not only without the pain but great fun!
Martin
...and Ada 2005 even supports Real-Time programming. It is possible - just not with C++.
i ng
Find a short intro here:
http://en.wikibooks.org/wiki/Ada_Programming/Task
Martin