Multi-Threaded Programming Without the Pain
holden karau writes "Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers. However, multi-threaded development has been notoriously hard to do. Researcher Stefanus Du Toit discusses and demonstrates RapidMind, a software system he co-authored, that takes the pain out of multi-threaded programming in C++. For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. The talk itself is interesting but the demo is golden."
The multi-threaded chicken or the multi-threaded egg?
--josh
Deadlock detected!
I didn't know the PS3 had thousands of cores ;)
I think what he meant was 'each tracked in a separate thread'...obviously each core is still handling many threads. I haven't watched the presentation and don't plan on it until later today, too much to do and I'd rather read something about it. It just sounds like it provides an efficient high level way to write a multi threaded app. Evolutionary but not revolutionary?
Multithreaded development is commonplace in applications that need it. The places it's not common in are:
-- old-style Unix development, because of the 'lightweight process model'. It's a unix-ism that's on the way out but until it disappears we will have some things like Ruby that don't 'get it'.
-- places that have absolutely no need for it, which certainly includes the chicken demo. One core per chicken?? Seems more like the guy just discovered threads but hasn't quite grasped what they're for.
Whence? Hence. Whither? Thither.
Where is the abstract getting "hundreds of cores in desktops on the horizon" from? Is this actually expected soon, or are they just looking ahead a bit too eagerly?
Both, RapidMind and Peakstream are proprietary commercial solutions and those companies are trying to lock users into their particular framework. What we really need is the equivalent as true open-source solution, perhaps as a gcc extension. Does anyone know if there is progress being made on this?
Also note that certain programming languages can make multithreaded programming a lot easier. Nothing against C++ (one of my favorite languages) but no matter what you do it's relatively hard to use in multithreaded applications compared to a functional language. We are already seeing more functional features put into existing languages.
The main problem I see is that there is lack of focus in the functional arena. Many current functional languages are designed to use a VM with bytecode (Erlang for example) and don't support native threads easily (often requiring multiple VM instances and slow[er] message passing). The languages that do support native compiling almost always have other problems like horrible syntax (O'Caml, Lisp) or just general lack of refinement. Arguably Haskell comes the closest but suffers from a complicated and large backend support requirement like Java.
Without native thread support it's hard to take advantage of multiple processor cores. Too bad we don't see more mature native compiled functional languages out there.
The ratio of people to cake is too big
You choose to go with a multi-threaded application when it is necessary. Anyone who just starts adding threads because they feel they need to utilize the number of cores is a complete idiot in my book. Hell, why don't we just put spin locks in there so your CPU usage shoots up and it looks like I'm using it to its full potential?
My point is that there have been a few applications I've written that require a multi-threaded solution. Perhaps this API would have made my life easier but I doubt it as I had to pretty much structure by hand each thread. There are frameworks, graphical libraries and that also use multi-threading that the scheduler has taken care of in the past. Hurray for multi-core if you use those.
A good programmer keeps things as simple as possible. They will be easier to maintain in the future. I'm afraid that this is unneeded layer of abstraction or some nut case trying to "utilize cores" for the sake of it. No one has only one application running at one time. The OS is usually running, you have a network process, etc. If I write my application to use one core, I'm giving the user more options to do with the other cores whatever he wants. Let the scheduler work with the futuristic hardware and sort that crap out.
Also, not everyone is multi-core already. Take use into consideration please!
My work here is dung.
From the site:
Man thats some funny stuff. Wow that cracked me up. A *games* company using a tool that has this level of indirection?!? I sure hope these guys got a lot of money from their sucker VC to roll in.
Look guys. There is no multi-processing silver bullet. It isn't even such a hard problem, *if you stop trying to solve it at such a low level*. Break your application into separate pieces that, *don't need to communicate very often.* Then this is the same kind of problem scalable websites like Google, MySpace, Hotmail and so on, have already, just without having to factor in the reliability issues. Finer grained multi-threading just leads to deadlocks and is really hard to debug. If you *really must* render the same sphere on 100 processors at the same time, then you need the speed of a custom coded solution. But you don't so let it go. The main loop of your program will be just fine as a single threaded implementation, 1 processor will do, and farm the 10% code / 90 % heavy lifting out in big clean chunks to other processors. If you find yourself writing some bizzare multi-threaded message passing system so that you can have 100s of threads all modifying the same live object model at the same time -- you are fucked, just forget about it 'cause you will never be able to debug that one killer bug that you know is going to get you right as you go to ship.
-- http://thegirlorthecar.com funny dating game for guys
Is there a version that isn't a 400+MB movie file? I was expecting an article.
99.99% of the time, multi-thredded programming is not needed, and can actually *Slow* things down as mutexes block each other, killing performance - It takes a certain amount of time to establish a mutex, so two threads working on a single bit of memory can perform like a dog as each try to block, and unblock the other.
Wait, can you tell me how you can get to parallel tasks that need to access the same resource to coordinate in a more performant manner? Aren't referring to poor design/algos vs anything specifically having to do with multi-threading? Saying mutex's are slow and should be avoided is like saying disk io is slow and should be avoided. While true, if my app needs to deal with storage, then that's what it needs to do.
Also, Multi-core CPU's, have made using windows bareable when under load, because current singly threadded processes leave one spare core available for the OS to use when any app decides to eat all the cpu!
Actually multi-core has made it cheaper. Multi-processor boxen have been around forever and those who've used them have been enjoying the benefits for a gazillion years. Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time. There is no such thing as a simple "a single threaded app leaves the other processor/core free for OS use", all cores are constantly being used by all sorts of crap unless the OS is configured to actually force the above to happen (processor affinity). In that case even a multi-threaded app can be forced onto a single processor/core.
100% agree. Concurrency is a problem, not a solution, and it needs to be abstracted out early if you need it at all.
No. Whether something can be done effectively on multiple cores doesn't depend on the programmer, but on the type of processing. Some things have to be done in a certain order, and there's nothing even the best programmer in the world can change about that, period. If you try hacking something together that uses multiple threads for this type of processing, you'll just end up making things slower and messier.
On the other hand, there are other types of processing that just lend themselves fantastically to being done multithreaded.
It may not seem like a bad idea now, but if we're to go down this path eventually we'll be praising researchers for getting a flavor of BASIC that can take advantage of all the PS3's processors seamlessly, but still has the same issues BASIC has.
I think in general this begs the question, "Are we better off in the long run empowering ignorant programmers, or informed programmers?"
For instance, would it have been better for the researcher to perhaps focus on how to teach programming for multiple processors rather than coming up with a library that "abstracts it away"?
This is not to take away from the fact that he got it running on the PS3 though. Kudos for that.
In Java, this is allowed because of performance issues. You can make it almost 100% thread safe (note I said almost) by synchronizing every method, but there's still some gotchas in the JDK.
Multi-threaded programming is a skill that comes with a level of understanding, much like students of mathematics must reach a level of understanding to comprehend Algebra, Calculus, Differential Equations, and Partial Differential Equations (yeah, that last one's a bear, especially when you apply it to various physical models) respectively. It's why you can be a wiz at adding or subtracting, but utterly fail at algebra, or a wiz at algebra, but never be able to differentiate or integrate a function, and so on.
Writing code is easy. Any moron can do it, as the Java and PHP hordes have shown. Writing good code is harder, designing good OO code is even harder, and designing and writing good multi-threaded code is yet a step beyond that. It's why there are so few well-written multi-threaded apps, and most of those are in server land.
The cesspool just got a check and balance.
Some researchers actually do bother: http://manticore.cs.uchicago.edu/
The problem with programming the PS3 is that once the complexity of its parallel processors is handled, the CPU is so fast that it consumes and produces data much faster than the IO available. The Cell is a basically 204GFLOPS/32bit machine (plus the Power RISC, basically a Mac G5), with an internal 1.6Tbps bus. But even its builtin gigabit ethernet is puny compared to that kind of dataflow. It's not clear whether the USB slots are 1, 2 or 4 buses at 480Mbps each, but even 2Gbps more isn't so much. Maybe another gig-e can plug into its CompactFlash slot, bringing the total up to 4Gbps, but that's still only 0.25% the chip bus. In desperation, perhaps the SATA bus could also be used for another 1.3Gbps. Adding the HDMI output with some fancy codecing (especially on the receiving host) gives 10.2Gbps out, so the other 5.3Gbps can be used for input, but that's still only 5.3Gbps throughput, probably a lot less at under 100% efficiency per channel. The Cell can spin its wheels with 2000 instructions on the data it's got before it gets more. There are lots of "multimedia mixing" and transformation applications that could run multiple cycles in that 2K instructions, which instead need more machines for more IO.
The PS3 doesn't seem to have the PCI-Express bus that would solve all these problems. For some reason Sony left out its old pet, FireWire, which could have added buses at 800Mbps each. There doesn't seem to be any expansion whatsoever, except changing the HD on the single SATA connector. To use what it's got, a huge amount of complex, heterogeneous IO management is necessary to use its power.
It's strange to think that a $600 machine with around 5Gbps throughput and 7Tbps processing is a "toy", but the cropped IO makes the PS3 look that way, relative to its full power. Maybe a HW mod, even at $500 or possibly up to $2000, that adds PCIe for a half-dozen 2x10Gig-E cards, or even InfiniBand, will make this crazy little toy into more than just a development platform for games or prototypes for really expensive Cell machines. Who's got the way out?
--
make install -not war
"For his demo he created a program on the PlayStation 3 representing thousands of chickens, each independently tracked by a single processing core. "
Wait wait wait... How many cores does a PS3 have? Thousands? I suspect someone has their facts sadly mistaken. I think they meant 'each with its own thread and using multiple cores to processing the threads,' but that isn't nearly as impressive sounding.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Craft Beer Programming T-shirts
"Transactional memory"
--jeffk++
ipv6 is my vpn
I browsed through the 411 MB ogg file, but could not find any chicks. Where are they?
However, multi-threaded development has been notoriously hard to do
Only at first, once you wrap your head around it it becomes second nature.
To a newbie, recursion is hard to do. To somebody who's been writing functional FORTRAN for 25 years, object oriented is hard to do.
It's just another way of thinking about problems. The real bitch is having the toolkits and thread safe libraries at your disposal.
I don't need no instructions to know how to rock!!!!
I'm not sure what techniques the developer is using as the um, "article" is a little light on details (unless I missed something) But the concept of Active Objects (a trivialized way of using threads) has been around for a while with generic implementations of them becoming more mainstream rapidly. In the past week there as been much discussion about active objects and "futures" on the boost mailing list and it is likely that both will become part of boost shortly. To put it simply, an active object is an object which has its own threaded message queue, so it is asynchronous from the rest of the system and a future is a return value from an asynchronous method call, a "future" value. These techniques are quite reasonable today because of concepts like fibers and the NPTL.
And of course, a shameless plug for my active objects implementation (bsd style license). Actually, that page also does a decent job of demonstrating the concepts.
I Do C++
This stuff is an outgrowth of the Sh work done at Waterloo U. Anyway, the idea is that the declarations in your code are replaced. The new types redefine standard operators to generate code for a parallel machine (say, an nVidia card, or a PS/3 cell).
The code so generated can be run immediately, or deferred (note -- its been a while since I've looked at Sh, so I am being vague).
I didn't think that this was a GENERAL multi-threading solution; more a way to easily generate code for the parallel machines that are coming available.
Just another "Cubible(sic) Joe" 2 17 3061
Hail Eris, full of mischief...
E pluribus sanguinem
Mod parent up, PLEASE
First of all, I, and many others before me, have been writing multithreaded applications for years in the likes of Linux and UNIX. I have had to maintain multithreaded applications created by others. My collective experience tells me:
It is not trivial.
Let me repeat: It is not a trivial task. Even if you have libraries and an API which abstracts out the ugly stuff, you still have the problem of concurrency, proper locking, deadlocks, etc...
The majority of problems with using multithreaded programming come not from "ugly" parts of the OS/API layer, but from a misunderstanding of the problem. A few problems in computer science - particularly in the physical sciences - do benefit from multithreading. And it is easier to use threads when writing a game than just to execute all of the IO in one big loop (Hello DOS!). But for most applications, using threads is not only unnecessary, but overkill, and introduces the possibility of yet another class of bugs for which the application must be tested. Furthermore, as deadlock and race conditions are often timing related, they are the most difficult type of bug to find and fix. Finding and fixing this class of bugs is still somewhat of a black art in the industry, and is highly dependent on the skill and experience of the programmer.
In short, unless your system/application design cannot do without multithreaded programming, it is best not to use it. Even with a glossy API, you still cannot escape the fact that debugging a multithreaded application is an order of magnitude more difficult than a single threaded one. In any case, you shouldn't be using threads just because you can.
The society for a thought-free internet welcomes you.
I program in Vb.net and use multithreading for work on pictures.. jpegs and mpegs and such. I'm know I'm not a great programmer but Multithreading has allways been a pain in the keester for me. I still use vb.net 2000, but hofully they fixed the starting of stopped thread issue. or maybe reading a threads state accurately without having to pause the system for a few NS. I need to learn C I guess. maybe some classes would help.. maybe I shoudl just but the newest rendition of Visual Studio..
Ad eundum quo nemo ante iit!
I take your point about multi-core been around for years, however...What I should have been more clear about, is if 1 thread ends up in a infinate-loop, then, on a single core Windows PC, the whole OS starts to crawl, somtimes making it difficult to even open 'Windows Task manager' in order to kill the rogue process. Multiple processors/cores get around this problem - there will always be the idle cpu for windows to make up for its dodgey task scheduler. "Wait, can you tell me how you can get to parallel tasks that need to access the same resource to coordinate in a more performant manner?" - Nope, I cant. But, if you create some multithredded code, in which you create 2 threads that compete with each other for the same resource, then this is shite design, and the performance of your code will almost certainly be singly thredded code doing the same thing.
Pthreads has been out for a while. It is open source, and runs on Linux, Windows, and Mac(?).
Whether or not you believe concurrency should be an explicit library or a matter of compiler extension is a bit of a religious argument. But pthreads does offer the functionality, and works fairly well.
The society for a thought-free internet welcomes you.
Geez - at 411MB, it better be a complete operating system, plus development toolchain.
Many complete distros fit in under 411MB.
Nuts --
Just give us a little animated GIF of the chickens, we dont believe the rest of your claims anyway.
Agreed. The question is "will this be free?" If the answer is "no" then everyone leaves and Dr. Stephan finishes giving his speech to the guy picking up the rubbish.
Despite what the USPTO* clerks tell you, programming ideas are a dime a dozen. He's got as much chance of getting you to pay for this as I have of convincing all you C++ programmers to switch to my new proprietary (*D)++++(R)(TM) language. Only $1,250 a seat! What are you waiting for boys?
* = At least the guy who picks up garbage knows trash when he sees it.
I seem to recall pretty much every app I used under OS/2 took advantage of threading. The workplace shell, of course, being the prime example. This was in 1992.
The problem, I think, is that the majority of programmers out there today who were just hobbyists back then, were learning on a very single-threaded platform. Because the model was never there, it's 'hard'. With OS/2 3+, it was always there, and anybody who dabbled on that platform were immediately exposed to how to implement threads, as they were such a core piece of the OS.
Good morning slashdot!
As the (slightly terrified to find himself mentioned on slashdot) presenter in the video linked to above I thought I'd respond to a couple of comments in bulk. First off, I'm part of a much bigger team at RapidMind that builds this software to make targeting multicore and stream processors easier -- the system and the "chicken demo" was a group effort, and you can read more about it and the company in general in the article linked to from here, which unfortunately is PDF-only.
For those crying out about multi-threading not being the solution: you're absolutely right! Our platform's approach to programming multi-core processors is to expose a data parallel model. In this model, the programmer explicitly deals with parallel programming (writing algorithms to work well on arbitrarily many cores) but all of the standard multi-threading issues such as deadlocks and race conditions are avoided, and the developer doesn't worry about how many cores there actually are.
And no, the chicken demo didn't run each chicken on an individual core ;). But it did automatically scale to however many cores were available -- 6 SPUs and a PPU on the PS3, and 16 SPUs and 2 PPUs on a Cell Blade (on which we originally showed the simulation at GDC 2006).
If you want to learn more, drop by our website at http://www.rapidmind.net. You can sign up for a free no-strings-attached evaluation version if you want to try it yourself.
Well, if you're going to remove 99+% of the common trouble spots of multi-threaded coding by moving to a messaging paradigm, then yes, it probably is conceptually easier than OO. It can also be significantly slower depending upon the application's design and function and greatly increase its memory footprint. e.g., I don't think a game like Quake would work all that well under this paradigm. BTW, webservers generally work under your paradigm.
You'll also still have the potential of concurrent modifications in this scenario, but at least you won't be working on the same memory storage locations, potentially reading indeterminate/incoherent values. Instead you'll have inconsistent values displayed, depending upon which thread's data you're displaying.
I'll still make the argument that good multi-threaded coding is harder than good OO coding.
The cesspool just got a check and balance.
"Plus your statement is misleading, very few major apps are single threaded, the OS itself has a ton of stuff going on in the background, there are demons/services running all the time."
Plus, you completely and utterly missed the point of the poster you replied to. Most apps (who cares about major?) are single-threaded. The poster's point is that writing a multi-threaded app JUST BECAUSE THERE ARE MORE CPUs/CORES to handle them is pointless and stupid. If the app only requires a single thread, use just one. The other resources will get used by the OS or by other apps (that may, God forbid, *also* be single-threaded). He wasn't talking about dedicating a computing resource to an app. He was saying that an app should only use what it needs, with the understanding that the OS will make good use of any remaining resources for other tasks.
What a lot of multi-thread-happy people seem to miss is that as long as the OS is multi-tasking, the other resources will not go to waste just because the app in the foreground isn't using them.
which has had easy-to-use multithreading constructs built right into the language for the past 25 years or so.
Sheesh, evil *and* a jerk. -- Jade
This is ridiculous tripe. Multi-threaded programming is hard not because the libraries are hard to use but because it requires alot of planning and thought to decide if you can actually gain a benefit by going multi-threaded.
The main benefit of multiple cores will not happen in userland. It will be in the kernels and the libcs'. Once userland processes can effectively get memory from the heap with minimal locking we will see a performance boost system wide(I'm talking 100 processes can all request memory from the heap with 0 locking). This is why cores are important, because we will be running more and more applications on our computers, we won't need the performance in a specific app, we will need all of our apps to be responsive all the time.
I have some practical experience on this one.
I inherited a massive data collection routine which I identified as a good candidate for threading. When I mentioned the idea to the original duhveloper his eyes just kinda glazed over. The objection was that the routine has always "worked" so why introduce risk with increased complexity? After he left I jumped in and multi-threaded it. I was able to thread 20 already busy servers all working for me at the same time and each server was threading stuff without any performance degredation. Normal 24/7 operations with many concurrent production sessions were not impacted in the least. The end result was that the former 22 hour long process now completes in 22 to 41 minutes and it's much more stable and reliable. And with a good thread pool class it really wasn't that complicated. Actually, the hardest thing was getting beyond the glazed eyed reservations of a clueless duhveloper who was too timorous to try something "new."
Why did the multi threaded chicken cross the road? to the other side To get the other side To get to the
http://www.rense.com/general79/wdx1.htm
You must not work with network apps all that much.
Think of the most basic email app possible. Now when a user presses "send mail" would you create a new fork (), try and micro manage the remote connation in a thread that handles the GUI, or force the user to wait around?
Next think about video where you have a resource intensive task AND you still want a highly responsive GUI.
Granted if all you ever work with is simple biz apps with one user you have a point but I think your 99.99% estimate says more about the work you do than programming in general. Because threads can often simply demanding applications.
SH, from which RapidMind's core tech came from, is FOSS and you can do
many of the things their stuff does with SH.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
And it is called boost::futures
.The theory behind it, though, it not new: the Actor model is quite old, and it has been used in Erlang for quite sometime.
The communicating Sequential Processes style of programming allows for many lightweight simple threads that communicate over channels rather that the monitor based thread synchronization.
The OCCAM language implemented this style of processing and the Transputer chip implemented a fast context switching hardware that OCCAM could run on.
This was all done back in the 1980s.
I even implemented the original version of the Java Communicating Sequential Processes API which brought CSP style programming to the Java world, although it is based on Java's underlying Thread mechanism so context switching isn't as fast as it could be.
For those who have not caught wind of this yet, transactional memory is currently the most promising solution to this problem and perhaps the most-covered subject in research conferences on parallel computing today. There have been several proposals for both hardware-based (at the cache level) and software-based architectures. Transactional memory greatly simplifies concurrent programming. When using transactions instead of locks, deadlocks go away completely and there is increased concurrency.
An unjust law is no law at all. - St. Augustine
There's a lot of posts saying that multithreading is really hard, which is completely true... But what RapidMind is providing is something else, something more like a SIMD model or vector computations. It solves things like elementwise operations on large arrays in an efficient manner using whatever parallel computing resources are available. It's a language with a semantics that don't require complicated synchronisation because you're bascially telling the compiler which operations are independent and then it can go off and compute it in the most efficient way possible. RapidMind was designed to make GPGPU programming easy, so it's a generalisation of the pixel shader model where you have a lot of 'threads' computing the color of each pixel on the display in parallel. This is an easy problem, because there is basically no communication between threads.
The interactive way to Go -- http://www.playgo.to/iwtg/en/
Yes, gcc 4.2 supports OpenMP. As others note, parallel programming is still not trivial. But OpenMP is very nice. I have a write up on building and testing gcc 4.2 on OS X here: http://alphakilo.com/openmp-on-os-x-using-gcc-42/.
Serious advantages are that OpenMP can be retrofitted to existing C/C++ and Fortran code. I know that everyone prefers to start from scratch and use Erlang or some other solution, but in a project I am working on, we already have about a million lines of C++.
Current OpenMP implementations favor SMP machines, but one can go even further with the Intel OpenMP for clusters solution. I have not tried it myself yet, but I understand that it makes the issue of non-shared memory across the cluster machines transparent.
As in all cases YMMV. But, if your code is amenable to parallelizing, OpenMP is a pretty straight forward way to go.
I work with network apps on a daily basis, thanks!!. Instead of using a new thread to handle each network transaction, the comms layer is fully asynchrounous, and never blocks. And so, the gui can submit requests to the server without fear of becoming unresponsive. Responses are placed back onto the gui thread when received.
But isn't that the same old excuse ? I mean, if a language prides itself on 'there's just one way to do it' (and don't tell me java doesn't do that - it does), then why would it allow such a glaring hole ? I tell you why - it was never thought about. I know: I've inspected every JDK source tree since it was 1.0.2 - the people then, brilliant as they were, never gave a second thought about making classes re-entrant, and when they started doing that, the language wasn't really fit to apply it on. So you get ugly things like re-entrant and non-re-entrant varieties of the same class. Yuk.
So you come up with the oldest excuse in the book, and you replace responsibility from the tool to the people who use it. I'll tell you something: that only goes when the tool as such is finished, which it obviously isn't. The tool must go back to the toolmaker, but that's something we've heard about java so many times now, it isn't even funny.
Religion is what happens when nature strikes and groupthink goes wrong.
As the articles say, the lock pressure is moved from the reader to the writer. Transactional Memory scales amazingly better when you have multiple threads which are reading common data. Please note that in today's system architectures even READING data on different cores at the same time may not be thread safe without memory barriers put into place to synchronize the caches.
There have been many papers written about the efficiency gains.
And as a bonus, writing multithreaded software with a "Transactional Memory" scheme is easier to "prove correct".
--jeffk++
ipv6 is my vpn
so maybe it wasn't on the PS3 but valve seems to take a more practical apoach.
This Article has some good info on what Valve is doing to bring threads into gaming.
Personaly I'd rather use their model because it's more utilitarian and less 'Because I can'.
Money is the root of all evil?
"Gigahertz are out and cores are in. Programmers must begin to develop applications that take full advantage of the increasing number of cores present in modern computers."
The marketplace wants and needs new technologies for more powerful processors. Multicore serves the needs of chip makers, not their customers. Making all software multi-threaded is trying to solve the wrong problem. It's going to result in lower-quality software without a significant increase in performance.
Back in April of 1989, the Communications of the ACM published an article describing a language for just this sort of purpose, called Linda. It provided elegance and simplicity in multithreaded programming at the expense of more overhead for coordination (always a tradeoff). Communication was done by each thread putting the results of its processing into a shared pool, from which downstream threads would periodically take messages and perform further processing. No synchronization really, just producers and consumers operating on this shared pool of data. Obviously this would not be a silver bullet for every multithreaded need, but strong in the "simple" department.
You choose to focus purely on the Java statement. That's fine. It was thought about. The problem is that there's only so many ways to skin the cat, and Java's approach was to focus on performance. You'd argue that they should have focused on thread safety instead and made it slower?
The architecture of Java, C, C++, and C# are all such that there will always be issues with re-entrant code. I'm sure you'll have some suggestion for a language that doesn't suffer from this issue.
As for Java, the core is finished. Refinements are being done to accommodate new features, like the concurrency library and changes in the memory model to address issues brought up by that addition. It's not that the features couldn't have been accommodated with the original JVM, but it would not have been as clean or easy to implement, and certainly not as performant as what resulted from the modifications.
The cesspool just got a check and balance.
I don't think you getting the kind of advantages you might assume with this approach. First off with non blocking IO you're wasting a lot of CPU time checking to see if you have a response vs. blocking. Second your manually balancing performance issues which is something modern operating systems are vary good at. Finally you're limiting the kinds of third party tools you can use.
Now you could easily have a GUI thread that uses message queues to talk with the networking layer(s) without changing much at all. Add in a simple thread pool to handle performance issues and you're going to have a more responsive and scalable app which also more flexible. (Granted if you have already spent a lot of time tuning things it's not such an issue.)
And, granted in some ways it's easer to debug your style of app, but as soon as have random developers mucking around in the code you're heading for trouble.
Craft Beer Programming T-shirts
Actually, the rise of web apps backed by a multithreaded web server and a multithreaded DB in the 1990s made this pretty easy for millions of people.
Hm. I focussed on java, because you seemed to do so - it was the only specific thing you reacted to. With regards to performance; if locking was an aspect of a class, rather than its accessors, then java might not have suffered so much in performance. The architectures are all the same (within C, C++ and java) because they were chosen to be the same (something existing programmers knew). Kernighan and Ritchie had an excuse - their predecessors are long gone and threads weren't existing when they invented C. C++, C# and java have no such excuse - they were invented long after that, when computer power was many times that of a PDP-11.
I agree with you that MT programming is something that you have to understand the (almost mathematical) basics of. I don't agree with you that modern languages have an excuse for ignoring threads as a intrinsic aspect of their being. Java's core may be finished, but if we're really to write in languages that allow for proper usage by multiple CPUs, then C++ or java aren't the right choices.
FWIW, I don't personally think that this 'multiple core programming revolution' is around the corner in any way. If only because what they do at the moment (services and batch-processes) can already be programmed successfully in many other ways. The revolution that the topic at hand seems to imply, requires a radical shift in what people expect programs to do in the first place.
Religion is what happens when nature strikes and groupthink goes wrong.
"However, multi-threaded development has been notoriously hard to do.
if ( statement.agree() )
leave_slashdot_now();
Noone I've ever met thinks this is even a tiny bit difficult. Maybe the problem is that there are not yet compilers AT ALL for these new chips? Nah, blame multithreading!
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
Stanford's Sequoia compiler is supposed to be open source eventually. Plus, it looks a little easier to use than RapidMind, PeakStream, or CUDA.
I'm sure the demonstration would've been a lot more difficult if he'd used philosophers instead of chickens. Thing is, chickens can't even hold chopsticks. A chicken just goes straight for the feed, so there's just one resource being acquired. It's still possible for a chicken to starve, but as chickens don't eat that much it's more likely that any shut-out chickens would simply go hungry for a while, and then get to eat before starving.
---GEC
I'm but the humble pupil, seeking to snatch the scratchbuilt pebble from the master's fully articulated hand
Well, that's probably because most of my recent multi-threaded work has been in Java, and hence is at the top of the heap, so to speak.
:) It certainly allowed for much improved AI, although it's still somewhat limited. At least it was the first turn based game I played that the AI didn't outright cheat.
Yes, Java largely copied the C/C++ architecture regarding threads, with minor tweaks. C# is of course mostly a reimplementation of Java with a strong C++ flavor. Regarding C++, it was merely an OO wrapper on top of C initially, or at least that's how it was implemented with the (g)cc compilers, as they translated C++ to C, then compiled and linked that C code IIRC. (it's been so long I wouldn't bet a beer on it, and I'm too lazy to look it up;)
To write efficient, production usable MT code currently I have only seen it done with C/C++/Java. I have yet to see MT C# production code, although I'm sure some exists somewhere, even if it's only in unsafe sections. I've already been directed to Erlang and SCOOP and will look at them soon.
I agree with you that an MC programming revolution is not around the corner. The only real consumer applications that already exist are photo and movie editing packages. Some of these effectively run across multiple CPUs. They're pretty much all written in C/C++ with some assembly from what I gather from various reviews, discussions, articles, etc. After all, does your mail program really benefit from MC? (MT yes, but MC?) I doubt it, unless you're getting thousands of emails a day and have some form of AI agent processing it. How about your word processor? Nope, can't type faster than my fingers will go. Speech recognition? Now there's a concept, but I personally believe that will be handled more effectively via an add-on DSP specialty chip. Multi-media processing including audio? Sure, I can see it there too, and I thought some already were MT. The audio compressors I've used to date are single threaded though. I've countered this by running my ripping software in MT with the ability to compress multiple tracks simultaneously in separate processes (EAC is truly a wonderful product, can't promote it enough). EAC itself only benefits from a single processor though - it's so CPU light while ripping I can barely tell it's running in the CPU monitor.
Games. Now games are an area that could have some serious MC/MT goodness. It requires a whole new paradigm, however, to run a game effectively in MT/MC. The only game I know that did this was Galactic Civilizations for OS/2. While a windows version came out, I do not know whether it was MT as well. I suppose I could run it and see...
The cesspool just got a check and balance.
I've probably used half a dozen different parallel extensions of C and C++ over the years. This one doesn't look revolutionary, it merely looks painful.
Proper analysis of the problem to be solved and then the creation of a functional specification, with lots of diagrams of process and data flow, that is then vetted by your peers.
Undetectable Steganography? Yep, there's an app fo
Most apps? I think you are living in a PC-centric world (and even then I do not believe its true). Ever write a real-time embedded system? Its pretty much impossible to prove you can meet several hard dead-lines with a single thread. And most systems in this realm do not use time-slicing (which I assume you mean when you wrote "multi-tasking"). Not all multitasking OSes are time-sliced. In the real-time world we usually use OSes that do cooperative multitasking that are priority preemptive. Unless a higher priority task wakes, or the owning task blocks, a task has to processor until it gives it up (kind of how Windows 3 worked). It's wrong to assume that all programs and programmers use a hosted system (Windows, Unix, etc.). Actually, there are more embedded systems in the world than PCs, Macs, and such. People who rail against multi-threading have no concept there are systems out there unlike their home computers.
Anonymous Cowards suck.
Those of us who've worked on client/server types of apps have grown accustumed to managing mutliple processes (users) via databases. The database becomes the central "coordination engine", and A.C.I.D.-compliant databases ensure there are no stuck locks or race conditions. There may be lessons from this client/server world for a smaller scale. (Actually, I used to do it with desktop databases also, such as FoxPro; however, it was not ACID-compliant at the time, but close enough to be practical. As long as a user knew who had locked something, they could cordinate it among themselves.)
Table-ized A.I.
Which theory?
OO is just an aesthetic quality of code, so basically single-threaded OO code is to single-threaded code, as proper paragraph breaks are to a block of text. On the other hand, multi-threaded code opens up a can of worms. There are whole new classes of bugs to be avoided, and most of them can NOT be found by inspecting code locally.
Yeah, yeah, lots of paradigms are supposed to make concurrency easier. And message-passing is nice. It's still not easier than single-threaded code, though...
SCOOP, like all concurrent programming models, has all kinds of unexpected non-intuitive quirks. It's not easier than single-threaded programming.
Did any of the chickens cross the road? If so, did the chicken say why?
There was no such thing as "(g)cc compilers". If that expression is to have any meaning, it would be any vendor supplied C/C++ compiler on a unix system and/or the gcc compiler. In that case it's untrue. Some compilers, such as CFront, originally compiled C++ to C. Today almost all compilers, such as gcc, compile C++ to native code.
I figured when I wrote that that someone would take me to task about the (g)cc compilers. I didn't recall which particular flavors I used, but I ran them several different OSes and those were the compiler executables we called. (Of course, they could have been linked/aliased to anything, but that's not the point:)
I did state that they originally operated in this fashion. This was purely in support of the argument that C++ was a syntactical overlay of C, nothing more. That changed later as did the C++ spec.
The cesspool just got a check and balance.
Craft Beer Programming T-shirts
STM has its own problems: http://patricklogan.blogspot.com/2007/02/misguided -road-not-to-be-travelled.html
cpeterso
LabVIEW. http://www.ni.com/labview.
A project that you can download and play with today is Trolltech's QtConcurrent. Given a task it will automatically manage creating threads and distributing the task among your cores.
From the project page:
The classes and functions available in the Qt Concurrent package allows you to write multi-threaded applications without having to use the basic threading synchronization primitives such as mutexes and wait conditions. This makes it easier to reason about and test parallel programs to make sure that they are correct.
The Qt Concurrent components manage the threads they use automatically. Each application has a global thread counter, which limits the maximum number threads used at the same time. The maximum is scaled according to the number of CPU cores on the system at runtime. This means that programs written with Qt Concurrent today will continue to scale when deployed on many-core systems in the future.
Very cool.
Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
parallel
{/* execute these statements in parallel if possible */
statement1;
statement2;
statementn;
}
sequential
{/* execute these statements in order as written */
statement_1;
statement_2;
statement_n;
}
- Tjp
I am in wallow with my inner money grubbing capitalistic pig. ... Oink!
Check out Erlang and Haskell. Haskell does it by eliminating the concept of state and instruction ordering (so there is so concept of a thread of execution). The compiler can then do provably safe things on any number of threads it feels like. Erlang takes a different approach by making EVERYTHING it's own little thread (functions, variables, everything). Both of these are production languages.
Ideally, there would be hardware support for the data to be updated in a single transaction.
Another way to implement this is via pointers, where the pointer is updated once the new values are put into place. This pointer switch must be atomic. When a writer decides to write, it gets a copy of the current pointer. It starts the calculation and allocates a new result. This new result is put into place via changing the pointer with an atomic operation only if the original pointer has not changed.
If it did, then it must THROW OUT the calculations that it did and start them over again.
The reason why this is more efficient is that in typical programs more threads read the same data, and very little data is actually needing to be written to by multiple threads.
--jeffk++
ipv6 is my vpn
OK chaps, care to enlighten me as to whether this method relies on specific language constructs, or whether it is implicitly managed after the fact?
I.e. do you need thread objects and synchronisation primitives (such as critical sections), or can one design a program as though it were serial, and then let the compiler judge how to manage threading and concurrency?
How dare you be so modest!! You conceited bastard!!
If you're donwnloading the video, do their servers a favor and use the torrent. HTTP downloads are incredibly slow off of their server right now ....
When technology hits a dead end on one track, the industry goes looking for another track where progress can be made. A few brilliant people stay at the dead end beating their brains against the wall. You can stand there with your arms folded, tapping your foot, waiting for the brilliant people to knock down the wall, but I'm going to pursue the low-hanging fruit elsewhere. When they do knock it down, I'll be right behind you, and what will I have lost?
OpenMP is an open standard for multi-platform shared-memory parallel programming in C/C++ and Fortran. It is supported by GCC 4.2 and greater.
And its syntax does suck. I don't think that's a good enough reason to dismiss it however, especially since its so damn good in every other way:
* Fast bytecode compilation that can be used in #! scripts
* optimizing native code compilation that performs as well as or better than every other language outside of C/C++/D
* debugger with backstepping
* profiler
* modules with seperate compilation
* strong static typing
* variant types and pattern matching
* imperative and OO features for when you want them
* a full lex and yacc
* camlp4 for creating syntax extensions or redefining syntax
How anyone could pass on all that just because the syntax is crappy is beyond me.
Win2K is the last one to support POSIX: http://support.microsoft.com/default.aspx?scid=kb; en-us;308259
You probably mean CreateProcess(), a Win32 API.
I think what you're really arguing is for work/task loops built up by message passing. These work queues can either be a different threads, or processes, and spend most of their time looping through the queue, instead of being created&destroyed per "user".
Apache 2 is for windows. The threaded model doesn't perform any better than the prefork model on unix. For unix users, apache 2 is just a way to get a less reliable version of apache that has had many new security holes introduced.
"When technology hits a dead end on one track, the industry goes looking for another track where progress can be made."
There are two different industries involved here: the hardware processor industry and the software industry. It looks like the former is looking for the latter to bail it out. Multicore isn't really a new track, it's more of a repackaging effort.
"I'm going to pursue the low-hanging fruit elsewhere."
I think all the discussion over this issue suggests that it isn't low-hanging fruit.
"When they do knock it down, I'll be right behind you, and what will I have lost?"
As always, it depends. If you and your co-workers don't make any additional mistakes due to the added complexity, probably nothing bad will happen. Obviously if multicore doesn't actually add a lot of multithreading to your project because you're already using it extensively, it won't have any effect at all.
Seems to be what?
You gotta love the American educational system...or the lack of Web spellchecking...one or the other...
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
Considering that hardware designers have been leaning over backwards for years to help software programmers stick with one fairly easy programming model, I wouldn't be too quick to blame the hardware industry. They know which side their bread is buttered on. The hardware company that requires the least adaptation from software developers has the advantage, just like the software company that requires the least change and adaptation from users always has the advantage.
This is Slashdot. People argue against something so they'll never have to try it :-) I've never had a choice; my first professional job was at a Java shop that used multithreaded servers, and at my current gig one of my ongoing tasks is to add multithreading to single-threaded C++ apps so that our customers get as much power as possible out of their multi-CPU workstations.
What I've learned is that there's basically a one-time cost when you redesign and refactor a single-threaded app to a multi-threaded app. Multi-threaded design is harder, but the resulting designs are often simpler, since single-threading forces you to specify many arbitrary choices about sequencing. Then when you consider resource utilization, it turns out those arbitrary choices aren't arbitrary after all, so you end up interleaving work in weird ways to keep resource utilization up. With multi-threaded apps, I just leave it up to the thread scheduler, which does a pretty damn good job.
On the programs I've worked on, the advantages of cleaner design (no need to specify serialization where it doesn't exist, no spaghetti flow control) have cancelled out the disadvantages of more complicated implementation (ensuring protection for every shared resource.) The cost of ensuring protection is mitigated by the fact that most of the resource-protection code in my programs is contained in a few shared data structure implementations that I reuse in all my projects.
Of course I write lots of single-threaded programs, too. Most of my programs start out single-threaded, with an eye toward future multithreading. Single threading has a decided advantage for simple programs; I just believe that advantage disappears when programs get large and complicated.
Game console programming is just as sheltered as any other narrowly focused programming field. Higher level languages aren't about sheltering, they are about abstraction. I understand the low level implimentation details, but I can choose to leave those details out of my mind, and concentrate on the high level problem. You use this same technique all the time, but you have the unfortunately common misconception that abstracting certain things is good, and abstracting other things is bad.
You don't have to impliment functions/procedures yourself, because you are using a high enough level language that it provides an abstraction for that. This makes you more productive, and your code easier to modularize and maintain. Keep moving up the high level chain and more and more things get abstracted, making you more productive and your code better. Especially since most low level programmers fail to impliment many abstractions that are very powerful, because they have never bothered to try existing high level languages and discover the power those abstractions provide.
Maybe you should rethink your position about synchronizing every method to make Java 'thread safe' Cause what that will really do is cause all your code to freeze when it is actually run in a multithreaded environment unless it is very simple.
You cannot just make code thread safe. You have to define where you are going to share memory between threads, and then figure out all interactions between the threads and define where the data can be shared. Otherwise you are asking for trouble. Using a process model makes it HARDER (but not impossible) to hang yourself, by forcing you to use shared memory, message passing, or other techniques to force you to think about the shared state of your code.
Writing code is easy, but the tools we choose to use make it hard to write good multithreaded code.
"Considering that hardware designers have been leaning over backwards for years to help software programmers stick with one fairly easy programming model, I wouldn't be too quick to blame the hardware industry. They know which side their bread is buttered on. The hardware company that requires the least adaptation from software developers has the advantage, just like the software company that requires the least change and adaptation from users always has the advantage."
I didn't see the "leaning over backwards" you refer to. What hardware improvements have hardware designers avoided because it would have caused the current programming model to change? You don't have to look any further than the x86 architecture with its segmented memory to conclude that ease of programming has never been a key goal at Intel.
"I've never had a choice; my first professional job was at a Java shop that used multithreaded servers, and at my current gig one of my ongoing tasks is to add multithreading to single-threaded C++ apps so that our customers get as much power as possible out of their multi-CPU workstations."
One advantage of server apps is that they inherently lend themselves to a mutithreading approach since each request can be seen as a seperate unit of work. As you probably know, the challenge comes when no such natural break-down is apparent.
I only claimed synchronizing would almost make it thread-safe. I did not ever state this was a good approach. You should read the second and third paragraphs of my GP.
The cesspool just got a check and balance.
Intel desktop and workstation microprocessors, or any superscalar microprocessors for that matter, are a great example. The programmer (or compiler) writes a single-threaded assembly language program for an extremely simple machine that is basically a fiction. The assembly programmer gets to think in terms of ISA abstractions like "the EAX register" when in the real microprocessor, with its superscalar pipelines, out-of-order execution, branch prediction, and so forth, there really isn't anything you could reasonably point to and say, "This is the EAX register." The microprocessor is a dizzyingly complex hardware simulation of the simple machine that software programmers use as their mental model of "the hardware."
Thanks to the insulation provided by the instruction set architecture, programmers have been able to mostly ignore the growing complexity of microprocessors and continue thinking in terms of the same single-threaded in-order execution model and a relatively slowly evolving set of ISAs. The cost of this mismatch between hardware and programming model becomes evident when you look at how much more performance you can get when you shift some of the burden from the hardware to the programmer, as with the Cell processor. (You can also save lots of power by forcing software tools to manage instruction scheduling. Microprocessors in cell phones and other battery-powered devices typically can't afford to spend power on analyzing, reordering, and creatively dispatching instruction streams. They leave that to compiler writers and programmers.)
When you look at the alternatives (and I'm sure Intel, AMD, and IBM have worked with more alternatives than I'll ever even hear of), symmetric multiprocessing is the smallest, least disruptive step that software developers could be asked to take. We still get a nice simple machine model that thankfully reflects very little of the complexity of the underlying hardware. The only thing that has gotten worse is that instead of getting unconditional, dramatic performance improvements for no effort at all, we get dramatic performance improvements conditional on our ability to use cores efficiently. And heck, single-threaded programs might keep getting faster anyway.
P.S. I admit I don't know anything about Intel's introduction of segmentation, but reading what Wikipedia says, it seems like the key selling point of segmentation was the ease of porting old software. Requiring users to move from straightforward 16-bit addressing to straightforward 32-bit addressing would have kept both hardware and software simpler than using Intel's segmentation idea, but segmentation made it easier to port old software, so it was a success. It sounds more like a present vs. future tradeoff than a software vs. hardware tradeoff.
"Intel desktop and workstation microprocessors, or any superscalar microprocessors for that matter, are a great example. The programmer (or compiler) writes a single-threaded assembly language program for an extremely simple machine that is basically a fiction. The assembly programmer gets to think in terms of ISA abstractions like "the EAX register" when in the real microprocessor, with its superscalar pipelines, out-of-order execution, branch prediction, and so forth, there really isn't anything you could reasonably point to and say, "This is the EAX register." The microprocessor is a dizzyingly complex hardware simulation of the simple machine that software programmers use as their mental model of "the hardware.""
This sort of "fiction" is a fundamental requirement of these devices to be classified as microprocessors; otherwise you'd have to perform "programming" by doing digital logic design or microcoding. It's like saying that car manufactures accomodate drivers legacy expectations by providing wheels for the car.
In addition, a number of these added features don't really preserve legacy expectations anyway. Hard real-time software can't really be written for most modern processors because the execution time of a particular section of code is not deterministic.
"P.S. I admit I don't know anything about Intel's introduction of segmentation, but reading what Wikipedia says, it seems like the key selling point of segmentation was the ease of porting old software."
Given the fact that the 8086/8088 couldn't natively run 8085 programs, I don't think backward compatibility was a major goal. If you have to translate instructions anyway, changing from an 8 bit address to 16 bit address isn't that much harder.
It isn't necessary for the ISA to be a fiction. In simple processors, the ISA *can* reflect the real, concrete structure of the processor. Assembly statements like "mov r2, r1" might tell you quite literally what is happening inside the processor. In fact, the programmer doesn't even need to know whether the ISA reflects the real structure of the processor or not. What matters to the programmer is that the behavior of the processor conforms to the model specified by the ISA.
Hardware designers went from processors almost as simple as a simple ISA to processors thousands of times more complex than the most complex ISA. Meanwhile, software programmers didn't have to come along for the ride. ISAs grew, but not at the same rate as hardware complexity. This is part of what I mean by hardware designers bending over backwards for software programmers. When it becomes possible to put more and more transistors on a microchip, yet the programming interface doesn't scale with the amount of processing power, it becomes hard to use the power efficiently. A Xeon can do a huge amount of work per unit time, but it's like five hundred people getting together to build a small house. When you throw all those transistors at a single instruction stream, most of them get applied to fairly small improvements like improving branch prediction by 2%. The instruction stream is simply inadequate to take advantage of that much hardware. Hardware efficiency suffers so programmers can keep using a simple model.
That's the logic behind multicore chips, and it's also why the Cell processor can put up such ludicrous performance numbers without any kind of technology breakthrough. Instead of wasting vast numbers of transistors on making small tweaks to single-process execution, why not make them available for real work? Let the programmer decide what to do with them. The programming model gets more complicated, but the programmer gets to use a bunch processing power that has until now been inefficiently applied because the programming interface has been too narrow.
I didn't mention backward compatibility; I mentioned porting. I'm pretty sure that changing the pointer size is in fact a big deal, especially for languages like C where types aren't exactly respected. If I understand what I'm reading, Intel's design allowed old programs that used 16-bit addressing to be recompiled to use 16-bit addressing inside a single segment. That meant that nobody had to dig through old code to find and correct all the places where a pointer was stored in a 16-bit datatype. Meanwhile, new programs could use up to 20 bits of address space.
You forgot to mention that Ada 2005 now adds Interfaces to both protected and task objects. See:
i ng
http://en.wikibooks.org/wiki/Ada_Programming/Task
Ada's multi-threadeding is not only without the pain but great fun!
Martin
...and Ada 2005 even supports Real-Time programming. It is possible - just not with C++.
i ng
Find a short intro here:
http://en.wikibooks.org/wiki/Ada_Programming/Task
Martin