More Interest In Parallel Programming Outside the US?
simoniker writes "In a new weblog post on Dobbs Code Talk, Intel's James Reinders discusses the growth of concurrency in programming, suggesting that '...programming for multi-core is catching the imagination of programmers more in Japan, China, Russia, and India than in Europe and the United States.' He also comments: 'We see a significantly HIGHER interest in jumping on a parallelism from programmers with under 15 years experience, verses programmers with more than 15 years.' Any anecdotal evidence for or against from this community?"
Old programmers don't want to learn new things -- trust the tried and true.
Young bucks want to be on the cutting edge to get the jobs that the old people already have.
----
Oh, and the people see the benefit in the other countries more than those in the U.S.? Probably not, we're just lazy American's though.
Instead of addressing the root problems with concurrency, we are just going see super-high-level languages that have no bearing or relationship to the actual hardware the underlying machine.
Which is more efficacious? To take up as much simultaneous processor time as possible in order to finish faster, or to leave extra cores open for other processes to run simultaneously.
Given that the management of threads isn't exactly the easiest thing in the world (not the hardest either, mind you), perhaps it would be more beneficial to let the OS determine which threads to run rather than trying to optimize a single app.
For example, many webservers use a semaphore rather than multiple threads to handle request dispatch. The result is that there is less overhead creating and cleaning up threads.
I cut my teeth on programming microcontrollers and embedded devices, and high level languages are a chore for me...having to program for some api/interface/whatever and barely seeing what goes on at the hardware level is strange and confusing for me. That being said, isn't the very reason I'm not fond of high level languages something that would make it an easy transition to multicore development? Are the techniques not an extension of what a programmer already knows and does versus something completely new?
Also, it would seem from a low level standpoint that working with long instruction paths on a superscalar architecture would have been an excellent stepstone for multicore development...am I wrong in seeing some parallels (no pun intended) here?
What the heck is a 'sig'?
Q1) Why did the multithreaded chicken cross the road?
A1) to To other side. get the
Q2) Why did the multithreaded chicken cross the road?
A4) other to side. To the get
It is funnier in the original Russian.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
One reason could be that software engineers with more experience simply already know about these things, and have faced off against the many problems with concurrency. Threads can be hell to deal with for instance. So because of things they don't show any interest.
That being said I think that if you want to actually make use of many cores you really do have to switch to a language that can give you usage of many threads for free. Writing it manually usually ends up with complications. I find Erlang to be pretty nifty when it comes to these things for instance.
Most of my past projects I had adapted or rewritten to use threads. Keeping data coherent when one thread or process can interrupt another is hard -- maybe that's why it's not done more?
And what about seti@home, folding@home, and all the other massively parallel projects out there? Surely you're not saying that doesn't apply to multi-core either. I think that if you stop and look around you'll see it. But if you're only basing your opinion on your book sales, then maybe there's another problem.
In an unrelated story, young pups display surprising agility with said trick. Scientists baffled.
Parallel programming is going to go mainstream, not because people find it interesting, but because that's the way hardware is forcing us to go. First mainframes, then workstations, and now desktops and laptops have moved away from single CPU cores. In every case, it has been a necessary evil due to the inability to pack enough power into a single monolithic processor. Does anyone actually think Intel, if they could, wouldn't happily go back to building single-core CPU's with the same total power as the multi-core CPU's they're making now?
Right now, parallel development techniques, education, and tools are all lagging behind the hardware reality. Relatively few applications currently make even remotely efficient use of multiple cores, and that includes embarrassingly parallel code that would require only minor code changes to run in parallel and no changes to the base algorithm at all.
However, if you look around, the tools are materializing. Parallel programming skills will be in hot demand shortly. It's just a matter of time before the multi-core install base is large enough that software companies can't ignore it.
One day soon, the computer industry will realize that, 150 years after Charles Babbage came up with his idea of a general purpose sequential computer, it is time to move on and change to a new computing model. The industry will be dragged kicking and screaming into the 21st century. For over 20 years, researchers in parallel and high-performance computing have tried to come up with an easy way to use threads for parallel programming. They have failed and they have failed miserably. Amazingly, they are still continuing to pursue the multithreading approach. None other than Dan Reed, Director of scalable and multicore computing at Microsoft Research, believes that multithreading over time will become part of the skill set of every professional software developer (source: cio.com). What is wrong with this picture? Threads are a major disaster: They are coarse-grained, they are a pain in the ass to write and hard to debug and maintain. Reinders knows this. He's pushing threads, not because he wants your code to run faster but because Intel's multicore CPUs are useless for non-threaded apps.
Reinders is not an evangelist for nothing. He's more concerned about future-proofing Intel's processors than anything else. You listen to him at your own risk because the industry's current multicore strategy will fail and it will fail miserably.
Threads were never meant to be the basis of a parallel computing model but as a mechanism to execute sequential code concurrently. To find out why multithreading is not part of the future of parallel programming, read Nightmare on Core Street. There is better way to achieve fine-grain, deterministic parallelism without threads.
On the surface, it seems that Japanese engineers have a history of this. Fujitsu had their dual-6809 home computer, arcade games commonly had two and sometimes three Z80s, 680x0, or whatever. Sega, in their mad rush to beat the specs of the upcoming Playstation, stuffed four off-the-shelf processors plus a few custom chips into the Saturn.
Whether it's a good idea or not, you will have a VERY hard time convincing an old dog programmer that he should jump on the train. Think back to the time when relational models entered the database world.
Ok, few here are old enough to be able to. But take object oriented programming. I'm fairly sure a few will remember the pre-OO days. "What is that good for?" was the most neutral question you might have heard. "Bunch o' bloated bollocks for kids that can't code cleanly" is maybe more like the average comment from an old programmer.
Well, just like in those two cases, what I see is times when it's useful and times when you're better off with the old ways. If you NEED all the processing power a machine can offer you, in time critical applications that need as much horsepower as you can muster (i.e. games), you will pretty much have to parallelize as much as you can. Although until we have rock solid compilers that can make use of multiple cores without causing unwanted side effects (and we're far from that), you might want to stay with serial computing for tasks that needn't be finished DAMN RIGHT NOW, but have to be DAMN RIGHT, no matter what.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Maybe it's just because I have under 15 years of experience, hell I have under five years of real world experience. But I develop nearly everything that's large enough to benefit from concurrency so it uses it. To be perfectly honest, I even develop web applications so they use concurrency extensively. It's an awesome concept and I cannot imagine how anyone could ever live without it.
Maybe it's just because I have...under five years of real world experience. But I develop nearly everything...(with) concurrency.
When your only tool is a hammer...
There is no substitute for good old optimisation. Efficient memory usage, looping techniques and a little thought towards to branching can go a long way. That, and batch processing on multiple cores takes a fair bit of beating when the clock is ticking. I realise that there is still some *big* gains from finer grained parallelisation but I see a lot of coding behaviors that would never have been used ten years ago.
.
I'm writing a scientific library that is supposed to scale for many many cores. Using a mutex lock is not an option. Unfortunately right now I am spending all my time trying to figure out how to get compare and swap working on all the different platforms. I am saddened to see the lack of support since this is such a fundamental operation. Also, the whole 32 vs 64-bit thing adds more pain because of pointer size.
----
Go canucks, habs, and sens!
So to discover that other nations have just walked by and left the EU and US in the dust is a little annoying, but let's face it. Their resources were kept minimal by the west, so it's no surprise they learned the golden rule of boolean logic. waste not, want nots. C'mon, don't pretend we didn't expect this. OpenMP wasn't developed for the fun of it. Parallel has been on the way in for 25+ years. 25 years of Moore's Law applying to their work. You can catch up if you like. It will mean less visible glory, but it would mean doing something real.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
One of the reasons more seasoned programmers are not particularly interested is that in most cases someone else has already doen the hard work.
Want to serve multiple user on multiple cpus with your web pages? Then write a single threaded program and let Apache handle the parallelism. Same goes for JavaEE, database triggers, etc. etc. going all the way back to good old CICS and COBOL.
It is very rare that you actually need to do parallel programing yourself. Either you are doing some TP monitor like program which schedules tasks other people have written in which case you should use use C and POSIX threads (anything else will land you in deep sewage) or you are doing serious science stuff in which case there are several "standard" fortran libraries to do multithreaded matrix math -- but if the workload is "serious" you should be looking at clustering anyway.
Old COBOL programmers never die. They just code in C.
Slightly offtopic, Hans Boehm had it's say in 2006, but author, who promotes his own book/project using it as a base for his "research", is obviously one of opposers of these claims.
Adoption problem, IMnsHO, arises because there is real friction between usual C* approaches and threading. Experienced programmers (as other posters claim) are used to their ways and easily discouraged by these frictions. Java's threading approach is a bit of pain in the ass too, so this leaves big majority of programmers out in the cold when it comes to threading.
Threading goes against usual development style in US, where problems are attacked with workforce, not with any kind of optimized approach. Or original. "Why experiment, when we make tons of money old-style all these years?"
dd
http://opencm3.net, http://www.nongnu.org/gm2/
Been there, done that. Good from far, but far from good.
As an engineer straight out of college, I was very interested in parallel programming. In fact, we were doing a project on parallel databases. My take is that it sounds very appealing, but once you dig deeper, you realise that there are too many gotchas.
Considering the typical work-time problem, let's say a piece of work takes n seconds to complete by 1 processor. If there are m processors, the work gets completed in n/m seconds. Unless the parallel system can somehow do better than this, it is usually not worth the effort. If the work is perfectly divisible between m processors, then why have a parallel system? Why not a distributed system (like beowulf, etc.)?
If it is not perfectly distributable, the code can get really complicated. Although it might be very interesting to solve mathematically, it is not worth the effort, if the benefit is only 'm'. This is because, as per Moore's law, the speed of the processor will catch up in k*log m years. So, in k*log m years, you will be left with an unmaintainable piece of code which will be running as fast as a serial program running on more modern hardware.
If the parallel system increases the speed factor better than 'm', such as by k^m, the solution is viable. However, there aren't many problems that have such a dramtic improvement.
What may be interesting are algorithms that take serial algorithms and parallelise them. However, most thread scheduling implementations already do this (old shell scripts can also utilise parallel processing using these techniques). Today's emphasis is on writing simple code that will require less maintenance, than on linear performance increase.
The only other economic benefit I can think of is economy of volumes. If you can get 4GHz of processing power for 1.5 times the cost of a 2GHz processor, it might be worth thinking about it.
And I am very actively considering whether we need to redesign core parts of our systems in Erlang, largely as a result of looking at the ejabberd XMPP server. My suspicion is that what we are seeing here may be more to do with the kind of work being done in different regions and by people at different experience levels. Work in the US and Europe is perhaps more likely to be user-facing, older programmers more likely to be developing end-user applications or working at the architectural level. Concurrency is more likely to be needed for back office, technical or distributed systems which are increasingly being designed outside the traditional areas.
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
It's only been twelve years since I entered the workforce in the US, but I have been studying parallel programming for almost 15 years (three years in university).
The future isn't "multi-threaded" unless you count SPMD, because architecturally the notion of coherent shared memory is always going to be an expensive crutch. Real high-performance stuff will continue to work with distributed, local memory pools and fast inter-node communication... whether the nodes are all on chip, on bus, in the box, in the datacenter, etc.
As they have been since the 80s at least, many CS researchers will be trying to find the holy grail of programming models and tools to automatically parallelize larger classes of algorithms for naive programmers. And in the meantime, just as we have since the 80s at least, programmers will often go to bare metal and hand-optimize important libraries and applications that are too important to leave to the vagaries of the immature tools.
If there are fewer programmers in the US or Europe who worry about parallelism, it is only because the economy is such that they can still satisfy their customers without it. There are many parallel programmers here too, and maybe there isn't a pressing need for even more because their work is being reused. I'm not sure what sort of statistical analysis you should use to determine prevalence of parallel systems usage, but I am pretty sure it is not by counting programmer interest. How many deployed systems are there? How many CPU-hours of parallel work are being done? What fraction of IT budgets are supporting production use of parallel systems?...
To be frank, I think the vast majority of computer cycles are spent executing code written by a small minority of the programmers in the world. These are the ones that matter, as far as optimizing for parallel environments. Sorry if that sounds elitist, but it's a bit like trying to analyze the distribution of race car drivers by surveying the interests of all licensed motorists.
Seasoned programmers understand what a pain in the ass it is, and hope to make their career exit before it takes over?
Aren't Verses Programmers generally referred to as Composers?
Whats left in the USA?
The great 1960's and 70's engineering generations?
The DoD Ada people?
You now have generation wintel.
Dumbed-down, corporatized C coders.
One useless OS on a one or 4 core chip.
When game makers and gpu makers want to keep you in the past due to
the lack of developer skill you know you have problems.
The rest of the world may not have the best hardware, but they do try and educate the
next generation
Domestic spying is now "Benign Information Gathering"
Parallel programming here seems to be SMP re-dux except the cores are now tightly coupled, anybody who has worked in back-end servers since the mid-90's with threading should be able to handle this with little difficulty, (those guys should have 15+ years or close to it). Its the apps programmers who are going to have to learn a new way to do things. The interesting part is how the large number of cores is going to deal with a much smaller share of the cache, optimization, cache line sharing etc are going to be much more difficult.
dmurphy_58@yahoo.com
I agree with the article.
I agree with the article too.
I agree with the article.
I don't
I agree with the article.
agree with the article.
As you grow older, you are more likely to focus on capitalizing on your current skills ("get things done") than on learning new skills. It makes perfect sense from an economic point of view, there are less productive years left to get a return on any investment made in increasing your productivity.
I don't believe it usually is a conscious economic estimation, it is simply build into our brains (with the usual "bell curve" caveat).
If I was 20 again, I would definitely think multi-threading, rather than the client-server approach to multiprocessing which was in vogue at the time.
As it is now, I learn just about enough multi-threading as needed to get my work done. That is, I had to make the GUI a separate thread in order to make the user experience tolerable, but making the actual number crushing multi-threaded will have to wait until there is funding.
Young bucks want to be on the cutting edge to get the jobs that the old people already have. I have seen cases where older programmers can be infuriatingly reluctant to learn new things, and this to the point where they are up in arms about things like adopting CVS/SVN, bug tracking systems, Web Services, using XML. People like that can actually obstruct the a company's efforts to modernize. The only solution is to fire them but that all to often simply isn't an option because they have worked them selves into key positions with the systems they have built and cannot be removed from the equation without severe costs to the company. I have also seen older people explaining to young bucks why a certified distro like Red Hat ES rather than the young bucks own favorite distro like, say, Gentoo, is the best pick for an Oracle server's OS (let's not even go into the debate about why to use Oracle and not MySQL). Another good example is the older/younger debate about why sometimes simple less optimal solutions are better than more complex and elegant ones because building a complex solution just isn't cost effective in that particular situation. There sometimes tends to be a total disconnect between younger and older developers. I don't think all older programmers are reluctant to learn new things they just tend to be conservative when it comes to using things like threading that can lead to difficult bugs to solve tasks where simpler solutions will do without much additional overhead. I have seen a number of young and inexperienced programmers 'discover' threading and go to town with using it. They build their application bit by bit in the usual ad-hoc and unplanned manner that is the custom of most inexperienced programmers (who needs bullshit like UML, right?). It is only when their application reaches a certain complexity level that they truly start to run into problems with things like shared data, thread synchronization, race conditions etc. The first time it happens it's kind of like walking straight into a newly installed plate glass door... rather painful. Not that there is anything fundamentally wrong with threading but you have to be aware of the problems you can run into and spending time doing proper application design in essential if you want to avoid the use of threading becoming a nightmare. The wisdom of planning things is something that most young programmers seem to have to learn the hard way.
You're all so damn frenzied that multithreaded programming is amazing and it's this and it's that and it's the next major paradigm that everyone has to jump on board with or be left behind with a million ageing COBOL programmers and blah blah blah. Oh and you need Erlang 'cos it's so fantabulous and the future of compilation will be dynamic runtime genetic algorithms and that threads are so complex and yet so many programmers can't handle them and... blah and...... yada and.... zip!
Really all extra cores do is give a PC more staying power. You already have hundreds or thousands of threads running on your OS at once within tens or hundreds of processes. They're already getting run across all cores and most of your computer is relaxing and sleeping while each core is 1-5% utilized. The OS is already making use of the extra hardware even though plenty of the processes it runs have only one thread.
Even with just one core your computer was doing several things at once effectively, or several hundred things at once if you count all the threads.
The only reason a programmer would want to try and make use of extra cores is if the problem domain contained processes that could be run simultaneously (like queuing network data while another thread renders a GUI), or there was a large process that could be split into several mini-processes (like analyzing video), or when writing a server which is handling several independent requests (like a typical web application). But these problems have been around for ages, and many existing solutions already employ threads or concurrent processes to take advantage of extra hardware. So where's the big amazing paradigm shift? It's nothing new, it's nothing particularly foreign, CS theory has had tens of years to produce plenty of best practices and guides for using threads while avoiding memory corruption or deadlocks. And so what if many programming problems are inherently serial - it doesn't really matter! How often do you wait for a CPU bound process these days? Much less than 15 years ago.
Ha.
Been using multithread and/or multiprocess programming my whole career (about 8 years now). I don't know if it generalises to the whole of the US but I'm always astounded at the attitude of people on slashdot to parallel programming.
We hear over and over again "it's not goping to go mainstream", "it's hard to get right", "threads are a recipe for disaster", "synchronisation is just too much work to debug", "it's a niche area" etc etc
Even occasionally "It'll never take off".
Well, guys, it has. A lot of people, software houses from several employess to several thousand employees, have been using parallelism in commercial settings for a long time now. And we're already making use of the extra resources available in multicore platforms without having to do a single thing to our codebases.
Slashdot's usually pretty cutting edge in it's tech (and esspiallcy language) evangelism, why has it slacked on this one?
Those of us doing server side development for any medium to large company will have already been doing multi-threaded and/or multi-process applications for ages now:
... until "strange" problems start randomly to pop-up</rant>).
... or is this just part of a PR campaign to sell an architecture which, in desktops, is fast growing beyond it's usefulness (a bit like razors with 4 and 5 blades)
- When Intel was still a barely known brand, other companies were already selling heavy-iron machines with multiple CPUs for use in heavy server-side environments (didn't ran Windows though). Multi-cores are just a variant of the multiple-CPU concept.
The spread of Web applications just made highly multi-threaded server-side apps even more widespread - they tend naturally to have multiple concurrent users (<rant>though some web-app developers seem to be sadly unaware of the innate multi-threadness of their applications
As for thick client apps, for anything but the simplest programs one always needs at least 2 threads, one for GUI painting and another one for processing (either that or some fancy juggling a la Windows 3.1)
So, does this article means that Japan, China, India and Russia had no multi-CPU machines until now
...we were recently introduced to our office's new cluster of 36 dual quadcores, primarily slated for use in bioinformatics. The kinds of applications we do in our lab are highly suited to breaking up over many machines and wouldn't particularly benefit (or suffer) for explicitly coding them for multicore processors. We have Torque installed. However, at the training session, more time was spent discussing how to configure job submissions for parallel programs (including even a "Hello world" intro program demo for the truly clueless about parallel programming, ie me and those like me) than more useful things, like how to delete all jobs at once.
Those containers do hide some of the complexity of parallel programming, but the programmer still needs a basic understanding to produce a usable result.
This whole article smells of hype...
Blar.
... with C. Then you can't avoid knowing what's going on under the covers.
It's not that tough, really.
If you are working on a single user system such as a word processor, parallelism has little significance. But if you are a Google, wanting to deliver similar but isolated web services to many people, or if you are building a switching exchange such as a scalable email server, VOIP exchange, XMPP server, print server...you are very interested in parallelism. And any system which causes the number of parallel processes to expand in some way coordinated with the number of available cores or core threads is very interesting indeed.
What especially pleases me about this article is that I put my current sig up before it appeared. It now seems slightly prophetic.
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
"One of the reasons more seasoned programmers are not particularly interested is that in most cases someone else has already doen the hard work."
Who is this mythical "someone else"? I'd like to meet them. Incidentaly since when have database triggers been parallel systems? The LAST thing you want in a database is these sort of things running parallel, thats why you have locking!
"It is very rare that you actually need to do parallel programing yourself."
Err , if you count threads as parallel programming then I do it in virtually every project I work on. You talk about POSIX threads as if there some scary API that only people who require real power use. On the contrary , they're an API that probably almost every C/C++ unix coder uses all the time these days.
Brain in neutral, the strange artefact there was an attempt to write 12<N<24.
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
We old farts are not all that scared of doing multi-core development with all the normal tools likes basic threads and sync objects (events, mutex's and semaphores). Making code thread-safe is really not that difficult once you understand the basics (I am using C++ here since that is what I have experience with regarding multi-core/multi-thread experience) of keeping your data safe by either putting it on the stack or use the sync objects to keep global data safe.
Maybe yes, Maybe not, this is all speculation at this point. Just take a look at Google and what they are capable of doing.
Most comments so far seem to concern the type of problems we've dealt with in the past, but there is another type of problem that is emerging. This new type tends to deal with short computations on vast amounts of data rather than long computations on small amounts of data, as we're used to. A few examples;
SDR, software defined radio. One of these can easily saturate a gigabit ethernet link with data, and we'll want the computer to automatically sort signal from noise, and determine if we're interested in the signal. Perhaps we'll even want to do voice recognition on the signal and look for keywords? - A parallel approach is suitable.
Robotics. They'll need sensors, and it would seem that the in the biological equivalent this problem got solved by rendering the right hemisphere of the brain as a parallel computer that sorts through this data for the benefit of serving the left hemisphere with quality data to process in a serial fashion.
Data mining. It's nice if it's Us duing rather than Them. I might want to trawl news sites, web forums and repositories of research papers looking for things I'm interested in without having to offer up my personal preferences to advertising agencies.
All in all it appears that long computations on small amounts of data is actually a lot less common than short computations on vast amounts of data. The configuration of the brain makes a lot of sense in this case.
All rites reversed 2010
If you thought your answer had merit, you could have linked us directly to the COSA project, instead of to a 5-page pointless review of all life as we know it.
And then we could have replied with a quick link to Erlang which has handled 10^5-process fine-grain concurrency in industrial PTT exchanges for a decade very easily, and much time would have been saved.
That's an oversimplified view.
It's more like when you've got enough experience, you already know what can go wrong, and why doing something might be... well, not necessarily a bad idea, but cost more and be less efficient anyway. You start having some clue, for example, what happens when your 1000 thread program has to access a shared piece of data.
E.g., let's say we write a massively multi-threaded shooter game. Each player is a separate thread, and we'll throw in a few extra threads for other stuff. What happens when I shoot at you? If your thread was updating your coordinates just as mine was calculating if I hit, very funny effects can happen. If the rendering is a separate thread too, and reads such mangled coordinates, you'll have enemies blinking into strange places on your screen. If the physics or collision detection does the same, that-a-way lies falling under the map and even more annoying stuff.
Debugging it gets even funnier, since some race conditions can happen once a year on one computer configuration, but every 5 minutes on some hapless user's. Most will not even happen while you're single-stepping through the program.
Now I'm not saying either of that is unsolvable. Just that when you have a given time and budget for that project, it's quite easy to see how the cool, hip and bleeding-edge solution would overrun that.
By comparison, well, I can't speak for all young 'uns, but I can say that _I_ was a lot more irresponsible as the stereotypical precocious kid. I did dumb things just because I didn't know any better, and/or wasted time reinventing the wheel with another framework just because it was fun. All this on the background of thinking that I'm such a genius that obviously _my_ version of the wheel will be better than that built by a company with 20 years of experience in the field. And that if I don't feel like using some best practice, 'cause it's boring, then I know better than those boring old farts, and they're probably doing it just to be paid for more hours.
Of course, that didn't stop my programs from crashing or doing other funny things, but no need to get hung up on that, right?
And I see the same in a lot of hotshots nowadays. They do dumb stuff just because it's more fun to play with new stuff, than just do their job. I can't be too mad at them, because I used to do the same. But make no mistake, it _is_ a form of computer gaming, not being t3h 1337 uber-h4xx0r.
At any rate, rest assured that some of us old guys still know how to spawn a thread, because that's what it boils down to. I even get into disputes with some of my colleagues because they think I use threads too often. And there are plenty of frameworks which do that for you, so you don't have to get your own hands dirty. E.g., everyone who's ever wrote a web application, guess what? It's a parallel application, only it's the server which spawns your threads.
A polar bear is a cartesian bear after a coordinate transform.
The parallel programming being discussed mostly solves the same problems as serial programs, only faster. So if you have code that runs fast enough as a serial program, you are better off solving a different problem than exploring parallel programming. And if you have a program that is running to slowly then you need to work out why. Most often your program is not CPU bound, and moving to the type of parallel program being discussed won't help. And if you have a program this is too slow and CPU bound there are a number of optimizations you can choose, most of which are more localized and simpler than moving to parallel programming. And if all else fails, then maybe you should look at parallel programming for your problem. If you do, frequently the problem you are solving is not the general one, and a simple solution exists that is much less complex than the more general parallel programming being discussed. So maybe older and better developers are looking at more promising solutions, to more important problems, rather than focusing on one type of optimization being pushed by unimaginative hardware vendors. Of course Erlang looks like a fun, so I check it out, but I don't think it is the most important development around.
While most replies seem to go around just "young kids tend to jump on new things" "parallelism has been here for years in inherently parallelizable tasks such as server or graphics", I managed to read TFA, and voila...
>my Threading Building Blocks (TBB) book, sold more copies in Japan in Japanese
>in a few weeks than in the first few months worldwide in the English vesion
>(in fact - it sold out in Japan - surprising the publisher!)
>contributors and users of the TBB open source project are worldwide, but with
>some particularly outstanding users and contributors in Russia
So basically the author didn't know there are already a vast number of programmers outside of the US. This is not surprising, China has already 5x more people than the US. It seems he thought all of software are developed in the US when there are counterexamples such as the Linux kernel and Ruby.
Nothing to see here, move along.
They'll allow you to use threads most happily and not take you too far away from the hardware.
The fact that you actually have to think a bit more about who's accessing what data at what time, and avoid trvial problems like deadlock, does not make it "too hard". And the fact that C is not ideal for parallelising mathematical operations doesn't make it useless either. Threads can be doing totally different things, or be pooled for great joy.
The other day I realized that I don't really know why threads are supposed to be better than processes, other than because "multithreading" sounds cooler and it's not old and boring and well-understood like "multiprocessing". I'm asking this sincerely: why do people only talk about multithreaded processing whenever parallel programming comes up lately? It seems like IPC is marginally easy with threads but that design is much trickier, so what's the big win? Is it a CPU optimization thing?
Dewey, what part of this looks like authorities should be involved?
I've been writing multithreaded and parallel (MPP/MIMD) code for 20 years or so and there were plenty of people writing those kinds of codes long before I started. It's not exactly a new thing.
- There are often other ways to do it, e.g. multiple processes communicating over sockets, or multiple processes that share memory.
- Threads are hard to get right. Really, really hard.
When your library of mutexes, semaphores, etc. doesn't have exactly the construct you need, and you go to write your own on top of them, it's really, really hard not to introduce serious bugs that only show up very rarely. As one random example, consider the Linux kernel team's attempts to write a mutex, as descried in Ulrich Drepper's paper "Futexes are Tricky."If these people take years to get it right, what makes you think *you* can get it right in a reasonable time?
The irony is that threads are only practical (from a correctness/debugging point of view) when there isn't much interaction between the threads.
By the way, I got that link from Drepper's excellent "What Every Programmer Should Know about Memory." It also talks about how threading can slow things down.
Though you say "they're not using a parallel programming language appropriate for their target."
I think too much emphasis is put, by some, on using a high level language that is specifically designed for parallelism. Personally I've always found C and POSIX threads more than adequate.
Name a single real world problem that doesn't parallelize.
Handling event propagation through a hierachy of user interface elements.
Like all pain, suffering is a signal that something isn't right
functional languages. It seems to me one of the most promising angles on this problem is the resurgence of functional languages such as haskell, list and f# and even the adoption of concepts from that world seen in languages such as python and so on. As for US and European interest, for example Microsoft Research have some excelent papers on possible solutions e.g. Software Transactional Memory http://research.microsoft.com/~simonpj/papers/stm/ STM for C# http://research.microsoft.com/research/downloads/Details/6cfc842d-1c16-4739-afaf-edb35f544384/Details.aspx I personally suspect finely grained parallelism is unlikely for the forseeable future for reasons such as existing knowledge of employees and legacy code. But hybrid solutions such as shifting heavy computation to languages suited to easily writing concurrent code (e.g. F#) tying into imperative languages for the event driven side. E.g. C#. Who needs a massively parallel gui anyway? Very few applications right now.
i wish i could stop
You just haven't noticed.
Multi process apps have been common in the business and server app space for almost two decades.
Multi thread apps have been common in the business and server world for a few years now too.
To all having the will it/won't it go mainstream argument: You missed the boat. It's mainstream.
Parallel programming is simply harder than typical sequential programming. Not only does the design take more time and effort, but the debugging is VERY much harder. tools for parallel programming are poor but debugging tools are basically pathetic. Worse, today's project and development methodologies don't focus on getting something up and hacking, not on careful upfront design that is needed to really parallelize things. We get most of our parallelism from the web server being multi-threaded and the database handling concurrency.
As many have said, large scale parallel systems are not new. Just because we need a solution to the problem doesn't mean that it will appear any time soon. Some problems are very difficult and involve not only new technologies and programing models but major re-educational efforts. There are many topics in physics and mathematics that only a small number of people have the intellectual skill and predilection to handle. Of all the college graduates, what percent complete calculus, let alone take advanced calculus? Pretty small number.
My prediction is that the broad base of programmers will have the tools and be able to do some basic parallelism. A small number of programmers will do the heavy duty parallel programming and this will be focus on very high value problems.
BTW, this Intel guy, while addressing a real issue, seemed to be doing marketing for his toolkit/approach. Sounds like a guy trying to secure his budget and grow it for next year.
There are two problems: thread sychronization, and race conditions.
Race conditions
A modern CPU architecture uses multiple levels of cache, which aggrivate the race condition scenario. For a programmer to code multi-threaded code, and "not-worry", then the architecture must always read and write every value to memory. This worst case scenario is only needed in a tiny fraction of cases. So the compiler can do much better by working with the memory model of the architecture, instead of assuming that no memory model is in place.
Any significant improvement in speed will require solving the race condition problem. I believe research into transactional memory may be the way to go - but it is still half the speed of normal memory access.
The synchronization problem
This requires that the programmer reason about multiple threads (at least two). It doesn't really matter what buildling blocks you're using, you simply must be aware that two pieces of code are effectively executing at the same time - or at least their cpu slices are interwoven. This type of reasoning significantly raises the bar for writing bug-free code, because a whole new class of subtle problems arise. The mind must wrap itself around something that is simply more complex, and we have trouble enough with single threaded programs.
I like the flex/javascript solution. Just one thread - with asynchronous-like behaviour through events. It means there are no race conditions, and you can get enough async behaviour to get the job done. I hope, in the future, that both these technologies allow you to create a seperate "process" (thread but with different memory context) that executes asychronously and returns the results on the main event queue. It's a bit clunky, but at the same time it's much more idiot proof. And I'm speaking as someone who's done a reasonable amount of parallel programming.
Like all pain, suffering is a signal that something isn't right
I've made a good chunk of my living from writing high-performance software using parallel algorithms, in C, C++, Fortran, and Java.
My clients? Britain, Brazil, Taiwan, and (yes) Omaha, Nebraska, USA.
Over the last quarter century of coding for a living, the greatest interest in advanced algorithms has been shown by my overseas customers. American companies tend to be conservative and bottom-line oriented. "Foreign" nations emphasize a broad education and creative thinking, thus making them more amenable to complex and new ideas, whereas the United States is focused on producing more MBAs -- and that difference influences everything in society, including software design.
All about me
* It's a difficult problem with no easy solutions.
* people with experience have jobs.
* You keep your job by delivering working solutions, not research.
Connect the dots.
-- Programming with boost is like building a house with lego. It's a cool but I wouldn't want to live in it
It's hard to appreciate, but we are in the middle of a revolution. High-performance, multi-core computing has made significant progress in the last few years and it's causing a lot of headache for software developers who try to exploit the potential power available. Multi-threaded programming does not bring a solution to the masses - understanding and preventing the issues arising from concurrency requires a lot of exposure to it.
The young generation of programmers have the confidence to try out multi-core programming, but only because they yet lack the painful experience of doing it. Whilst hardware is running away towards 100 cores in 5 years time and still keeping up with Moore's "Law", software is sadly lacking such advances. There's a very good white paper over on the 2 Cubed website about the problem The Multicore Revolution and a new framework just out called infoQuanta which looks intersting.
Meanwhile, Sun is tackling the problem from within the language. They've got Fortress Fortress which has a certain Java/C syntax about it. I haven't used it but I would say that it doesn't look like the 21st Century solution that's going to advance software development significantly.
If you run Windows, go to task manager and enable the threads column in the process window. On my system, there are very few processes that only use one thread. I've done a lot of work in embedded applications, and even there most products I worked on were heavily multi-threaded. Even in the simplest embedded applications without an operating system, there is usually some parallelization via interrupts. This raises the question, if so much software both application and embedded is multi-threaded, why do so many people on Slashdot feel that multi-threaded programming isn't useful / too difficult / etc? What kind of software are you guys writing?
> Name a single real world problem that doesn't parallelize.
Childbirth. Regardless of how many women you assign to the task, it still takes nine months.
(feel free to reply with snark, but that's a quote from Fred Brooks, so if your snarky reply makes it look like you haven't heard the quote before you will seem foolish)
In the UK we produced a cell like CPU long before Sony's much hyped cell processor. We even developed a parallel programming language for it.
This was back in the days of 8 and 16-bit processors and the fear was that 16-bit and 32-bit processors weren't going to produce the processing power required in the future.
Obviously these fears were unfounded as the MHz increased dramatically as did caching and other tricks to increase performance.
But those looking for some inspiration should look up the Inmos Transputer and Occam.
But not all. And there is a beautiful (IMHO) compromise.
"First, they give the developer an illusion simplicity in the beginning of development. Each thread minds its own business and inevitably follows its state machine that is plainly visible as a sequential program: receive this, compute that, save the result, send the response, repeat from beginning."
Yes, they do. they also make sense in many ways as a natural actor-driven state of affairs. Client A opens comms with the middle tier, middle tier spawns a thread that deals with client A and it's requests to underlying data sources. Client B connects, spawn another thread etc etc. It encapsulates things quite well. Not that this is a good way of doing things, but more on that later.
Windows does support the whole async io thing now, as far as I'm aware, but that was an issue on some of the earlir windows platforms.
"I've had my share of trying to sew together the multithreaded solutions of less timid developers."
I can certainly appreciate that. Finding meaning in large chunks of other people's code is tricky at the best of times. Add in a good measure of incompetence and a shake of threading, you have a recipe for incomprehensibility. Solution - hire decent staff.
"Locking is missing and cannot be added without deadlocks. It is also virtually impossible to find all places in the code where locking is missing and where it needs to be added."
Design, intelligence and good practice. As with your deadlock example, there are good ways around these situations and they tend to involve a bit of forethought. A needs to lock B and B needs to lock A? Why? Something is probably wrong with your design.
"The second big problem with multithreading is that the threading paradigm requires only one input at a time (after all, multiplexing was not allowed)."
This is not necessarily true. I'm a C programmer and have found that io multiplexing and threads can go hand in hand very nicely. The multiplexing is just a necessary part of writing a capable server or middle tier, and threading provides (when done well) effortless scalability.
"At a bare minimum if a thread is waiting on a socket, it must be receptive to another thread telling it to stop waiting because of a change in circumstances"
There are various interrupt mechanisms availble, or one can have a message pipe that a thread polls along with the socket. This is a bit messy but doesn't require periodic polling or anything else so inefficient.
"It follows that experienced developers generally prefer the single-threaded, event-driven paradigm"
Remove the word "single-threaded" from that and I agree. One of the best solutions (IMHO) for a scalable and yet robust system is the thread pool. You have a manager thread that is responsible for polling various file descriptors and timers, you have a variety of job queues, and you have a pool of threads that take jobs off the queue and run them. You then implement the rest of the program as a series of discrete jobs that (ideally) never block. then you can tune the size of your pool based to get the best out of the machine you're running on.
Multicore, supercomputers, same endless crap. Vendors calim they have great hardware, but ship crappy software tools.
But when I hear:
uttered by a rep from one of the companies that lobbies hard for more and more visas to bring in more foreign IT workers I have to wonder if this isn't just more BS in support of that effoert. It sure doesn't help when he follows it up with:
So not only are you domestic programmers not showing sufficient interest in parallel programming, you're also too old (i.e., more costly) so plan on being replaced by younger, cheaper, foreign programmers.
I fully expect that the next round of Congressional hearings on increasing the cap on visas will include this line of reasoning.
CUR ALLOC 20195.....5804M
For your examples you threw out a bunch of problems that are currently being parallelized just fine by today's software, which would indicate we're not having problems with parallel programming. Name a search engine, database query engine, weather simulation software, etc, that isn't already multithreaded. Where's the issue?
Problems that can and should be parallelized in software already are for the most part. There is no issue here.
Business processes are often serial (step B depends on output of step A). That's what a lot of corporate programmers work on. And even these have steps that are done in parallel or can run multiple instances of a process in parallel. Anyone working on a web application or j2ee infrastructure is probably running lots of their small, serialized problems, in parallel. Again, there is no issue here.
You simply build scalability into the app from the word go. If you're really clever you can build in some tunable parameters, like the size of a thread pool, and have your app make good use of whatever's available.
We hear oh, threads are good, but then I read this propagated by SQLite and written by someone at Berkley.
Then there is my own observation, If I have a mount of work (w) to do n times, then the product (w*n=p) is the total work. If I queue it then I approach total wall time being w*n as well. If I tackle it as multi-threaded, then I start n threads which each do w work. However, because the threads compete for scheduling, and the OS has at least n more context switches, we actually reduce the amount of work being done in any amount of time. In addition, the caches effectively become smaller because you have to share your cache space with other threads. And we've also just added a degree of complexity to the processing because you will likely need critical sections and mutexes.
Which leads me to to my general conclusion (in terms of most software written) that for general programs, the threading should be accomplished by a thread for each unique (in terms of algorithm) processing task. A GUI thread, a database thread, a static service thread. By separating these, you'd enforce a degree of abstraction and concurrency. And all that is needed are callbacks (to async. process the value from the DB and update the GUI). The non-general cases like executing a PHP script as part of a web server (where the algorithm depends on the script file itself), or a highly scalable large problem (i.e. sorting, dynamic programming).
Just my thoughts from an alleged pragmatic programmer.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
Most of the ones you listed: search, neural nets, compilation, and database are ALL memory bandwidth bound. It doesn't matter if the parallelize well because your processor will just be sitting there waiting for the cache misses. FEA and weather simulation (similar) may not be if the update steps are sufficiently computationally intensive and/or the reuse pattern is good.
Parallelization is about *data locality* not parallel processing. So few people understand this that it's shocking. This is why the Cell beats the pants off much faster CPUs -- it's harder to program, but once you've done it you've figured out the data locality.
So it doesn't really matter if you can run them across 100 cores/processors, you'll still only get 1% utilization, as is typically seen on most supercomputers.
Anybody else remember the great clock cycle stall of the 1980's? During that period, Moore's Law operated in a manner closer to its original statement - the big news was the drop in cost per transistor, not raw CPU speed. The general wisdom at the time was that parallelism was going to be the way to get performance.
And then, we entered the die shrink and clock speed up era, clock speeds doubled every 14 months or so for ten years, and we went smoothly from 60 MHz to 2 GHz. Much of the enthusiasm for parallel programming died away - why sweat blood making a parallel computer when you can wait a few years and get most of the same performance?
Clock speeds hit a wall again about five years ago. If the rate of increase stays small for another five years, the current cycle drought will have outlasted the 1980's slowdown. I have a great deal of sympathy for parallel enthusiasm (I hacked on a cluster of 256 Z80's in the early 80's), but I think it won't really take off until we really have no other choice, because parallelism is hard.
To a Lisp hacker, XML is S-expressions in drag.
Sure, there are a class of problems that are embarassingly parallelisable. The ones you've mentioned above are mostly in this category.
Howver, there ARE problems that don't scale easily or appropriately. CPU simulation is one...you can break it down into functional areas, but at some point your inter-processor communication costs more than it buys you.
How about decryption of existing encoded data where the key changes based on data content? You could decrypt more than one data stream at a time, but that doesn't help you handle a single stream any faster. Changing algorithms could help with this, of course.
Some of the existing compression algorithms behave similarly, with the same current problem and the same future solution.
I think that until recently we've gotten used to being able to do the same things faster than before. Now we're having to change the way we look at things, and try and convert those same problems into doing more things at once.
Never has a truer phrase been uttered on /.
Objects have been touted by industry as a panacea for the better part of the last two decades....all it has done is give us bloat and bugs.
The problem isn't "old dog" programmers, the problem is the vast number of mediocre programmers, old-beards and fresh-faced alike. Objects became the favored method because mediocre programmers could understand them.
(And before I got tagged as flame-bait, I am not trying to insult all programmers, I'm merely pointing out that not everyone is a Rob Pike or a Miguel de Icaza)
When McCarthy gave us LISP he had given us the first step on the road to improved programming techniques, unfortunately the keepers of that legacy are the high-priests of Haskell and the Monad. Only programmers with a background in Pure Math can really get to grips with this stuff.
Thankfully there are some descendants of LISP that the mediocre among us can make sense of, Erlang is a good example of this and there is enough capability in the compiler and the toolkits to take advantage of multiple cores right now.
The problem is that the mediocre masses (and their pointy-hared project managers) are stuck in the tar-pit reciting object-babble to each other because the IT industry is no-longer about technology it has reached the point in its evolution where only (business) methods and (management) procedures matter.
What really needs to happen is for the industry to realise one important thing about the preceeding decade: distributed object systems were still-born in the 90's, industry needs to put the paddles away and stop trying to revive the corpse.
It might seem ironic, but think about it. If threads interact with each other all the time, that means most of them will stay blocked just trying to synchronizing access to some data. That's using thread for concurrency (I/O bound tasks), not using thread for parallelism (computation bound tasks).
I once had a signature.
All it takes a good design.
The hard part is debugging after loosing ability for reproducing bugs with equal input.
As for 2 to many cores it really isn't that hard for many applications. And for many others it doesn't really matter because single threaded/process is fast enough.
One thing that could use it is GAMES provided you actually are GUARANTEED to have that many processors available.
The syncronisations are easy if you design it properly. If your coding style is
"think little, try something, debug until works" then you fail, the agile method for multithreaded is just asking trouble.
Design the application for parallerism instead of trying to parallerise each small block, you get far better result. Extreme waterfall for extreme numbers of processors.
And using number of processors for scaling up such application is far easier than trying to speed up said application.
Of course there are things that are hard to parallerise, luckily most of the time you don't need to parallerise them.
Of course I want PC:s with capability of executing 100's of threads at same time, and enough memory bandwith for that.
Why?
Because the easily parallerised problems include.
game AI, optimizing compiler, graphics...
The hard problem is "Here's the code base, make it go faster by using more processors." , and thats lots of realworld situations, but it doesn't mean that parallerism is hard. Fitting parallerism for non-paraller design is hard.
Its like trying to fit square through circle of equal area.
For 2 to many cores, its hard for those who's entire code base is designed for single or two processors.
©God
First of all, it was just one example, of one problem. Not even the hardest, but something that everyone would understand easily.
There's a reason why you see "these types of simple threading problems." Not because you're teh uber-genius and everyone else is teh drooling retard, but because they're supposed to be just that: extremely simple examples. We're not doing a Ph.D. level research paper into parallelization, we're shooting wind on a board, for the benefit of some people which may be nerdier than the average, but include a ton of non-programmers anyway. Trust me, it's not that it's the kind of stuff that stumps everyone but you, but the kind of stuff that you could use to teach your mother about it.
Second, snap out of it. You're essentially doing the same "I'm a genius, everyone else obviously is an idiot" crap act, that's actually the most common the least someone actually understands the domain.
Believe it or not, yes, some of us do know what a "critical section" is. Java even has "synchronized" as a keyword into the language itself, so it's hard to not run into the concept.
But some of us also know what the performance penalties are. It doesn't come for free, you know. Sometimes it _is_ more efficient to just update each player's coordinates in a single loop, than to have a thousand threads which try to lock each other out every dozen lines of code.
More importantly, it's easy to design something that doesn't scale. A very common performance problem, for example, is when all processes wait on one resource that only one can use at a time. E.g., if most processing happens trough a cache, and access to the cache is guarded by a mutex, congrats, 1000 threads on 1000 cores won't run much faster than 1 thread on 1 core. I've actually seen that exact problem in more than one web application.
Just having a concept of "critical sections" won't do jack squat for you there. You end up needing a bit more advanced stuff so you can go as deep as you can into the innards of that cache, before you synchronize access. The Java Concurrent package is there for a reason, and to solve just that kind of a problem, for example.
More importantly, it still needs more clued up (read: more expensive) people to get it right, and even those do occasionally get it wrong. But even skipping past the last part, essentially it means you do need a bigger budget for the cool multithreaded solution. So in the cases where it's not that much of a disadvantage to avoid that (e.g., because the CPU will be idle and waiting for the GPU 90% of the time anyway), there's actually a good economic reason not to.
Again, I'm not saying any of _that_ is unsolvable either, but I _am_ saying that it tends to be slightly more complicated than when looked through the goggles of "I'm such a genius and I've heard about critical sections in college!" _If_ you see the world as that simple and devoid of any other considerations, that's really just your clue that you still have much to learn.
A polar bear is a cartesian bear after a coordinate transform.
Perhaps embedded and hardware-centric work is moving away from the US because the labor costs are too high. US labor tends to lean more toward integration work, and our software effort is for custom "glueware" of sorts. For example, an SQL query is a sequential request, but fulfilling it may make use of parallel processing by the guts of the database engine. (In fact, database servers tend to split the files/tables on multiple disks now so that sequential scanning is split among multiple disks to scan parallel.)
Table-ized A.I.
It would take a lot longer than nine months if it was just that one (cell) processor doing all the work of dividing.
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
I can throw you the very most classic example there is. You cannot parallelize an order matching algorithm for instance. Every order has to be processed in a fixed sequence. If I have a book of bids and asks and you come in with an order, it is a REGULATORY REQUIREMENT that I have to process it in the order it came into the system. This is an absolutely serialized type of operation. I've got to check for a match on your order and either book it or execute. No ifs ands or buts. An entire order matching SYSTEM can handle multiple orders for different instruments in tandem, but any good middleware framework will abstract that level away from where you even need to know about it outside of setting some configuration on your message queues and transaction manager.
Sure, a lot of things can benefit from parallel execution, but MANY tasks inherently contain mandatorially serial sequences of operations. You can try to break them down into steps and execute different steps on different cores/threads at the same time (one can fetch a new message while another performs matching) but what you will find is that in a lot of cases the overhead of the locking, queueing, and handoff of data from one stage to the next, combined with loss of data locality and cache coherency in current systems will actually DEGRADE performance. Many times it is a matter of a trade off between throughput and latency.
At least in some application domains architecture of the data processing system at a higher level is a lot more important than cute optimizations at the code level. I think one of the reasons "younger programmers are more interested in parallel programming" is the ILLUSION created by the fact that they're the ones doing low level coding, whereas I'm up here at a higher level of design where the issue is important, but not from a coding perspective. In other words I really don't care about queues, semaphores, and spinlocks, etc. I care about data flows, scalability, throughput, reliability, and latency, which are largely NOT a coder level issue. So maybe you'd find that senior developers/system architects/analysts ARE concerned about the same issues as 'younger coders', but they see them in a different way, and may not even consider it the same thing at all.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
n/t
tomorrow who's gonna fuss
It seems to me people are looking at this wrong. People keep saying "Its to hard don't do it, or you'll mess up" However the fact is threading + multiple cores does speed things up. Also not doing something simply because its hard, or because you may mess up is foolish at best. If its hard and you may mess it up the best thing to do is to "DO IT ALLOT" the more you do it the better you will get. As you mess up you'll see where you messed up, and be better equipped to NOT mess up like that in the future. People learn from doing, and exploring. The #1 thing that has held back parallel processing today IS that people have been told to avoid it. This retarded the development of the field, and needs to be amended.
Its very simple, and nobody can avoid it. If you have ANY shared resource, then every process/thread/whatever (and it does not matter how you 'hide it under the hood' of your code) then you will be subject to Church's Law. Not to say we cannot or don't need something better than threads, just that as you said, there are problems that just CANNOT achieve much extra performance from parallel execution.
So we come to the question of ARCHITECTURE and design. When you have a problem like that, the only viable approach is to attempt to recast it into another problem, and that is a language/toolkit independent issue because the problem itself is not a language/toolkit/framework level issue.
The conclusion being that it is not possible to just 'abstract away' all parallelism considerations from software engineering. At best maybe some day we'll make tools so smart and so high level that THEY will deal with the problem for us. That will be nice and all, but it won't make it go away.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
When OS/2 version 2.0 was first released about 15 years ago it provided real multitasking for applications. Windows 3.1 at the time was moribound as a thin shell over DOS, which itself was little more than a glorified interrupt handler. Within a couple of years Warp was released and by then x486's were the rage running at 33MHz; 66 for a DX2 cpu. OS/2, however, had started providing multitasking on 80286 processors several years prior to Warp.
Microsoft and the industry trade press (which Microsoft owned through advertising support if not in name) started a campaign that marginalized multitasking as no big thing. That mindset has persisted to this day. I was lucky (or unlucky depending on how you look at it) that I started designing and programming using threads on a pre-emptive multitasking OS almost as soon as OS/2 2.0 was released. I guess I can count myself in the camp that has been programming for less than 15 years, even though the actual number is higher than that.
Nice post. I look at these other posts and wonder how you can see the world as anything other than parallel.
If you've got 1 or 2 thousand processors you could have a language that picks up whatever current objects are created and "run them" parallel to each other.
I don't know much about programming but it seems to me that although c++ is object oriented it must ultimately flow in one direction through the cpu. (bad analogy I know, please don't call me the next internet tubes guy) If you've got multiple processors you could be loading each object with it's own entire program into each processor. Just like memory, each processor is used and reused for all the various objects or programs. I imagine it's best for real simulations. Like if you REALLY want to simulate a world. Not just a car that can car.moveForward or car.turnLeft or car.turnRight but all the physics of the car, all the possible variables, etc.
I think we've already got the solution with object-orientation but I think most programmers just need to change the way they THINK about it.
I suppose though that this might be overpowered for a simple accounting application but that's what we have Microsoft for, no one else is better at making bloated apps which suck up every resource you can give it.
Haskell has parMap for that sort of situation. I've been using parMap to speed up a raytracer I'm writing. I'm getting less than linear speedups for some unknown reason, but it does help and it's easy to use. (I switched from ocaml to haskell partly because I wanted the parallelism, and partly because I wanted to learn haskell.)
N/T
On a google screening question I offered a solution that distributed a task across multiple processors using threads - and they said it was wrong. They then changed the question to limit me to one processor. Why? Because google requires uniform box-like thinking.
(Trollin'... Trollin'... Trollin' on -a- Slashdot do do do do do do)
It is easy to demonstrate the statement false for non-symmetric distributions. For example, let's say, that a class of 20 students took a test and 19 people got one point out of ten and only one person got full score of ten points. In that case, 90 percent of the students scored below average.
Please do not confuse the terms. The statement would be much more true if you replace the term average with median.
I know people will scoff, but plain old widely-despised Ada has tasking built right into the language, not as an add-on library. No, it doesn't prevent common parallel-programming mistakes, but it does try to help you avoid them, and in my experience makes them easier to find when you do make them.
In a lot of other languages, the decision to use threads is a pretty big one, prefaced by a lot of chin wagging, hand wringing, and soul searching. When I'm writing in Ada, tasks are just another tool in the box, to use or not depending on the demands of the app.
First of all, the quote is "Nine women can't have a baby in one month." Parallelism shortens time (to one month) by increasing processors (women). There is only one product (a baby).
That said, pregnancy is a good real world parallelism example. The baby's brain, skeleton, liver, & everything else are growing at the same time. Sure, some body parts must be grown before others. Humans are multi-tasking chemical machines.
OMG! 256 Z80s ! What kind of hardware was that?
Sigh. Off topic, but: that's the median you are talking about. 1, 1, 1, 2, 15 gives an average of 4 with 80% below average.
Mods been smoking crack again?
Yeah.
I remember a research project published about ten years ago, SF-Express, that was a large scale simulator for tank/infantry wars which ran in realtime to allow it to be hooked up to a live player's vehicle simulator. They simulated an entire Gulf War scale battlefield across thousands of processors, where each processor handled a few dozen agents (tanks, soldiers, aircraft).
It's amusing (but not surprising) to hear the slashdot crowd continue to pronounce parallelization impossible at every turn. There seems to be a new NIHS: "not imagined here syndrome"...
I must concur with the opinion about erlang (or more generally, functional languages) and parallel programming.
IMO the most important feature which functional languages have and proceedural don't have which make them appropriate for parallel programming is single assignment. That is once 'X' is assigned a value it can never change. This means that sharing 'X' with other threads is a no-brainer, because nothing can go wrong.
You are forced by the languages to specially type (in Haskell) or access (dictionaries in Erlang) those variables which are going to hold shared state. Shared state is not the default like it is in C, java, etc... .
This means many fewer mistakes, especially with multiple developers reusing code, because you can easily see in the source code which variables are shared and need special access.
The down side of single-assignment is memory usage. If you are indeed looping (recursing) and doing it in such a way that the compiler cannot optimize away the old X, you rapidly use up heap space storing every new X. This uses O(N) space, rather than O(1) like the change-X-in-place C code would, and that puts pressure on the caches and the memory bus. Hopefully the memory bandwidth will continue to scale with the # of cores.
The other features needed for parallel programming are a green-threads (userspace-threads) and light weight message passing. But you achieve those just as well in C with a library or a bit of code.
A and B negotiate policy as a thread distinct from A and B. BARIER(until A and B complete). A and B continue in isolated threads.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
You don't have to go super-high-level languages to get proper tasking support in a programming language:
http://en.wikibooks.org/wiki/Ada_Programming/Tasking
Multitasking programming has been part of a large collection of programming language for quite some time now - Only the generation "worse is better" [1] with there {}-languages [2] made us forget all the cool stuff computer scientist conceived in the 70th and early 80th.
Martin
PS: "worse is better" has the advantage in the short and medium term - but in long term the low priority of Consistency and Completeness will come back to haunt you. Example? Well, how many full featured C99 compiler do you know?
[1] http://en.wikipedia.org/wiki/Worse_is_better
[2] http://en.wikipedia.org/wiki/Category:Curly_bracket_programming_languages
"mutexes, semaphores" tricky indeed but well there is a far easier alternative called "Rendezvous":
http://en.wikibooks.org/wiki/Ada_Programming/Tasking#Rendezvous
And the funny thing is: it's not a new concept - it has been conceived in the late 70th.
Martin
I had no idea it would be findable on the web - the Digital Library of India did a real nice job of scanning a paper on it into a nice, searchable OCRed PDF:
http://dli.iiit.ac.in/ijcai/IJCAI-81-VOL-2/PDF/071.pdf
ZMOB consisted of 256 home-grown Z80 boards and 256 home-grown communication boards. The comm hardware controlled a 48-bit wide bus that was essentially a shift register running in a loop around all 256 processor/comm pairs.
If we had started the project a few years later, we would have probably used 68000 processors, or maybe 8085/8087.
ZMOB was Maryland's first big hardware project, and we learned a lot about how not to do projects of this nature, like the worst ways to maneuver money through state channels, and that when you do large scale machines, careful signal engineering matters.
ZMOB per se was a failure - we never really got all 256 machines running at once. The comm hardware was eventually broken up into smaller 16 and 32 machine loops, and the Z80s were replaced by 68000s.
To a Lisp hacker, XML is S-expressions in drag.
And now you came to the nut of the problem. Now your operations have to be reversible, and at some point the sequence dependency has to be resolved. I'd look at it this way. Your solution is virtually guaranteed to be significantly more complex, and in fact in a practical system there is already a good deal of (to the application programmer) parallel activity going on, assuming you're using good tools.
For example the search for potential matches is going to be a database table SELECT, which any good RDBMS will execute as a (at least potentially) parallel scan. However, you can't with any existing modern tools do something like have several threads submitting different orders into the algorithm in parallel because THERE IS NO WAY FOR ONE MESSAGE TO KNOW THAT OTHERS EXIST. Parse the problem. Messages arrive continuously, each one permutes the existing state of the book.
The real problem with any attempt to do all this asynchronously is practical. The incoming message rate can be almost arbitrarily high. Suppose you dispatch a thread to do the match for each incoming message. There is no knowable guarantee is to the order of execution within the set of all currently running threads. You may have to backtrack an arbitrary number of steps, N. N is simply the number of received but not yet retired messages at any given time. Also, backtracking IS very expensive in real terms, at the very least a 'delta' has to exist between before and after state for each operation so it can be reversed. Plus you have no real way of knowing when you can throw away one of those deltas because you don't know what the value of N is. It is possible 1000 messages poured in in the last 2nd and the whims of thread scheduling could mean message number 1000 hit the book first, but you don't know the value of N, so all you can essentially do is keep the data required for reversal basically forever.
Now I can sketch out in my head how in theory you could do it, but frankly the overhead of the whole thing is greater than just accepting that there is a bottleneck, reducing the critical section to the smallest extent possible, and optimizing it highly for serial execution.
I think this is in essence the nature of the problem. Certain types of things are just not naturally amenable to parallel execution BY THEIR VERY NATURE, and won't ever be. And no tool will tell you that. And any tool that lets you deal with it will still have to implicitly or explicitly deal with all of the same factors and be just as complex. Plus there is just the problem of generality. One could ask why APIs don't simply become so high level and abstract that programmers never have to deal with nuts and bolts? Nothing about the structure of existing software libraries disallows that, but the higher the level you operate at, the more problem domain specific your code has to be, or else it has to become SO generalized that you just move the problem to how to specify exactly which of the 9000 supported ways things work you actually wanted.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
In fact there's probably some theorem which covers that, lol. Church's Law however does apply to any exclusive shared resource, so at SOME level there is a potential bottleneck, even if the critical section is one line of code. That would be the best case and then chances are the load would need to be huge.
Sure, any system can only have some maximum throughput. I think in the case I'm talking about the real observation is that updating the book data structure (RDBMS tables or whatever the concrete implementation is) is an operation which is critical and its performance dominates the performance of the overall application. I think you find a significant amount of these cases in OLTP type applications. Luckily processors are fast, so small critical sections are annoying, but you can live with them...
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson