Is Parallel Programming Just Too Hard?
pcause writes "There has been a lot of talk recently about the need for programmers to shift paradigms and begin building more parallel applications and systems. The need to do this and the hardware and systems to support it have been around for a while, but we haven't seen a lot of progress. The article says that gaming systems have made progress, but MMOGs are typically years late and I'll bet part of the problem is trying to be more parallel/distributed. Since this discussion has been going on for over three decades with little progress in terms of widespread change, one has to ask: is parallel programming just too difficult for most programmers? Are the tools inadequate or perhaps is it that it is very difficult to think about parallel systems? Maybe it is a fundamental human limit. Will we really see progress in the next 10 years that matches the progress of the silicon?"
Parallel programming isn't all that hard, what is difficult is justifying it.
:-)
What's hard, is trying to write multi-threaded java applications that work on my VT-100 terminal.
I can't speak for the rest of the world, or even the programming community. That disclaimer spoken, however, I can say that parallel programming is indeed hard. The trivial examples, like simply running many processes in parallel that are doing the same thing (as in, for example, Monte Carlo sampling) are easy, but the more difficult examples of parallelized mathematical algorithms I've seen, such as those in linear algebra are difficult to conceptualize, let alone program. Trying to manage multiple threads and process communication in an efficient way when actually implementing it adds an additional level of complexity.
I think the biggest reason why it is difficult is that people tend to process information in a linear fashion. I break large projects into a series of chronologically ordered steps and complete one at a time. Sometimes if I am working on multiple projects, I will multitask and do them in parallel, but that is really an example of trivial parallelization.
Ironically, the best parallel programmers may be those good managers, who have to break exceptionally large projects into parallel units for their employees to simultaneously complete. Unfortunately, trying to explain any sort of technical algorithm to my managers usually exacts a look of panic and confusion.
-Ryan
AUWYHSTOT (Acronyms are Useless When You Have to Spell Them Out Too)
Implement it, add CPUs, earn billion$. Just Google it.
Oh noes! Software doesn't get churned out immediately upon the suggestion of parallel programming! Programmers might actually be debugging their own code!
There's nothing new here: just somebody being impatient. Parallel code is getting written. It is not difficult, nor are the tools inadequate. What we have is non-programmers not understanding that it takes a while to write new code.
If anything, that the world hasn't exploded with massive amounts of parallel code is a good thing: it means that proper engineering practice is being used to develop sound programs, and the jonny-come-lately programmers aren't able to fake their way into the marketplace with crappy code, like they did 10 years ago.
Aside from my usual lament that people already call themselves programmers when they can fire up Visual Studio, parallelizing your tasks opens quite a few cans of worms. Many things can't be done simultanously, many side effects can occur if you don't take care and generally, programmers don't really enjoy multithreaded applications, for exactly those reasons.
And often enough, it's far from necessary. Unless you're actually dealing with an application that does a lot of "work", calculate or display, preferable simultanously (games would be one of the few applications that come to my mind), most of the time, your application is waiting. Either for input from the user or for data from a slow source, like a network or even the internet. The average text processor or database client is usually not in the situation that it needs more than the processing power of one core. Modern machines are by magnitudes faster than anything you usually need.
Generally, we'll have to deal with this issue sooner or later, especially if our systems become more and more overburdened with "features" while the advance of processing speed will not keep up with it. I don't see the overwhelming need for parallel processing within a single application for most programs, though.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
I can see this going down in cubicles all through the gaming industry. The game is mostly coming together, the models have been tuned, textures drawn, code is coming together, and the coder goes to the pointy haired boss.
Coder: We need more time to make this game multithreaded!
PHB: Why? Can it run on one core of a X?
Coder: Well I suppose it can but...
PHB: Shove it out the door then.
If flight simulator X is any indication (a game that should have been easy to parallize) this conversation happens all the time and games are launched taking advantage of only one core.
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
For this generation of "average" programmers, yes its too hard. Its the programming language, stupid. The average programming language has come a remarkably short distance in the last 30 years. Java and Fortran really aren't very different, and neither is well suited to paralellizing programs.
Why isn't there a mass stampede to Erlang or Haskell, languages that address this problem in a serious way? My conclusion is that most programmers are just too dumb to do major mind-bending once they've burned their first couple languages into their ROMs.
Wait for the next generation, or make yourself above average.
Well, the way they 'teach' programming nowadays, programmers are simply just typists.
No need to worry about memory management, java will do it for you.
No need to worry about data location, let the java technology of the day do it for you.
No need to worry about how/which algorithm you use, just let java do it for you, no need to optimize your code.
Problem X => Java cookbook solution Y
It may come in useful for the numerous beowolf clusters I have imagined on slashdot.
Though there are many very good parallel programmers who make excellent use of the Message Passing Interface, we are entering a new era of parallel computing where MPI will soon be unusable. Consider when the switch was made from assembly language to a programming language - when the "processor" contained too many components to be effectively programmed with machine language. That same threshold has long since passed with parallel computers. Now that we have computers with more than 100 thousand processors and are working to build computers with more than a million processors, MPI has become the assembly language of parallel programming. It hence, needs to be replaced with a new parallel language that can controll great numbers of processors.
In one word, the answer is yes. It's difficult for people who are used to programming for a single CPU.
Programmers that are accustomed to non-parallel programming environments forget to think about the synchronization issues that come up in parallel programming. Several conventional programs do not take into account synchronization of the shared memory or message passing requirements that come up for these programs to work correctly in a parallel environment.
This is not to say that there will not be any progress in this field. There will be and there has been. The design techniques and best practices differ for parallel programming than for the conventional programming. Also currently there is limited IDE support for debugging purposes. There are already several books on this topic and classes in the universities. As the topic becomes more and more important, computer science students will be required to take such classes (as opposed to it being optional) and more and more programmers that know and are experts in parallel programming will be churned out. It's just not as popular because the universities don't currently seem to make it a required subject. But that will change because of the advancement in hardware and more market demand for expert parallel programmers.
Our brains might be limited about other things, but this is just a matter of better education. 'Nuff said.
Since this discussion has been going on for over three decades with little progress in terms of widespread change
Funny, I've seen an explosion in the number of compute clusters in the past decade. Those employ parallelism, of differing types and degrees. I guess I'm not focused as much on the games scene - is this somebody from the Cell group writing in?
I mean, when there's an ancient Slashdot joke about something there has to be some entrenchment.
The costs are just getting to the point where lots of big companies and academic departments can afford compute clusters. Just last year the price of multi-core CPU's made it into mainstream desktops (ironically, more in laptops so far). Don't be so quick to write off a technology that's just out of its first year of being on the desktop.
Now, that doesn't mean that all programmers are going to be good at it - generally programmers have a specialty. I'm told the guys who write microcode are very special, are well fed, and generally left undisturbed in their dark rooms, for fear that they might go look for a better employer, leaving the current one to sift through a stack of 40,000 resumes to find another. I probably wouldn't stand a chance at it, and they might not do well in my field, internet applications - yet we both need to understand parallelism - they in their special languages and me, perhaps with Java this week, doing a multithreaded network server.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
It's very easy to just say "Parallel programming is too hard", and try to ignore it. But consider this: the languages we use today are designed to work in the same way as the CPUs of old: one step at a time, in sequence, from start to finish.
... there's a hell of a lot to think about, and you only have to get it wrong once to introduce a very hard to reproduce and debug problem. Other languages are more abstract, and have much greater opportunities for compilers to extract the inherent parallelism; it's just that they have had more use to date in academia to illustrate principles than in the real world to solve problems.
A special case of this is seen in the vector units found in today's CPUs: MMX, SSE, Altivec, and so forth. You can't write a C compiler that takes advantage of these units (not easily, anyway), because the design of C means that a programmer takes the mathematics and splits it up into a bunch of sequential instructions. Fortran, on the other hand, is readily adapted, because the language is designed for mathematics; you tell it what you need to do, but not how to do it.
In the same way, trying to cram parallelism into a program written in C is a nightmare. Semaphores, exclusive zones, shared variables, locking, deadlocks
As time marches on, and the reality of the situation becomes increasingly obvious, I would expect that performance-intensive apps will start to be written in languages better suited to the domain of parallel programming. Single threaded apps will remain - eg, Word doesn't really need any more processing power (MS' best efforts to the contrary notwithstanding) - and C-like languages will still be used in that domain, but I don't think inherently sequential languages, like C, C++, and others of that nature, will be as common in five or ten years as they are today, simply because of the rise of the parallel programming domain necessitating the rise of languages that mimic that domain better than C does.
please someone tag yes.
wtf happened anyway? the tags got all boring.
The problem with parallel programming is we don't have the right set of primitives. Right now the primitives are threads, mutexes, semaphores, shared memory and queues. This is the machine language of concurrency - it's too primitive to effective write lots of code by anyone who isn't a genius.
What we need is more advanced primitives. Here are my 2 or 3 top likely suspects:
- Concurrent Sequential Programs - CSP. This is the programming model behind Erlang - one of the most successful concurrent programming languages available. Writing large, concurrent, robust apps is as simple as 'hello world' in Erlang. There is a whole new way of thinking that is pretty much mind bending. However, it is that new methodology that is key to the concurrency and robustness of the end applications. Be warned, it's functional!
- Highly optimizing functional languages (HOFL) - These are in the proto-phase, and there isn't much available, but I think this will be the key to extremely high performance parallel apps. Erlang is nice, but not high performance computing, but HOFLs won't be as safe as Erlang. You get one or the other. The basic concept is most computation in high performance systems is bound up in various loops. A loop is a 'noop' from a semantic point of view. To get efficient highly parallel systems Cray uses loop annotations and special compilers to get more information about loops. In a functional language (such as Haskel) you would use map/fold functions or list comprehensions. Both of which convey more semantic meaning to the compiler. The compiler can auto-parallelize a functional-map where each individual map-computation is not dependent on any other.
- Map-reduce - the paper is elegant and really cool. It seems like this is a half way model between C++ and HOFLs that might tide people over.
In the end, the problem is the abstractions. People will consider threads and mutexes as dangerous and unnecessary as we consider manual memory allocation today.
Parallel programming and construction share one crucial fundamental requirement: Proper communication. But building a house is much easier to grasp. Programming is abstract.
I think part of the problem is, that many programmers tend to be lone wolves, and having to take other people (and their code, processes, and threads) into consideration is a huge psychological hurdle.
Just think about traffic: If everyone were to cooperate and people wouldn't cut lanes and fuck around in general we'd all be better off. But traffic laws are still needed.
I figure what we really need is to develop proper guidelines and "laws" for increased parallelism.
Disclaimer: This is all totally unscientific coming from the top of my head...
.: Max Romantschuk
I'd like to think compilers should be smart enough to assess data dependencies and space those instructions out- we've always had to do this anyway, at least since pipelined processors hit the market, but loops still aren't cascaded properly. An example is a loop that calculates a sum of products- the add instruction must wait for the multiplication instruction to finish, when in fact the processor could be doing a heap of multiplications, and using associativity to cut down dataflow problems in the add stage. Spreading the dataflow graph out as much as possible at compile time also helps with cache coherency between many processors.
I think a program compiled that way would need hardware that will understand that the data dependencies are spread out so that it can distribute instructions among the processors, although the distribution could be very simple if the dependencies could be spread out significantly- instructions could almost be distributed like dealing cards. It's a much finer granularity than threading, but I think more applications suit this sort of parallelism.
Another barrier to parallel programming is accessing [for read] data that should be global to all threads. You can do it by passing pointers to state [messy], using globals [dangerous, obese] or by copying all the data onto the stack of the thread [slow]. Threads need to really share address space- use the same stack and everything, IMHO.
We at slashdot are scientists, specialists and kernel hackers. Your FUD will be found out.
Splitting up a single task can be a lot of trouble, but modern programs, games and applications with lots of creeping featurism, can benefit enormously by sticking those tasks in separate threads and letting them run on different processors. Time effects on 3D scenes, AI for NPCs, real-time changes in embedded data, all can be offloaded, which improves the response time for the main thread to the user.
Finding errors can be hard enough, but it becomes much harder when they are non-repeatable. That's exactly the sort of bug you get with multi-threaded programming. Run a function 99 times, fine. Run it the 100th time, bang!. On the 100th run the threads were juggled by the OS in such a way that your flawed programming suddenly shows up and the threads clash. I've already used some graphics software that had this exact problem. Thinking in a parallel/multi-threaded way may be a challenge, but I think debugging is the real issue.
is that our brains do NOT need to think in parallel to write solid parallel code. Just certain design principles need to be used and it is happening all the time.
And someone said languages are a limitation, and that's probably partially true. The most popular programming languages don't make it easy for a programmer to write parallel code. The ones that do are not so popular yet. So there's a little gap there, but as hardware technology grows more powerful, that will change.
Parallel programming doesn't have to be quite as painful as it currently is. The catch is that you have to face the fact that you can't go on thinking with a sequential paradigm and have some tool, library, or methodology magically make everything work. And now, I'm not talking about functional programming. Functional programming is great, and has a lot going for it, but solving concurrent programming issues is not one of those things. Functional programming deals with concurrency issues by simply avoiding them. For problems that have no state and can be coded purely functionally this is fine, but for a large number of problems you end up either tainting the purity of your functions, or wrapping things up in monads which end up having the same concurrency issues all over again. It does have the benefit that you can isolate the state, and code that doesn't need it is fine, but it doesn't solve the issue of concurrent programming.
No, the different sorts of paradigms I'm talking about no shared state, message passing concurrency models ala CSP and pi Calculus and the Actor Model. That sort of approach in terms of how to think about the problem shows up in languages like Erlang, and Oz which handle concurrency well. The aim here is to make message passing and threads lightweight and integrated right into the language. You think in terms actors passing data, and the language supports you in thinking this way. Personally I'm rather fond of SCOOP for Eiffel which elegantly integrates this idea into OO paradigms (an object making a method call is, ostensibly, passing a message after all). That's still research work though (only available as a preprocessor and library, with promises of eventually integrating it into the compiler). At least it makes thinking about concurrency easier, while still staying somewhat close more traditional paradigms (it's well worth having a look at if you've never heard of it).
The reality, however, is that these new languages which provide the newer and better paradigms for thinking and reasoning about concurrent code, just aren't going to get developer uptake. Programmers are too conservative and too wedded to their C, C++, and Java to step off and think as differently as the solution really requires. No, what I expect we'll get is kluginess retrofitted on to existing languages in a slipshod way that sort of work, in as much as it is an improvement over previous concurrent programming in that language, but doesn't really make the leap required to make the problem truly significantly easier.
Craft Beer Programming T-shirts
I've worked with parallel software for years - there are lots of ways to do it, lots of good programming tools around even a couple of decades back (my stuff ranged from custom message passing in C to using "Connection-Machine Fortran"; now it's java threads) but the fundamental problem was stated long ago by Gene Amdahl - if half the things you need to do are simply not parallelizable, then it doesn't matter how much you parallelize everything else, you'll never go more than twice as fast as using a single thread.
Now there's been lots of work on eliminating those single-threaded bits in our algorithms, but every new software problem needs to be analyzed anew. It's just another example of the no-silver-bullet problem of software engineering...
Energy: time to change the picture.
Along those lines, Sun's work on threading was largely in support of adapting applications to run on multiprocessor systems. This work has been going on for more than a decade and has received additional impetus with the 'Niagara' series processors. Seems to me that Intel has finally seen the light.
A Shadeless room is a brighter room.
Instead of continuing to write software in imperative languages like C and Java and dealing with the added complexity of multiple threads, race conditions, etc. it seems to this AC a bit simpler to just start pushing functional programming as a solution: let the compiler do the work.
Historical precedent: as processors and their instruction sets became more complex and varied, there was no "crisis" of programmers failing to adapt to more complicated assembly language; instead the solution was to start progamming in "higher level" languages (ya know like Cobol and Fortan!).
if you want to solve the problem, think BeOS. Join Haiku or code for it www.haiku-OS.com. If you want to learn about how parallelism work in that OS, just take a look at the BeBook (the API reference easy to find on the web) or one of the now free Oreily book on the subject.
I've been doing true parallel programming for the better part of 20 years now. I started off writing kernel code on multi-processors and have moved on to writing distributed systems.
Multi-threaded code is hard. Keeping track of locks, race conditions and possible deadlocks is a bitch. Working on projects with multiple programmers passing data across threads is hard (I remember one problem that took days to track down where a programmer passed a pointer to something on his stack across threads. Every now and then by the time the other thread went to read the data it was not what was expected. But most of the time it worked).
At the same time we are passing comments back and forth here on Slashdot between thousands of different processors using a system written in Perl. Why does this work when parallel programming is so hard?
Traditional multi-threaded code places way too much emphasis on synchronization of INSTRUCTION streams, rather than synchronization of data flow. It's like having a bunch of blind cooks in a kitchen and trying to work it so that you can give them instructions so that if they follow the instructions each cook will be in exactly the right place at the right time. They're passing knives and pots of boiling hot soup between them. One misstep and, ouch, that was a carving knife in the ribs.
In contrast, distributed programming typically puts each blind cook in his own area with well defined spots to use his knives that no one else enters and well defined places to put that pot of boiling soup. Often there are queues between cooks so that one cook can work a little faster for a while without messing everything up.
As we move into this era of cheap, ubiquitous parallel chips we're going to have to give up synchronizing instruction streams and start moving to programming models based on data flow. It may be a bit less efficient but it's much easier to code for and much more forgiving of errors.
I went to the Game Developers Conference and went to a session where a gentleman from either IBM or Intel gave a talk on how to utilize extra cores and yet not penalize those that did not have multicore. As an aside, Intel also offers something called the "Building Blocks" library which parallelizes primitive things such as for loops. Anywho, back to something game specific, a few of the suggestions below:
Particle physics for client-side visual effects (omit on non multi-core machines)
Animate faces more, smoother more natural animations when walking up stairs/inclines (again more static and rigid on single core machines)
Animate cloth/hair
And my personal favorite
Dynamic texture/model tessellation. (models far away are less complex than models close up, make the transition smoother)
So, whats my opinion? It's the libraries and programming languages and compilers that will change, the programmer just needs to have an idea of synchronization issues and whatnot.
If you are about to mod me down, keep in mind that this post was most likely sarcastic.
Well, I was a Teaching Assistant of 'Concurrent Algorithms'. There used to be 30 students for that class. For a given assignment there used to be 30 different solutions and many used to be wrong.
It is difficult to find out errors in these kind of algorithms. There has been some papers published in big conferences and it was found that some mathematical proofs were wrong because they could not think of some scenarios where it can fail.
Spam: Any activity on internet to gain popularity without paying to advertising companies like Google.
One reason that parallel programming is hard is simple: lots of things can't be parallelized effectively. The computer is really doing one thing, namely, waiting for input (ie: the app is user-bound). If your app isn't user bound, then parallelism is possible.
If your app isn't user bound, it should be pretty easy to parallelize, assuming that your processing stream isn't serialized (ie: the does a result depends on the previous result?). The problem then becomes contention. For some reason, lots of developers can't wrap their heads around threading and contention issues. Maybe it's been presented incorrectly in textbooks, or something like that. People have problems programming asynchronously as well. Maybe that's the problem?
Anyhow, once you figure out the contention issue, then there are all kinds of other things you have to deal with, like asynchronous status notification. Timeouts. Data that never comes back. Unexpected termination. Data coherency. Read and write locking. Synchronization.
In the end, sometimes it's easier to break your data set up into chunks and spawn a couple of processes in the background, then use cat to aggregate the data sets. It's definitely easier to do it that way (ie: not a lot of mental lifting involved).
Most developers don't care about scalability. The ones that do parallelize. The ones that don't don't.
No, parallel programming isn't "too hard", it's just that programmers never learn how to do it because they spend all their time on mostly useless crap: enormous and bloated APIs, enormous IDEs, gimmicky tools, and fancy development methodologies and management speak. Most of them, however, don't understand even the fundamentals of non-procedural programming, parallel programming, elementary algorithms, or even how a CPU works.
These same programmers often think that ideas like "garbage collection", "extreme programming", "visual GUI design", "object relational mappings", "unit testing", "backwards stepping debuggers", and "refactoring IDEs" (to name just a few) are innovations of the last few years, when in reality, many of them have been around for a quarter of a century or more. And, to add insult to injury, those programmers are often the ones that are the most vocal opponents of the kinds of technologies that make parallel programming easier: declarative programming and functional programming (not that they could actually define those terms, they just reject any language that offers such features).
If you learn the basics of programming, then parallel programming isn't "too hard". But if all you have ever known is how to throw together some application in Eclipse or Visual Studio, then it's not surprising that you find it too hard.
I've encountered two problems with parallel programming 1. For applications which are constantly being changed under tight deadlines, parallel programming becomes an obstacle. Parallelism adds complexity which hinders quick changes to applications. This is a very broad generalization, but often the case. 2. Risk. Parallelism introduces a lot of risk, things that arent easy to debug. Some problems I faced happened only once every couple of weeks, and involved underlying black-box libraries. For financial applications, this was absolutely too much risk to bear. I would never do parallel programming for financial applications unless management was behind it fully (as they are for HUGE efforts such as monte carlo simulations and VaR apps.)
First I'll put a bit over here
Then I'll put a bit over there
Watchin' all those bits go in
Then watch 'em do that I/O again
I'm sittin' at my desk all day
Watching the bits toil away
Ooo, I'm just sittin' at my desk all day
Wastin' time
What?
erlang is interesting because it presents a parallel platform that scales to the number of cores available. ultimately one of the strongest features of modern code is the ability to run across numerous hardware architectures and on diverse group of processors. as a result, it's usually best to follow the old mantra in CS to program for the 'worst case scenario' which manifests itself in single core processors today.
parallel programming isn't impossible, it's just not that worthwhile right now for 99% of coders
No, the different sorts of paradigms I'm talking about no shared state, message passing concurrency models ala CSP and pi Calculus
You need to be careful with these paradigms. They all use an interleaved concurrency model as the basis for their semantics. This is OK as a way of defining semantics, but not for execution in a truly parallel environment. In essence, the semantic model says that running computation A in parallel with computation B is the same as executing "A" then "B" or "B" then "A". Notice that the parallelism has been removed to make the semantics easier to reason about.
In a truly parallel application, you need to be very sure that executing "A" in parallel with "B" is not actually going to break anything. Often this means some implicit synchronisation, which is what we're trying to remove. There are better models around. Petri nets are truly parallel, but also introduce some synchronisation requirements. Models based on Winskel's event structures are somewhat better, but not that widely known or understood.
So, there is still some way to go here.
Up thread someone wondered if we'd make the same rates of progress in parallel programming as chip design (Moore's law) .... guess what hardware design these days is largely programming, and highly programming at that .... and it's done by mere mortals. I have a decade in logic design and maybe 15 more years as a programmer - largely as a kernel hack and doing embedded systems - they're really all the same stuff - spending a lot of time doing logic helps you hone those parallel programming skills - after a while the deadlocks and places that need locks just start to become obvious ....
Note to peanut gallery: favorite gdb debugging command "t a a bt"
The basic problem is one of optimization. THERE WOULD BE ABSOLUTELY NO REASON TO EVER PROGRAM IN PARALLEL IF COMPUTERS WERE FAST ENOUGH. This means that there needs to be a separation of concerns between the algorithm expressed serially and it's parallel optimization. So far, automatic parallelization has had only limited sucess and there is no reason to believe this will change in the near future. This suggests a three-fold architecture.
1) A parallelizable (but not parallel) serial language for "every" programmer to write in.
2) A specialized "mapping" meta-language for optimization experts, that would parallelize expressions in the serial language in an optimal way for the target hardware.
3) The resulting machine code.
NOBODY SEEMS TO BE WORKING ON THIS APPROACH. In fact object-oriented programming with it's state (object) based approach works dead against parallel programming. The parallizable/serial language would probably have to be functional.
There's long been the comment that multi-threaded programming was going to be the next quantum shift, much like OOP was. And the difficulty was going to be the same -- namely, getting the programmer's brain to wrap around the concept. The shift from OOP to threaded programming is likely to be at least as difficult as the shift from linear to OOP.
What I'm seeing here (caveat: not a programmer by trade, only a lowly QA ... but I do have a rudimentary awareness of programming) is that the tools aren't fully ready. This is something I could fully believe -- I've been witness to the development cycle without tools, and the development cycle with tools. It's equivalent to editing a photo with MS Paint vs. Photoshop.
As for those accusing programmers of being lazy, I'll pass on one question posed to me by a friend:
"Why doesn't everyone just code in assembly language?"
I taught parallel processing (as a TA) at UC San Diego, and worked at the supercomputer center for years doing parallel code.
No, parallel processing is not "too hard". Phah.
It's simply hard for people that are being forced to write parallel code without taking a class in the subject, or really understanding what they're doing. Parallel code is as mind bending as recursive code the first time you saw it, only more so. It takes a lot of work to wrap your mind around the weirdness of having the same code being executed in different places with different data, making sure you pass the data back and forth correctly, synchronize correctly, and doing it all without making mistakes that destroy your performance.
To anyone interested in the field, sit down, learn MPI, run through some tutorials, and get to the point where you can run something like a radial blur on an array in parallel without breaking a sweat, and you know you're mentally ready to write parallel code. In class, getting to this state usually takes 2-3 weeks, but if you're working on your own, you should be able to do it faster.
Math Barbie says "Parallel Programming is hard!"
Don't piss off The Angry Economist
Yes, you too can _learn_ how to program in parallel. You're right, in that we fundamentally do things in a linear fashion. But, we can learn to think differently, and learn different methods that will work in a parallel way.
Taking your mathematical example, here's a parallel example of a change in thinking. At one stage, the concept of 'zero' was unfathomable. As little as over 2000 years ago, there was no zero. You could say that their brains weren't wired to think in zeroes. That's certainly changed these days. Once the concept was discovered, and taught, it eventually became instinctive for everyone.
We obviously haven't reached that stage with parallel programming, but all it takes is familiarity with the concepts and methods, and a bit of practice. I rate it up there with object oriented programming. In fact, they're both very suited for each other.
Recently my University bought a supercomputer listed between the top500 computer systems in the world. During a class of Computational Physics, my Professor was commenting this issue and noted that the system was intended to serve about 80 research groups with tasks that demand parallel processing. The reality were much more modest though: only two groups were actually using the system (one of them was my Professor's research group), and of this two groups, only his group were taking advantage of parallel programming techniques to use the system the way it should be used.
He said that a new policy would be implanted, this was about 3 or 4 months ago, so I don't know how it is right now. But this reflects the lack of preparation to use such a system, they've wasted tons of cash with something they don't really know how to use properly.
My brother just recently started doing IT stuff for the research psych department at a respected university I won't name. They do a lot of mathematical simulations with Matlab and in order to speed these up they decided to buy several Mac Pro towers with dual quad core Xeons at $15,000*. The problem is, their simulations aren't multithreaded (I don't know if this is a limitation of Matlab or of their programming abilities -- maybe both) so while one core is cranking on their simulations to the max, the other 7 are sitting there idle! So while a big part of this ridiculous situation is just uninformed people not understanding their computing needs, it also shows that there are plenty of programmers stuck playing catch-up since computers with multiple cores (i.e. Core 2 Duo, Athlon X2, etc.) have made their way onto the desktops of normal users.
I think this is a temporary situation though, and something that has happened before, there have been many cases where new powerful hardware seeps into the mainstream before programmers are prepared to use it.
*I know what you're thinking: "How the hell do you spend $15,000 on a Mac?". I wouldn't have thought it was possible either, but basically all you have to do is buy a Mac with every single option that no sane person would buy: max out the overpriced RAM, buy the four 750GB hard drives at 100% markup, throw in a $1700 Nvidia Quadro FX 4500, get the $1K quad fibre channel card, etc.
I would argue that most user tasks cannot be fundamentally parallel. Quite simply, if a user highlights some text, and hits the bold button, there really isn't anything to split across multiple cores. No matter how much processing is necessary to make the text bold (or to build the table, or to check the spelling of a word, or to format a report, or to calculate a number, or to make a decision -- as in A.I.) it's a serial concept, and a serial algorithm, and cannot be anything more. Serial is cool too remember.
So we're looking at multiple tasks. Obvious gaming and operating systems get to split by engine (graphics, network, interface, each type of sound, A.I., et cetera). I'd guess that there is a limit of about 25 such engines that anyone can dream up. Obviously raytracing gets to have something like 20 cores per pixel, which is really really cool. But that's clearly the exception.
So really, in my semi-expert and wholy professional opinion, I think priming is the way to go. That is, it takes the user up to a second to click a mouse button. So if the mouse is over a word, start guessing. Get ready to bold it, underline it, turn it into a table, look up related information, pronounce it, stretch it, whatever.
Think of it as: "we have up to a full second to do something. We don't know what it's going to be, but we have all of these cores just sitting here. So we'll just start doing stuff." It's the off-screen buffering of the user task world.
Which results in just about any task, no matter how complicated, can be instantly presented -- having already been calculated.
Doesn't exactly save power, but hey, the whole point is to utilize power. And power requires power. There must be someone's law -- the conservation of power, or power conversion, or something that discusses converting power between various abstract or unrelated forms -- muscle to crank to generator to battery to floatig point to computing to management to food et cetera; whatever.
Right, so priming / off-screen buffering / preloading. Might as well parse every document in the directory when you open the first one. Might as well load every application on start-up. Can't have idle cores lying around just for fun.
Has anyone ever thought that maybe we won't run out of copper after all? I'd bet that at some point in the next twenty years, we go back to clock speed improvement. I'd guess that it's when core busing becomes rediculous.
I still don't understand how we went from shared RAM as the greatest thing in the world to GPU's with on-board RAM, to CPU's with three levels of on-chip RAM, to cores each with their own on-core RAM.
Hail to the bus driver; bus driver man.
After years of driving the programming profession to its least common denominator, and eliminating anything that was considered non-essential, somebody is surprised that current professionals are not elastic enough to quickly adapt to a changing environment in hardware. Whoda thunk it? The ones, you may have left, with some skills are nearing retirement.
"To those who are overly cautious, everything is impossible. "
As for computer scientists... well... I really don't know that there is much going on of any value outside of Kolmogorov Complexity. Maybe something will come out of that that will resolve the issue while the physicists are trepanning themselves into theoretic epilepsy.
Seastead this.
I've often wondered if parallel programming would be easier if it were done in Chinese characters instead of English/European alphabetical characters.
Perhaps having a Chinese character represent a simple block of pre-compiled code that does one simple thing. Then the characters could be placed in two-dimensional order to form parallel threads. This would require a completely different approach to compiler development. But that would be OK because compilers are stuck in the 1970s anyway.
Just getting away from the idea of having code based on a very limited set of alphanumeric characters strung together like beads on a string might help unlock a whole new era of innovative approaches to parallel program development strategies.
It does go without saying that Chinese programmers would have an incredible advantage in any new system of programming that is based on Chinese characters. It may be possible that this subject is already a project under development in China. I'm a bit thick, so whenever I think of something independently that seems to be a completely new idea, inevitably someone smarter than me is already developing it.
A lot of efforts are being done to simplify the parallel programming, both on micro scale and component scale. Just look through lambda-the-ultimate archives. Micro scale is mostly invisible to application programmers, and it is mostly done by compilers when they use SSE and friends, one of notable efforts of making it more explicit is OpenMP. On component scale, most of them are based on the message passing concurrency model (after you grok it, it is really really simple). The best effort that I have seen from point of view of usability is E programming language. I tried to clone its core ideas in my pet project AsyncObjects Framework, but usability is less than E's one because of the framework clutter.
OpenMP provide almost transparent implementation of automatic threads parallelization in many practical cases. It easy to learn and use. It work perfectly for embarrassingly parallel task, especially loop parallelization. The rule of the thumb for multithreading - if task couldn't be paralellized with OpenMP it's often is not worth the effort to paralellize at all.
Well as stated before, you can't just parallelize any task, if you could then compilers would have done it for the programmers long ago.
It's also pointless to parallelize any task depending directly on a limited shared resource, for example copying files from A to B isn't going to get any faster when you copy 3 files at once, it will in fact be slower (since the hard drive has to jump back in forth, NCQ might help, but it's still less then 100%).
What i hear people bitch most about are Games. Games are not as easy to parallelize as some might think because some of the 'tasks' involved cannot be parallelized and those that can are usually tied by a together by a shared resources (i.e. graphics card, geometry, textures). For example, you can try to do do the geometry transformation parallelized and separate from a rendering thread, but when writing the transformed geometry back to, for example, the vertex buffer you have to lock that buffer, meaning any thread trying to read/write from it has to wait for the operation to complete. If each thread tried just tried to access the shared resource whenever they have finished, they may end up waiting a long time for another threads (such as the the rendering thread for example), or block other threads trying to access the same resource. If you have threads waiting on other threads most of the time then you already have removed most of the the parallel execution benefits, in fact, the threading overhead may be worse then benefit gained from parallel execution in the first place.
Another thing is that even if most of the 'tasks' in a game have been parallelized, you still only see the 'weakest' link. For example, if you have a Geforce 2 GTS running the latest Unreal 3 Engine (if that where possible) then it won't matter that you have a 16 Core CPU and the game is parallelized every possible task. You still get crappy FPS. An usually the weakest link (task) is one that cannot be parallelized.
Taking a look at the credits list for one of the major MMOs I was involved with, there were a ten programmers, two additional programming credits, two lead engineers and a tech director listed. Call that 15-20 people, a few more if you add in the DB teams etc.
Compare that to over 30 designers, design assistants and additional game design credits, 4 writers, 7 world/architecture personel, 9 characters/creatures folks, 5 animators, 19 motion capture personel (before the three actors) and one George Lucas.
Companies go bust over late games. Sigil's gone in large part because they kept overrunning Microsoft's willingness to add another six months and another million or two in salaries, leading to Microsoft pulling out (and a lot more politics beyond). If the real cause of games running late was coding, an area that takes maybe 10% of the production budget, don't you think they'd throw double, even triple the resources at it (even if that means only a few more guys but hugely paid parallel coding geniuses) to avoid the massive costs associated with a delay?
Think about it: A typical MMO lifecycle is around five years with a geared up team for maybe the last three. You run late by a year, you're paying 20-30% extra in salaries. If it was really a coding issue, wouldn't you triple the resources and pay less overall?
The biggest challenge for most games is getting the game balance tweaked so it's atually fun, scrapping all the ideas you ran with for a year or two that seemed great but didn't work once lots of people were playing together, and populating enough content to stand up against games that have been out for half a decade and ship with ten expansion packs in one box set for less than the cost of your new release.
Getting code to run in parallel isn't to be diminished... But the scale of where resources are put shows far more is invested in to art, design and content than ever goes near programming. If programming were really a bottleneck, it's an easy one to fix (even accepting that double the money doesn't equal half the time due to scaling issues).
Windows Vista
Me failed English...
FreeBSD over Linux. If my comments seem odd, this may explain...
Most programming languages today are text-based and follow the flow of instructions. Parallel programming is inherently hard and unnatural on these (basically all) languages. Programmer needs to think of atomicity, mutexes, semaphores, deadlocks, race conditions... Add to that the fact that compilers may reorder some instructions to optimize for this or that CPU and things become trickier. There is a language called LabVIEW by National Instruments. It's a Turing-complete data flow languange (not control-flow) and it happens to be completely graphical (sometimes it is refered to as G) - no typing required, just drawing wires that represent data to blocks that represent operations or functions. In this language, parallel programming is very, very natural. You just write your wires in parallel, they merge sometimes, they don't, etc. The scheduler in the runtime takes care of spawning threads and using available CPUs. While powerful, it is unfortunately very tailored for the test and measurement markets and there is no free version. If you are interested to try this completely different programming paradigm, however, I recommend you download a trial version and play with it. It's available for Windows, Mac, and Linux.
K here is my two cents.
Computers have been single processor systems for a long time. We have developed around that concept for a long time. A normal binary tree, is not any faster on a multi processor system than a single.
I see 2 problems.
1. Most CS students are not taught to utilize multi-threading(parallel) efficiently(when it is useful, and when it is not), or even worse they just get a crash course in fork.
2. There are not many operations or algorithms as of yet that map easily to a parallel enviornment.
Parallel programming is incredibly hard to do right, only novice programmers in the field will claim otherwise. Consider race/starve conditions, the fact that parallel programs are very hard to debug, hard to test. On top of that the can of worm called thread/process synchronization, which in modern architectures with processor caches etc. have not become any easier. You cannot change the property of an object and rely on another thread picking it up, even after a protected section. The other cpu may just have that propertys memory content in its own cache. Declaring every property of our program volatile is clearly not the way, as it would defeit the benefits of running in parallel in the first case.
On the other hand many of us are already programming parallel threads/processes when we program for an application server. The typical web application server will handle a number of concurrent requests, but will serialize access to the relevant common object, namely the session abstraction.
High end database systems have been doing parallel execution of queries for years. Clearly this is a very good abstraction as we don't even have to know that its taking place.
A very common pattern seen in programs is looping through an array or list to do something, like e.g. a transform or performing specific actions on select items. Most of the time we just want the work done and don't really care whether it is done in parallel or serialized, only our programming language forces us to do the latter.
Closures is a (small) step on the road to taking advantage of multiple CPUs and -cores. Microsoft LINQ (to objects) actually brings the parallel traversal of in-memory collections within reach for common developers.
We need to know the theory behind parallel programming to understand concepts such as "volatile" variables etc, but we shouldn't need to be concerned with the complexities in every single line of code.
I have a feeling that the statically typed high level languages will have an advantage here. Consider Java/.NET which uses a JIT compiler on the target architecture. Doing static analysis on a program with respect to the target architecture will be easier if the optimizer/compiler actually knows the types during JITing.
Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
Many of the responses here, frankly, demonstrate how poorly people understand both the problem and the potential solutions.
/.) that for parallel/distributed processing, Language Does Matter. (And UML is clearly part of the problem...)
Are we talking about situations that lend themselves to data flow techniques? Or Single Instruction/Multiple Data (e.g. vector) techniques? Or problems that can be broken down and distributed, requiring explicit synchronization or avoidance?
I agree with other posters (and I've said it elsewhere on
Then there's the related issue of long-lived computation. Many of the interesting problems in the real-world take more than a couple of seconds to run, even on today's machines. So you have to worry about faults, failures, timeouts, etc.
One place to start for distributed processing kinds of problems, including fault tolerance, is the work of Nancy Lynch of MIT. She has 2 books out (unfortunately not cheap), "Distributed Algorithms" and "Atomic Transactions". (No personal/financial connection, but I've read & used results from her papers for many years...)
I get the sense that parallelism/distributed computation is not taught at the undergrad level (it's been a long time since I was an undergrad, and mine is not a computer science/software engineering/computer engineering degree.) And that's a big part of the problem...
dave
There are three basic "problems" and to some extent I think they are unsolvable.
First is debugging tools. People are used to having a nice neat step through debugger - press some key to execute the next statement. Because a large part of parallel processing does not conform to that process you have to add in your own debug code (even if using one of the parallel debuggers out there). I see no way around this one - you can not have a deterministic debugger on a non-deterministic system and I see no way for parallel systems to generally be deterministic in the sense of "press F8 to execute next line". This increases cost as you can not simply hire a cheap "code monkey" to do the job.
Second is just general complexity, things like file locks, critical sections, race conditions, etc. Even if you deal with them every single day and have been doing it for the last 20 years these are still complex issue when something fails. I also see no way around this one, though there have been some really nice advances in recent times and there is still a long way to go before I think we have mostly maxed out the tools. Somebody has to deal with this and a good deal will eventually be shoved off to the tools, but when that happens figuring out where the error is will be a real bitch (ask anyone who eventually found an esoteric bug in a standard library how much work *that* took). While quite solvable, this also increases cost.
Lastly you have Amdahl's law - basically you can not parallelize everything and performance degrades quickly the more serialization you have. I do not think that the real performance of SMP's or multi-core systems for the average user is going to come in single application performance - you just can not parallelize the typical applications that much. Where the performance comes in is allowing several things to go on at a time, basically that really large system tray many people run will be going across multiple processors or cores and not degrade performance that much. Though this really helps - I've ran two processor systems on my home systems for years and it is interesting how little processor power I need for games compared to the minimum specs even when I do not bother with paying attention to what is currently running.
So, in short, not really too hard, it's a level of complexity that many have done well with in the past and will continue to do so, but that the problem space doesn't map well to parallelization. There is some level of "too hard" when you are asking if VB only people who think that HTML should be listed as a programming language and can not survive without Microsoft's wizard (note that using VB isn't the bad part - lots and lots of really good programmers use it at work because that is what will sale), but competent programmers/software engineers are quite capable of using it. I do not think that the benefits within most single applications are worth the cost, though as an overall system it will help quite a bit.
------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
Most programmers have difficulty thinking about recursive processes as well, but there are still some who don't and we still have use for them. I should say "us", as I make many other programmers batty by using recursion frequently. Programmers tell me all the time that they find recursion difficult - difficult to write, difficult to trace, difficult to understand, difficult to debug. Conversely, I find it easier - all I have to do is reduce the problem to its simplest form and determine the end case, and a tiny snip of code will do where a huge mess of iterative code would otherwise have been required. So, I don't understand why anyone would want to write iterative code when recursion can solve the problem.
I suspect that parallel programming may be similar - some programmers will "get it", others won't. Those who "get it" will find it fun and easy and be unable to understand why everyone else finds it hard.
Also, most developement tools were created with a single processor system in mind: IDEs for parallel programming are a new-ish concept and there are few. As more are developed we'll learn about how the computer can best help the programmer to create code for a parallel system, and the whole process can become more efficient. Or maybe automated entirely; at least in some cases, if the code can be effectively profiled the computer may be able to determine how to parallelize it and the programmer may not have to worry about it. So, I think it's premature to argue about whether parallel programming is hard or not - it's different, but until we have taken the time to further develop the relevant tools, we won't know if it's really hard or not.
And of course, for a lot of tasks it simply won't *matter* - anything with a live user sitting there, for example, only has to be fast enough that the person perceives it as being instantaneous. Any faster than that is essentially useless. So, for anything that has states requiring user input, there is a "fast enough" beyond which we need not bother optimizing unless we're just trying to speed up the system as a whole, and that sort of optimization is usually done at the compiler level. It is only for software requiring unusually large amounts of computation or for systems which have been abstracted to the point of being massively inefficient beneath the surface that the fastest possible computing speed is really required, and those are the sorts of systems to which specialist programmers could be applied.
Parallel programming is NOT too hard. Yes its harder than a single-threaded approach sometimes, but in my experience usually the problem maps more naturally into a multi-threaded paradigm.
The real probelm is this: I've seen time and again that the real problem is that most companies do not require, recognise, or reward high technical skill, experience and ability instead they favour minimal cost and speed of product delivery times over quality.
Also it seems most Human Resources staff/agents don't have the necessary skills to actually identify skilled Software Developers compared to useless Software Developers that have a few matching buzzwords on their resume, because they themselves don't understand enough to ask the right questions so resort to resume-keyword matching.
The consequence is that the whole notion of Software Development being a skilled profession is being undermined and devalued. This is allowing a vast amount of people to be employed as Software Developers that don't have the natural ability and/or proper training to do the job.
To those people, parallel programming IS hard. To anyone with some natural ability, the proper understanding of the issues (you get from say a BS Computer Science degree) and a naturally rigorus approach, no it really isn't.
This is OK as a way of defining semantics, but not for execution in a truly parallel environment. In essence, the semantic model says that running computation A in parallel with computation B is the same as executing "A" then "B" or "B" then "A". Notice that the parallelism has been removed to make the semantics easier to reason about.
The thing to consider also is that the number of permutations is N! where N is the number of tasks you want to run in parallel. Thus even as few as 8 tasks means you have more than 40 thousand cases you need to be sure are actually independent.
To a certain extent, that's a matter of interpretation and/or abstraction. Simultaneity of events is a tricky concept: it's always possible that if you look closely enough, you'll find that two things which appear "simultaneous" really aren't, but are actually interleaved. That's effectively the stance that calculi like CSP and the pi-calculus take: as you rightly point out, they make a fundamental assumption that events are "instantaneous" and indivisible, and simply cannot occur "simultaneously". However, all is not lost, since if you need to represent something that has any real duration, you can model it as a pair of events ("start" and "finish", or something similar). As for whether the interleaving model makes an acceptable basis for execution in a "truly" parallel environment, I guess it depends on what you mean by "truly" parallel. The success of the occam programming language running on networks of transputers would seem to provide a proof-by-example that the interleaving model can work in practice. Erlang provides a similar, more modern proof-by-example. But I suppose those may not count as "truly" parallel, in that they do enforce some serialization of events at the interfaces of individual processes, even if most of the system is operating "in parallel".
Having said all of that, there's been some work done on providing a "true concurrency" semantics for CSP (not sure about the pi-calculus). See, for example, recent work in that area by Adrian Lawrence or Marc Smith. There has also been a bunch of work on other "true concurrency" process algebras. So to claim that "They all use an interleaved concurrency model..." is incorrect. rather, the interleaved concurrency model has so far proved adequate for the design problems to which the calculi in question have been applied (see my comments above about occam and the transputer).
I can only really see that being an issue if A and B can both affect the same piece of state simultaneously (willing to be corrected on that count though - I just can't envision any other situation where it'd be an issue). Languages built on models like CSP enforce the interleaving of A and B if they are affecting the same state, as a byproduct of the share-nothing approach to state encapsulation. Note that that doesn't prevent A and B from occurring at the "same time" if they're not affecting the same piece of state.Both Petri nets and Winskel's event structures certainly have much to recommend them. However, in both cases, my impression is that they are better suited to use as tools for providing a semantics (much as traces provide a semantics to CSP, or labelled transition systems to the pi-calculus), rather than as something that one would actually want to use as the basis for developing a design or a programming language. Perhaps an event-structure-based semantics of something like CSP would be useful (I gather Winskel did some work on that kind of thing back in the early 80's, but I'm not sure if he's done much with it since - my quick skim of his work on HOPLA wasn't enough to determine if it provides "true" concurrency or not).I've been programming using parallel techniques for more than 25 years using APL.
I was so imnpressed with APL that I implemented an APL like derivative language
called Simmunity which is a hybrid compiler for an APL like syntax. Simmunity
can be targeted at a parallel processor implemented in FPGA's that provides
multiple simultaneously executing vector/matrix operations in parallel...
(stay tuned for more info)
The quintessential parallel database programming language available is clearly
KxSystems Q and KDB+
See the faq at http://kx.com/faq/ and tutorial at http://kx.com/q/d/primer.htm
Ken Iverson invented APL and then the J language which anyone interested in
really discussing parallel programming should look at closely. The Connection
Machine LISP had many of the APL operators/functions and certainly Map/Reduce
that it added to provide a convienent parallel expression langauge
Most programmers think in terms of sequential code execution with threads and
processes. APL incourages the programmer to conceive of programming using
vector and matrix operations to process strings and numbers and manipulate
data like a spreadsheet. An application that people might be familiar with
is Lotus Improv which provided naturally parallel expressions based on a subset
of APL operations.
Cheers, Simbuddha
Multiple smallish tasks or events that communicate mostly via a RDBMS is a common technique (AKA "client/server"). Transactions and A.C.I.D. take care of a lot of the nitty gritty of multiple communicating processes. Even different prog languages. Other domains can learn from this.
Table-ized A.I.
And yet have no concept of what an address bus is (or any other fundamental uP concept)?? Honestly, how do you expect to wrap your head around Parallel Programming if you don't have even the slightest idea of what's going on at the hardware level? Sorry for the troll tone, but the illusions need to be shattered.
It needs project management skill rather than programming skill for parallel programming!
But after all, who need parallel programming? Why we need to take the trouble adding non-productive communication complexity? Unless for number crunching or something like that, what we need are multi-threading though it is not all programmers can handle well yet.
I've been doing it since the early OS/2 days. The victory of Windows over OS/2 cemented a lot of bad practices firmly into place IMO. New "technology" was whatever the MS buzzword of the day was. None of it ever really examined the benefits of parallel programming and multi-threading. Another reason it's not particularly widespread is because it doesn't lend itself well to seat-of-the-pants on-the-fly coding. The old adage that "the documentation is the code" doesn't cut it when you absolutely have to buckle down and do some design work at the outset to partition the problem and decide how the different threads and/or processes are going to interact with each other. Things like avoiding deadlocks through synchronization mechanisms and such simply have to be documented, and nobody likes to write documentation.
2. Parallel code is a monster to write. I'm not talking simple scatter-gather data spreaders. Imaging Adobe photoshop running across a 400 machine cluster, handling hundreds of users at a time. The data concurency issues, data residence, locking, message handling, message reordering Total bloody nightmare...., If youve parallelized a markov model it doesn't really compare.
3. The tools arent adequate. tracing a data race, or deadlock in a cluster is a beast. MPI and PVM are nice but are really narrow in the scope that they handle problems.
4. It isnt just non-programmers.. Parallel is a whole different scale of complexity... Almost everything I see is a "parallelize the brute zones of a specialty engine once it works in serial"..... Its an important baby step and it really blows non-programmers for a loop. But we are a far sight from having an implicitly parallel version of MS-word.
5. Parallel isnt new, dual cpu boxes have been in userspace since the late 90's, it has been mostly ignored by applications. The use of network resources is horribly behind the times. The ability to aggregate resources on the fly is a total joke compared to where it should be.
Storm
Look writing multi-threaded/multi-processor applications isn't hard to do.
The tools to test and debug them are easy and available in any version of My Eclipse.
However functionally speaking parallel programming isn't completely unnecessary. The thing is its pointless to write a multi-threaded application if the bottleneck is IO, or memory. This is because multi threaded applications take the load off the cpu nothing else. And unless your doing some pretty heavy math the cpu is very rarely the bottle neck now days.
If you are doing cpu intensive functions then that code needs to be divided into independent chunks which can run separately without blocking (because if they block waiting for each other you might as well just write them in a single thread). This is less likely than you'd think because using this code is for a process, that is something will need to use this.
Unless your talking about games or server side applications this is very rarely needed.
More importantly software developers should only ever use hardware like this if it has a benefit, not so intel can justify their latest wow factor toy.
This seems a case of the hardware providing things that are not needed and then complaining that they are not used more than anything else.
Multi-threading has an overhead cost to it. If you use it too much the application can actually run slower than if every thing is in just one thread. Therefore you need a good reason to use multi-threaded code, like performance benefits for example.
Believe me if and when it solves a real performance bottleneck or problem there will be no problem getting developers to write code that fully uses multi-core CPUs until that point they wont. Developers love the latest flashy toy as much as any other geek, but they like stable, secure reliable and efficient applications more.
I know many physicists, not professional programmers, who write code (using MPI) that scales well to dozens -- sometimes hundreds -- of processors. They are intelligent people, but programming is not their main job or what they were trained in. Also, their code is not user-friendly, but that's a cosmetic matter that can easily be fixed if needed. If they can do it, why can't professionals?
Tom Leonard, a programmer from Valve, gave a fascinating talk about this at GDC this year, about retrofitting multicore support into Half-Life 2 (specifically, into the Source Engine, which powers Half-Life 2). Not surprisingly, this talk was named "Dragged Kicking and Screaming" ...
There was a lot of really good wisdom in there, whether you are writing a game or something else that needs to get every possible performance boost.
I'm sure they probably drew from 20+ years worth of whitepapers (and some newer ones about "lock-free" mutexes, see chapter 1.1 of "Game Programming Gems 6"), but what I walked away from the talk with was the question: "why the hell didn't _i_ think of that?"
There were several techniques they used that, once you built a framework to support it, made parallelizing tasks dirt simple. A lot of it involves putting specific jobs onto queues and letting worker threads pick them up when they are idle, and being able to assign specific jobs to specific cores to protect your investment in CPU cache.
Most of the rest of the work is building things that don't need a result immediately, and trying to build things that can be processed without having to compete for various pieces of state...sometimes easier said than done, sure. But after hearing his talk, I was of the opinion that while parallelism is always more complex than single-threaded code, doing this well is something most developers aren't even _thinking_ about yet. In most cases, we're not even at the point where we can talk about _languages_ and _tools_, since we aren't even using the ones we have well.
--ryan.
Don't say, "don't quote me," because if no one quotes you, you probably haven't said a thing worth saying.
We've had a steady stream of multi-core guests here at Stanford lecturing on the horror and panic in the industry about multi-core, and how the world is ending. I've seen Intel guys practically shit themselves on stage, they are so terrified we can't find them a new killer multi-core app in time. That's BS. OK so it's a hour lecture, maybe two if you're not that fast. Parallel/SMP is not that hard, you just have to be careful and follow one rule, and it's not even a hard rule. There are even software tools to check you followed the rule!
But that's not the problem...
The problem is, a multi-year old desktop PC is still doing IM, email, web surfing, Excel, facebook/myspace, and even video playing fast enough a new one won't "feel" any faster once you load up all the software, not one bit. For everyone but the hardcore momma's basement dwelling gamers, the PC on your home/work desk is already fast enough. All the killer apps are now considered low-CPU, bandwidth is the problem.
Now sure, I use 8-core systems at the lab, and sit on top of the 250k-node Folding@home so it sounds like I've lost my mind, but you know what, us super/distributed computing science geeks are still 0% of the computer market if you round. (and we'll always need every last TFLOP of computation on earth, and still need more)
That's it. Simple. Sweet. And a REAL crisis - but only to the bottom line. The market has nowhere to go but lower wattage lower cost systems which means lower profits. Ouch.
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
It's not that hard to program shared-memory multiprocessors. Most of the problems we have doing that stem from the fact that C and C++ provide essentially no support for doing so; they shove the problem off to the operating system. The fundamental problem there is that the language has no idea which locks lock what. Compare Ada, Java, etc.
Non-shared memory machines, though, are tough. If each CPU has plenty of memory, as in a (I hate to say it) Beowulf cluster, it's not too bad. But collections of small memory CPUs are a pain to program. Until recently, this was a headache only for people trying to use one of the weirder supercomputers like an NCube or a Connection Machine. Then came the Cell, and the PS3. With 256K per processor, you have to organize your program like an assembly line, where work units are pumped through the processor in sequence. For some applications this fits. For some it doesn't. And for some it adds a year to development time as the application logic is redesigned to fit this painful hardware. (Meanwhile, the XBox 360 programmers, who have a conventional 3-CPU shared memory multiprocessor, ship first.)
Actually, the real progress in massive parallelism is in graphic cards. The operations needed are well understood, so the hardware can be matched to the job.
he'll use his magnet to escape
It's hard because very few things we do are inherently parallelizable.
Yes, you can use functional programming and friends, and that often gives you a solution which is inherently 10 times slower than a straightforward algorithmic solution which you can then distribute across several processors.
In a few cases it's a big gain, but in most it's simply wasting resources trying to come up with parallel solutions to non-parallel problems.
It's often hard because it's stupid, not because it's actually difficult to do.
Peter
then the application(s).
As I ejected from my cube at Intel R+D in '04 there was already early talk about 4 cores - 8 cores - and something like 16 - 32 - or 64 mini-cores by '08 - '09.
And really - do we need to have MS Office or Word parallel-threaded - or email ? Certainly media-streams should be threaded - games - communication / network streams. But I think a big part of this is the OS facilitating both threaded and non-threaded apps INTELLIGENTLY and EFFICIENTLY into those cores. And from what I took out of discussions - some of those mini-cores will be locked down either running or doing security checks etc. - others performing other low bandwidth or asynchronous tasks.
And not to forget that 2 or more simultaneous OS sessions may be running on those cores --- the classic situation being a locked down OS running the in house IT package of productivity tools so users can't hose up the installed image or let loose a threat ---- the second OS runs the users personal space which they can corrupt and infect at will without affecting the network.
Its not the years, its the mileage
could be to add a especifically parallel iterator keyword to programming languages, ie:
for-every a in list; do something(a); done;
The compiler then assumes that something(a) can be safely executed in parallel for every element in the list.Is not rocket science, it could lead to parallel spagheti, but is a straightforward method to promote parallel programming.
What's in a sig?
Martin
Most tasks, other than system-wide tasks, are inherently serial in nature. It's the way we think. How many parallel flow charts have you ever seen? Probably very few, if any. Think of all the major computer applications. How many of them deal with anything in other than a serial manner? Probably very few, if any. Distributed tasks are generally the same algorithm being run on multiple processing units, each being basically a serial process. Add to this the fact that nearly all of us have been programming in serial mode all of our programming lives, then hell yes, parallel programming is just too difficult. It's basically unnatural. We only have one brain.
Heard any good sigs lately?
Using MPI or PVM can be hard, but this is not the case if you use better tools like Star-P. ( www.interactivesupercomputing.com )
The language isn't called LabView, it's called "G". LabView is the whole package (G interpreter/compiler, drivers, libraries, etc).
And, yeah, G is pretty cool. National Instruments offers a slightly dated, but otherwise completely functional version of LabView for free for noncommercial use.
Our cognitive system does many things at the same time, yes. That doesn't answer the question that's being posed here: whether explicit, conscious reasoning about parallel processing is hard for people.
Are you adequate?
I don't understand whats so hard about parallel programming?
.. They _are_ in C, C++ or Java.. Never in Erlang or Lisp.
I cant see that anyone calling themself a real programmer would have any problems with parallel programming.
It's about the same in most languages, you need to have mutex and threads. Not alot more too it.
I have reed here that people bashes all over, saying that programmers are lazy not to learn Erlang
and that every language out there are bad... my question to you: What language would you use, the perfect one or the one that get the job done?
Today theres basicly two languanges that you can use, Java and C derived (C, C++, D.. etc).
Why you might ask yourself? well.. do you know in what languages thirdparty api's comes in?
Learn what a deadlock, mutex and threads are.. thats all you need to know, then pick your language of your choice that gets your job done.
One more tip, dont multithread applications that dont benefit from it.
I've worked on a high performance multiprocessor system.
There are a few principles and tools that make much of it manageable, and then there's some parts that are inherently nearly impossible.
A few principles:
1. Task switches are very very time consuming, at least under preemptive multitasking, and especially when that preemption (as it does in Windows) involves process switches. Since mutexes rely on task switches, they're very expensive, although exact implementation matters too.
2. It's possible to come up with non-blocking algorithms that require no mutexes (though you do need low level access to interlocked instructions, or barriers). Unfortunately that's where it gets nearly impossible because proving that a non-blocking algorithm is correct requires a combinatorial analysis. I could go into the some of the principles of non-blocking algorithms, but they're still very hard even when you know what you're doing. Still it's useful to understand some basic interlocked instructions because people often use mutexes when a simple interlocked instruction would do the same task infinitely quicker.
3. Having a few basic non-blocking queue algorithms (LIFO and FIFO) up your sleeve can suffice to allow rather separate tasks to communicate cheaply. There are even nonblocking queues that allow an unlimited number of processors to compete for each end of the queue simultaneously, and so you can have common banks of work to be balanced between processors, for instance.
4. It's often a win to give very separate tasks to each processor, and have a single process per processor. Put all IO on one processor, for instance and have it communicate with the other processors through non-blocking queues.
5. Even though fair preemptive multitasking is expensive (because of the task switches), in special cases you can use cooperative multitasking (fibers, green threading) and save the time. One advantage to cooperative multitasking is that you can control exactly when a fiber wakes up. In some benchmarks I've seen (producer-consumer), green threading outraces preemptive multitasking by over an order of magnitude.
6. If you can all tie all interrupts to a single processor (which is possible in Windows) and lock specific processes to each processor, then you can (almost) guarantee that certain processors won't interrupted, and thus those processors are the ones that are cheapest to use for any task that the other processors will have to wait for.
7. Aggregate. There are some tasks that can not be accomplished in parallel with the usual work. It's useful to save up a bunch of work like that so that you don't have to shut down the other tasks too often.
8. State machines with atomic arcs. This gets closer to the harder sort of algorithms, but its useful to talk about what's easy. If you need a bunch of processors to cooperate, it's useful to have a state machine description of the global state. If the state can be stored in a single double word, then transitions between states can be accomplished atomically with a CAS instruction (compare and swap), and then it doesn't matter which processor changes the state. They may all try, and one will succeed. If one of the states is too complicated to fit in a double word, then you can either have a spin-lock intermediate state or in some cases, the state double word can include a pointer (but this gets complicated).
9. Cache delays. Multiprocessor systems have to keep the caches coherent. Because of this situations like a CPU reading a value that's dirty in another processor's cache can bring all of the processors to a halt while they transfer data. I'm a few years out of date on what the newest processors do, but it's probably still true, and still true that these pauses are much longer on systems like the Sparc that allow hundreds of processors than it is in x86 machines that are more limited in the number of processors. In any case it's a win to keep data memory use separate between processors when possible.
In the "experimental view" slashdot cuts off the last two entries:
10. Databases (locking, rolling back etc). For those cases where you don't need speed, database technology solves most of the problems of multiple entities keeping the data constant more flexibly and easily than anything other than stopping the world. It could be useful to build some database technology into your product, even if only for inter-process communication.
11. The usual semaphores and mutexes. By the way there's a "many readers but only one writer" mutex pattern that's so useful that I wonder why it isn't more common. In any case you should all know to avoid mutual blocking etc etc...
I'm afraid that I may not be following you here. Just to clarify: are you referring to the fact that CSP includes a nondeterministic choice operator, which occam omits (and further restricts its equivalent of the CSP external choice operator such that it's not possible to ALT across output events)? Or is there some other aspect of the "implicit sychronization" that I'm missing?
Please note that I'm not disagreeing with your point. You are quite correct that languages which implement concurrency models based on process calculi (such as occam or Nomadic Pict) do make judicious choices about which constructs make sense froma practical perspective, and which are better suited to theoretical use. However, your original complaint was about interleaving semantics in general as a basis for execution models, which is a somewhat broader issue than the choice of which constructs within an interleaving model to use.
We are in agreement there. I think that asynchronous messaging is a better default choice for distributed programming. Of course, there are process calculi based on asynchronous messaging...Real engineers designing hardware use languages like verilog /VHDL which allow description of concurrent and independent processes. Programmers can take a lok at "SystemC" which provides the same functionality in C++. Using one of the above mentioned languages takes care of the programming part. what is lacking is a cheap compiler which will translate it to code to run on a multiprocessor system. The existing simulators which are way too costly.
When every of us have 1, 2 or 4 core/CPUs doing all the tasks that need to be done in the computer from the kernel to the desktop environment to the virus scanner to the servers to the BT clients to the video player to the real application, who need parallel programming? Give every of us 1024 core/CPUs, and every CPU-bound program will be parallel. But that's not a day that everybody see as about to come.
Most of the people in this discussion are talking about multithreading programs as a means of parallelization. While multithreading is one way, it's applicability in parallel processing is restricted to SMP (Shared Memory) systems only and thus is not very scalable by problem size. Far more general and powerful is Message Passing (http://en.wikipedia.org/wiki/Message_passing) which can be implemented in distributed grids and is very scalable. Multithreaded programs can be scaled upwards to distributed grids only by hybridizing it with MPI codes.
l'Homme n'est Rien l'Oeuvre Tout: Gustave Flaubert to George Sand
You're thinking of how to speed up programs by using parallel execution units. This is closely related, but not exactly the same thing, as the use of concurrent programming techniques to write better organized programs capable of responding flexibly to events from multiple sources while also performing their own "background" computations. Think of a UI program that must simultaneously react to user interaction events, while also performing some computation that's mostly independent of user input. A word processor that does grammar checking in the background while the user interacts with the UI in other ways is a prime example.
The thing is that with mainstream languages, writing programs like this is probably much harder than it needs to be, for several reasons:
Here's one way of solving this problem: (a) lightweight threading mechanisms for concurrency (in addition to the heavyweight ones for parallelism), (b) no shared mutable memory between threads (any shared memory must be immutable), and (c) simpler thread synchronization primitives that do not rely on shared mutable memory, like communication channels between threads.
Are you adequate?
Actually, it's more like pipelined. The fact that your eyes already moved to the next letter, just says that the old one is still going through the pipeline. Yeah, there'll be some bayesian prediction and pre-fetching involved, but it's nowhere near consciously doing things in parallel.
Try reading two different texts side by side, at the same time, and it won't work that neatly parallel any more.
Heck, there were some recent articles about why most Powerpoint presentations are a disaster: in a nutshell, because your brain isn't that parallel, or doesn't have the bandwidth for it. If you try to read _and_ hear someone saying something (slightly) different at the same time, you just get overloaded and do neither well. The result is those time-wasting meetings where everyone goes fuzzy-brained and forgets everything as soon as the presentation flipped to the next chart.
To get back to the pipeline idea, the brain seems to be quite the pipelined design. Starting from say, the eyes, you just don't have the bandwidth to consciously process the raw stream of pixels. There are several stages of buffering, filtering out the irrelevant bits (e.g., if you focus on the blonde in the car, you won't even notice the pink gorilla jumping up and down in the background), "tokenizing" it, matching and cross-referencing it, etc, and your conscious levels work on the pre-processed executive summary.
We already know, for example, that the shortest term buffer can store about 8 seconds worth of raw data in transit. And that after about 8 seconds it will discard that data, whether it's been used or not. (Try closing your eyes while moving around a room, and for about 8 seconds you're still good. After that, you no longer know where you are and what the room looks like.)
There's a lot of stuff done in parallel at each stage, yes, but the overall process is really just a serial pipeline.
At any rate, yeah, your eyes may already be up to 8 seconds ahead of what your brain currently processes. It doesn't mean you're that much of a lean, mean, parallel-processing machine, it just means that some data is buffered in transit.
Even time-slicing won't really work that well, because of that (potential) latency and the finite buffers. If you want to suddenly focus on another bit of the picture, or switch context to think of something else, you'll basically lose some data in the process. Your pipeline still has the old data in it, and it's going right to the bit bucket. That or both streams get thrashed because there's simply not enough processing power and bandwidth for both to go through the pipeline at the same time.
Again, you only need to look at the fuzzy-brain effect of bad Powerpoint presentations to see just that in practice. Forced to try to process two streams at the same time (speech and text), people just make a hash of both.
A polar bear is a cartesian bear after a coordinate transform.
Just start with that -- a easy-to-use (whatever that means in the parallel programming case) and full-featured debugging tool. I think a lot will follow from that, especially if one can get to the bottom of what 'easy-to-use' and 'full-featured' means in the parallel programming case. And frankly, I believe it's Intel's and Microsoft's responsibility to step up on this.
Crap f**king weak arse damn coders (probably bloody java weenies) need to shut the hell up and remember that SMP systems have been around for OVER FORTY YEARS!! If you have not worked out SMP, sockets, or pointers yet, then either learn or GIVE UP. Go into marketing or something and stop bothering real programmers.
If you think threads are too hard, or you can't debug parallel code, then go to work on the tools and actually help the issue.
damn..
Try replacing $FOO with various words: "parallel," "large-scale," "object-oriented," "cross-platform." No matter what you plug in there, the spiel sounds familiar. Now just take $FOO out altogether:
Yep, we hear that all the time too. These are meaningless words. I won't say they're pointless, since dissatisfaction drives progress, but they don't mean a damn thing at all.
I think that would have been pretty much the case no matter the subject.
"a bunch of bullshit like peter piper parallely picked a pied of peppers"
:-)
so that we can avoid reading random crap that may not interest us
I've seen it over and over in the industry, there is a distinct lack of parallel programming skill and knowledge, and even senior developers with years under their belt struggle with the fundamentals.
Parallel programming does add a layer of complexity and its inherent lack of general solutions does make abstracting its complexity away difficult, but I suspect that the biggest issue is the lack of training of the work force. It isn't something you can pick up easily without a steep learning curve with many hard lessons in it, definitely not something that can be incorporated as a new thing to be learnt on the fly with deadlines looming.
Another aspect is that its fundamental to the design, parallelism can and often will dictate the design and if the software architects are not actively designing for it or are not experienced enough to ensure that it remains a viable future option, future attempts to parallelise can be difficult at best.
Ultimately the key issues are:
* Lack of skill and training in the work force (including the understanding that there are no general solutions)
* Lack of mature development tool to easy the development process (debugging tools especially)
GPLv2: I want my rights, I want my phone call! DRM: What use is a phone call, if you are unable to speak?
Always these dumb headlines with "Is XXX too hard?" where XXX is a moderately difficult, but well-explored topic.
Can't stand it, please stop it!
Looking at the proposals for switching languages, they just shift the problem around (as another poster has already mentioned). The only thing new that might be useful is the idea of "Guards" from Haskell (for why, see below). But you don't need a new programming language to implement that, with a little imagination you can build it into a C++ library.
In my experience, 99% of the problems are due to two problems: overlooked shared resources and deadlocks. An example of the first problem might be when two threads are pushing/popping from the same queue, causing duplicates or skips. An example of the second would be thread A has resource X and is waiting for resource Y, while thread B has resource Y and is waiting for resource X. Guards could possibly make it easier to avoid some forms of deadlock by making it easier to acquire multiple resources simultaneously with less risk of deadlock. I think I have a new library to go code now... ;-)
I, myself am a programmer and I really thing the current tools are subpar.
// "standard" procedure // runs procedure1() in a new thread that is destroyed when over. immediately returns
For example, I have never understood why a - let's say - "parallel" semantic hadn't been introduced.
e.g:
void parallel procedure1() { do somethin; }
void procedure2()
{
procedure1();
do something else...
}
My friend has a paper in PLDI 2007 that is starting to address the issues that have come up in this thread, but for 'backwards' languages like C and Java. :) The abstractions are definitely part of the problem ... reasoning in terms of mutexes and barriers and so on is a big pain. Search for "Optimistic Parallelism Requires Abstractions" ... the idea is to define commutativity conditions on the methods of shared data structures, and to execute code speculatively until the commutativity conditions indicate conflicts, which trigger aborts in the system. Ideally, all a programmer would have to do is identify the blocks of code he wants to execute in parallel (with constructs like for-every) and specify some conditions on all the shared data structures he is using, and the system would take care of everything else.
The problem with programming is that we are forced to use programming languages. Parallel programming is what programming should have been in the first place. We find it difficult because the tools are inherently sequential. Why? Because they are based on languages (usually English) and languages are inherently sequential. Just the fact that most programmers have to learn a computer language based on English (why not Chinese or French?) should be a clue that the linguistic approach is fundamentally flawed. The way a program works has nothing to do with what language you understand.
What is needed is a new software paradigm that uses parallelism to start with. It should be a system based on elementary communicating objects. And the best way to depict parallel objects and signal pathways is to do it graphically. I say that the entrire computer industry have been doing it wrong from the beginning (since Lady Ada Lovelace). It is time to move away from the algorithmic model and adopt a non-algorithmic, synchronous software model. Even our processors should be redesigned for the new model. Only then will find that parallel programming is not onl y easy but the only way to do it.
because it can make you jump to conclusions regarding how to conceptualize computation. You can end up all too easily with objects and interactions between these objects that are hard to parallelize. With Antiobjects we try to move from the computational foreground into the background. This way one can implement, for instance, game AI, by mapping computation onto dozens, hundreds or thousands of CPU/cores with very little overhead.
I am currently working on a game engine for an unreleased MMORPG. The engine is parallel on a massive scale with the ability to use up to an infinite number of threads. Parallel programming is especially effective when dealing with multiple NPCs and their schedules/AI routines, as well as zone management for PCs. The server spawns threads as needed to deal with loads based on number of PCs/Mobs in X area. The client uses threading for graphics (DirectX), Sound, Input, music, and resource management (loading data from disk). It's really hard to further break anything down, however in a single player game things could be broken down much more. Imagine each NPC having his/her own thread to perform advanced AI functions, etc. taking full advantage of the multiple CPUs. Out MMO server does that to a lesser extent (though when dealing with thousands of Mobs, the AI can't be too advanced.) It is not hard to write parallel code. The hard part is finding programmers experienced enough to even know what threading is. You'll usually end up paying 6 figures a year for an experienced C++ dev.
But multithreading already helps... It is really pathetic to launch, say, ffmpeg (video encoding) or oggenc on a modern Linux system (like my Core 2 Duo from yesteryear) and see the system poorly using only one core. This is where poorly written programs show their limits. Last time I checked some individual already had 8-core Macs. A poorly written program will then use 1/8th of the CPU's power. Neat no!?
Now enter the Java world, where all programs are multi-threaded, wether you like it or not. Even the simplest Java programs will have several threads. Like, say, the Garbage Collector thread running in the background. So even the dumbest Java program benefits from multi-threading on multi-core systems. Last time I checked my wonderful IDE (IntelliJ IDEA, arguably the best IDE ever written in any language) was using the two cores perfectly fine. That's one benefit of using a language where multi-threading was not an afterthought.
Then it's not because you're adding threads that suddenly you're using the most effective kind of parallel processing... Java's threads can't match erlang, for example. While Tomcat 6 (written in Java) and Apache give mostly (I wrote "mostly") the same perfs, an http server written in erlang will smoke these web servers... By an order of magnitude (Google is your friend).
I'm writing correctly multi-threaded Java programs since years and, unsurprisingly, the multiple cores of recent CPUs are correctly used when running my programs. And I certainly won't say that it's easy to write correctly multi-threaded programs in Java: it is not.
The problem is that most VB 'coders' don't know what the word semaphore means.
This comes up as a topic several times a year, and the reason people keep asking is because there have been few titles that are multi-threaded being released. It all comes down to the question asked.
So, is it really all that difficult to write a multi-threaded application? It depends on if you are the type of person who just sits down and hacks the code together, or if you design first, and then code. In order to write a multi-threaded application, you need to decide at the start to make the program multi-threaded. Can you split the program up into pieces that can run independently, while still communicating between the pieces?
Since this is a design decision that needs to be made VERY early in development, it seems that most of the big development houses out there just can't do it. Too many programs take too much code from previous applications, and that makes it overly difficult to fit into a multi-threaded design. These companies just don't seem to be willing to write anything from scratch at this point. They use engines from previous titles, or that they purchase that were never multi-threaded, and so, you don't see much in the way of change.
Look at Electronic Arts, they have sports games they release year after year that really don't change, EVER. They add new data, but the game fundamentally doesn't get updated from year to year. The fact that they can get away with releasing the same thing year after year and make tons of money has made them lazy, and they just don't seem to be willing to devote serious effort into new code.
It takes a developer around 5 years to develop a game fully from scratch if they need to come up with the engine themselves. That involves engine development, storyline(for a game that has a storyline), voice acting, etc. So, if a title takes 3 years, then the engine is probably taken from another project, and is probably single-threaded.
Now, I have my hopes that Dragon Age(Bioware has it under development right now) will be multi-threaded by design, and it will be very well designed with multi-threading in mind. Being able to take advantage of more than 4 or more cores would be a good thing.
Between 1987 and 1990 I led a team in Digital which built a global currency management system running across 40 machines worldwide acting as a single system, and all operating in parallel, without stopping for synchronisation. We built a global 'synch wave' which allowed the business to be 'closed' without interruption. The near infinitely scaleable techniques we developed were unfortunately never picked up on and exploited, however, it was not particualarly difficult, once we had built our underlying (I only care about my neighbouring process) architecture rather than the central control processor architecture. As for the 'Moores Law for Programming' unfortunately there is not the industry incentive to seek such humungous gains, as the revenues of the business software businesses are predicated upon man hours, effectively 'cost plus' economic models. all the best Jerry
Imagine taking your CPU and breaking it into sub-units and then having the user reconnect them with other hardware. Sounds like a programming nightmare. That is exactly the state of parallel computing with clusters and with multi-core (Intel, AMD, and IBM, Sun multi-core are different enough to make a single programming approach "hard")
The problem "depends" on what you are trying to do. Parallel rendering -- simple. Proteins in water - hard (but much of this work has been done fortunately).
Parallel programming is faced with the problem that there is no guarantee that a parallel program will be BOTH portable and efficient for a given architecture. (largely due to what I mention above)
Over my 15+ years of working with parallel programming I have had some ideas. I think the expectation levels need to be lowered i.e. 3 weeks with a functional language equals 2.7 speedup, 2 months with threads or OpenMP or MPI equals 3.5 speedup on 4 cores. Of course the functional language is not quite ready yet!
HPC for Primates. Read Cluster Monkey
Many of the major issues concerning parallel programming can be reduced in complexity and impact by using the correct programming methodologies to map out which tasks can be 'parallelized' (hate that word!) and what dependencies they do or do not reply upon, but the best way forward is to encourage a mind-set change in programming by leaving behind the rigid adherence to a synchronous processing platform that derives all timing from a single clocking source and to embrace an asychronous working environment. Only then can the constraints that come with thinking about linear time dependencies be elminated. Asynchronous biological/optical processors, anyone!
AT&ROFLMAO
No. They are some of the brightest students of my country. Moreover, assignments used to be very very tough.
Spam: Any activity on internet to gain popularity without paying to advertising companies like Google.
The problem is that our existing parallel computational models and languages are derived from the sequential ones. This imposes extra complexity in the expression of parallel algorithms. e.g. Procedure, a very fundamental concept in the sequential world, turns out to be hazardous in the parallel world because it multiplexes code. When code is multiplexed, you have all the locking problems and the headache that comes with it. Using a pattern matching mechanism in place of procedural abstraction for expression of parallel algorithm has moderate success (e.g. see Tuple Space) in the past. However, once it is being used with a sequential language, the procedural barrier kicks in and multiplexing returns.
In summary, when we start to think parallel algorithms are scaled up version of the sequential ones, we already build a lot of artificial barriers in our way to think about parallel algorithms. It is probably not right neither, to think sequential algorithms are special cases of the parallel ones. The reason is that the overall complexity involved in sequential algorithms are much less than the complexity involved in the parallel ones (sequential algorithms have less things to get right than the parallel ones) and there is little reason to tweak the current sequential models at this point in time.
My modest proposal to the problem of parallel algorithms is to begin in the middle: initially we simply treat the two separate beasts (parallel and sequential) as individuals and find a minimal interface that would handling the complexity in both worlds reasonably well. Then expanding on this interface to cover as many ground in both direction simultaneously. Repeat and rinse until we find a good enough solution that could be implemented and optimized on existing hardware architecture (you want performance from parallel algorithms that compute correctly, right?).
My hypothesis is that we may never find a good enough interface that covers all of parallel and sequential worlds. However, the proposed process would let us adapt computational models as our needs have changed.
Actually, that's probably untrue, in the way you seem to mean it. Of course there's always a limit, but I think that things like wordprocessing and spreadsheet design could be HIGHLY parallelized.
The problem at the moment is that only one person does the job, and they do it sequentially. It could be done, however, so that many users each write a paragraph, or proof-read a paragraph, or do the layout, etc. This is what's been done for a long time, in more writing-focused commercial ventures, such as publishing: a different workflow allowing parallelization, and many specialised workers, performing the different tasks.
This could be applied to parrallel programming, too -- especially as we're moving more and more towards automated spell-checking, grammar checking, stylesheet application, index generation, typography rendering, etc.
Equally, in spreadsheets, there's no need for one cell to be designed with another cell or spreadsheet already in place. If you know the formula you're entering, someone else could easily complete the other cells later. Workflow is a bottleneck here, but that's arguably because the tools we've used until now haven't provided the opportunity for a different, better workflow.
Beyond that, there are still PLENTY of places where the UI itself could be parallelised: updating thumbnails, loading images, re-rendering at a new DPI, or rendering a different part of the page, drawing widgets on the screen (already partly parallelized by GPUs).
A move to a more event-oriented programming methodology would probably help a LOT of this stuff.
Programs that benefit from parallel programming more than likely are already parallelized. (Photoshop, Maya, Chess programs, Games, etc) What's the point in parallelizing Notepad? My point not everything needs to be developed like this, but anything CPU intensive can benefit and most of the time does.
The problem faced when developing multi-threaded applications has two fundamental parts: safety and liveness.
Safety is ensuring one thread does not clobber some other thread, while liveness is ensuring the threads do not cause each other to lock.
The classic example is Dijksta's dining philosophers problem. In this problem you have a set of five philosophers seated around a circular table. Between each pair of philosophers is a single fork, but each philosopher needs both the fork to his right and the fork to his left to eat. Being philosophers, each can either eat or think. To facilitate safety, one must ensure that when a philosopher wants to eat, he cannot use a fork that is already in use. To facilitate liveness, one must ensure that if a philosopher wants to eat, and it is safe, then he/she can eat.
Clearly, one cannot compromise safety, as this will cause fundamental problems with the software. A programmer must deal with liveness in at least a minimal manner to prevent deadlock, that is, the case wherein every philosopher, in order around the table, picks up his right fork and sits waiting for the left to become available - which will never happen, and execution stops. One solution prevents this by numbering the forks, ensuring that each philosopher picks up the lower numbered fork first, and as such, in this case, the final philosopher would attempt to pick up the lower number fork, already held by the first, and not do so as it is locked, thus allowing the fourth philosopher a chance to eat, though in this case only one philosopher can eat at a time. A better solution involves having each philosopher pick up the even numbered fork, ensuring the maximum number of philosophers that want to eat, can eat.
As you can see, there are two major issues to deal with, and only one offers any flexibility. In choosing how to implement liveness, that is, how to dealing conflicts in locking, we find added complication that must be overcome to maximize efficiency. In reality, this is a question of finding your critical sections of code and using fine locking, wherein you hold only the mutex of the data structure you are modifying. This is significantly more difficult that holding the mutex on the entire thing. To draw a correlation, the first solution to liveness in the dining philosophers problem can be achieved with a single mutex on the table. The second solution requires one mutex per fork (or, for another way to think about it, one mutex per philosopher.)
Why should it be hard?
I mean, with UNIX, it's easy to write complex applications thanks to the "each program is a filter" paradigm and with pipes.
Each program sends it's output to the next, and so on.
So why not do the same with processes?
A program would define serveral separate processes (functions), which are then parallelized through pipes; then, the various processes (functions) would simply be executed through the OS what would take care of the scheduling.
I don't think that OOP is the problem, rather that most OOP languages are jut syntactic sugar on top of the good old 'sequence, selection and iteration' models. An Actor model however suits ||ism quite nicely though. An OOP language (such as Smalltalk or even Java) could be used for quite nice || code in a Actors with Consumers and Producers IF a good memory manager and Garbage Collector was developed. I base my opinion on playing with Visual Works on IBM's SP2 architecture about 12 years ago.
I work for a company that makes microprocessors. Obviously, I can't give you specifics, but based on our future plans, I can tell you that parallel programming is only going to get harder, not easier, with silicon advancements. Our customers want more features and more performance, and that means exposing the programmer to more of what the hardware is doing, not less.
On a side note, let me just add that IMHO 90% of programmers I've seen really aren't very good, so complaining about such-and-such being "too hard" is cliché. I'm sorry that some companies out there are having a hard time finding programmers who know what they're doing, but that's just the way it is.
And the men who hold high places must be the ones who start
To mold a new reality... closer to the heart
I think that what's most likely to happen is we'll see the emergence of a new programming model, which allows us to specify an algorithm in a form resembling a Hasse diagram [wikipedia.org], where each point represent a step and each edge represents a dependency, so that a compiler can recognize what can and cannot be done in parallel and set up multiple threads of execution (or some similar construct) according to that.
You just described LabVIEW. It is a dataflow graphical programming language, meaning that each component executes when all of its inputs are ready. Multiple components that are ready simultaneously can be processed simultaneously. Programs can even consist of completely separate functional dataflows, and both will execute in parallel.
Yes, I believe it already supports multiple processors.
It doesn't hurt to be nice.
Its interesting how technology has evolved over the last 20 years and this is a prime example... go back 15 years and high-end was truely high-end, desktop was very low-end and there was varying shades of grey all throughout the spectrum.
Those 2 ends of the spectrum still remain to a large extent, but the distance between them really has narrowed. Sure you can buy a huge multi-rack (possibly even custom designed) box if your a company and the raw grunt of it will slaughter your over pc. But, the architectual differences are nowhere near as huge as they were.
But parallel programming is by no means new, and in fact it was really big news 15 years ago on MPAR boxes. It was a highly specialised area and while MPAR boxes still do exist, the demand for the skill set has certainly all but disappeared... Well now its back again, possibly with a vengence. But when you look around, theres alot of parallel type paradigm's already in existence that can probably help the industry grow. Take (as an example out of thing air) stackless pythons model of (sort-of) blowing away the threading model and going with something they like to call tasklets and micro-threads. To me, this whole concept sounds rather antiquated much like the pre-smp-desktop era with its co-operative multi-tasking, but its a concept that is worth building on.
Personally i think perhaps they'll find a way of overcoming the whole "we cant make cpu's any faster anymore" thing with some new technological discovery. If they don't however, it really could spell a new era of computing in some ways. People might possibly start looking at CPU architecture with a new eye. Its been a while since we even had (real) arguments about RISC v CISC for example.
Imagine a completely new cpu design thats like nothing we've seen, but scales in some horizontal way thats transparent to programmers. Necessity is the mother of invention after all;)
What I really am interested in seeing is how something like that would develop? Microsoft have been traditionally very bad at dealing with architecture changes on those kinds of scales (think smp, 64bit, etc) and thats not to say linux, apple, Sun, BSD (or anyone else) would be able to adapt. But if there were a fundamental shift in the basis of computer design there stands a chance for someone to win a pretty big victory. Although I personally hope its not apple, apple seem great at farming a new technology and making it just a little expensive and forcing someone to come out with a cheaper (not just in cost) alternative (a good example of that was firewire vs USB, firewire should have won that battle if only apple just hadn't made it way too expensive to license).
But, i think by "necessity is the mother of invention" the end result is going to be something more like a shift in the way people code - perhaps a new language, new design principles, etc that lend them self to parallel programming. Perhaps even a method of virtualisation that solves the problem?
Whatever does happen, it'll be exciting to be along for the ride! (i hope).
Why paralelize when we have quantum computers?
The main reason that 'parallel' programming is hard is that current tools support it only as an afterthought. All mainstream tools (languages, compilers, libraries, debuggers, etc.) are built, primarily, to support single-threaded code. It is up to the individual programmer to write code that is multi-thread safe by enclosing dangerous sections of code in mutexes, erecting semephores to coordinate multiple threads, and so on. Even languages with explicit support for multi-threading (e.g. Java) only support it as the exception: all code is presumed to be inteded for a single-threaded environment unless the programmer explicitly identifies it as multi-threaded (e.g. 'synchronized').
A better solution would be to assume that all code is run in a multi-threaded environment and have the tools build the code for mutual exclusion, unless explicitly told otherwise by the programmer. The language and libraries could also provide basic utilities and data structures in thread-safe implementations by default, requiring the programmer to specifically select a non-thread-safe version when desired. Monitors (thread-safe objects) were invented decades ago, but are still not in common use in modern languages.
The fundamental problem is that people are really, really bad at reasoning about thread-safety and figuring out where mutually exclusive sections of code and race conditions are. Relying on the programmer to get such things right is a guarantee of failure. The tools (languaes, compilers, etc.) should be able to determine when methods/functions access the same common state and automatically insert mutexes/semephores to enforce mutual exclusion, or at least assume that, unless told otherwise, all methods/functions must have mutexes/semephores. You'd still have instances where a programmer thought, incorrectly, that some code could be run with a mutex, but it would be the exception rather than the rule.
just a ghost in the machine.
Most apps that real people use don't need to be parallel processing. Word processing, e-mail, web browsing, even 95% of spreadsheets. These apps do some stuff in multiple threads, but they're not truly parallel processing so much as performing some operations in background threads. So I think part of it is that the problem needs to be better defined as to what you consider parallel processing. If you mean simply running a thread in the background, that's not a big deal and I do it all the time in my own code.
True parallel processing only needs to come into play with very compute intensive problems. In these cases, parallel processing needs to be taken into consideration in the initial design. Proper design can usually avoid serious problems, but it's also a learned skill, like any other aspect of programming. The mistake is to think that someone who can program should inherently have this skill. People specialize in it. Just like some people specialize in 3D apps or database apps. If you do it enough, it's not that hard. I mean, no harder than any other specialized field, like 3D. It's simply learning a new set of issues to deal with, how to avoid problems related to parallel processing and how to debug those problems when they do appear.
Generally, the biggest problem is coming up with a suitable way to represent a computation in a parallel way. Many compute intensive problems are simply hard or impossible to handle in a parallel way because of dependencies that require some computations to take place before other computations. Some problems, on the other hand, lend themselves quite well to parallel processing.
But the question, "Is Parallel Programming Just Too Hard?" No. It's not. It's hard, like anything else that requires time and skills to develop, but it's not "too hard" as a field. But as I said, certain problems simply don't lend themselves to parallel processing. In those cases, it's not that parallel processing itself is "too hard," it's simply that the problem isn't one that's necessarily solvable in a parallel way.
No, writing programs that do parallel processing (whether using threads or processes to do it) is really quite easy. You do need to have a good background in programming before you take up the challenge, though, or you will really mess it up - but that is true for any more advanced technique in a field - you need to know the basics to do the advanced stuff.
Single threaded apps are really the basics of programming, and today's environments really, really require multi-threaded/multi-processed apps. For example, GUI environments usually require having a thread dedicated to handling the GUI interface, and running lots of worker threads to perform the various tasks. (If you get a GUI program whose window hangs on you, it is because either (a) it was not written correctly and does not use a dedicated thread to handle its GUI, and/or (b) it got stuck waiting on something. 'a' is the more likely cause.)
Now, what can be a bit difficult is determine what parts of the program to offload to another thread. And the simple answer there is anything that could block the program if it does not return fast enough. (Fast enough being
The problem simply becomes that we do not want to do it because we think it is hard. And it is not a matter of language issues either. Certainly some languages may make it easier, but it is easy to do multi-threaded and multi-processed programming in C and C++ - OO makes it even easier. What programmers need to learn is how to use locking mechanisms correctly to protect the data, and how to write code that will keep locks for minimal amounts of time in order to minimize possible locking conflicts. (OO's Get/Set methodology helps this tremendously.)
So, no - it is not hard. Most programmers are just to lazy to learn how and/or attempt it. Reality is, if you want to maintain a responsive interface you have to do it.
FYI - I've been doing this very thing for a quite while now, and in C/C++ in a Win32 environment none-the-less. I've also done similar work under Linux. When you architect your program correctly, it becomes really easy to do. I would offer that the real cause is that software programmers don't do enough software engineering for their programs; so everything happens in chaos, which does make it difficult to do. (Thus the perception of being hard, and the laziness of programmers in doing it.)
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
Erlang succeeds because it has message passage between lightweight processes, not true OS threads. You can easily get 100,000 processes cooking along and talking to each other in Erlang, but such numbers would kill most operating systems. And even if you could get those numbers with Linux or Windows, Erlang would still walk all over them performance-wise by a huge factor.
So what we really need is for OSes to be designed around having massive numbers of threads that communicate via message passing (i.e. shared nothing). Then the need for custom languages will go away.
While I don't disagree that many consumer apps work OK on single thread hardware, it certainly isn't the case with every app and certainly not emerging apps.
A current example are things like video encoding or in some cases play back. Other things like word processors are enhanced by threaded hardware due the number of threads they spawn. Often the consumer benefit is snappy performance on low power machine (power as in wattage).
Given the above I don't think that the big benefit of MSP hardware at the consumer level is the acceleration of any one program. Rather I see the development as enabling the broader use of a computing system due to the OS being able to keep the system responsive during multitasking. It would not be impossible to have a video conference take place while encoding a movie and maybe a word processing or Internet operation going on. Of course somebody is going to ask who would want to do something like that - the easy reply is grand parents. Or more exactly children of grand parents responding to requests to see the latest pics or movies.
It is the ability of the coming multi core processor to adapt performance to user needs that will make these PC's high in demand. Adaptability is a clear goal of AMD's coming quad core with the independent control they now have of the individual cores on the chip. Basically it allows the operating system to supply the computational power required on demand. Frankly the coming technology has put me off buying a new machine. The reality is that I may be able to buy a quad core machine that uses far less power on average than the single core machine I have now. That is cool.
The other thing that people should recognize is that many so called professional application of computers is so simple because of the hardware required to run the software. Give consumer hardware to approximate that professional hardware and you will see consumers running "professional software". And if not consumers a good number of professionals that don't have the bank roll of a Fortune 500.
So is SMP hardware difficult to exploit in consumer applications. I'd say not at all. You just don't need to consider that all programs don't need to be parallel to justify the SMP hardware. As to the whining about single thread performance, I don't think people will have much to complain about with the advent of some of the new processor technology about to hit the marketplace, it will get better and you will have multiple cores to boot.
Dave
Maybe there just isn't a big enough need to use parallel programming. Sure there are some programs which just need it but for the most part the general consensus is that we just don't needed. I know you think that we should advance in loo of the demand but in the long run programs are made for the user and maybe he/she just doesn't see the need.
"Health nuts are going to feel stupid someday, lying in hospitals dying of nothing." - Redd Foxx
Through strife, we become stronger, and less bored. And right now, it seems everyone is bored. How many web apps can you code before you shoot yourself to see what's on the other side? I'll never give up Perl and its swiss army knife approach, but I think challenges make me grow and keep from becoming a stagnant, boring curmudgeon (or more of one).
technical writing / development
Not if you're are doing it with Fortran95 and MPICH-2....its really a piece of cake...
Threads (multiple threads) can actually simplify the software rather than making it more complex. Or, rather, simplify writing the software.
Ok, so maybe I am an old hand at multi-threaded programming (over 20 years of doing this stuff) but it just makes sense for so many things. Past examples that the general public may have seen are things I wrote on the Amiga, machines with interrupts, Apache, etc. (Imaging writing a high-volume web server without multiple threads!)
There was a Hockey game on the Amiga that had each player on the ice having its own thread and own IA behavior. The code became so much easier than the traditional multiple interlocking state machines that such simulation games had to run that in order to port it to the PC they ended up writing their own multi-threading environment that ran on top of DOS.
Personally, I think that too many people are scared away from multi-threaded programming because of the "horror stories" and a few "old guard" that don't feel comfortable in such an environment. For me, many problems are so much easier in the multi-threaded solution space that I rarely think of non-multi-threading. (Well, co-routines are sometimes required when the platform does not support true multi-threading)
If past is prologue then developers will spend time optimizing the wrong part of their program for parallelism...
Orignator of the Miserable Failure Googlebomb
Most "programmers" seem to have a difficult time churning out normal, non-parallel code that runs, much less code that is bug free. Do you seriously think your average code monkey reusing snippets from the language-of-the-week cookbook is going to be able to wrap his head around something that actually requires a little thought?
You're right, and have convinced me: parallel programming is too hard. I'm going to disable my 2nd core, delete all my threaded applications and libraries in anticipation of the new age.
Finally things make sense again!
-judging another only defines yourself
I've transformed several processes in this fashion and I'm no super brain. In many cases, the issue is not processors but saturating the I/O channel. There isn't much point in adding more parallel processing if your disks have 10 second queues for I/O.
Also, some problems do not break into pieces.
My understanding of the hard part currently is being able to write a straightforward loop and the compiler will know when to break it into "X" pieces.
The fundamental problem is probably lack of pay in parallel programming. If the pay is there, people who are talented will enter the field.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
This is just a bad parallel algorithm that doesn't lend itself to synchronization. Two procedures (p1 and p2 here) should not both be responsible for setting the same result variable (C here).
If two procedures contribute to a result C, then they should each place their results in intermediate variables (A and B). The final result is C = A[+]B (where [+] is some operation). This is how implicitly parallel compilers work, such as Fortran and ADA. Consider a Fortran dot-product operation.
If two procedures each set a result C, then only one procedure is really needed. In the sample, procedure p1's effort is clobbered by p2. Some parallel compilers might be able to tell which procedure (p1 or p2) is useless.
Parallel programming's actually pretty easy.
StoneCypher is Full of BS
Multics was designed for long lived processes. Short lived processes are something we take for granted today but wasn't assumed back then. Today we assume that the sequence is we open a program, perform a task, close the program. Microsoft Outlook, for example, relies on Outlook being closed for its queue when to purge email that's been deleted. Programs are not designed to be up years-on-end. In fact, Microsoft reboots their OS every month with Windows Update. I've often speculated that the reboot is often requested not because the patch requires it but because Microsoft knows that its OS needs rebooted, often.
Why? Why wouldn't one just leave every application you've ever opened, opened?
The reason is that programmers cannot reliably write long running process code. Programs crashed all the time in Multics. Something Multics wasn't very good at handling back in the 1960s. There was some research done and it was observed that programmers could write code for short lived processes fairly well but not long lived.
So, one lessoned learn from from the failure Multics is that programmers do not write reliable, long running code.
Parallel processing is a processing better suited to long running processes. Since humans are not good at writing long running processes it makes sense then that parallel processing is rare. The innovation to deal with this sticky dilemma was the client-server model. Only the server needs to be up for long periods of time. The clients can and should perform short lived tasks and only the server needs to be reliably programmed to run 24/7. Consequently you see servers have clusters, RAID storage, SAN storage and other parallel engineering and clients do not. In some sense, Windows is the terminal they were thinking of back in the Multics days. The twist is that given humans are not very good at writing long running processes then the client OS, Windows, is designed around short lived processes. Open, perform task, close. I leave Firefox open for more than a couple of days and it is using 300MB of memory and slowing my machine down. I close and reopen Firefox daily.
Threads didn't exist in the computing word until Virtual Memory with it's program isolation came to be. What happened before threads? What happened before threads is that programmers were in a shared, non-isolated environment. Where Multics gave isolation to every program, Unix just recognizes two users: root and everyone else. Before Virtual Memory, this meant that all user programs could easily step on each other and programs could bring each down. Which happened a lot.
Virtual Memory is one of the greatest computing advances because it specifically deals with a short coming in human nature. Humans are not very good at programming memory directly, i.e. pointers.
It wasn't very long after VM came out that threads were invented to allow intra-application memory sharing. Here's the irony though. There still as no advancement in getting humans to perform reliable programming. Humans today are still not very good at programming memory directly, even with monitors, semaphores and other OS helpers.
When I was in my graduate OS class the question was raised then of "when do you invoke a thread?" given you probably shouldn't to avoid instability.
The answer was to think about threads then as "light weight processes". The teaching was that given this a thread was appropriate for:
Have one thread per IO device like keyboard, mouse, monitor, etc. There should be one thread dedicated to CPU only and the CPU thread controls all the IO threads. The IO threads should be given the simple task of servicing requests on behalf of the CPU thread.
Onl
"Since this discussion has been going on for over three decades with little progress in terms of widespread change"
Actually, there has been huge progress in the last 5-10 years. Before Java, there were no mass-market programming languages that included explicit support for threading primitives. Now that Java is dominant, or at least widespread, in all areas of software except the desktop and game consoles, multithreading has become much more ubiquitous. And game consoles have their own brand of multithreading-- look at the Cell in the PS3.
There are a lot of ideas for what will be the next generation of tools, which will lead to the next generation of capabilities. It's really all about the tools that make these things easier, more than the developers who just do whatever's the easiest way to fulfill their contract.
E pluribus unum
A good OS and compilers should do most of the work.
Parallel programming is hard, but I'm not sure that's really such a bad thing. I don't mean that I like inadequate tools, but there *is* a fundamental human limit, and overcoming it requires discipline and training or experience. If the rise of multi-core machines results in greater stratification to distinguish Software Engineers from Computer Programmers, I won't shed a tear. There's a lot of bad software in this world, and simple economics dictates that this will always be true. If we can push that niche towards less important, marginal tasks that are happy being single-threaded, and away from the primary workloads of our increasingly SMP machines, maybe software on the whole will suck a tiny bit less.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
The difficulty of a particular parallel programming task depends greatly on the complexity of the task at hand. Serial programming has the same issues.
I've had lots of serial programming tasks that are actually very difficult to break down, modularize, and make extensible. I've also had parallel programming tasks that have been pretty simple -- and some that have been pretty difficult. Let's face it... a task which involves a simple fire-and-forget is parallel... but incredibly easy. It is when notification and state management are involved that things get more complicated. But that's the same with serial programming -- although serial is a little more straightforward.
Any programmer worth their salt will become more and more proficient in parallel programming over time. It just takes practice, knowing what to look for, knowing what pitfalls there are, et cetera.
It just takes practice.
Pre-made solutions that are supposed to 'scale' your applications, things like app-servers, that's what is really not letting the developers of projects to truly parallellize applications.
In reality many of the back-end applications are parallel just by virtue of sharing common data resources and running on multiple cores/systems. A common database is a synchronized data store, which forces sequential (transactional) processing, but beyond that, the multiple threads running for multiple users are basically parallellilized programs.
Currently I am on a project where the company decided they wanted a cheap solution, so no Weblogic licenses for this project (Java stuff.) Instead of relying on a cluster of weblogic nodes, I decided to make the application distributable/scalable/parallel just by virtue of the design itself. The design breaks application down into multiple independent services. Each service has a thread pool, each request gets its own thread. All calls between services are asynchronous, a well-defined state machine is forcing a specific order of execution depending on the current state and output conditions from the current service being executed. Each service can be distributed on its own and multithreaded. The communication protocol is done through yet another service, something like a lightweight-persistant message queue. This message service is also distributed, but synchronized on a database table (where the messages are stored.) The message service is aware of each other running service and it decides which instance of a particular service to send a message to (round robin and business/health checks are also included into the decision.)
So a client makes a call to the system through a web-service, which puts the client to wait and starts a state machine handler, which decides what is the next process going to be. All communications are asynchronous, once the client's thread wakes up, the data is either there or not. The client can be either released (timed-out) or put to sleep again to wait some more time. Even if the client is released, the state machine continues processing and gets the data and stores it in local cache. The next time this data is requested it is already retrieved and preprocessed and can be retrieved immediately from cache. Cache expires/clears based on predefined rules for this data type.
This is an application that is distributed and parallel by design and not by the virtue of running in a so called 'distributed parallel container'. If more parallellization becomes necessary at any step, it can be outsorced to yet another service, and the state-machine will be modified to include this service into account.
I find that this is the simplest approach to parallellization of very large systems but I am pretty certain this can be used even for simple algorythms.
You can't handle the truth.
Yes No Yes No Yes Yes No No No Yes No
Is it, Linus?
System of Systems is a philosophy that should look a helluva lot like the Unix Philosophy because... it is the same philosophy... and people do write code for these environments. If you want Parallel Programming to succeed make it work as systems of systems interacting to create a complex whole. Create environments where independent "objects" or code units can interact with each other asynchronously.
Otherwise, as discussed in TFA there are certain problems that just don't parallelize. However, there are whole classes of algorithms that aren't developed in modern Computer Science because such stuff is too radical a departure from classical mathematics. Consider HTA as a computing model. It is just too far away from traditional programming.
Parallel programing is just alien to traditional procedural declarative programming models. One solution is to abandon traditional programming languages and/or models as inadequate to represent more complex parallel problems. Another is to isolate procedural code to message each other asynchronously. Another is to create a system of systems... or combinations of both.
If there is a limitation it is in the declarative sequential paradigm and that limitation is partially addressed by all the latest work in asynchronous 'net technologies. These distributed paradigms still haven't filtered through the whole industry.
[signature]
My experience is that parallel programming is not all that difficult. However, there is a generation of x86-architecture developers that have developed "bad" single-processor/single-thread habits and are not inclined to change. Many of these folks are the team leads/architects/supervisors/managers of today. Not only that, because they are not inclined to learn threading, they cast doubt on the skills of those who can exploit some of its benefits.
As an example:
I worked as a developer for a large payroll processing company several years ago. At one point, due to numerous headaches compiling eight different executables, I broke out their state income tax methods into "plug-in" libraries. I wrote a threaded loader so that the application could continue on its merry way while the plug-ins were loading (adding a some logic to control access until all plug-ins were loaded). The code was solid and ran flawlessly for about a year. Forward a year and an access violation starts showing up. Immediately the manager and "expert" developer (throwback developer) pointed the finger at the threading code without doing any homework and demanded it be changed to non-threaded. Lo and behold the access violation did not go away. Ended up being a memory leak introduced by the so-called "expert". However, did the threading get added back in? Of course not. So to this day the application hangs for about a minute while the libraries load.
Old habits die hard.
Back around Mac OS 7.5 there was a language called Prograf which (logically speaking) could handle this problem easily. It required thinking slightly differently, but that wasn't it's real problem. It's real problem was that it didn't have any compact textual representation. Also that is only ran on the Mac. (I believe that it died during an attempt to transition to MSWind, as so many other programs did.)
...).
They called it a dataflow architecture. (Well, with the Mac being a single processor system the "architecture was entirely software, but
However it could handle common parallelizations quite easily. A node of the program was defined in terms of what it required to operate, and what it did. The compiler did a kind of reverse sort, starting with the answer it built the required availability of data sources, and chained backwards. There were a few explicitly sequential steps, but they weren't common. E.g., addition was defined as (approx.) to get the answer you need two numeric inputs...THESE, add them together to get the answer. Then the compiler figures out how you are telling it to get THESE.
I understand that there are still a couple or three dataflow languages around, but the last time I checked two were special purpose and one was obscure. (Well, it's been a few months. There were reasons in each case why I didn't check further into using them.)
Erlang can also handle the stated problem, but it's SLOW. I don't know if this is inherent or can be fixed. (It may not be fixed even if it can be, becasue it's fast enough for what it was designed to do.) Still I keep coming back to Erlang, because it appears to be the best thing currently active that handles THAT part of the problem space. Then I look at how it handles the rest of the problem space and look for something better.
Currently I'm quite unsatisfied with all of the parallelization tools that I know of. This isn't something intrinsic in the problem, but because it hasn't been of serious concern among large numbers of people for very long. Good answers exist (in principle), they just don't appear to be currently available.
I think we've pushed this "anyone can grow up to be president" thing too far.
I've been waiting for access to multiprocessor/multicore systems for YEARS. Right now I'm writing this on a dual core laptop that I've had for two months - and is the first such system I've had in my life. (I've been programming since 1984 - however outside of some circa 1980-ish "big iron systems" in the early 90s I've had no access to that world.
:)
now I do and I'm starting to recode all the work I did in prep for this day.
The cost has dropped. It can finally be tested by those of us operating on a shoestring budget with hand-me-down equipment.
About time
On just about everything I work on, I often come to a point where I see that something could be broken into multiple processes -- I wouldn't even have to dirty my hands with all the trickiness of multithreading, shared memory, etc.
But the easiness (usually) only goes up to a point. I think programming for today's multi-core chips isn't all that hard. But when I hear about 80-core chips, I realize: no, I just don't see that many processes in the problem that I'm looking at. To really use that kind of hardware, most (though not all) problems are going to take some pretty serious work (at best) or maybe even a totally alien way of looking at them, and I probably will have to deal with nontrivial concurrency issues.
There's a little bit of good news, though: many application programmers don't have to see the whole picture. Remember that your app is just one of the things that a user's desktop machine is running. If you can only use 20 of a machine's 80 cores, that's ok. The user would probably bitch anyway, if your 80-way game caused his PVR to transcode more slowly, his VoIP to stutter because it can't encrypt fast enough, and his Gentoo package compile to take longer. I think it's ok for the hardware to be a bit overkill compared to the individual applications. Yeah, maybe that doesn't apply to some of you guys, who are working in situations where a machine is dedicated to a single problem, but with personal computers, the programmer doesn't have to "use up" the whole box -- in fact, he probably shouldn't.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Even with multi-threaded apps, I find older single core (but fast) processors to be ~3x faster than my dual core AMD.
I am writing n-threaded app now, the trade off appears to be stability - particularly the stability of 3rd-party libraries (.net).
But the question is with image loading and processing. Some functions can be split, but the slowest functions are spatial filters(ie Gaussian blur), and these have overlapping datasets (though only in read mode). So it seems that the applications will become multithreaded when the libraries are smart enough to break down long functions into sub threads.
The other solution - which is to divide threads at the higher level would seem to quickly hit the serialization horizon.
AIK
Simple fact: Any recursion that works in O(1) space can be reimplemented as a loop. That's actually how O(1) recursion works: the compiler emits an unconditional jump to the start of the "loop". Yes, I will grant that there is some benefit to being able to chain together a sequence of mutually tail-recursive functions: foo -> bar -> baz -> foo ->
The other reason people don't like recursion is simple: stack overflows. The C language, for example, does not guarantee O(1) tail recursion even though good compilers like gcc will recognize it and implement it as such. You'll be wishing you had used iteration when you find out 6 months after the fact that the compiler failed to recognize space-bounded recursion and your app crashed on a customer system.
So either the parent is writing proper tail recursion recursion and assuming the compiler will actually emit an O(1) jump, or he is writing non-tail-recursion and just assuming that it won't smash the stack. In either case, I submit that the parent probably doesn't understand recursion as well as he thinks he does.
There's a time for iteration and a time for recursion. Understanding and embracing both will help you write better code in each paradigm.
p.s. Quicksort is a recursive algorithm with linear fan-out (logarithmic depth), right? Well go look at the source for your library's implementation of quicksort in C. Now try to understand why the iterative library code outperforms anything you could possibly write using recursion.
- p1 reads C
- p2 reads C
- p1 writes to C
- p2 writes to C - p1's change just got clobbered
Solution: Nothing is ever written to more than once. Here, p1 and p2 run in parallel, reading C1 and writing to C2, and another process p3 that has been blocking on both p1 and p2 reads C1 and C2 and writes C3. See also Static single assignment form.Parallel programming is very easy to do, when you know the trick to it, do the parallel programming in a parallel universe.
Carbon based humanoid in training.
I don't think this is too hard for humans-- humans manage parallel processing all the time-- albeit in other guises commonly referred to as "multitasking". Managers, for example, process in parallel frequently when they sit down and assign tasks to various staff members who then go away and get them done and come back for more assignments. Sometimes the assignments are standalone, and sometimes they involve interdependencies with tasks assigned to other staff members, and sometimes tasks are handed-out without truly understanding what sort of dependencies or interactions will be required and just assume that either the individual staff members can get together to figure it out, or else they can do as much as possible and then return for a full staff meeting/briefing to assess the status and determine what tasks are still left to perform.
In a similar fashion, people parallel process all the time. They have their email open in one window, an IM client in another, have Slashdot open in yet another window, and are busy typing in a word processor / spreadsheet / vi / whatever in yet another. While it is true that most folks must multi-task in the more traditional manner (ie. task switch, so perhaps not true parallel in the strictest form), some people have secretaries or assistants they can off-load task functions to (reading/responding to emails for example) which would fit the definition.
Yet other people perform parallel processing on the weekends. There are things to do at home-- chores for example, so the occupants get together and divy up the tasks and one person mows the lawn, another one vaccums the carpet, another goes to the store and gets groceries and another takes out the trash. Collectively, concurrently, all of the tasks get accomplished. Whenever interaction is required, either due to interdependencies-- ie. needing to go to the gas station to buy gas for the mower-- and hitching a ride with the person going to the grocery store.
Or when manufacturing a product. Thing A happens while thing B happens but thing C has to wait until A and B are ready.
There are countless examples of concurrent and parallel processing in the real world, and even in IT on a regular basis. Most loosely (or unrelated) tasks even on a uni-processing, multi-tasking system are for most intents and purposes parallel. The real problems creep in when considering how to implement low-level details such as interlocking processes, interdependent processes, process notification, task granularity and dispatching, and other similar concepts. I think much of the problem comes about in the fact that people "naturally" parallel process things from a higher-level abstract point-of-view, but rarely have to deal with the detailed subtleties that arise when having to figure out the whys and hows. However, in the real world (TM) there are Project Managers and Planners who work out these details. They are the "programmers" of these processes. They determine how collisions or other interactions will be handled if/when they arise. Sometimes (in fact, probably most of the time) in the real world these details just sort of "happen" without much overt planning. Instead little "scripts" that people know take care of these details. Such as-- "Oops, I'm out of gas. Since you're going to the store, can you drive me by the gas station on the way?" or "I'm stuck on my project and can't continue until George gets his part done." And other such examples.
In my view the real way to win at parallel processing is to make processors plentiful and ubiquitous. Instead of having a system with just one or two processors, have a system with 10,000 or 100,000 or even a million processors. Just hanging around waiting to be tasked. Each a little tiny "computer system" unto itself, and yet attached to a larger system with common peripherals, communication pathways, and the like. Then to have one (or perhaps a group of) executive processors who interact with the user (or the user's executive "processor agent") to get its marching orders and t
Pick any one...
Some of the things being done in C# I think will perhaps partially address this problem.
Anders Hejlsberg often talked about parallelism, specifically with constructs such as the new LinQ add-ons. He's talked about multi-threading and parallelism being handled more by the compiler.
He rightly describes the problem as simply being an abstract one: specificaly, when you might write a simple "for" loop - you're essentially giving the compiler very detailed instractions on what you wish to have your program do and how you want it to accomplish it, however, if you use a "foreach" loop, all of a sudden, you're telling the compiler WHAT you want to do, as opposed to HOW you want to do it.
This kind of thinking gives the compiler more freedome to make parallel desicions.
I think this is a more appropriate question.
The need for parallel applications is overblown, particularly when modern OSs can manage single threaded applications across multiple CPUs and you can use interprocess communication to communicate between parent and child processes. For the uninitiated, a parent application forks (creates a copy of itself) child processes (which themselves can fork child processes - ad infinitum) all of which can run on seperate CPUs and communication with each other. Modern OSes manage the distribution of processes across multiple CPUs.
This is a model that has been around a long time, is easy to program, and just works. Whatever efficiency is lost in interprocess communication between single threaded applications is made up for in dealing with mutex issues in parallel programs. (From Wikipedia) "The programmer needs to be careful to avoid race conditions, and other non-intuitive behaviors. In order for data to be correctly manipulated, threads will often need to rendezvous in time in order to process the data in the correct order. Threads may also require atomic operations (often implemented using semaphores) in order to prevent common data from being simultaneously modified, or read while in the process of being modified. Careless use of such primitives can lead to deadlocks."
I don't see the benefit of doing concurrent programming in all cases, and see many problems that increase the complexity beyond what is reasonable to manage from a QA standpoint in most (in most applications data must be atomically modified - obviating the need to multithread, since all other threads will be busy-waiting for the data to be freed up anyway before they can work on it).
For computational research, I could see the usefulness of multithreading inside of an application. For practical applications it is too complicated, and the cost of time spent building and debugging a multithreaded system does not balance out against the return on the investment. Computer systems are complex enough as it is; lets work smart and not latch onto the latest buzz word/tool at the expense of all others. It might be cheaper to wait 6 months, and the percieved bottleneck may have been cleared up by the application of new hardware anyway (Beowulf cluster, or faster/more capable processors etc).
I view parallel programming much as I do assembly language: it is a tool that is useful for a specific problem domain - but I wouldn't want to do all of my programming in it.
Lodragan Draoidh
The more you explain it, the more I don't understand it. - Mark Twain
Unlike you poor people out there, I have two brains, so parallel programming comes quite naturally to me -- at least on dual-cores....
It's a nice little advantage, but deadlocks can be quite painful.
VHDL and Verilog are perhaps the most widespread
.. ;)
parallel programming languages in use today.
Admittedly, hardware design is, by nature, more
static than most software designs. Nonetheless,
hardware designers seem to have relatively little
difficulty programming "in parallel".
Or perhaps they are just smarter
The gigantic paradigm shift to parallel programming ain't going to happen for the programming "public at large". Most programmers are high level script programming dudes who don't even understand pointers, the stack, the heap, or the concept of "type". You think they're all of a sudden going to become code-ninjas who understand massively parallel programming? How is some guy who doesn't even understand what a memory address is going to deal with this sort of thing?
The only way this problem is going to be solved by software (if at all) will be at the compiler, OS, and tool level. Joe-script-dude isn't going to get any better.
...check out these tools
Take a look at the XB0360 and the PS3. Both of these combine several multipurpose cores with a huge number of special purpose SIMD cores used just for graphics. (A typical modern GPU is a large collection of SIMD cores.) So there you have a huge market for multithreaded consumer-level applications.
In business there are people who would be very happy to see a multithreaded spread sheet program. Anyone who works with database extracts in a spread sheet would love that. So, there is at least one business app.
Stonewolf
How can our brains not be wired to think about parallel processes? Everything around us in running in parallel. Physical objects don't sit in stasis taking turns moving. Surely our brains can handle this.
What we need is a programming language that works the way the world does. For starters, objects don't modify each others' internal states. But they might send messages to each other.
So if you want a parallel langugage that works the way your brain works, try Erlang. It works just like that, and Erlang programmers are universally of the opinion that multicore/distributed processing isn't hard at all.
The context shifts between languages can get really confusing, though. Especially if you speak more than two languages and for everything you hear, you have to first identify the language (by which point, you're struggling to catch up with what was just said).
This is a problem for such a small niche of coder... to pretend that it will affect everyone is just scare-mongering. It's difficult and it should remain that way. This isn't some defect in the language(s), but in the very problems themselves.
... will never ever need to worry about parallelism. Most coders will never need to worry about it, so I suggest everyone stop worrying about it. Everyone that needs to go parallel already knows it.
Parallel coding requires discipline. If you don't want to spend the time to do it properly, then don't write parallel code. This isn't a very big deal. The fat, bloated, and lazy scripters (I write some PHP & Ruby myself) will find parallelism in concurrent execution of the same sequential script(s). The guy who writes the webserver? He'll be writing actual parallel code. This is the way it is now.
Video game code? Ever more complex APIs are being introduced to shield coders from actual coding. I anticipate next gen consoles will provide their own complete game engine implementations. No problems there. Database (form) applications? No need to go parallel unless you're insane. Spreadsheets?... maybe if you're the one writing the actual spreadsheet software (Excel, OpenOffice, Lotus). Parallelizing formulas is just stupid.
The majority of applications run just fine sequentially. I once worked for a place that threaded everything. It was a mess. We had simple U.I.'s under serious lock contention, and sometimes deadlock, for tasks that would normally complete in milliseconds. Their love of threading creating a soiled heap of horrid code that ran so erratically, I quit. They applied it incorrectly and without discipline.
Only serious low-level or networking coders need to worry about such things. If your computation is so starved that it needs to run on multiple cores, then you could probably benefit from vectorization and assembler tweaks. Once you're in that realm, complexity is nothing new. If you're parallelizing fat code, then you're just being wasteful, perhaps unnecessarily.
Your average corporate drone writing accounting software, or form-based UI front ends to databases, or reporting,
Lots of comments have been about the validity and/or the ease, or not, of parallel programming - but I think everyone can agree that there just isn't enough education on the topic - for or against it.
Does anyone have any good books, sites, tutorials, or classes to recommend on parallel programming (or writing parallel algorithms I should say, since most people seem to agree that programming them is the easy part. Or yes, I would even take good material on threading.).
"There has been a lot of talk recently about the need for programmers to shift paradigms and begin building more parallel applications and systems. The need to do this and the hardware and systems to support it have been around for a while, but we haven't seen a lot of progress. The article says that gaming systems have made progress, but MMOGs are typically years late and I'll bet part of the problem is trying to be more parallel/distributed. Since this discussion has been going on for over three decades with little progress in terms of widespread change, one has to ask: is parallel programming just too difficult for most programmers? Are the tools inadequate or perhaps is it that it is very difficult to think about parallel systems? Maybe it is a fundamental human limit. Will we really see progress in the next 10 years that matches the progress of the silicon?"
:)
BULLSHIT!
Anyone remembers the bullshit bingo game?
Having written MPI and Linda work-alike libraries, developed parallel and distributed systems, and being a heavy user of threads and remote procedure calls (via xml-rpc), I will say that the general issue that all of these different systems try to handle is one of API. How does one have data X handled by function Y in thread/process Z. In the realm of threading, we use queues with locks. MPI/PVM/Linda/XML-RPC all use sockets to pass data between processes, and all but Linda requires that you specify the destination process explicitly.
One interesting recent development is the processing package for Python: http://www.python.org/pypi/processing/ . The idea is that you create processes like you would threads, then use shared queues, key-value mappings, etc., to pass data between the processes like you would threads. By sticking with generalizations that are seen to be 'easy' to understand and use (primarily queues), they bypass the majority of the difficulty people have when using threads and processes. Of course it doesn't hurt that the data transfer is pretty fast.
The only limiting factors with the particular implementation is that it relies on fork in *nix, and the transfer of code as data in Windows (which isn't generally possible for all languages, especially with the NX processor extension). If Windows had a usable fork, and if there was a 'fork across machines' (think mosix), many of these discussions would result in "just program it like you had a thread and use shared queues and the processing package for your language".
Multi-core designs do not change the fact that processors will always execute the vast majority of their instructions sequentially. Even a 1000 core machine would still be executing more instructions in serial than in parallel. If each core ran at 1Hz they would still be executing 1000 instructions per second in parallel but each of those is also part of a sequence in each core so it is executing >1000 in serial depending on how deeply pipelined the cores are! The extreme of parallelism is a neural network but neurons do not execute instructions: if you want each time-step of the processor to have a well defined meaning (so that it could justifiably be called an instruction execution) then it needs a context - it needs to be part of a sequence.
On a more general note I don't know why people have taken to evangelising functional programming for this discussion. Functional languages are particularly not the solution for parallel programming. People are too dumb or lazy to write explicitly parallel code and think if they write everything in a functional language the compiler will handle that messy business for them. Wrong! As soon as you get to something moderately complex like matrix multiplication functional languages will fail to find a good strategy. There is no way in hell a compiler can figure out an optimal way to parallelise matrix multiplication for an arbitrary architecture because the optimal matrix blocking and memory access patterns depend on the processors cache design. The ATLAS project achieves this to some extent but it is a highly specialised 'compiler' if you can call it that, and needs to know the cpu type and cache sizes/organisation to work optimally.
They don't just set it, they read, modify, THEN set it. E.g. two threads, one counts mouseclicks and the other counts keypresses and both add their numbers to the same counter. You could add one counter for each thread but now say we have threads that are spawned at a specific interrupt, calculate for a while and then write their result somewhere. How do you fit that into a formula?
Justice is the sheep getting arrested while an impartial judge declares the vote void.
// processing - Where is the Cluster Joke????
I think this problem is a good example of why companies should take a Computer Science degree seriously instead of assuming a three-hour programming class will make an employee capable of doing what's best for the company.
As many comments suggest, there are those for whom parallel programming isn't hard, but that is a small minority of the programmers in the world. Most of the code is use today was not written by those elite few. Unless we want to limit the continuation of Moore's Law to only applications written by those few programmers we need a better way. RapidMind has created a development platform that simplifies development of parallel applications for vector and multicore processors. Using RapidMind, the programmer masses can develop parallel applications using their existing skills, compilers and IDE's. The RapidMind runtime platform takes care of the parallelization and load balancing across any number of available cores. In addition, code written using the RapidMind platform will run unchanged on NVIDIA and ATI GPGPUs, IBM Cell BE and Multicore Intel and AMD processors. Even the elite few that can parallel program would have to port and refactor to support multiple or future processor architectures. Just as higher level programming languages like Visual Basic and now Ruby have enabled laymen to be productive programmers, a similar level of abstraction will be needed in the parallel world just to keep existing programmers productive. Check it out, it is free to download and try @ www.rapidmind.net
To really provide parallel programming support, you want to have more asynchronous services. Unfortunately, none of the operating systems provide a very good interface for asynchronous services, they're incomplete and littered with limitations.
Some of these problems can be mitigated by libraries but you're still stuck much of the time and there are very few cross platform asynchronous libraries.
There are some very simple multi threaded programming models that I have used. The issue is that there is an inevitable learning curve. Most developers have no idea what a state machine is. Inevitably, you end up making state machines to handle much of the complexity of things happening in different order.
Then, what for ? Most of the written can't benefit (even if they tried) from parallelization. High performance code, like servers, graphics intensive applications, video editing applications to name a few, might have some benefits, however, that's a small chunk of the code written today.
I took a parallel programming class recently. We took an algorithm and made three performance improvements to it. Two involved parallelization (one on a shared-memory machine (OpenMP), the other on a cluster (MPI)), and the third involved reordering some stuff to take advantage of the CPU's L2 cache. The non-parallel solution provided by far the best performance improvement, even though our parallel machine had 32 CPUs. This is, of course, very specific to the algorithm, but the point is that there are often better AND less error-prone ways of improving performance than parallelization. These techniques are also often viable even for algorithms that aren't parallelizable.
It seems that someone needs to do some sort of massive code review study to figure out what our computers do most and what types of performance improvement techniques would probably be most beneficial before making any broad statements on the subject. That said it seems that increasing the number of cores in a system, especially a desktop system, is limited by cost, technology, and physical space on the chip. Even if we had 32-core desktops, that's still only a 32 fold improvement. Since we have to pay to retrain programmers either way, that cost cancels out. Doesn't it seem like training in other algorithmic approaches may have more potential for quicker improvements than buying our way out of the performance problem with...
Wait a minute, what performance problem? My desktop does all kinds of fancy things I don't need it to do and its single-core CPU is still usually idle. And as for server apps, isn't your database, web server, etc. already multi-threaded? What, exactly, is all this CPU-bound code that's supposedly running out there? I would have though that network and disk speed is by far a greater bottleneck to the majority of things computers do these days. Why not just scrap the CPU crap, write sloppy code, and retrain a bunch of programmers as EE's to do I/O research.
Tom Leonard, a programmer from Valve, gave a fascinating talk about this
Game developers are getting excited about technology like Erlang. A link to Extreme Concurrency and Erlang, an entertaining presentation by André Pang, was posted to the erlang-questions list by Miguel Rodríguez Rubinos on 12 April.
you had me at #!
Nonsense. Verilog is a very popular programming language for small projects, and within a Verilog program, EVERYTHING happens all in parallel unless you specifically structure your Verilog program to execute groups of instructions serially with a state machine of some kind (and doing this explicitly with a CASE statement or other construct is usually necessary, except for purely combinatorial functions). Verilog programming is on par with assembly language as far as difficulty of learning the language, and many of the operators, etc., use C language syntax. As FPGA's become larger, more popular, and easily reconfigurable "on-the-fly", expect more employers to start looking for people who know a Hardware Description Language (HDL) like Verilog, VHDL, etc. Verilog can also be used to design VLSI ASIC's.
9/11 Eyewitnesses to Explosive WTC Demolition 1 of 2
Most "Programmers" don't really understand their system. The typical level of understanding goes from memorizing single lines of code that fit in certain places and cutting/pasting groups of lines to a more advanced ability to understand the piece you are working on, refactor, significant design of clearly defined parts of a system and some level of planning and debugging skills.
...) in your head. You need complete understanding of the entire model, then you need to twirl it around and reshape it, add to it and refactor it and you need to notice voids and inconsistencies while doing so.
nearly all programmers fall into that range somewhere, and most teams consist exclusively of these programmers.
The things that are outside that range are accurate up-front design, accurate up-front planning, good fully-factored object-model design and comprehension across a large system and more advanced topics like threading. It's my opinion that you are no more likely to hit this level simply because you program than you are to become a rockstar simply because you have a garage band. In fact, I'm fairly sure it's an innate talent that you can't improve much. If I set out to learn to paint and become an artist, I can't assume that I can challenge the great artists simply because I want it and I study a lot.
For difficult programming tasks you absolutely need the ability to manipulate the entire job (Object model, gui,
In 20 years I've met very few people with this ability--one or two, and a quite a few who more are pretty close.
Yeah, threading is hard, so is writing a #1 hit. Hard stuff is the most rewarding and fun. Who wants to play the Aramada Room at the Holiday Inn for the rest of their lives IF they have the ability to do more?
The really funny part I've noticed is that until you have passed a certain stage as a programmer you have minimal ability to understand the levels above you--so nearly all programmers think they are at or near the top. As they learn more they start to recognize the levels they have passed through, but rarely recognize those they have not seen in action.
Try this: http://www.cs.wlu.edu/~levy/software/pecon/ I wrote it and have run it on Linux and Mac OS X. It's a free, bare-bones alternative to something that Mathworks sells for a lot more money: http://www.mathworks.com/products/distriben/ Of course, if you want *real* high-performance, you won't run Matlab, but lots of people have tons of Matlab code and many idle processors, like the original poster.
Pardon me if this point has already been made.
What's the most important thing in high-performance computing nowadays; is it super-optimized low-level parallel programming or is it just plain parallel programming? I'm talking about any high-level framework which allows for easy multi-threading in applications which need it. A framework that will not only heighten developer productivity, but also utilize the additional cores in a computer system with a minimal effort. Wouldn't that be better than not doing it at all? The C-omega language http://research.microsoft.com/Comega/ by Microsoft Research Labs does exactly this.
I have to agree, though, that the problem here is us - the programmers. The implementation of the language can only as smart as it is, and potential side-effects in terms of data dependencies etc. will definately occour. I'm not aware of any languages which implement a degree of Data Flow Analysis http://en.wikipedia.org/wiki/Data_flow_analysis, which allows the compiler to alert of such "stupidities" propagated by us--or perhaps even optimize them for the better.
As several of you pointed out, matrix calculations can easily be distributed onto n number of cores, computers, or whatnot without any problems. I can't mention any real-life application where fast matrix calculations could save lives, but hey..
I'd say the problem is that premature optimization leads people to overly complex, and therefore hard, solutions. If you're willing to live with a pure message passing system (i.e. something equivalent to Unix processes, not threads) then most of the problems associated with parallel programming a vastsly diminished (spaghetti coding may still lead to deadlocks and livelocks; but a little disciple will avoid that, most of the time -- it's not "hard").
If this pure message passing system can take advantage of an extra core or two, then maybe you can live with its inefficiency. To assume otherwise is a premature optimization. Once you identify that optimization is actually necessary then you have two options: optimize it into a single threaded program (yes, sequentialization is an optimization) or else start sharing memory and adding locks, etc.
Most problems are moderately parallelization; so you should train yourself to code them in that way. Then you don't need to rearchitect to take advantage of the extra hardware resources.
Opinions my own, statements of fact may contain errors
"Vista, he said, is designed to handle multiple threads, but not the 16 or more that chips will soon be able to handle. And the applications world is even further behind."
I thought this quote from the article was rather funny. Vista is so advanced it can almost handle 16 threads!
I think parallel programming as it pertains to multicore CPU architectures and multi-CPU architectures will always remain roughly similar to what we see today. Some problems can be highly parallelized and others will be harder. It will remain up to the brilliance of the engineers/scientists to do this work. There's no great language or tool breakthrough that will save us the work of understanding each case-by-case problem of how to make our programs optimal on parallel architectures. The future of parallelism isn't about how we can make what we currently do faster. It's about what new things become possible.
Send me email to an address at the domain above. It should get to me. The email address on the thesis has died and gone to heaven, unfortunately.
the simple answer is: YeNso
"I think this line is mostly filler"
LabVIEW as a programming "language" is certainly interesting, but I find it oversold -- too poorly implemented to actually be useful.
Numerous fatal bugs (that only seem to show up in complex projects involving >100 VI's) have existed since version 8.0 onwards. These bugs can cause the loss/corruption of entire VI files and result in LabVIEW crashing.
I also find its run-time performance leaves much to be desired. Try creating a simple "while" loop that increments a 32-bit integer until it reaches zero (due to overflow, resulting in ~4 billion iterations). In a C, this would be a simple 2-3 line program that would take 1-2 seconds to execute (depending on processor). Try the same thing in LabVIEW. It will take significantly longer (~10X). It doesn't matter whether or not you make this VI reentrant, execute as a "subroutine", or do any other LabVIEW-specific trick. It just runs slowly.
Now this toy example involves no arrays, no memory allocation, etc etc..
My theory right now is that *EVERY* single loop iteration and *EVERY* single damn VI is dynamically scheduled -- even if it is just a simple add in a single VI program that does nothing else. In a complex application, I have seen this result in a C implementation of a simple routine to be 10-20X faster than my best LabVIEW implementation (and yes, I tried many ways).
Based upon what I've seen in NI's Developer Zone, I can only concluded that the Test & Measurement (T&M)market doesn't care about most of the functionality that NI offers. It would seem that the T&M market doesn't care about generating dynamic (non-static) analog waveforms or real-time processing. It would seem the only goal is to transmit and receive a short analog waveform (few seconds at most) and then perhaps spend ~1 minute doing *offline* processing. In practice, (despite NI's advertised capabilities) doing anything else is almost infeasible in LabVIEW.
The parent makes many execellent points.
At my university, I attended once of these rare "distinguished" lectures that usually involve really famous people. Some bigshot from Intel was describing how in the future everyone would have a 1 TeraFLOP computer in their homes... It would use advanced modeling/prediction algorithms to help the user. (Something similar to weather prediction, but for other tasks specific to the individual such as buying groceries, etc). Well, the lecture seemed like crap. (Doesn't he realize that most of the users are still struggling with basic day-to-day tasks and aren't going to spend the time or effort required to *configure* such a complex program, much less actually use it?)
I believe that once the majority of the public gets used to having 2 or 4 cores in their desktop/laptop, Intel will be screwed. At that point, once core can handle the user's tasks and the other core can handle anti-virus and other background programs. The second nail in the coffin will probably be FLASH drives replacing hard-drives. I find that most of the perceived sluggishness of a computer is due to the hard-drive and not due to the processing speed. (e.g. try network-booting a Linux workstation and see what makes it appear slow). I've tried using laptop hard drives in desktops for a while now (to have a quieter computer). I'm starting to wonder if any "new" computer only seems fast because its harddrive is faster/performs better than the one in the "old" computer.
As it is, I'm typing this on my 1-Ghz Pentium III laptop. It's not the fastest around, but it does what I need (web, email, writing papers, software development, occasional games) very well. Several years ago, I bought it used for $400 and can't *justify* to myself the need for a faster computer right now.
Terracotta (an open-source solution to clustering at the JVM level) approaches this problem by recognizing that multi-threaded programs can become multi-JVM programs. Here's a great article just posted on Dr. Dobbs -> http://www.ddj.com/dept/java/199703478 with more details and links about Terracotta.
Disclaimer: I work for Terracotta.
They had a great programming model, really nice scalability (the notion of each core being both a computation center for its own stuff plus a "pass it along" communications center for jobs/results that weren't for that particular core was not only elegant, but allowed additional cores to be added as needed.
I was sad to see it go...
Add LabView to that list (although it's probably a bit more application specific).
Any two chunks of code which do not depend on each other in some way can, and most likely will, execute in parallel. Most non-trivial programs I've seen tend to be designed so that multiple loops, each handling different aspects of the program, operate in parallel. For example, a program controlling a durability test stand might have a motion control loop which is designed to execute as quickly as possible, a data acquisition loop which reads data from a buffer every 100ms, and a user interface loop which is waiting for the user to hit a button, select a menu, etc. all running in parallel.
Come test your mettle in the world of Alter Aeon!
Amdahl's law still applies - look at Gustafson's graphs. The assumption is that the serial part stays constant while the parallel part grows with the number of processors (because you are handling a larger number of problems). Therefore the serial fraction shrinks, and the speedup available is larger. If you tried to run the large problem on a single CPU it really would take that much longer. But that's only in this sort of case where the serial fraction is independent of problem size. Seems like a pretty special case to me, but in any case, Amdahl's law is as valid as ever here.
Energy: time to change the picture.
Don't think threads. The thread model is inherently non-deterministic, and when you program with them you spend most of your time fighting to make the result deterministic.
There are other concurrency models without this problem. For a nice summary of some alternatives, see the "Alternatives to Threads" section of the article "The Problem with Threads" in the May 2006 issue of Computer.