Is Parallel Programming Just Too Hard?
pcause writes "There has been a lot of talk recently about the need for programmers to shift paradigms and begin building more parallel applications and systems. The need to do this and the hardware and systems to support it have been around for a while, but we haven't seen a lot of progress. The article says that gaming systems have made progress, but MMOGs are typically years late and I'll bet part of the problem is trying to be more parallel/distributed. Since this discussion has been going on for over three decades with little progress in terms of widespread change, one has to ask: is parallel programming just too difficult for most programmers? Are the tools inadequate or perhaps is it that it is very difficult to think about parallel systems? Maybe it is a fundamental human limit. Will we really see progress in the next 10 years that matches the progress of the silicon?"
Implement it, add CPUs, earn billion$. Just Google it.
Aside from my usual lament that people already call themselves programmers when they can fire up Visual Studio, parallelizing your tasks opens quite a few cans of worms. Many things can't be done simultanously, many side effects can occur if you don't take care and generally, programmers don't really enjoy multithreaded applications, for exactly those reasons.
And often enough, it's far from necessary. Unless you're actually dealing with an application that does a lot of "work", calculate or display, preferable simultanously (games would be one of the few applications that come to my mind), most of the time, your application is waiting. Either for input from the user or for data from a slow source, like a network or even the internet. The average text processor or database client is usually not in the situation that it needs more than the processing power of one core. Modern machines are by magnitudes faster than anything you usually need.
Generally, we'll have to deal with this issue sooner or later, especially if our systems become more and more overburdened with "features" while the advance of processing speed will not keep up with it. I don't see the overwhelming need for parallel processing within a single application for most programs, though.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
I can see this going down in cubicles all through the gaming industry. The game is mostly coming together, the models have been tuned, textures drawn, code is coming together, and the coder goes to the pointy haired boss.
Coder: We need more time to make this game multithreaded!
PHB: Why? Can it run on one core of a X?
Coder: Well I suppose it can but...
PHB: Shove it out the door then.
If flight simulator X is any indication (a game that should have been easy to parallize) this conversation happens all the time and games are launched taking advantage of only one core.
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
For this generation of "average" programmers, yes its too hard. Its the programming language, stupid. The average programming language has come a remarkably short distance in the last 30 years. Java and Fortran really aren't very different, and neither is well suited to paralellizing programs.
Why isn't there a mass stampede to Erlang or Haskell, languages that address this problem in a serious way? My conclusion is that most programmers are just too dumb to do major mind-bending once they've burned their first couple languages into their ROMs.
Wait for the next generation, or make yourself above average.
The problem with parallel programming is we don't have the right set of primitives. Right now the primitives are threads, mutexes, semaphores, shared memory and queues. This is the machine language of concurrency - it's too primitive to effective write lots of code by anyone who isn't a genius.
What we need is more advanced primitives. Here are my 2 or 3 top likely suspects:
- Concurrent Sequential Programs - CSP. This is the programming model behind Erlang - one of the most successful concurrent programming languages available. Writing large, concurrent, robust apps is as simple as 'hello world' in Erlang. There is a whole new way of thinking that is pretty much mind bending. However, it is that new methodology that is key to the concurrency and robustness of the end applications. Be warned, it's functional!
- Highly optimizing functional languages (HOFL) - These are in the proto-phase, and there isn't much available, but I think this will be the key to extremely high performance parallel apps. Erlang is nice, but not high performance computing, but HOFLs won't be as safe as Erlang. You get one or the other. The basic concept is most computation in high performance systems is bound up in various loops. A loop is a 'noop' from a semantic point of view. To get efficient highly parallel systems Cray uses loop annotations and special compilers to get more information about loops. In a functional language (such as Haskel) you would use map/fold functions or list comprehensions. Both of which convey more semantic meaning to the compiler. The compiler can auto-parallelize a functional-map where each individual map-computation is not dependent on any other.
- Map-reduce - the paper is elegant and really cool. It seems like this is a half way model between C++ and HOFLs that might tide people over.
In the end, the problem is the abstractions. People will consider threads and mutexes as dangerous and unnecessary as we consider manual memory allocation today.
Parallel programming and construction share one crucial fundamental requirement: Proper communication. But building a house is much easier to grasp. Programming is abstract.
I think part of the problem is, that many programmers tend to be lone wolves, and having to take other people (and their code, processes, and threads) into consideration is a huge psychological hurdle.
Just think about traffic: If everyone were to cooperate and people wouldn't cut lanes and fuck around in general we'd all be better off. But traffic laws are still needed.
I figure what we really need is to develop proper guidelines and "laws" for increased parallelism.
Disclaimer: This is all totally unscientific coming from the top of my head...
.: Max Romantschuk
It is not difficult to justify parallel programming. Ten years ago, it was difficult to justify because most computers had a single processor. Today, dual-core systems are increasingly common, and 8-core PC's are not unheard of. And software developers are already complaining because it's "too hard" to write parallel programs.
Since Intel is already developing processors with around 80 cores, I think that multi-core (i.e. multi-processor) processors are only going to become more common. If software developers intend to write software that can take advantage of current and future processors, they're going to have to deal with parallel programming.
I think that what's most likely to happen is we'll see the emergence of a new programming model, which allows us to specify an algorithm in a form resembling a Hasse diagram, where each point represent a step and each edge represents a dependency, so that a compiler can recognize what can and cannot be done in parallel and set up multiple threads of execution (or some similar construct) according to that.
I'd like to think compilers should be smart enough to assess data dependencies and space those instructions out- we've always had to do this anyway, at least since pipelined processors hit the market, but loops still aren't cascaded properly. An example is a loop that calculates a sum of products- the add instruction must wait for the multiplication instruction to finish, when in fact the processor could be doing a heap of multiplications, and using associativity to cut down dataflow problems in the add stage. Spreading the dataflow graph out as much as possible at compile time also helps with cache coherency between many processors.
I think a program compiled that way would need hardware that will understand that the data dependencies are spread out so that it can distribute instructions among the processors, although the distribution could be very simple if the dependencies could be spread out significantly- instructions could almost be distributed like dealing cards. It's a much finer granularity than threading, but I think more applications suit this sort of parallelism.
Another barrier to parallel programming is accessing [for read] data that should be global to all threads. You can do it by passing pointers to state [messy], using globals [dangerous, obese] or by copying all the data onto the stack of the thread [slow]. Threads need to really share address space- use the same stack and everything, IMHO.
We at slashdot are scientists, specialists and kernel hackers. Your FUD will be found out.
People may or may not process information in a linear fashion, but human brains are, apparently, massively parallel computational devices.
Addressing architecture for Brain-like Massively Parallel Computers
or from a brain-science perspective
Natural and Artificial Parallel Computation
The common tools (Java, C#, C++, Visual Basic) are still primitive for parallel programming. Not much more than semaphores and some basic multi-threading code (start/stop/pause/communicate from one thread to another via common variables). I've made programs, specifically, spiders, that can run 200-1,000 simultaneous threads usefully on a PC. They work ok, as long as the inter-thread coupling is minimized. Until we get enough exposure to parallel systems that we develop new languages to express the solutions, parallel programming will remain accessible to the few, the proud, the geeks. But I don't think it's because of our brains architecture.
In my exp it's not really that hard and it really doesn't come down to thinking in parallel except in cases where you are going for really fine-grained parallelism. Coarse grained parallelism is really easy and can give HUGE benefits to users... a simple example of this is creating a background thread to handle incremental saves... suddenly the app stays responsive as it's GUI thread is no longer being stalled by factors beyond it's control (ie: network load).
Under C++ I use the boost threading libraries and they are excellent, allowing me to write once and run on all required platforms and when my Java hat is on it's a snap too because of the superb libraries available.... moral, it is hard if you don't use the right tools or try to split the task along poorly chosen boundaries.
Granted that tasks like large matrix solves and the like are a brute to multi-thread... but fortunately people that are way smarter than me have already done it and I can handle plugging in a library.
Back when i was in graduate school we used to joke .. in the future everything will be monte carlo :)
.. there is more that can be done w/ 'variations of the theme' than is perhaps obvious .. (or perhaps you can rephrase the problem and ask the same question a different way) .. perhaps if you are dreaming like .. what happens if i have a 100,000 processors at my disposal etc
While everything perhaps can't be solved using monte carlo type integration tricks
I've been doing true parallel programming for the better part of 20 years now. I started off writing kernel code on multi-processors and have moved on to writing distributed systems.
Multi-threaded code is hard. Keeping track of locks, race conditions and possible deadlocks is a bitch. Working on projects with multiple programmers passing data across threads is hard (I remember one problem that took days to track down where a programmer passed a pointer to something on his stack across threads. Every now and then by the time the other thread went to read the data it was not what was expected. But most of the time it worked).
At the same time we are passing comments back and forth here on Slashdot between thousands of different processors using a system written in Perl. Why does this work when parallel programming is so hard?
Traditional multi-threaded code places way too much emphasis on synchronization of INSTRUCTION streams, rather than synchronization of data flow. It's like having a bunch of blind cooks in a kitchen and trying to work it so that you can give them instructions so that if they follow the instructions each cook will be in exactly the right place at the right time. They're passing knives and pots of boiling hot soup between them. One misstep and, ouch, that was a carving knife in the ribs.
In contrast, distributed programming typically puts each blind cook in his own area with well defined spots to use his knives that no one else enters and well defined places to put that pot of boiling soup. Often there are queues between cooks so that one cook can work a little faster for a while without messing everything up.
As we move into this era of cheap, ubiquitous parallel chips we're going to have to give up synchronizing instruction streams and start moving to programming models based on data flow. It may be a bit less efficient but it's much easier to code for and much more forgiving of errors.
I think that what's most likely to happen is we'll see the emergence of a new programming model, which allows us to specify an algorithm in a form resembling a Hasse diagram, where each point represent a step and each edge represents a dependency, so that a compiler can recognize what can and cannot be done in parallel and set up multiple threads of execution (or some similar construct) according to that.
Z -H-10.html#%25_sec_1.1.5. More parallelism can be drawn out if the interpreter "compiles" as yet unused functions while evaluating others. See the following section.
This is more-or-less how functional programming works. You write your program using an XML-like tree syntax. The compiler utilizes the tree to figure out dependencies. See http://mitpress.mit.edu/sicp/full-text/book/book-
After all, I am strangely colored.
It is still difficult to justify if you can more easily write more efficient single-threaded apps. What consumer-level apps out there really need more processing power than a single core of a modern CPU can provide? I already understand the enterprise need. In fact, multi-threaded solutions for enterprise and scientific apps are already prevalent, that market having had SMP for a long time.
There are three basic "problems" and to some extent I think they are unsolvable.
First is debugging tools. People are used to having a nice neat step through debugger - press some key to execute the next statement. Because a large part of parallel processing does not conform to that process you have to add in your own debug code (even if using one of the parallel debuggers out there). I see no way around this one - you can not have a deterministic debugger on a non-deterministic system and I see no way for parallel systems to generally be deterministic in the sense of "press F8 to execute next line". This increases cost as you can not simply hire a cheap "code monkey" to do the job.
Second is just general complexity, things like file locks, critical sections, race conditions, etc. Even if you deal with them every single day and have been doing it for the last 20 years these are still complex issue when something fails. I also see no way around this one, though there have been some really nice advances in recent times and there is still a long way to go before I think we have mostly maxed out the tools. Somebody has to deal with this and a good deal will eventually be shoved off to the tools, but when that happens figuring out where the error is will be a real bitch (ask anyone who eventually found an esoteric bug in a standard library how much work *that* took). While quite solvable, this also increases cost.
Lastly you have Amdahl's law - basically you can not parallelize everything and performance degrades quickly the more serialization you have. I do not think that the real performance of SMP's or multi-core systems for the average user is going to come in single application performance - you just can not parallelize the typical applications that much. Where the performance comes in is allowing several things to go on at a time, basically that really large system tray many people run will be going across multiple processors or cores and not degrade performance that much. Though this really helps - I've ran two processor systems on my home systems for years and it is interesting how little processor power I need for games compared to the minimum specs even when I do not bother with paying attention to what is currently running.
So, in short, not really too hard, it's a level of complexity that many have done well with in the past and will continue to do so, but that the problem space doesn't map well to parallelization. There is some level of "too hard" when you are asking if VB only people who think that HTML should be listed as a programming language and can not survive without Microsoft's wizard (note that using VB isn't the bad part - lots and lots of really good programmers use it at work because that is what will sale), but competent programmers/software engineers are quite capable of using it. I do not think that the benefits within most single applications are worth the cost, though as an overall system it will help quite a bit.
------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
Most programmers have difficulty thinking about recursive processes as well, but there are still some who don't and we still have use for them. I should say "us", as I make many other programmers batty by using recursion frequently. Programmers tell me all the time that they find recursion difficult - difficult to write, difficult to trace, difficult to understand, difficult to debug. Conversely, I find it easier - all I have to do is reduce the problem to its simplest form and determine the end case, and a tiny snip of code will do where a huge mess of iterative code would otherwise have been required. So, I don't understand why anyone would want to write iterative code when recursion can solve the problem.
I suspect that parallel programming may be similar - some programmers will "get it", others won't. Those who "get it" will find it fun and easy and be unable to understand why everyone else finds it hard.
Also, most developement tools were created with a single processor system in mind: IDEs for parallel programming are a new-ish concept and there are few. As more are developed we'll learn about how the computer can best help the programmer to create code for a parallel system, and the whole process can become more efficient. Or maybe automated entirely; at least in some cases, if the code can be effectively profiled the computer may be able to determine how to parallelize it and the programmer may not have to worry about it. So, I think it's premature to argue about whether parallel programming is hard or not - it's different, but until we have taken the time to further develop the relevant tools, we won't know if it's really hard or not.
And of course, for a lot of tasks it simply won't *matter* - anything with a live user sitting there, for example, only has to be fast enough that the person perceives it as being instantaneous. Any faster than that is essentially useless. So, for anything that has states requiring user input, there is a "fast enough" beyond which we need not bother optimizing unless we're just trying to speed up the system as a whole, and that sort of optimization is usually done at the compiler level. It is only for software requiring unusually large amounts of computation or for systems which have been abstracted to the point of being massively inefficient beneath the surface that the fastest possible computing speed is really required, and those are the sorts of systems to which specialist programmers could be applied.
Parallel programming is NOT too hard. Yes its harder than a single-threaded approach sometimes, but in my experience usually the problem maps more naturally into a multi-threaded paradigm.
The real probelm is this: I've seen time and again that the real problem is that most companies do not require, recognise, or reward high technical skill, experience and ability instead they favour minimal cost and speed of product delivery times over quality.
Also it seems most Human Resources staff/agents don't have the necessary skills to actually identify skilled Software Developers compared to useless Software Developers that have a few matching buzzwords on their resume, because they themselves don't understand enough to ask the right questions so resort to resume-keyword matching.
The consequence is that the whole notion of Software Development being a skilled profession is being undermined and devalued. This is allowing a vast amount of people to be employed as Software Developers that don't have the natural ability and/or proper training to do the job.
To those people, parallel programming IS hard. To anyone with some natural ability, the proper understanding of the issues (you get from say a BS Computer Science degree) and a naturally rigorus approach, no it really isn't.
Tom Leonard, a programmer from Valve, gave a fascinating talk about this at GDC this year, about retrofitting multicore support into Half-Life 2 (specifically, into the Source Engine, which powers Half-Life 2). Not surprisingly, this talk was named "Dragged Kicking and Screaming" ...
There was a lot of really good wisdom in there, whether you are writing a game or something else that needs to get every possible performance boost.
I'm sure they probably drew from 20+ years worth of whitepapers (and some newer ones about "lock-free" mutexes, see chapter 1.1 of "Game Programming Gems 6"), but what I walked away from the talk with was the question: "why the hell didn't _i_ think of that?"
There were several techniques they used that, once you built a framework to support it, made parallelizing tasks dirt simple. A lot of it involves putting specific jobs onto queues and letting worker threads pick them up when they are idle, and being able to assign specific jobs to specific cores to protect your investment in CPU cache.
Most of the rest of the work is building things that don't need a result immediately, and trying to build things that can be processed without having to compete for various pieces of state...sometimes easier said than done, sure. But after hearing his talk, I was of the opinion that while parallelism is always more complex than single-threaded code, doing this well is something most developers aren't even _thinking_ about yet. In most cases, we're not even at the point where we can talk about _languages_ and _tools_, since we aren't even using the ones we have well.
--ryan.
Don't say, "don't quote me," because if no one quotes you, you probably haven't said a thing worth saying.
We've had a steady stream of multi-core guests here at Stanford lecturing on the horror and panic in the industry about multi-core, and how the world is ending. I've seen Intel guys practically shit themselves on stage, they are so terrified we can't find them a new killer multi-core app in time. That's BS. OK so it's a hour lecture, maybe two if you're not that fast. Parallel/SMP is not that hard, you just have to be careful and follow one rule, and it's not even a hard rule. There are even software tools to check you followed the rule!
But that's not the problem...
The problem is, a multi-year old desktop PC is still doing IM, email, web surfing, Excel, facebook/myspace, and even video playing fast enough a new one won't "feel" any faster once you load up all the software, not one bit. For everyone but the hardcore momma's basement dwelling gamers, the PC on your home/work desk is already fast enough. All the killer apps are now considered low-CPU, bandwidth is the problem.
Now sure, I use 8-core systems at the lab, and sit on top of the 250k-node Folding@home so it sounds like I've lost my mind, but you know what, us super/distributed computing science geeks are still 0% of the computer market if you round. (and we'll always need every last TFLOP of computation on earth, and still need more)
That's it. Simple. Sweet. And a REAL crisis - but only to the bottom line. The market has nowhere to go but lower wattage lower cost systems which means lower profits. Ouch.
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
could be to add a especifically parallel iterator keyword to programming languages, ie:
for-every a in list; do something(a); done;
The compiler then assumes that something(a) can be safely executed in parallel for every element in the list.Is not rocket science, it could lead to parallel spagheti, but is a straightforward method to promote parallel programming.
What's in a sig?
I can eat, talk and think at the same time, all are pretty conscious actions
..." while this other guy was watching the numbers pass in front of his eyes.
True, but can you talk (perhaps reciting something from memory) at the same time you are listening to something? Even if it's not a volume issue; you're wearing headphones say.
Feynman has a chapter in "What do you care what other people think" where he talks about some informal experimentation he did where he tried to figure out what he could do at the same time as accurately timing out a minute. Essentially, the same time as counting. He found that (1) he could be very consistent about timing out a given time, and that (2) he could do most things while counting. But what he couldn't do is talk. On discussion with other people in his dorm/frat/house/whatever, there was another person who could talk, but couldn't read while timing things out. Turns out that the reason it differed was because they counted differently; Feynman was hearing "one, two, three,
Activities are localized in the brain; it seems that these areas are largely independent, but try two tasks that use the same area and you're SOL.
How would a string of Chinese characters like beads on a string be any better than a string of alphanumeric characters like beads on a string? Many (most?) Chinese words are written using multiple characters, so a language that had a special use for lone characters would end up looking a lot like Fortran 77 anyway. (What does the FSSCRD function do?)
I think you're reasoning from some notion about the way "Oriental" people think differently from "Western" people. I tend to doubt that idea based on the kinds of people who have enthusiastically pushed it: originally imperialist racists of various groups, each bent on proving the superiority of their group, and more recently western PC racists who compulsively idealize everything non-Western. Despite the taint of racism, the idea may have some basis in fact as well -- there is ongoing research that occasionally manages to produce some evidence for it -- but the sad fact is that we have only been able to create programming languages that express a very tiny subset of the way "Western" people supposedly think anyway. The problem is not a lack of nonlinear, context-sensitive ways of thinking; the problem is that before we can use a given way of thinking to communicate with a computer, we must essentially enable the computer to think the same way. If you buy into the western PC version of the dualism, "Oriental = nonlinear, inclusive, sensitive, flexible, context-sensitive; Western = linear, exclusive, autistic, rigid, blinkered," then digital computers are quintessentially Western beings that cannot be made to appreciate Eastern ways of thinking, at least not without a few more decades of AI research and performance improvements.
The Chinese government might very well be hard at work creating a quintessentially Chinese programming language, but it's a bad idea to pin your hopes on political science. It tends to suck. On top of that, many excellent programming languages have been doomed by much smaller barriers to entry than learning an entirely new system of writing. On top of that, your 2D array of characters is doomed by the multitudes of multicharacter words in Chinese. To add yet more on top of that, another poster just pointed out that the idea that it expresses has already been expressed in other languages using ASCII.
I wouldn't have bothered piling on you like this if your post didn't strike me as racist. The commonly accepted story about the differences between Eastern and Western ways of thinking is propagated by uninformed repitition. Chinese, Americans, left-wingers, right-wingers, everybody has learned to love it and interpret it to flatter their side, so they all repeat it in unison. It pollutes the discourse. Wouldn't it be nice if everyone who didn't have firsthand experience just shut the hell up? Then we might hear something different from the standard story that gets passed around like a centuries-old fruitcake. Or we might hear the same thing, but then at least it would mean something.
Don't forget that the makers of applications like that will probably see a multi-core version as something they can sell at a premium.
So they first sell versions for a single CPU, then when the market for multi-core versions is sufficiently large they start selling new versions and upgrades.
It is good to keep the clients waiting for some time, because they will feel that their special needs have been catered for. When they would put out two different boxes at a time almost nobody is interested yet, the negative feelings around "why does the multi-core version cost more" will hit them too much.
Perhaps having a Chinese character represent a simple block of pre-compiled code that does one simple thing. Then the characters could be placed in two-dimensional order to form parallel threads. This would require a completely different approach to compiler development. But that would be OK because compilers are stuck in the 1970s anyway.
I see neither a connection between the choice of character set and parallel programming, nor anything in your post that is beyond trivial to do in ASCII:
- For vertical side-by-side alignment, check your editor prefs in your IDE). If it is important that the source code itself be laid out horizontally like drapes, two dimensional languages in ASCII have already been tried and you can look into what happened with those.
- Typically many of these threads are running the same code, so a separate column for each thread is of dubious value.
- Even single threaded code forks around way too much to be visually representable by a column of glyphs.
- Why not just use English words? They exist in comparable numbers to Chinese characters.
- Existing compilers already convert methods into compressed representations of code that are basically equivalent to anything you're suggesting.
- What's this about the 1970s? Compilers have a lot of optimizations now including the ability to recognize code written by dumb or lazy coders. If you're complaining about language design that's not the fault of the compiler which is just doing its job for a given language.
Just getting away from the idea of having code based on a very limited set of alphanumeric characters strung together like beads on a string might help unlock a whole new era of innovative approaches to parallel program development strategies.
It reminds me of the time I was on here arguing with a guy who said files would be easier to compress if they were converted to Chinese characters first, because then the file would be shorter. Exploding the set of characters, and having individual characters represent chunks of pre-compiled code as you're suggesting, would basically be like having thousands of reserved words and global functions. It sounds like hell the more I think about it. The actual character encoding system with left-right top-bottom semantics doesn't really get in the way of anything at a fundamental level.
It does go without saying that Chinese programmers would have an incredible advantage in any new system of programming that is based on Chinese characters.
Aha- here we see the motive behind the evil subversive plan for global compilation dominance...
It may be possible that this subject is already a project under development in China.
Maybe someone is working on such a thing- who knows? Why even bother? There are more English speakers in China than in the U.S. And you don't have to know much of a natural language to just code in a programming language that draws on it for its symbols. It's not like asking for a taxi or interviewing for a job in that natural language. Ruby was invented in Japan and is ASCII based, and the Chinese have an easy time picking it up because they're already used to ASCII languages and there are no Hiragana characters in Ruby itself AFAIK, not even in variable names.
Sure. Short-term we could learn to do a lot of simple tasks better in parallell. Drawing the circle and square at the same time is hard, but it gets a lot easier even with just a few hours of practice.
Longer term, we'd *evolve* better handling of parallellism if it gave us significant survival-benefits (well, really reproductive-benefits, but you get the idea)
When we *do* manage many things at a time it is mostly by practicing them to the point where as much as possible about them become automatic, "muscle memory" (which isn't really, but atleast it's subconscious)
A trained driver can;
- Change gears (manual)
- Operate blinkers
- Turn wheel
- Adjust speed (gas or brake)
- Check mirrors
- Judge intentions, speed, curve of other trafficants
- Observe yield-signs
More or less all simultaneously. But that is only possible because so much of it is automatic.The newbie-driver thinks "I'm in second, third is *there*, revs are high, better shift, clutch in, gas out, shift, touch of gas, declutch", he may even be able to do it smoothly without the car jumping after a few tries. But he *won't* be able to have his full attention elsewhere while doing it.
The trained driver thinks at most "shift", oftentimes not even that. He is rarely consciously even aware of what gear he is running, the entire sequence of steps required to smoothly shift has been internalised as a single action, and even the invoking of that actions is on semi-automatic. (I realize many americans drive an automatic, I'm assuming a driver used to manual here)
What I'm saying is that, drawing a (rough!) circle is very easy for an adult. Drawing a rough rectangle is very easy too. So you'd expect to be able to do both *without* having to train it. Certainly any one of them taxes much less than half your mental capacity. Only that ain't so.
If you continue to depend on C and C++ for doing this kind of work, you can continue to expect these kinds of results.
Ada95 is a -much better- choice for this kind of programming, with a rich set of concurrency primitives (for both synchronization (rendezvous) and avoidance (protected objects), integrated into both a strong typing system and an object-oriented approach. And there's a good compliler in the GNU compiler family.
But programming language research seems to have been abandoned over the last 20 years, as if C++ and Java represented the final solution.
C/C++ and even Java thread models are truly appalling, and they were known to be bug-ridden back in the 90s when they were being standardized. I remember during a POSIX meeting, the people working on the Ada binding detected a race condition anomoly in the C/Pthreads specification. Ted Baker sketched the problem out in terms of Ada tasks and we went back, looked at the C & text, and verified Ted's observation. We called in some representatives from the C/Pthreads group, and it took them about 30 minutes just to work out a 'language'/notation to reason about their own standard. After that 30 minute discussion, they came back and looked at our work, and said "Geez, you're right!".
Another area where we need better tools are for the debugging of concurrent, parallel and distributed programs, preferrably starting with analytical tools that prevent, or at least detect before runtime, these kinds of problems. Debugging should be an embarrassment to an professional programmer, as it's basically a representation that you don't know what your code is actually doing.
I'm no fan of "model-driven architecture" in general, but model-driven approaches, coupled with the right kind of source code analysis tools, have real promise here. (Look at how Microsoft, a company I rarely praise, has used this for checking out device drivers...)
dave
because it can make you jump to conclusions regarding how to conceptualize computation. You can end up all too easily with objects and interactions between these objects that are hard to parallelize. With Antiobjects we try to move from the computational foreground into the background. This way one can implement, for instance, game AI, by mapping computation onto dozens, hundreds or thousands of CPU/cores with very little overhead.
Check out programming models offered in Terracotta (www.terracottatech.com) and Hiperware (www.hiperware.com). Both showed up at JavaOne. They're offering Java based programming models that allow developers to do this. Terracotta's model transports Java messages across CPUs/ computers by transpatently making changes to bytecode. Hiperware provides a programming interface to programmers that allows object information to transparently flow across objects assigned on different CPUs/ computers.
I remember when I was around 10 discovering that this wasn't how procedure calls worked in normal languages, and being disappointed. While I was playing with the Objective-C runtime, I realised there was no reason this had to be the case, and added it. It works on both Apple and GNU runtimes.
Of course, if you use a language like Erlang, then you get this kind of semantic natively, and it's trivial to write hugely-parallel code.
[1] That's 'method you call' for people more familiar with cargo-cult OO languages.
I am TheRaven on Soylent News
Threads (multiple threads) can actually simplify the software rather than making it more complex. Or, rather, simplify writing the software.
Ok, so maybe I am an old hand at multi-threaded programming (over 20 years of doing this stuff) but it just makes sense for so many things. Past examples that the general public may have seen are things I wrote on the Amiga, machines with interrupts, Apache, etc. (Imaging writing a high-volume web server without multiple threads!)
There was a Hockey game on the Amiga that had each player on the ice having its own thread and own IA behavior. The code became so much easier than the traditional multiple interlocking state machines that such simulation games had to run that in order to port it to the PC they ended up writing their own multi-threading environment that ran on top of DOS.
Personally, I think that too many people are scared away from multi-threaded programming because of the "horror stories" and a few "old guard" that don't feel comfortable in such an environment. For me, many problems are so much easier in the multi-threaded solution space that I rarely think of non-multi-threading. (Well, co-routines are sometimes required when the platform does not support true multi-threading)
I was using a potential answer to this in 1990. I was working for a small company in the Provo/Orem area, called Computer System Architects, which was selling Transputer hardware. For those who haven't heard of Transputers, they were small, 16- or 32-bit processors, with a small amount of built-in RAM (not a cache; this was actually in the memory map and you could do small tasks on a Transputer without any external RAM), 2-4 high-speed serial channels (easily implemented with 4 wires) and a stack-based architecture. Adding megabytes of external RAM was easy, and it was embarrassingly easy to connect up networks of these things, even on one board (in a single ISA slot), and build cluster. An external card cage, in those days, could hold 20 slots, which would hold up to 80 Transputers, using our products.
I did some Assembly and some C, but the kicker language for this chip was called Occam II. Among other things, it used the indentation in the code to determine block structure. A quick example:
PAR
step A
step B
step C
SEQ
step D
step E
In this example, steps A, B and C would all be executed in parallel with another task which ran step D then step E. If you had one Transputer in your machine, it would multi-task. If you had multiple CPU's available, it would spread the task across the CPU's.
It also has a basic construct called a Channel. These were very easy to set up and use. These were how the different tasks communicated with each other.
It was not difficult to spawn thousands of tasks, each one doing a relatively small part of an overall task, with full communication and synchronization. Again, if you had multiple CPU's available, it would spread the tasks across them. A board with multiple Transputers was usually doing ray-tracing or rendering Mandelbrot fractals as a demo anytime we went to a trade or tech show. They could knock it down to one processor, and things got done relatively quickly. Then, they'd kick in 4 or 16 CPU's and blow people's minds.
This was in 1990. A 386DX-33 was high-end, back then. The Transputer didn't run DOS or Windows, so it didn't survive in the market of the time. That was a shame; I benchmarked a variety of them, then ran identical benchmarks on various other machines as technology marched on. A T805 running 30 MHz (the top end Transputer I ever got to play with) blasted through mixed integer/floating-point calculations about as fast a 486DX2-66 (which didn't come on the market for another couple years). There was an occasion where I had 16 of those T805's sitting my machine. You'd need a Pentium II to be able to match that occasion. It was well over a decade later that the P-II became available.
Cool tech, but the programming tools were what allowed you to really use the parallelization. It was typical to achieve over 95% linear speedup (i.e. 20 CPU's gave real-world 19x performance); sometimes we went over 99%. Most Intel SMP machines are lucky if they give 80% linear speedup (4 CPU's = 3.2x total performance).
... by the Dew of Mountains the thoughts acquire speed, the hands acquire shakes, the shakes become a warning
System of Systems is a philosophy that should look a helluva lot like the Unix Philosophy because... it is the same philosophy... and people do write code for these environments. If you want Parallel Programming to succeed make it work as systems of systems interacting to create a complex whole. Create environments where independent "objects" or code units can interact with each other asynchronously.
Otherwise, as discussed in TFA there are certain problems that just don't parallelize. However, there are whole classes of algorithms that aren't developed in modern Computer Science because such stuff is too radical a departure from classical mathematics. Consider HTA as a computing model. It is just too far away from traditional programming.
Parallel programing is just alien to traditional procedural declarative programming models. One solution is to abandon traditional programming languages and/or models as inadequate to represent more complex parallel problems. Another is to isolate procedural code to message each other asynchronously. Another is to create a system of systems... or combinations of both.
If there is a limitation it is in the declarative sequential paradigm and that limitation is partially addressed by all the latest work in asynchronous 'net technologies. These distributed paradigms still haven't filtered through the whole industry.
[signature]
Lack of skill and training in the work force (including the understanding that there are no general solutions)
.NET Framework) although the demand (the explicit demand anyway), relative to the many other types of development project that occur in the business world, remains relatively speaking quite low. If a particular programmer is able and willing and if the project allows for this type of solution then he may choose a multithreaded approach to solve the problem and if he succeeds then the management never knows that the application is multithreaded (most of them do not understand the concept in any more detail than some vague notion of multitasking anyway). It is rare for multithreaded to be an explicit project requirement, especially in the more common types of business applications. Of course, if the project fails and it is found that multithreading is involved then the progammer might take some heat, so why improve efficiency at the cost of project failure when you can do something that you know will work even if it doesn't hit the theoretical maximum efficiency? That last 10% of performance is frequently too expensive for all but the most critical applications (i.e. those that specify it as a project requirement).
You may be shocked then to discover that a substantial percentage of the lecturers and professors in computer science programs at American (and probably foreign as well) universities have little or no *practical* experience in programming large multithreaded applications as well (they know what it is of course and they wave their hands while describing it but the programming details are often left as an exercise for the student to figure out on his own). Is it any wonder that their students, while familiar with the concept, also lack that experience or even practical training? The exceptions are professors whose area of interest or research involves operating systems, real time systems, or linear programming whereas many other disciplines in computer science, being more theoretical in nature, do not rely on practical mulithreaded programming abilities to advance the state of the art (or if it does occasionally come up then there are always a couple of grad students around who can do that grunt work for you in exchange for saying nice things about their research assistance on their PhD evaluations).
Lack of mature development tool to easy the development process (debugging tools especially)
Modern professional IDEs, such as Visual Studio, have really improved in this area (especially in the last five (5) years with the introduction of the
The *real* question is this, if you need someone with a high level of skill in parallel programming are you willing to pay a substantial premium for that person over a more average programmer or do you just expect every programmer who works professionally to have those skills? In the same way that not every lawyer is a constitutional scholar, not every programmer is a multithreading, operating system kernel hacking, real time Jedi programming master. If you want (and need) master Yoda, then you have to be willing to pay him what he is worth.
By this logic, it seems like maybe we are not designing our multi-core processors effectively. It may be more effective if we designed different cores for different tasks, much like our brains. If you could describe to a programmer 4 cores on a processor and that each core was really only good at ONE type of processing, then that programmer would logically break up his design into peices for each core to do individually. Even if they had to be done in serial, each core may process its peice faster than a general CPU and thus make the whole process faster. If you were running other programs at the same time that were also serial, they may not be asking for the same cores all the time either.
I realize this is over simplified and would probably have lots of problems in practice. However I know that there are plans to combine CPU and GPU cores for very much the same reasons, they are good in some areas and bad in others.
Having written MPI and Linda work-alike libraries, developed parallel and distributed systems, and being a heavy user of threads and remote procedure calls (via xml-rpc), I will say that the general issue that all of these different systems try to handle is one of API. How does one have data X handled by function Y in thread/process Z. In the realm of threading, we use queues with locks. MPI/PVM/Linda/XML-RPC all use sockets to pass data between processes, and all but Linda requires that you specify the destination process explicitly.
One interesting recent development is the processing package for Python: http://www.python.org/pypi/processing/ . The idea is that you create processes like you would threads, then use shared queues, key-value mappings, etc., to pass data between the processes like you would threads. By sticking with generalizations that are seen to be 'easy' to understand and use (primarily queues), they bypass the majority of the difficulty people have when using threads and processes. Of course it doesn't hurt that the data transfer is pretty fast.
The only limiting factors with the particular implementation is that it relies on fork in *nix, and the transfer of code as data in Windows (which isn't generally possible for all languages, especially with the NX processor extension). If Windows had a usable fork, and if there was a 'fork across machines' (think mosix), many of these discussions would result in "just program it like you had a thread and use shared queues and the processing package for your language".
Yeah, I've been doing tutorials and bought the beta PDF edition of Joe Armstrong's book, but at one point I stopped to go back and do The Little Schemer to refresh my ability to frame everything in terms of recursion on lists. Haven't gotten back to Erlang yet, but I've been keeping an eye out for something useful to do with it as a first non-trivial application. You'd think it'd be easy for a web-dev geek to come up with problems that are well-suited to the pseudo-functional concurrent style, but it's harder than it looks :)
I've also been playing off and on with Scala, which is a pretty neat little language; it runs on the JVM (and has seamless importing of Java classes, which is quite nice), but has first-class and higher-order functions, anonymous functions, pattern matching (including pattern matching on classes, which is wicked cool), list comprehensions and the ability to use either Java's Thread API or its own Actor-based concurrency model. It's definitely worth looking at if you've been enjoying Erlang.
Not off the top of my head, but I'm also somewhat sidetracked by an ongoing effort to pull the news industry off Java and onto Python :)
How much time have you spent writing Lisp? I ask because I was initially uncomfortable with s-expressions (as are many people) but after a certain amount of time something in my brain flipped, and now I'm much *more* comfortable with them. Part of it had to do with the fact that I'm drawn to writing code in the functional style (imperative code in Lisp *is* uncomfortable) and part was just getting my brain to see the structure in the parens.
From what we know about the brain it takes advantage of many different techniques but I wouldn't necessarily agree that it is pipelined. Pipelining is a parallelization technique for making a serial set of operations execute faster through a processor by changing how the processor executes each instruction. When the brain processes information, it is basically a giant circuit that does all the processing for vision only, nothing else--break anything in the chain and the "vision" capability is either hindered or stopped.
The brain has two separate pathways "vision" data takes: one path determines things like dimensional properties (as in how far away a cup is from you) while the other path determines identification (is what you're looking at a cup?). There have been documented cases where one pathway was lesioned and the patient could identify the object, but could not physically reach out and grab it while if the other pathway was broken the patient could physically catch a ball thrown at them and they could catch it, but they could not recognize what object was thrown at them until they used their other senses (like touch) to determine it.
Each processing unit in the brain does some sort of processing on the data rather than contributing to make the final product. When you "see" an image, you are actual interpreting many different senses from your vision pathways. The only processing that might potentially be shared (as far as I know) is the processing done by the retina like edge detection. All of the other areas are very specialized at what they do and are only designed to do that one task. That's way different from pipelining where each unit is reused for different instructions giving the cpu its general purpose traits.
I'd say multiple processors/cores is more how the brain operates but does not necessarily model how "we" think. What I mean by "we" is that each person has a pre-frontal cortex in their brain has that is where their personality and where their actual "thinking" as a person goes on. The rest of the brain is specialized to do other tasks. Some of those tasks are controlling emotions, labeling and highlighting memories, vision processing, motor skills, and so on. For example, when you first learned to write, you were probably intensely utilizing your prefrontal cortex to write letters. That's because the motor section of your brain specializing in hand-finger motions for writing the letter had not been trained yet. However, as time goes on, you no longer have to think in order to write the letter, in fact you could probably close your eyes and start writing--no prefrontal cortex processing is needed because the motor section now has it programmed in.
So when you're thinking, you're really just utilizing your prefrontal cortex, but if the operation or task your performing is repetitive and can be memorized, at some point that task will be programmed into a specialized unit in your brain leading you to require less or no thinking at all. I'd say the brain itself is designed as a massively parallel unit with multiple areas that can be trained to perform specialized operations.
Now, you may be thinking if each section of the brain can be trained to do something, can they be retrained to do other things? The answer is yes. There have also been cases where due to some reason, people lost a need to utilize one portion of their brain but started utilizing those portions to do other things. For example say a person lost their arms, well now their motor section for their arms is completely useless in their brain. But, because they must not be forced to use their feet and legs in ways they hadn't before, eventually the parts of the brain trained to do arm and hand motions will gradually be utilized to learn and control feet and leg movements. In fact, if you were to map out the amount of brain space used for each portion of the body, you would get a huge proportion devoted to things like hands and mouth where other portions of the body would utilize an insanely small portion of th
Well, that statement makes a gross assumption. Every software developer (programmer/engineer) thinks that their particular domain is representative of what computer programmers do in general. However, if you in fact write software that automates common business tasks, the software is directly analogous to the process it seeks to model and/or replace. Most business tasks are sequential, so procedural single-threaded programming is a perfectly fine model to use.
Yes, for the things that most casual users and computer scientists think are "interesting," there is some inherent level of parallelism that can get pretty high. For a few choice types of tasks (ray tracing, rendering certain fractals), the problem itself is embarrassingly parallel because there is no direct coupling between the solutions of sub-problems. However, at some level you can't decompose your problem any further, and you can extract no more parallelism.
The vast majority of computer software these days is written for businesses. Most of it is not the stuff you see on the shelf at the local computer store, but is written custom to solve a specific business need. Most business processes are inherently serial operations: you perform one step, then you perform another step that depends on the previous step. Occasionally, you get lucky and you discover multiple steps that can be done simultaneously. However, making such processes explicitly parallel might not be the most advantageous move; after all, most modern CPU architectures are adept at out-of-order execution, and can analyze instruction streams to figure out dependencies dynamically. Heck, most modern CPU architectures support multiple in-flight instructions, and allow multiple instructions to complete simultaneously, which extracts parallelism at a very low level.
You can still explicitly use this technique in modern programming languages; Java, for example, has Thread.join() which is used for precisely such situations. However, just because you can do something doesn't mean that you should; there is overhead associated with spawning threads of execution and synchronizing threads when some result is needed. If the computation being performed by a function call is long-lived, then it makes sense to spawn a thread to perform that computation -- assuming that there are sufficient computational resources to truly run that thread concurrently (e.g., another CPU core able to perform that computation). Otherwise, you're burning more computational resources and probably making your code actually slower in the process (due to management overhead). And if you're multitasking on a single CPU core, spawning another thread will almost certainly result in a slower-running program (because you still have all the overhead of managing another thread, but none of the benefit of true hardware-level concurrency).
The first sentence is really an unsupported conjecture. The second sentence is an attempt to provide anecdotal evidence drawn from your own experiences to support the conjecture in the first. My own life experience is vastly different from yours, but then again, that too is anecdotal evidence; neither your nor my personal experiences are really "proof" o