What Makes Parallel Programming Difficult?
An anonymous reader writes "Intel's Aater Suleman writes about why parallel programming is difficult. ... I was unaware ... that a major challenge in multi-threaded programming lies in optimizing parallel programs not just getting them to run. His analysis is insightful and the case study is very enlightening if you are unfamiliar with parallel code debugging. "
Motherboards rarely have parallel ports these days!
"You're just not thinking fourth-dimensionally!"
"Right, right, I have a real problem with that."
Use Ctrl-C instead of ESC in Vim!
Erlang (and other functional, single-static-assignment languages) are perfect for parallel programming. Sure, you can do parallel programming in C or Ruby or Python, but you need to be very careful about side effects. And the compiler doesn't know enough, so you need to manage all that shit yourself. Analogy time: If you were on the prowl for some vagina, you could try a woman, a ladyboy, or a man. Which one would you choose?
There is a class of problems, the P complete problems, which are (probably) inherently hard to multithread in any advantageous way, and this class includes some pretty important real-world problems:
http://en.wikipedia.org/wiki/P-complete
Palm trees and 8
PLC code for automated mechanical systems runs into all of the same timing issues etc. Obviously the PLC code is much smaller but this is mostly due to the instructions acting on smaller data sets, but can still be just as complicated,.
I am kinda curious how anyone even tangentially involved in programming could not be aware that the problem with writing parallel programming was doing it for a gain in efficiency. Making a thread or process is generally just a couple lines of code, synchronization with data separation, mutex's and avoiding deadlocks and race conditions has been solved since almost the beginning of parallelism.
Good tutorial for someone who wants to jump into some parallel programming, but it's mostly Operating Systems 101 (or 601).
Honestly though, if you have not optimized your algorithm or code to for parallelism and you want to do it now, you might probably be better off writing the whole thing from scratch, and the tutorial explains why very nicely/.
Not all tasks are suited for parallel programming. Some are and can be highly optimized, but if you have something that simply can't be reduced further then parallel programming will not solve your problems.
Jebus in a sidecar, Slashdot; do you think we've all never heard of parallel programming before? OpenMP has been around for ten years now. Yeah, you can't write a for loop in C and expect magically parallelization. Seriously, is there ANYONE here who doesn't know this already?
If you're trying to dumb down your article content to the point where it annoys any vaguely experienced programmers... well, you're succeeding.
This hasn't been news for 30+ years. Parallel programming is hard for a multitude of well-known technical reasons, this is nothing new. Another major reason is that the human brain is no good at parallel programming; again, nothing new. I say let's either try to solve the problems or move on.
In my opinion, many problems with software development, are just as applicable in other domains of our life, and parallel programming is definitely one of them. We equally well have problems managing large teams of people working in parallel. These are problems of logistics, management and also (and no I am not joking) - cooking. And we're as bad handling these as we now handle software development. It may be right, however, to start solving this with the computers - no need to throw away rotten food and/or burned-out employees under delusional management.
What could help is a trusted set of formal theories. That would be a start. Whether it will be a mathematician, a computer programmer, or a kitchen chef that writes this set, is not important. Stripped bare of the bathwater - abstracted and proven - the baby will be what we need.
That you have to do everything all at once. How would you tell 50 kids to sort 50 toy cars? How would you tell 50 footballers to line up by height all at once? How would you have 50 editors edit the 50 pages of a screenplay all at once so that it makes sense from a continuity perspective? All these problems are very easy, but become very hard when you have to do it all at once...
Are you kidding?
A PLC runs the same code over and over again in an endless loop. This is a language that does not even have _ANY_ sort of mutexes, semaphores, or threads. Hell, most PLCs don't even have local variables, or even function arguments!
Many PLCs let you configure interrupt handlers, but there is no way to synchronize them with the main loop.
I bet there are hundreds of simple tutorials on concurrent processing on the web, and these two pieces bring nothing new/interesting.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
Well, we just learned about this in a graduate comp arch course and yeah, it can get hairy. Especially if using a processor consistency model as opposed to the sequential consistency model. The easiest fix is to throw up some fence instructions around the interdependent code sections to force sequential consistency, and then lay some flags as a means of time signaling between the processors. This eliminates the randomness that the author discussed in one of the examples.
Mutexes are just bits that are turned on or off. Semaphores are integers. Local variables are memory locations that aren't used by the rest of the program (or the rest of the program may use them freely and their value need not be retained from scan to scan). Function arguments are just memory locations that are written by the rest of the program before calling the subroutine, which reads them and uses their values.
The only of those that I actually haven't used in a PLC program are semaphores.
I think the language Chapel being developed by Cray is taking concrete steps to make Parallel Programming easier. It is very similar in syntax to C++ and even supports some higher level constructs.
This looks like an advertisement for Chapel but I have no relation to Cray. Having taken a graduate parallel programming course, I cannot agree more with the statement that "Parallel Programming is difficult". I struggled a lot with pthreads and MPI before doing the final assignment in Chapel which was a pleasant surprise. The difference between serial and parallel code was FIVE characters only - and it gave a near linear speedup on three different machines.
No wait - I take that back; I've used semaphores too.
20 years ago when I was working with transputers we use Occam. It was a very pure parallel programming language and it wasn't too difficult. However, writing parallel code meant starting from scratch (getting rid of the dusty decks of old algorithms as my professor described it). However, this never really happened and we've ended up with primitive parallelisation nailed on to sequential code. The are many parallel architectures, SIMD, MIMD, distributed memory, shared memory and combinations of them all and none of the languages currently available really suit. You have basic multithreading and MPI bolted on to old sequential languages and developers trying to take algorithms which absolutely must proceed in order and trying to find sections they can perform in parallel. This isn't going to get you good performance or scalability. In Occam, we wrote procedures which were independent of each other and communicated over channels and all would be running at once. You wired them together and poured you data in at one end and results came out at the other. Due to the very fine grained parallelism it was extremely scalable - I hade code running on 80+ transputers at over 90% efficiency in 1990 and it would run happily on many more if I had them, and yet it also ran perfectly on one since serialisation of parallel programs is easy whereas parallelising serial code is very hard and inefficient. I find it sad that the current state of parallel programming is still so far behind where we were 20 years ago due to all this legacy stuff. Even when people start out with new algorithms, they still start with serial processes and then try and parallelise rather than considering the parallelism from the outset and designing the algorithm so it doesn't require everything in memory and doesn't produce different results when number of CPUs changes.
"I have the attention span of a strobe lit goldfish, please get to the point quickly!"
I'll take a stab and reply to you that "8 processors was enough for anyone", in the sense that multiplexing 8 programs is just insane. Better to just run 8 prorams each on their own core, and use some progs that can use 4 cores at a time. That leaves 4 free.
(Overly simpistic) I agree, but 1028 cores is not the answer. We need the next generation in raw core power to move computing forward. 8 killer cores will beat 1024 mediocre cores.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
"The news of the problems of parallelization is still news after 30 years"
- paraphrase of Christian saying
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Isn't this what Hoare formally described in "Communicating Sequential Processes"?
...I have found that the strategy chosen for coordinating activity among and communicating between running bits of software on a parallel system usually causes most of the problems. Some blessed lead guy or (god forbid) manager will declare, "We are going to use ____" [threads | json over web services | CORBA | EJB | enterprise service bus | ...] and the project is stuck with that paradigm come hell or high water. Believe it or not, all parallel computing problems are not created equal. And it is often not until an engineering team has spent time puzzling out the data and process interactions before the best way to set up the solution becomes clear. Parallel systems magnify the liabilities of poor and/or rigid technology choices.
Also, the herd mentality of layering stacks of frameworks atop other stacks of virtual machines is a recipe for disappointment. For truly performance-critical applications you want to physicalize, not virtualize. You want to stop treating a cluster of boxes as a general-purpose computing resources and start treating it as a custom-crafted parallel appliance tailored for your problem domain. You want to get your knuckles dirty with hardware characteristics and then tailor your logic and data structures such that they can be pinned to NUMA nodes (for example) and utilize the caches and locally-attached RAM for the highest benefit.
Less dependence upon stacks of frameworks, more understanding your problem and tailoring and hardware and software solution to fit it more efficiently.
Using locks and the like make it very easy to do multithreaded and parallel programs.
The big problem comes when you need multiple locks because you find your program is waiting more on locks than anything else which is gumming up the whole works, and that can easily lead to deadlocks and other fun stuff.
Another way is to consider lockless algorithms, which don't have such blocking mechanisms. However, then you get into issues where atomicity isn't quite so atomic thanks to memory queues and re-ordering done in the modern CPU, and thus have to start adding memory barriers before doing your atomic exchanges.
Raymond Chen (of Microsoft) did a nice write up of the lockfree ways to do things and what Windows provides to accomplish them.
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/05/10149783.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/06/10150261.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/06/10150262.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/07/10150728.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/08/10151159.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/08/10151258.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/12/10152296.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/13/10152929.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/14/10153633.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/15/10154245.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/19/10155452.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/20/10156014.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/21/10156539.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2011/04/22/10156894.aspx
Parallel programming is hard. Bummer. I'd rather be interested in an article that talks about *monitoring and debugging* parallel programs (I currently struggle to monitor parallel algorithms implemented in Java). Anybody?
that most of today's popular programming languages do not accommodate higher-level forms of expression required for easy parallelism. Declarative languages have a slight edge at being able to express where sequential dependencies are.
keep dreaming http://www.flowlang.net/
One more reason why functional programming matters. Many programs become trivial to parallelize when you avoid mutation and side-effects outside of limited, carefully-controlled contexts.
It's truly a joy when you can parallelize your code by changing a single character (from "map" to "pmap"). There's usually a little more to it than that, but you almost never have to deal with explicit locks or synchronization.
Side-effects are the pantomime villain of parallel programming.
This has been known since the late fifties.
The fact nobody has given a shit until recently is not surprising, but I don't have much sympathy either.
What makes parallel programming hard is poor languages. Languages that allow state changes and don't keep them isolated. Isolate changes of state, all changes of state and be careful about what kinds of combinators to use. Google map-reduce works whenever
a) You can organize your data into an array
b) You can do computations on array element in isolation
c) You can do computations on the entire array via. associate operations pairwise
And most programs do meet those properties but they slip in all sorts of subtle state changes so out of order execution isn't possible. The key to parallelism is better language selection.
The dirty secret of parallel programming is that it's *NOT* so widely needed. I think a lot of academics got funding to study automatic parallelization or other parallel techniques, and they latch on to multicore as a justification for it, but it's not.
There is only one GOOD reasons to use multithreading -- because your work is compute-bound. This typically happens on large-data applications like audio/video processing (for which you just call out to libraries that someone else has written), or else on your own large-data problems that have embarrassingly trivial parallelization: e.g.
var results = from c in Customers.AsParallel() where c.OrderStatus="unfilfilled" select new {Name=c.Name, Cost=c.Cost};
Here, using ParallelLINQ, it's as simple as just sticking in "AsParallel()". The commonest sort of large-data problems don't have any complicated scheduling.
There are also BAD reasons why people have used multithreading, particularly to deal with long-latency operations like network requests. But this is a BAD reason, and you shouldn't use multithreading for it. There are better alternatives, as shown by the Async feature in F#/VB/C# which I worked on, which was also copied into Javascript with Google's traceur compiler). e.g.
Task task1 = (new WebClient()).DownloadStringTaskAsync("http://a.com");
Task task2 = (new WebClient()).DownloadStringTaskASync("http://b.com");
Task winner = await Task.WhenAny(task1,task2);
string result = await winner;
Here it kicks off two tasks in parallel. But they are cooperatively multitasked on the same main thread at the "await" points. Therefore there is *NO* issue about race conditions; *NO* need to use semaphores/mutexes/condition-variables. The potential for unwanted interleaving is dramatically reduced.
So in the end, who are the people who still need to develop multithreaded algorithms? There are very few. I think they're just the people who write high-performance multithreaded libraries.
What kind of imbecile posted this? Did he/she think that optimising should be simpler then just getting them to run? Besides, if you are unfamiliar with parallel code debugging, perhaps you are not the best person to judge what is enlightening and what is not. Try selling used cars instead.
...very simple article...was expecting something more technical!
:-P
Anyway, cheers for the effort to write it!
Ps. Looks like he discovered an open secret
The basic problem with parallel programming is that, in most widely used languages, all data is by default shared by all threads. C, C++, and Python all work that way. The usual bug is race conditions.
There have been many languages for parallel programming which don't have default sharing, but they've never taken over outside some narrow niches. Partly because most of them weren't that useful outside their niche.
The other classic problem is that in most shared-data languages with locks, the language doesn't know what the lock is protecting. So you can still code race conditions by accident.
This is probably why many programmers experience such difficulty programming (efficiently) PLCs.... Lots of very large, very complicated programs are just "the same code running endlessly". The devil is in the inputs.
Erlang (and other functional, single-static-assignment languages) are perfect for parallel programming.
Okay, then the only problem is getting something useful out of Erlang.
Back in 1985 the Japanese government announced a "fifth generation" computing project, with software to be developed in Prolog. So I went and learned Prolog, an intriguing and amusing language. Only problem is, it was totally useless for any actual application, as the Japanese found out.
Sorry, but in order to believe any of the promises of one of those non-vonNeumann languages, I have to see a practical working application first.
The Scala community has tried to move the problem into a more practical realm by adding things like parallel collections, DSL's to abstract out the problem for specific applications and the Akka Project for simpler concurrency.
Most of the parallel programming discussion I've seen is very complicated and not likely to appeal to those who have to do practical day-to-day business projects. By pushing the abstractions up a level, I think the Scala folks have made parallel programming more accessible for the average developer.
Ref: Parallel Collections video.
Signatures are a waste of bandwi (buffering...)
Seriously. Parallel programming IS NOT hard. I'm tired of people complaining about this; in 90% of cases, it's trivially easy. Like most things in programming, there are some hick ups in a small fraction of what you do, but that's just part of the game. Parallel computing isn't somehow magically more difficult than programming in general. Dear God.
Unless of course we're talking about dealing with terribly written bolt on libraries for circa 1980s programming languages that should not be in use in the modern era. That can be hard -- but that's the fault of old technology being expected to handle new problems and a lack of investment in creating new tools in the form of system-level languages...it has little or nothing to do with the underlying challenges of parallel programming itself.
All hardware guys (myself included) claim that they have been doing concurrent work since long. It takes writing a parallel program to understand the challenges. The communication latency (a huge issue), non-determinism, caches, need to deal with legacy code, and the need to make it robust that makes it a much much harder problem. My goal is to make hardware guys see these challenges so I can say you hit a pet-peeve. (I am a hardware guy who has learned software over the years).
'nuff said.
The problem is moving optimizations to sofware which is the most retarded thing I ever heard. Most problems are not parrallizeable (if such a word exists) and is not needed if Intel made decent processors rather than the Itanium disaster. VLIW is heavily dependable on this. Parallel Programming as Intel wants it is to have the developers make up for the shortcummings by having programmers invent tricks while they put more cache and other things on their chips. We have the corei series of processors now at least but still. Why should we learn how to do parallel programming because they lost billions in R&D and have these silly api's and a market to sell these tools.
Advancements have been made to make that argument obsolete as putting more cache while making compilers harder to write is not making huge improvements in performance. It is still old fashioned branch predictions. So yes it is usefull in some circumstances, but in business all I care about is how fast I can get SQL out to a client/server app or to an intranet app
I admit I have not wrote code in 5 years so I am not the best source before the professional programmers nail me here. This is just from what I see
http://saveie6.com/
I never would have time to optimize code for as long as this would take. If a tool or process works you move on. How fast the code runs is never as important as how fast the schedule moves.
No, the devil's in the steering comittee.
Science advances one funeral at a time- Max Planck
Concurrency
Mutexes and semaphores are a bit more than bits and integers. You have to be able to modify them atomically. If you don't design your PLC correctly, you end up with two paths attempting to modify the same bit at the same time. Without atomicity, you can't know which won.
Government is not reason; it is not eloquent; it is force. Like fire, it is a dangerous servant and a fearful master.
Parallelism is very easy, provided that you don't do it yourself.
Use a pure functional programming language like Haskell that can be automatically parallelized.
Or use a programming language that uses the Active Object Pattern (or the Actor model).
Or do as I do: use C++ to implement message passing, then have one thread per object. Objects don't have public members, they simply communicate by exchanging messages.
In all the above cases, the trick is to avoid state. In Haskell, state is avoided by design; in the Actor Model/Active Object pattern, state is avoided due to isolation: effectively, each thread is like a separate process.
I have a small issue with the message passing. Doesn't it make the barrier of entry higher for the average programmers? Many claim that IBM Cell was harder to code for than the XBOX because of this.
While you're correct from a temporarily practical measure, I disagree in theory. OS theory 20 or more years ago was about one very simple concept.. Keeping all resources utilized. Instead of buying 20 cheap, slow full systems (at a meager $6k each), you can buy 1 $50k machine and time-share it. All your disk-IO will be maximized, all your CPUs will be maximized, network etc. Any given person is running slower, but you're saving money overall.
If I have a single 8 core machine but it's attached to a netapp disk-array of 100 platters over a network, then the latency means that the round trip of a single-threaded program is almost guaranteed to leave platters idle. If, instead I split a problem up into multiple threads / processes (or use async-IO concepts), then each thread can schedule IO and immediately react to IO-completion, thereby turning around and requesting the next random disk block. While async-IO removes the advantage of multiple CPUs, it's MASSIVELY error-prone programming compared to blocking parallel threads/processes.
A given configuration will have it's own practical maximum and over-saturation point. And for most disk/network sub-systems, 8 cores TODAY is sufficient. But with appropriate NUMA supported motherboards and cache coherence isolation, it's possible that a thousand-thread application-suite could leverage more than 8 cores efficiently. But I've regularly over-committed 8 core machine farms with 3 to 5 thousand threads and never had responsiveness issues (each thread group (client application) were predominantly IO bound). Here, higher numbers of CPUs allows fewer CPU transfers during rare periods of competing hot CPU sections. If I have 6 hot threads on 4 cores, the CPU context switches leach a measureable amount of user-time. But by going hyper-threading (e.g. doubling the number of context registers), we can reduce the overhead slightly.
Now for HPC, where you have a single problem you're trying to solve quickly/cheaply - I'll admit it's hard to scale up. Cache contention KILLS performance - bringing critical region execution to near DRAM speeds. And unless you have MOESI, even non-contentious shared memory regions run at BUS speeds. You really need copy-on-write and message passing. Of course, not every problem is efficient with copy-on-write algorithms (i.e. sorting), so YMMV. But this, too was an advocation for over-committing.. Meaning while YOUR problem doesn't divide. You can take the hardware farm and run two separate problems on it. It'll run somewhat slower, but you get nearly double your money's worth in the hardware - lowering costs, and thus reducing the barrier to entry to TRY and solve hard problems with compute farms.
amazon EC anyone?
-Michael
What makes parallel programming hard is computer languages.
Most languages today are actually not designed for parallelism or concurrency simply because most computers for a very long time had only one core. This is why we have threading and locks everywhere. Threads have huge overhead from hundreds of kilobytes to megabytes. That may seem like nothing but ideally for parallelism and concurrency to work you need to be able to create thousands of processes at nearly no cost (hundreds of bytes each). And locks, don't even get me started with that!
Shared mutable state is also a major problem it makes parallelism very hard, again current languages make heavy use of it (Singletons).
Anyway, this problem has been solved ages ago just look into the Actor Model and Erlang to get started that should pretty much cover it.
A bit is a mutex if your processor has an atomic test and set instruction. Which these days they all do. A semaphore is an integer protected by a mutex. He oversimplified a bit, but if you know what you're doing it is that simple.
I still have more fans than freaks. WTF is wrong with you people?
If you don't design your PLC correctly, you end up with two paths attempting to modify the same bit at the same time.
A PLC does in fact run the same code over and over in an endless loop. There are no threads. No other processes. No other code can modify your mutex. This line executes, and the line following this line will follow this line in execution, nothing in-between, no question about it. There is simply no way that anything else can attempt to modify the same bit at the same time.
The only things that can modify that bit are other parts of the program, and since the program executes in order, you know exactly when that will happen.
In the end, what makes it harder is that the things you do on software are usually way more complex than the things you do on hardware. If not for that, that argument would make perfect sense as hardware also have issues with communication latency, non-determinism, and localized data (not just caches), to an even higher degree than software. And you also must make it robust.
Rethinking email
The barrier for learning to program purely functionally is high. But its a 1x barrier vs. complexity for ever. It makes the system simple.
I agree with him, avoid state.
For certain types of problems, the Linda coordination primitives and shared tuple-space make parallel programming much easier. I used the original C-Linda many, many years ago, and IBM's TSpaces for Java more recently. If you're trying to do little bitty actions on lots of data with tight coordination, the overhead is pretty bad. Looking into PyLinda is on my list of things to do...
I figured this all out when I was a kid... share nothing!
Indeed it is is extremely difficult to do parallel programming. Back in the 70's we had several groups of programmers working on programs to do so.
Add in the operating system technicalities and the hardware issues (storage serialization must be taken into effect).
The issue which is wicked becomes more difficult when you have multiple cpu's running as well.
When we first fired up out multiprocessor back in the 1970's we openned a can of worms as we found the programs that "worked" before stop working. Even IBM in one of their products did not like multiprocessing doing parallel programming. IBM did supply a fix quickly but we had a heck of a time documenting it. One vendor had had a LONG time bug in their code and refused to believe that a storage location could change withing a single cpu instruction until we proved it (simply I might add). It really can get dirty if things are not working the way they should (OS wise).
Makefiles specify dependencies and recipes quite tidily. A simple implementation of make is provided in the "AWK book" as an example program- http://www.cs.bell-labs.com/cm/cs/awkbook/
If that's the case, there is no need for mutex or semaphore constructs because those are only needed for parallelism. Parent was stating that he's used both in PLCs. If that's the case, why? Either he has no clue, or you're wrong. I don't really care either way, just addressing the fact that you can't just implement a mutex as a flag or a semaphore as an integer as stated by parent.
Government is not reason; it is not eloquent; it is force. Like fire, it is a dangerous servant and a fearful master.
Funny, just went through three different manufacturer's PLC instruction sets and none of them listed test and set or any other kind of discernible atomic instruction. Do you have any references you could point me to? I'd be interested to know for sure.
Government is not reason; it is not eloquent; it is force. Like fire, it is a dangerous servant and a fearful master.
I've got a crazy question to throw out there. Which do you think is easier? Design a motherboard capable of disguisesing a 4 core chip as a 1 core chip as far as the software sees it and then splits off the workload automatically across all cores
or
make every programmer in the entire world working on any multithreaded application jump through a ridiculous amount of hoops and headcahes to get their software to run properly.
Now they're both incredibly difficult but I still think it's sort of one sided. Seriously, why has nobody attempted to develop something like this yet? Making life difficult for thousands of programmers compared to making some weird CPU virtualization technique that manages to properly manage threads and not blue screen the OS due to memory sharing violations and then making a ton of money on that exact technology...well I think that sort of spells it out.
Intel, the world's largest semiconductor chip maker, which was established in 1968, has 41 years of product innovation and market leadership in history. In 1971, Intel introduced the world's first microprocessor. Brought about by the microprocessor computer and the Internet revolution has changed the world. http://www.cheaptoryburchshop.com/ http://www.sexytoyslove.com/ http://www.edhardygo.com/
Humans don't think well in parallel. The way you normally do parallel programming in a language like verilog-FPGA is using a parallel simulator like modelsim. It visualizes operations and computations that are normally hard to visualize. As far as I know, there is no good analog to modelsim for traditionally procedural languages like C. Until there is, it is going to be hard.
I also think that there is little effort spent trying to teach students to program in parallel. Curriculums might benefit by emphasizing hardware description languages more than they currently do.
Parallel Programming is one of the old Arts which the new "developers" don't know jack about.
Never buy Sony CDs - they will open up your computer to anyone..
Nothing wrong with what's described in the article, but the analysis is still quite naive. If you really want to know why parallel programming is hard, i.e., when you start from sequential programs (which all algorithms are) and want to parallelize them efficiently, you need to go and ask a compiler developer or someone who works on compiler parallelization. There are a large number of complex issues in play here that can best be summarized at a high level with: "Parallelization, for a given algorithm, is hard because dependence analysis is hard" and it has a lot to do with language design. The example in the article illustrates a rather easy-to-address difficulty: the solution to it is privatization which removes false dependences. On another note, an important point missed by the writer on why parallel programs may not provide better performance in many cases is contention for the same amount of memory bandwidth. Unless memory bandwidth catches up with compute power, parallelization will primarily only help compute bound problems.
All operations are atomic. For instance, if you have a line of code that tests a mutex bit, and, if it's unset, it sets it and branches into a subroutine, you don't have to worry about the bit being modified by external code between when you test it and when you set it.
However, since the real processes that are occurring outside the PLC take much, much longer than the few milliseconds that it takes to execute a single scan of the PLC program, most of the time the PLC is actually just running multiple subroutines which are all waiting for inputs to change on the PLC (maybe, a level reaches a setpoint, or a valve reaches its open or closed limit) so that they can move to the next stage of their execution. As a result you have most of the same issues that are caused by multi-threading with respects to concurrency, preventing deadlock, etc.
Not really, at least in languages where messages are indistinguishable from normal function calls.
Not sure what you mean. Are you thinking RPC?