Threads Considered Harmful
LBR9 writes "James Reinders compares native threads with the goto statement so famously denounced 40 years ago by Edsger Dijkstra. Paraphrasing Dijkstra, he says they both 'make a mess of a program,' and then argues in favor of a higher level of abstraction. A couple of people commenting on the post question whether or not we should be even be treading into the 'swamp of parallelism,' echoing the view recently espoused by Donald Knuth."
Alright, then all responses to this article need to fall under this one post.
Because really, multithreading doesn't have to be hard
Cretin - a powerful and flexible CD reencoder
I use threads a fair amount, because they are there. But I kinda wish path expressions would catch on. Let the compiler sort out the scheduling given the constraints - that's the kind of scut work computers are good for anyway.
PHEM - party like it's 1997-2003!
I'm all for getting rid of threads, but what are you going to replace them with? Traditional functional languages may be the most obvious solution, but they're also among the most impractical of solutions. Is there anything else out there that can replace threading needs, without throwing out the book on programming? It seems like what we need hasn't been invented yet.
Goto's and global variables are not inherently wrong or evil. They are tools. Granted, they are tools that, if misused, will wreak havoc on your code's stability and maintainability. The same could be said, however, for pointers. Threads are dangerous, and require special care. This is not a reason to avoid them; it is only a reason to be incredibly careful with them.
Use the best tool for the job, regardless of whether your CS professors demonized it or not.
Threads have been considered a "bad idea" by the CompSci profession for a little while now. So there is definitely nothing new about the author's statements. That being said, there is a fundamental difference between Dijkstra's paper 40 years ago and this summary: Dijkstra started his paper by holding up examples of better practices. Only after establishing their existence did he go on to suggest that the GOTO keyword was "too primitive" to be of practical use in software development.
The author of this "article" (and I use the term loosely) doesn't really present such options. He hand waves a few work-in-progress solutions at the end, compares threads to GOTO statements, then asks the readers to fill in the (rather sizable) blanks.
Long story short, it's a good topic of discussion, but the comparison to Dijkstra's famous paper is just an advertising point. Nothing more, nothing less.
Javascript + Nintendo DSi = DSiCade
The problem is not threads per se, but the way they are generally used in programming languages like C and C++. Although const correctness is understood by some C++ programmers, they appear to be a minority if I judge by the code I regularly review. There is also memory management which is a much bigger issue in threaded C/C++ applications than in applications written in Java. The Java library provides good examples of immutable classes, most prominently the String class, that remove a number of problems often encountered with their mutable cousins like std::string. Unlike std::string, I don't have to remember to make it immutable by constifying it or wrapping it. The presence of immutable classes, and the more adequate coverage given them along with threading in Java textbooks means that I disagree with the articles' author who lumps Java threads in with pthreads as a bad thing. What we need is more coverage of threading issues and how to alleviate them in intermediate level C/C++ textbooks, because despite the fact that threading is not built into those languages or their standard libraries, concurrency has become too important to ignore once you go beyond the basics.
You know the big difference between TFA and Edsger Dijkstra's paper?
The second one made an argument, showed alternatives that were at least summarili demonstrated to be better and used reasoning.
The first one just says "Edsger Dijkstra's paper said goto was harmful and he ended up being right, thus if I say threads are harmful, I'm also right. Oh and here are some threading libraries I've found in a quick google search, they might be better."
Well, for starters, there's processes, which were invented in the 1960s. These may not handle every case, but in my experience they'd cover 95+%...
"Not an actor, but he plays one on TV."
The articles' author explicitly mentions Erlang as a potential solution to threading issues in other languages. In fact he's mainly concerned about POSIX pthreads, Boost threads, Java threads (and presumably Windows low level thread libraries). As I point out in another post below, I disagree with him lumping Java threads in with those used in most C/C++ libraries, as threading support is integrated into the language along with increasingly sophisticated locking support in the library which can be used if the simple object lock is insufficient. In my experience, most data shared across threads is immutable (read only), and the Java libraries encourage use of immutable types such as String. Once you appreciate the value of immutable types, then they can be used just as easily in C++ (with C it's a little harder). Writable shared data can be cleanly hidden behind a decent interface, with the locking within the getters and setters, but again, this approach is applicable in C++ as well as Java.
By Edward Lee of the EECS department at Berkeley: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.html. Worth reading if you work with threads.
Those people who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)
The problem is that programmers are generally untrained in them or trained very poorly.
Writing a safe threaded application is not a difficult task, but it is a different task then writing a single-threaded app. And unfortunately CS programs, books, tutorials, etc, still train people in the single-thread mindset and yes the programs they produce end up being buggy.
And I'm not sure these 'high abstraction' languages are really the 'answer'. I have found that often in higher level solutions the results become even less predictable and tracing what is actually happening when becomes either extremely difficult, extremely inefficient, or just back to the single-thread mentality.
I think the OP talking about how one might be next writing a parrell app shows the real flaw here... the author is going from one mentality, entering another without really thinking it through, and then complaining when old methods don't work well. Take a programmer who STARTED in parrell space and you don't run into these problems.
Complete crap. Threads solve a number of programming problems much more elegantly than forked processes and sharing data through some IPC mechanisms. Anecdote time: a stock price system I worked on. The first generation used separate processes for a single writer and a large number of readers, with shared memory for interprocess communication. This was switched to a threaded implementation for the second generation, which was faster, even though it was using the old LinuxThreads implementation, and more easily maintained as the pthreads API is much richer than IPC ones.
The problem with Java concurrency and threading is, all the locks are advisory. The synchronize statement is a nice bit of syntax, and making it apply to whole blocks of code was the Right Thing to do.
:)
The problem simply comes in that a program is not obligated to *use* synchronize, or any locking, when it accesses objects. Which means the code is totally unsuitable for integrating into a multithreaded program. And trying to backport thread-safety in is (currently) too difficult, as there are no tools to tell you when you've got it right.
I haven't studied Erlang yet, but threads (or more generally concurrency) done securely would require mandatory locking of all data..except when you know two threads must share data and sync between each other. Ada95 seems to understand this, although it probably needs refinement.
Which makes threads functionally equivalent to UNIX processes & shared memory
Practice Kind Randomness and Beautiful Acts of Nonsense.
The problem is not threads per se, but the way they are generally used in programming languages like C and C++
Right. C and C++ provide zero help in dealing with the isolation issues of threading. The languages have no concept of parallelism (there's "volatile", but that's about it.) There were 1980s languages that did offer some help, such as Modula I/II/III, Ada, and Occam. Java has some minimal concurrency support, although it's not well thought out.
There's nothing wrong with multithreaded programming, but some help from the language would be nice. Major issues with threads are "which thread owns what", "which locks lock what data", and "what can I safely call concurrently". C and C++ do not help here. They push the problem off to the operating system, which has no idea what to do about thread level data ownership.
At the operating system level, we're not doing too well either. One good way to write concurrent programs is with multiple intercommunicating processes. Unfortunately, the Unix/POSIX/Linux mechanisms for interprocess communication are awful. You have byte-oriented pipes, sockets, and the seldom used "system V IPC" mechanism. None of these let you do something like an inter-process subroutine call. Subroutine call mechanisms built on top of these stream-like channels tend to be slow, clunky things like CORBA and SOAP. Windows does a bit better, but their fast approach is a legacy from OLE, which was designed for Windows 3.1 on DOS, and their slower approachs are more like SOAP. QNX has usable interprocess messaging, but few non-real-time systems are designed for QNX.
I've been writing heavily concurrent programs with threads since the 1970s. It's possible to do it well, but the tools have not improved much.
I may have misunderstood (I'm not exactly an expert in threading), but I believe that Erlang handles this is a scarily elegant manner... once assigned, a variable can not be changed.
The = operator in Erlang should be looked at in the mathematical sense, so the following (pseudo) code would fail:
a = 2
a = 1 + 3
Because 1+3 != 2
(Disclaimer: I've briefly dabbled in Erlang, but anything I say about it should be taken with several rocks of salt)
Unix's select/poll mechanism avoids all that. See, e.g., here.
PHEM - party like it's 1997-2003!
I use them routinely on MS platforms. Background threads for write-behind mechanisms, for self-tuning caches, for animation. The sharing between threads is the more-precise problem, not threads itself. If one knows how to examine the context of a thread, one can see all shared pints and code accordingly. This is no different that knowing what pieces of data are eventually exposed as public data of a component.
That said, there is a clumsy set of constructs around threading still. Most modern languages do not have the atomic test-and-flip operation around an object as you wish. For example, in the C# realm, I see this routinely: ...I'd much more appreciate the OS supporting a thread-level operation that allowed for ..where above clause was skipped if (sharedMemInitialized==true), and if not, it waited for the "sem" semaphore concept to be unlatched.
In Erlang, variables aren't variable -- they're single assignment. (There is a process dictionary that is mutable, but it isn't usually used and other threads don't have access to it). Inter-thread communication is done via message passing (which may be local or over tcp/ip).
Do you even lift?
These aren't the 'roids you're looking for.
I know nobody born in the last 30 years has bothered to read his memo, but he doesn't pretend gotos are "evil". Just that people should adopt structured control flow structures instead. Meaning, design and use languages with such advanced features as "if/else" statements, and "while" loops, and "functions". Goto considered harmful was written in a time when most people were not using the fancy new languages that offered these features, and he was suggesting that they do so, in order to improve the quality of their code.
Unless you seriously think people should use gotos instead of loops and if/else statements, then you don't disagree with Dijkstra.
I'm an aerospace engineer, but write code 40-60% of my workday. Essentially between a FORTRAN/C++ compiler and Excel, those are my design tools. In the last 4 years (I'm relatively fresh out of college), I've used GOTO once. And you are right, it's a useful tool in very specific situations, in FORTRAN. However in C++ with object-oriented programming, I have yet to see a case, in the work I do anyways, where it would make something more concise.
The Actor model, where each object is a separate thread, is the way to the future. When an actor sends a message to another actor, the message is stored in the target actor's message queue and the thread that represents the target actor is woken up to process the message. Results are delivered with future values.
With the Actor model, whatever data parallelization is there in a program is automatically exposed.
Functional programming is hard, non intuitive and even plain distasteful to me. Now I know I'm an idiot, but the problem is most programmers are idiots. The language has to make parallelism easy for us, and if it starts out being functional it's already lost that battle.
Q) Why did the multithreaded chicken cross the road?
A) to To other the side. get the
Education is the silver bullet.
I'm tired of reading replies to this article that evangelize some fancy-schmancy high-level solution. I wonder if these advocates have ever tried writing production code in such an environment.
Let me give you a wonderful example of when theory simply doesn't meet reality.
Recently, I wrote a bunch of multi-threaded code for a next-generation asymmetric-multiprocessing game console that shall remain nameless. Its operating system has a wonderful complement of synchronization features. There's the usual mutex lock/unlock, and the usual condition signal/wait, but there are also event queues (queues of generic events that can be passed between threads running on different types of processors), lightweight mutexes/conditions, spinlocks, semaphores, reader/writer locks, and so on and so on. Truly a rich palette from which one can paint a wonderfully synchronized multi-threaded application! I then proceeded to try to rewrite a key section of our code in a very multi-threaded way.
The problem was, the first version of this code added NINETY milliseconds per frame to our main thread. A profile showed that nearly all of the extra time was spent in the operating system's synchronization features.
After much rewriting and much pain, I stopped using all of the operating system's synchronization features, and used processor-level atomic operations instead, and finally, the extra code accounted for only FOUR milliseconds per frame in our main thread (with the rest of the time successfully farmed out to separate threads).
I challenge anyone with a fancy-schmancy automatic concurrency solution to demonstrate that it doesn't have this problem.
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters