Fundamentals Of Multithreading
Bob Moore writes "SystemLogic has got
a very thorough article on multithreading. Deals with Amdalh's Law, Latencies and Bandwidth, On-Chip Multiprocessing, Course-Grained Multithreading, Fine-Grained Multithreading, Simultaneous Multithreading, and Applications Of Multithreading. This is definately a good one."
Well, the article's been /.ed.
Having done a moderate amount of multi-threaded programming, may main complaint with the designs I've seen is that they often equate a conceptual task with a thread. This results in apps that bloat into fifty thread when all they need is two. Indeed, lots of apps don't multiple threading at all. Threads are costly in terms of the time spent switch tasks and in terms of the mental energy required to consider the different paths of execution.
Joe Solbrig
WON'T SOMEBODY PLEASE THINK OF THE CHILD PROCESSES!!!!
I am the owner of SystemLogic.net, just to let you know, we officially got /.'ed. I'm trying to log in to the site now to take off a couple scripts in order to lighten the load but I can't get in.
Just letting you guys know I'm working on it.
Strictly speaking, you are right - multiple threads doesn't increase the maximum capabilities of your computer. However, I hope you realize that in practice having multiple threads is absolutely essential to get the best performance. Without multiple threads, the only way the second sentence above holds true is if you have complete control of the hardware at the lowest level - your application is the OS - or your application NEVER has any processing it could easily do while blocked waiting for I/O operations to complete, or is completely CPU intensive and never blocks in the first place. For typical applications, though, especially anything interactive, it is simply IMPOSSIBLE to achieve the same kind of throughput with a single threaded architecture as what you would get with an appropriately designed multithreaded one.
I do agree with you that a lot of developers often use threads as if they are each going to be executed on their own CPU. Of course this is wrong. The previous poster's example of using synchronized threads to implement a finite automaton with implicit state management was particularly apalling. All that is needed for that is single threaded coroutines. Introducing the non-determinism of multiple threads is a recipe for disaster if you don't know what you're doing. One synchronization bug that never shows up on your single CPU development box is all it takes.
Anyway, I see what you're getting at, but your comment that you never need more than single thread for a single CPU is misleading, as it isn't true in many cases. In your own example, you are using multiple threads (with good reason) on what is presumably a single CPU.
ftp://plg.uwaterloo.ca/pub/uSystem/uC++book.ps.gz
It approaches the subject from a more theoretical rather than applied point of view, but if you understand all of the concepts in this book you will have a better working knowledge of concurent programming than 99% of the programmers in the industry!
And, it looks like you did a good job sorting it out.
Great site, and a great service. Thank you!
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
I can say that as a programmer, my value is significantly increased by being proficient in multithreaded programming (beyond Java, FWIW). If this article sparks any interest in people, do read further and practice, practice, practice! The people that I've interviewed in the past who have a strong working knowledge of multithreading get a lot of points in my book, and I'm sure other "aware" employers do the same.
Keep in mind, however, that just knowing how to launch a thread isn't enough. If your code isn't reentrant and thread safe, launching a thread isn't worth a damn.
Good article... now if I could only get to it ;-)
--
Never hit your grandmother with a shovel, for it leaves a bad impression on her mind...
It's not an article on Slashdot. It's just a link to somebody else's article.
-- Ed Avis ed@membled.com
Some interesting papers on the design of the Cray (nee Tera) MTA (multi-threaded architecture) machine are here
Just junk food for thought...
Is this, like, during midterms, when you're trying to study for all your classes at once?
Just junk food for thought...
While it is extremely well written and very informative and interesting, I have a feeling that a lot of developers will read it through expecting information that it is not providing. This article is, at least from my interpretation, largely analyzing the designs of various processors/hardware platforms. It is not (again IMHO) discussing software development multithreading techniques, so if you're looking to it for information on how to pThread your application, or the pitfalls of multithreading, or whether a state machine is more efficient than synchronous threads, you won't find what you're looking for. Do read the intropage about RC5 & S@H though as that is fascinating, though it applies primarily to the constraints of the hardware system.
Still very interesting though.
Based on the description of the article, I looked up some things. What can I say? Somebody modded me down, so I'm at 49, and I'm incomplete without that karma point.
Amdahl's law
Amdahl's law
On chip multiprocessing
Simultaneous multithreading
If tits were wings it'd be flying around.
It's like a multi-cpu machine, except instead of only sharing the bus, and possibly the L2 cache, it shares the L1 cache, the TLB, and a few other things. Also, the second CPU is on the same fleck of silicon as the first.
Need a Python, C++, Unix, Linux develop
I can't agree with this more!
A better analogy is multithreading and recursion. Just because you CAN use recursion to solve a problem, doesn't mean you should. An iterative solution is (almost?) always faster and simpler, and often more elegant.
Just because you can make your application multi-threaded, doesn't mean you necessarily should. Ask yourself what problem are you solving with threads that you can't do otherwise (besides quenching your thirst for pain and suffering)?
I've talked to programmers who think that making an application multi-threaded will, in and of itself, make the application "better." When pressed for reasons they usually end up shaking their heads muttering something about multiple CPUs or modularized code.
I feel that being able to multithread code effectively in Java would make a programmer advanced in that topic.
Here's the best book on the subject:
Doug Lea, Concurrent Programming in Java. Second Edition: Design Principles and Patterns
book home page
author home page (pointer to online supplement for the book)
at Fatbrain
at Amazon
Stupid job ads, weird spam, occasional insight at
Silly rabbit. That's a mov. Now the tricky part is if you in 16bit mode or 32bit mode. You're loading 0x21cd4c in to a register. I don't know what's so special about that number though.
I can pretty much tell what kernel you're running by looking at the first 200 bytes or so. (They change the boot every major revision)
Here's one for you: fc fa
This is my signature. There are many signatures like it but this one is mine..
It's not a "Patch to Photoshop" which allows multiprocessing on the Mac OS, but rather usage of the Multiprocessor Services API. Any application can spawn a thread to be scheduled preemptively among however many processors are on the system. While it's true that the main system event loop (think message pump) is restricted to the main processor, these MP threads can execute network and disk IO. It's not as nice as SMP, but it's not the hack the ill-informed author of the article seems to think it is.
The CPU has a separate set of registers, flags, segment tables (on x86) and so on for each thread that it is running simultaneously. When the instructions enter the pipelines they are tagged with bits to say which thread they are actually operating on and the execution units of the CPU use that set of registers and flags for the data required to run the instruction.
If you think of a four-way SMT CPU as having four sets of everything then you are starting to get the idea. For example on an x86 you would have four different CS:IP pairs, each one providing instructions for a different thread. Once those instructions are loaded and decoded the instructions in the pipelines are tagged with which CS:IP pair provided them. On execution if a register is referenced, then the register is loaded from the register set that corresponds to that tag (mov EAX,0 would affect the EAX #1 if it came from CS:IP #1, affect EAX #2 if it came from CS:IP #2 etc).
As there are implicitly no register dependancies between these instructions, any stalled instruction which is waiting for results from another execution unit does not hold up execution of instructions from other threads.
This model of execution speeds up performance considerably without the requirement of having multiple CPU cores on the one piece of silicon as you are getting far more efficiency from the one set of execution units.
For more information, Paul DeMone had an artice over at realworldtech a while back on the EV8 and how it worked. Take a look - it was quite interesting.
Fear: When you see B8 00 4C CD 21 and know what it means
If you are memory bound, increasing the number of threads a CPU can run will help a little, but not significantly as sooner or later all instructions are going to be stalled waiting for memory. If you have a rich register set then you have the ability to run more instructions before you have to stall on memory loads.
Unless you have a well designed memory interface that can support multiple outstanding transactions, and sufficient registers to allow other threads to continue without accessing memory while one is blocked on memory access you aren't going to be gaining the full performance from SMT.
Of course, I've been know to be wrong.
Fear: When you see B8 00 4C CD 21 and know what it means
Interesting paper. It's certainly a novel way of using SMT to combat the memory/internal clock disparity. Possibly even more effective on x86 where you really want a lot of stuff in L1 because you hit memory all the time!
Fear: When you see B8 00 4C CD 21 and know what it means
Dinkumware, of course (well, actually the one that ships with MSVC). The painful thing is even though we have the source it is so damn hard to read that any changes we make couldn't accurately be ported through upgrades.
;-)
We considered going to SGI, but ended up just working around the bugs when we saw the SGI implementation of std::string wasn't ref counted at all (performance loss bigtime). Hopefully some of them will be figured out by the time VC.NET ships.
Is there any other way to program an MVS/390 if you are pathologically allergic to COBOL?
Fear: When you see B8 00 4C CD 21 and know what it means
If there are more threads than register sets, you have to do normal context switching. I think 4 threads is about the limit at the moment.
To my knowledge there are no CPUs available at the moment that do SMT. I'm not even sure if there are operating systems that support it (you need OS support to load the thread specific context of each CPU register set). The Alpha EV8 will probably be the first mainstream CPU to support it, though there were plenty of rumors that an upcoming revision of the P4 Xeon will support it as well.
It should be noted that SMT does nothing for you if the CPU is tied down in memory stalls, thus the x86 architecture is probably going to gain the least from this as it is very register starved and hence dumps things to memory all the time. Running more threads just increases the required memory bandwidth and so you need a very fast memory system (which the EV8 has) to keep up with everything.
Fear: When you see B8 00 4C CD 21 and know what it means
SMT is where you have one processor core executing several threads at the same time without having to context switch. The CPU maintains state (registers and flags etc.) for each thread and can execute instructions from each thread simultaneously down different pipes. This improves throughput as you don't have the overhead of task switching and you also have a far better chance of keeping your pipes full.
Naturally, it requires OS support for it to work, but most CPU manufacturers are looking to go this way in the near future.
Fear: When you see B8 00 4C CD 21 and know what it means
Any programming construct can be harmful. Pointers, explicit memory allocation, anything! However, if interfaces are clearly defined, and and the code is kept simple (ie. lack of feature creep, something that the "new" (GNOME, KDE, etc) UNIX guys don't seem to understand) threading is just as harmless as pointers. I'm not going to get theoretical here, but I'll give you an actual example: BeOS. Say what you will about it being dead or the company being stupid or whatever, it has a kick-ass threading implementation. The app_server regularly runs with 60+ threads and the damn thing only crashes on me when I'm playing with a kernel driver. The apps, too, are stable, even though they are forced into using multi-threading due to the GUI architecture. If you want to see why this is the case, take a look at the BeBook (the API). Every time there is a possible thread interaction, they warn you about it. Just as you have to keep memory ownership clear, you have to do the same thing for threads. Theoretical rules aside, an entire platform begs to differ with you.
A deep unwavering belief is a sure sign you're missing something...
SMT is where you have one processor core executing several threads at the same time without having to context switch. The CPU maintains state (registers and flags etc.) for each thread and can execute instructions from each thread simultaneously down different pipes. This improves throughput as you don't have the overhead of task switching and you also have a far better chance of keeping your pipes full.
Interesting. I don't understand how the processor can switch from one thread to another without doing a context switch. Would you mind expanding on that a little bit?
The processor doesn't switch at all; it actually runs multiple threads at the same time (hence "simultaneous") because it has a separate set of registers for each thread.
Thanks for this explanation and to the others who responded. I guess I did not properly understand the original explanation by throx. I take it that if there are suddenly more threads than there are separate register sets, the processor is then will be forced to do context switching. What off-the-shelf processors support SMT, do you know?
Its such a pity transputers didnt do better, good C compilers with CSP support would have helped... communicating serial processes are just so much more elegant than threads communicating through shared memory without any other protection against the various possible contentions than the one the programmers try (and inevitably fail) to ensure themselves :(
Are you saying that threads should communicate via message passing? Or are you saying each thread should have its own processor and memory? Please clarify.
An API which only allowed message passing to me would seem sufficient for most people's uses. That doesnt mean you cant pass references, for efficiency, you'd just make it so passing it implicitly put whats referenced out of scope. Shared memory where multiple processes read the memory and one or more processes perform unsynchronized writes are the exception, why make it the common case in the language and put all the burden for safety on the developer?
In my opinion, there should be no threads at all or rather just one thread, the OS thread. Software should be super-parallel. It should be a collections of primitive objects or cells that just sit there waiting for a signal to do something and send another signal to allert other cells that it's done. The operating system and the CPU should support this paradigm at the fundamental level. I envision that the OS would maintain two lists of cells: input cells and output cells. It would process one list while inserting cells into the other. This is kind of like the way some neural networks are implemented. Programming would then consist of dragging cells into a work space and connecting them together to form higher level objects. Just one man's opinion.
I can't get on the site. In the meantime, can someone tell mean what they mean by "simultaneous multithreading?" It sounds somewhat redundant.
The server appears to be slashdotted. Time to make up conversation while it recovers. ^_^
Reguardless of whether your system/OS has multiple processors or can handle multiple process execution, writing your code to be multithreaded might be a good design choice. If nothing else it forces and enforces abstraction and "compartementalize" of the design and code.
The one huge draw back to writting code that is multithreaded is the syntax baggage you must carry around and use to keep the system sane. Even in thread friendly languages Java where the language semantics try to help users write clean multithreaded code its still a non-trival thing to support. Bug can be very obscure and extremely non trival to solve in multithreaded code not to mention tools can become cumbersome(which stack am I asking for the value of "counter" on?).
Its too bad that writing threaded code is still considered to be an "advanced coding skill".
ZooLib supports Linux, BeOS for x86, Mac OS PowerPC and 68k, and Windows out of the box. It can be bound to other platforms in a straightforward way.
I believe the Mozilla framework is multithreaded as well.
Mike
-- Could you use my software consulting serv
In MacOS versions after 7.5.5 but before X, the OS DOES have MP support (much improved in Mac OS 9, btw) but it's asymmetric MP rather than symmetric. Furthermore, the OS itself does not take any advantage of it, and ALL processes are spawned on CPU 0. However, applications can execute threads on either CPU, which means that applications (quicktime, photoshop, itunes, Quake 3...) can take advantage of multiple CPUs.
This is completely different from the SMP model in Mac OS X, Linux, Solaris, Windoze NT/2000, etc.
-- "Those who cast the votes decide nothing. Those who count the votes decide everything." -Joseph Stalin
Got Rhinos?
http://cide1.dhs.org/thread.html
-- the computer doesn't want any beer, no matter how much you think it does. NEVER, EVER feed your computer beer.
From the article (in reference to SMP OS support):
There are exceptions to this rule, such as the Mac, where Photoshop has a patch which has multithreaded support for G4's even though no Mac OS below Mac OS X supports multiprocessing. Generally speaking, the computer needs OS support as well as application support to take advantage of multiprocessing.
Maybe I am not reading this correctly, but is it saying that a patch to an APPLICATION gave the OS SMP support? I don't see how this could work ... SMP support needs to be from the OS. Not only would the OS need to be designed from the ground up to support multiple CPUs (ie, the scheduler, interrrupts, etc) but it would need to be thread-safe itself (proper locking, etc.).
So, either tell me "yah, of course, the article is wrong" or can anyone explain how a patch to PhotoShop could give it SMP support on a non-SMP-aware OS?
Thanks,
RobertThreads are popular (hype aside) because they are a simple abstraction. Simple is usually good--but it's not when it introduces as many pitfalls as threads do. If you don't believe me, I'm probably not going to convince you--but reading a balanced treatment (eg, a systems textbook, not pro-Pthreads, -Win32, -Java hype) might. Most programmers shouldn't write threaded production code, period. Almost every experienced programmer I've talked to agrees.
There are demonstrably better abstractions for almost all problems that threads can solve. Co-routines, continuations, event models, message queues, sockets, shared memory. "Demonstrably" means they get the job done, but clearly introduce fewer possibilities for error and are easier to debug. They have higher conceptual overhead than threads, but they usually pay off. If you think you absolutely need threads for performance--prove it, with hard numbers.
If you use threads, be sure to understand exactly why you're using them and spec your model precisely. Review threaded code and perform load tests early and often.
The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
Perhaps if their server was multi-threaded, I'd be able to access the page...
:-/
This is a great article to use for education. It would have saved me some headache when I first tried learning about threads.
...but I'm reading Slashdot right now, and I can't do two things at once.
Its such a pity transputers didnt do better, good C compilers with CSP support would have helped... communicating serial processes are just so much more elegant than threads communicating through shared memory without any other protection against the various possible contentions than the one the programmers try (and inevitably fail) to ensure themselves :(
To be pendantic there is an article on /., pointing to an article on System Logic about multithreading.
I want to see more articles on /. about technical topics, not editorials, which I feel are often poorly constructed. Whether the /. article is a "pointer" or not is secondary to me.
more technical content, please.
Hey guys, just to let you know, I'm the owner of SystemLogic.net. Yeah, we got /.ed, I'm trying my best to get things up and running once again, but it's hard when you can't login to FTP, web, or telnet! I'm talking to my hosting company now to see if there's anything we can do. In the meantime, if anybody has access to a good server with PHP installed I could technically get a mirror up...dave@systemlogic.net