Linux 2.6 Multithreading Advances
chromatic writes "Jerry Cooperstein has just written an excellent article explaining the competing threading implementations in the upcoming 2.6 kernel for the O'Reilly Network."
← Back to Stories (view on slashdot.org)
While it's great that Linux has excellent multithreading support, it's a shame, however, that many programmers do not take advantage of multi-threading in their programs.
The worst example of this was the Quake I source code, which was used for many games, including Half-Life. The code was not multi-threaded, and the network code sat idle while everything else drew -- adding about 20ms of lag, unless you cut the frame rate down to about 15 or so.
The problem wasn't fixed in Half-Life -- the most popular multiplayer game of all time -- until sometime in 2000. We can only imagine how many other programs are not taking full advantage of multithreading.
From what I understand NGPT is mainly a user space thing. Why not go with the 1:1 one in the kernel (NPTL or whatever), just have a libpthread.so (NPTL runtime) and libpthread-mn.so (NGPT). From a programmer's standpoint, when I say pthread_create() I want to know exactly what that does: with NPTL I know what happens. With NGPT I don't. Also, the old rule of "Don't pay for what you don't use" applies. If I'm going to have just, say, four threads, those four threads are going to run better as four kernel threads as opposed to 2 LWP's dynamically mapped between 4 CPU contexts.
But, again, I might want to write a server of some sort which handles hundreds of thousands of connections at once, but 99% are idle at any given time and the other 1% require some nontrivial processing sometimes and/or a long stream of data to be sent without prejudicing the other 99%. Now, for ANY 1:1 threading system, I can't just create x * 10^5 threads because the overhead would be colossal. But equally so, implementing this with poll() is going to be horrid, and if the amount of processing done on a connection is nontrivial and/or DoS'able, there's going to be tons of hairy context management code in there, until lo and behold you end up with a 1:N or M:N scheduling implementation yourself. NGPT could be very useful as a portable userspace library here, as these people have implemented an efficient M:N scheduler under GPL, something that hasn't existed before and could be very useful. I think these libraries might be much more complimentary than the article makes out.
So, someone who knows... Are these threading systems good for different things? And would it really be that hard to make them both come with the kernel?
They both implement the POSIX threading API (a good thing IMHO). NPTL is more radical; the IBM team made a conscious decision to keep the impact of their changes to the minimum. For that reason, I expect that NGPT will be accepted; it has a shorter path to deployment in production systems, even though NPTL is a more "correct" solution (i.e. it uses purely kernel threads). But it changes userspace, libc and the kernel - it will be much harder to verify.
Are these threading systems good for different things? And would it really be that hard to make them both come with the kernel?
Developers shouldn't care, or more accurately it doesn't matter for them. Both implement POSIX threads, so it simply depends what is installed on the system on which their code ends up running - the same application code will work the same on both, altho' each will have its "quirks". Sysadmins will prefer the NGPT because it is easier to deploy and test. Linux purists will prefer NPTL because a) it's the "right" way to do it, and b) it was written by Red Hat.
They could both come with the kernel source and you could choose one when you compiled it. I don't see how they could coexist on a single system.
I am not well-versed in the world of Linux, ( have my own allegiances but am being drawn to it more and more. Reading the article, it felt very clear to me that Linux will prevail (with a nod to William Faulkner's Nobel speech).
Consider a few quotes from the article:
Perhaps others have already pointed this out, but I am newly impressed with the universal nature of Linux. The power of an operating system that *everyone* is interested in improving, and has the opportunity to improve, is awesome. Yes, Microsoft has tremendous resources, and very earnest, good-willed, brilliant people. But to improve Microsoft's kernels, you have to work for Microsoft. That means switching the kid's schools, moving to Redmond, etc. etc. On the other hand, everyone from IBM to HP to some kid in, say, Finland, can add a good idea to Linux. When the kernel's threads implementation is a topic for conversation at conferences, with multiple independent teams coming up with their best ideas, Linux is sure to win in the long run.
I'm struck by the parallels to my own field of scientific research: Yes, the large multinational companies have made tremendous contributions in materials science, seminconductors, and biotech. They work on the "closed-source", or perhaps "BSD" model of development. But it is the "GPL"-like process of peer-reviewed, openly shared, and collaborative academic science that has truly prevailed.
four nine eighteen twenty-7 thirty-nine forty-7 fiftyeight sixty-nine seventy-9 eighty-8 one-hundred-and-nine one-twenty
Correctly programming threads is hard, so they should only be used when necessary. Many of the things that can be done with threads can be done more safely with fork() and/or select(). Since Windows lacks the former and has a broken version of the latter, Windows programmers tend to use threads when Unix programmers would use an alternative.
You poor Unix guys are struggling through something we all went through years ago -- learning how to think more sophisticated than a single thread of control correctly.
What the heck does altering the structure of a thread *library* have to do with application-level thread programming? What are you talking about?
May we never see th
Because 1:1 implementations are well known to not scale well because of context switch overhead and synchronization overhead.
For systems that don't require true high-end scalability, 1:1 works fairly well. It's because of this that M:N has some proponents.
I guess I agree that we shouldn't do a context switch just for executing a single xchg instruction. But if the resource is busy, user level scheduler can not make a good decision. For one thing, it can only switch to threads in the same process where as kernel can make a global decision, such as switching to a process holding the resource we are waiting for. Also, user scheduler doesn't have execution statistics - working set, % of cpu slice used I/O behaviour etc - even for it's own threads. It can only do round-robin scheduling rather than optimizing potentian througput based on each thread's history.
The change in thinking for this is argued in this Sun Whitepaper , and this FAQ .
If one believes the Sun guys have a clue, you can take this as a vote in favor of 1:1.
IMO, anyone who runs more than about 4*NCPUS threads in a program is an idiot; the benchmarks on 10^5 threads are absurd and irrelevant.
Once you run a reasonable number of threads, you can be quickly driven to internal queueing of work from thread to thread; and by the time you have done that, you may already have reached a point of state abstraction that lets you run event driven in a very small number of threads, approaching NCPUs as the lower useful limit. Putting all your state in per-thread storage or on the thread stack is a sign of weak state abstraction.
-dB
"It if was easy to do, we'd find someone cheaper than you to do it."