Slashdot Mirror


Linux 2.6 Multithreading Advances

chromatic writes "Jerry Cooperstein has just written an excellent article explaining the competing threading implementations in the upcoming 2.6 kernel for the O'Reilly Network."

16 of 194 comments (clear)

  1. Non-threaded programs by SexyKellyOsbourne · · Score: 2, Insightful

    While it's great that Linux has excellent multithreading support, it's a shame, however, that many programmers do not take advantage of multi-threading in their programs.

    The worst example of this was the Quake I source code, which was used for many games, including Half-Life. The code was not multi-threaded, and the network code sat idle while everything else drew -- adding about 20ms of lag, unless you cut the frame rate down to about 15 or so.

    The problem wasn't fixed in Half-Life -- the most popular multiplayer game of all time -- until sometime in 2000. We can only imagine how many other programs are not taking full advantage of multithreading.

    1. Re:Non-threaded programs by awol · · Score: 4, Insightful

      Many coders are disinclined to use threads, because they don't necessarily improve code speed.



      Further there are a number of examples where writing a single threaded application has definitive benefits. For example applications where deadlocks or race conditions would be an integral problem in a multithreaded implementation whilst a single thread has none of these problems.


      --
      "The first thing to do when you find yourself in a hole is stop digging."
    2. Re:Non-threaded programs by 0x0d0a · · Score: 4, Insightful

      While it's great that Linux has excellent multithreading support, it's a shame, however, that many programmers do not take advantage of multi-threading in their programs.

      Multi-threading is an easy way to cut down response latency in programs and produce a responsive UI. Unfortunately, it also has many drawbacks -- it can actually be slower (due to having to maintain a bunch of locks...you're usually only better off with threads if you have a very few), and it's one of the very best ways to introduce very hard to debug bugs.

      I do think that a lot of GTK programmers, at least, block the UI when they'd be better off creating a single thread to handle UI issues and hand this data off to the core program. Also, when doing I/O that doesn't affect the rest of the program heavily, it can be more straightforward to use threads -- if you have multiple TCP connections running, it can be worthwhile to use a thread for each.

      There are a not insignificant number of libraries that are non-reentrant, and have issues with threads. Xlib, gtk+1 (dunno about 2), etc.

      Threading is just a paradigm. Just about anything you can manage to pull off with threading you can pull off without threading. The question is just which is cleaner in your case -- worrying about the interactions of multiple threads, or having more complex event handlers in a single-threaded program.

      The other problem is that UNIX has a good fork() multi-process model, so a lot of times when a Windows programmer would have to use threads, a UNIX programmer can get away with fork().

      So you only really want to use threads when:
      * you have a number of tasks, each of which operates mostly independently
      * when these tasks *do* need to affect each other, they do so with *large* amounts of data (so the traditional process model doesn't get as good performance).
      * You have more CPU-bound "tasks" than CPUs, so you derive a benefit from avoiding context switching that characterizes the fork() model.
      * you are using reentrant libraries in everything that the threads must use.

    3. Re:Non-threaded programs by Anonymous Coward · · Score: 1, Insightful

      Not, to mention that sometimes the threading libraries are broken and just plain don't work.

      I was working on a game for the PS2 last year. The initial prototype had been developed under Windows before we knew the target platform. The design was great, we used multiple threads, and the performance was great. We had a lot of flexibility with the graphics because our responsiveness didn't depend on our framerate.

      Once we started the port to the PS2, we ran into major, unfixable problems with the Sony libraries' ability to create and manage threads. I'd be that Sony has fixed these problems by now, but at the time, we had no way to solve our problem.

      So the end result is that we had a butt-simple infinite loop that polls each of our subsystems once per frame (60Hz). Simple code, good performance, and very, very portable to other platforms (PC, XBox, etc.). As a result, we figured that if we were running at 60 fps (pretty much a given for console games these days), the latency we were seeing in the other parts of the code would be negligible.

      I guess the moral is that _sometimes_ simple code that's slightly less efficient is preferable to more sophisticated code, especially on platforms which are a bitch to debug.

    4. Re:Non-threaded programs by Minna+Kirai · · Score: 3, Insightful

      has built-in language support and features for threads, it is becomming almost second nature to think in terms of threads. What a wonderful language!

      This is partly a matter of taste, but I dislike languages that are excessively large. That is, when given the choice between implementing a feature in the language itself or in the standard libraries (which are built in, or at least interfaced via, the same language) you should try to use the language you already have.

      Academics prefers this because it follows principles like Occam's Razor and MDL (minimum description length, an artificial intelligence related term for program quality).

      This simplifies your language definition, but transfers some complexity to your library documentation- which is optional reading for learning the language. And it makes the language more extensible in the future. The classic example that C++ advocates pick on is Java's String class. Two Java Strings support the "+" operator to concatenate them as a special language feature. But 3rd party library developers cannot support "+" with their own Objects, like complex numbers or string-like series of non-character data.

      The same argument can be applied (with much more complexity and opportunity for disagreement or plain old error) to the question of including threading support native in the language, rather than as an external library. Language-supporters may say "The language natively provides the CPU's logical, arithmetic, and memory management operations. Threads are just as fundamental, and should go there too". The Library guys respond "No useful program lacks logic,arith,and memory. But we've gotten by fine for decades without threads. They're OPTIONAL. And not all OSes support threads- you want to make them incompatible with your language then?"

      It goes back and forth, but winds up with a pro-Library argument backed up by programming language theory- language support for threads offers no more expressive power than library support, so they should be kept in the standard libraries. So C/C++ adopted this approach (or rather, C++ kept the C approach as it had been justified).

      It sounds great to theorists, who think that even C++'s 4 styles of parentheses are redundant and excessive compared to what's used in Lisp. But outside of conceptual language design, there's a large practical problem which has retarded the performance of C++ programs to this day: backwards compatibility. Specifically, compatibility of new source code with old linkers. (This problem applied somewhat to the acceptance of other compiled languages besides C++)

      To get any acceptance, new versions of C++ needed to be compatibile with user's existing C libraries. And to reduce the workload of C++ compiler developers, they made C++ compilers fit into the C workflow (compile, compile... & link) as directly as possible.

      But that undermines one big assumption of the "provide important features IN the language, not AS the language" crowd- the assumption that the compiler is very, very good. Their quality metric ignored the ease of the compiler making good binary code from your source- as long as the language has the ability to express their intent compactly and unambiguously they're happy- but that intent may not be clear if the computer isn't looking at the whole program.

      A compiler can make global optimizations if it considers the whole program at once, avoiding the function-call overhead of using external functions for core features. But the C developement processs- only giving the compiler small sections of code at once, and then depending on a separate program to link them together- means that the compiler simply can't make the best choices, burdened with incomplete information. (Today, we sometimes have smarter linkers which support more function inlining and const propagation, but they're a poorer solution than using compilers all the way through).

      So, this lack of super-good compilers is why pulling more features into a language definition has been helpful, even though plenty of CS graduate theses say it shouldn't be so.

      I don't think C/C++ is a good language either- except to implement other languages in! (Where "other languages" may include all the graphics, networking, compression, and other low-level code that Ada95 programs access via C bindings). And to make the academics most happy, that language should be Lisp or ML, which can then be used to write any other compiler/interpreter you might wish.

      There are other valid reasons why C++ is still heavily used, but they're mostly shortsighted and based in legacy compatibility ("We've always written desktop applications that way!")

    5. Re:Non-threaded programs by Salamander · · Score: 3, Insightful
      There are definitely cases where using multiple threads on a single-processor system can degrade performance (switching, locking, etc.).

      This is only a factor with a poor multithreaded design. By contrast, single-threaded programs always fail to take advantage of multiple processors, no matter how well they're designed otherwise.

      --
      Slashdot - News for Herds. Stuff that Splatters.
  2. I don't see why the two are mutually exclusive. by Second_Derivative · · Score: 5, Insightful

    From what I understand NGPT is mainly a user space thing. Why not go with the 1:1 one in the kernel (NPTL or whatever), just have a libpthread.so (NPTL runtime) and libpthread-mn.so (NGPT). From a programmer's standpoint, when I say pthread_create() I want to know exactly what that does: with NPTL I know what happens. With NGPT I don't. Also, the old rule of "Don't pay for what you don't use" applies. If I'm going to have just, say, four threads, those four threads are going to run better as four kernel threads as opposed to 2 LWP's dynamically mapped between 4 CPU contexts.

    But, again, I might want to write a server of some sort which handles hundreds of thousands of connections at once, but 99% are idle at any given time and the other 1% require some nontrivial processing sometimes and/or a long stream of data to be sent without prejudicing the other 99%. Now, for ANY 1:1 threading system, I can't just create x * 10^5 threads because the overhead would be colossal. But equally so, implementing this with poll() is going to be horrid, and if the amount of processing done on a connection is nontrivial and/or DoS'able, there's going to be tons of hairy context management code in there, until lo and behold you end up with a 1:N or M:N scheduling implementation yourself. NGPT could be very useful as a portable userspace library here, as these people have implemented an efficient M:N scheduler under GPL, something that hasn't existed before and could be very useful. I think these libraries might be much more complimentary than the article makes out.

  3. Re:So are they both useful? by sql*kitten · · Score: 5, Insightful

    So, someone who knows... Are these threading systems good for different things? And would it really be that hard to make them both come with the kernel?

    They both implement the POSIX threading API (a good thing IMHO). NPTL is more radical; the IBM team made a conscious decision to keep the impact of their changes to the minimum. For that reason, I expect that NGPT will be accepted; it has a shorter path to deployment in production systems, even though NPTL is a more "correct" solution (i.e. it uses purely kernel threads). But it changes userspace, libc and the kernel - it will be much harder to verify.

    Are these threading systems good for different things? And would it really be that hard to make them both come with the kernel?

    Developers shouldn't care, or more accurately it doesn't matter for them. Both implement POSIX threads, so it simply depends what is installed on the system on which their code ends up running - the same application code will work the same on both, altho' each will have its "quirks". Sysadmins will prefer the NGPT because it is easier to deploy and test. Linux purists will prefer NPTL because a) it's the "right" way to do it, and b) it was written by Red Hat.

    They could both come with the kernel source and you could choose one when you compiled it. I don't see how they could coexist on a single system.

  4. Linux will prevail by mithras+the+prophet · · Score: 3, Insightful

    I am not well-versed in the world of Linux, ( have my own allegiances but am being drawn to it more and more. Reading the article, it felt very clear to me that Linux will prevail (with a nod to William Faulkner's Nobel speech).

    Consider a few quotes from the article:

    The LinuxThreads implementation of the POSIX threads standard (pthreads), originally written by Xavier Leroy
    A group at IBM and Intel, led by Bill Abt at IBM, released the first version of the New Generation POSIX Threads (NGPT) library in May 2001
    On March 26-27, 2002, Compaq hosted a meeting to discuss the future replacement for the LinuxThreads library. In attendance were members of the NGPT team, some employees of (then distinct) Compaq and Hewlett-Packard, and representatives of the glibc team
    On September 19, 2002, Ulrich Drepper and Ingo Molnar (also of Red Hat) released an alternative to NGPT called the Native POSIX Thread Library (NPTL)

    Perhaps others have already pointed this out, but I am newly impressed with the universal nature of Linux. The power of an operating system that *everyone* is interested in improving, and has the opportunity to improve, is awesome. Yes, Microsoft has tremendous resources, and very earnest, good-willed, brilliant people. But to improve Microsoft's kernels, you have to work for Microsoft. That means switching the kid's schools, moving to Redmond, etc. etc. On the other hand, everyone from IBM to HP to some kid in, say, Finland, can add a good idea to Linux. When the kernel's threads implementation is a topic for conversation at conferences, with multiple independent teams coming up with their best ideas, Linux is sure to win in the long run.

    I'm struck by the parallels to my own field of scientific research: Yes, the large multinational companies have made tremendous contributions in materials science, seminconductors, and biotech. They work on the "closed-source", or perhaps "BSD" model of development. But it is the "GPL"-like process of peer-reviewed, openly shared, and collaborative academic science that has truly prevailed.

    --
    four nine eighteen twenty-7 thirty-nine forty-7 fiftyeight sixty-nine seventy-9 eighty-8 one-hundred-and-nine one-twenty
  5. Re:Oh crap, I wish I didn't have to say this... by Waffle+Iron · · Score: 3, Insightful
    . How come y'all are switching to a thread-based model now? Was the other way running out of steam?

    Correctly programming threads is hard, so they should only be used when necessary. Many of the things that can be done with threads can be done more safely with fork() and/or select(). Since Windows lacks the former and has a broken version of the latter, Windows programmers tend to use threads when Unix programmers would use an alternative.

  6. Re:Oh crap, I wish I didn't have to say this... by 0x0d0a · · Score: 3, Insightful

    You poor Unix guys are struggling through something we all went through years ago -- learning how to think more sophisticated than a single thread of control correctly.

    What the heck does altering the structure of a thread *library* have to do with application-level thread programming? What are you talking about?

  7. Re:Why poll? or why M:N? by GooberToo · · Score: 2, Insightful

    Because 1:1 implementations are well known to not scale well because of context switch overhead and synchronization overhead.

    For systems that don't require true high-end scalability, 1:1 works fairly well. It's because of this that M:N has some proponents.

  8. Re:Mode switching times. by iamacat · · Score: 2, Insightful

    I guess I agree that we shouldn't do a context switch just for executing a single xchg instruction. But if the resource is busy, user level scheduler can not make a good decision. For one thing, it can only switch to threads in the same process where as kernel can make a global decision, such as switching to a process holding the resource we are waiting for. Also, user scheduler doesn't have execution statistics - working set, % of cpu slice used I/O behaviour etc - even for it's own threads. It can only do round-robin scheduling rather than optimizing potentian througput based on each thread's history.

  9. Compare to Solaris evolution by dbrower · · Score: 5, Insightful
    For a long time, Sun used M:N threading, and many people thought this was a good idea. They have recently changed their minds, and been moving towards 1:1.

    The change in thinking for this is argued in this Sun Whitepaper , and this FAQ .

    If one believes the Sun guys have a clue, you can take this as a vote in favor of 1:1.

    IMO, anyone who runs more than about 4*NCPUS threads in a program is an idiot; the benchmarks on 10^5 threads are absurd and irrelevant.

    Once you run a reasonable number of threads, you can be quickly driven to internal queueing of work from thread to thread; and by the time you have done that, you may already have reached a point of state abstraction that lets you run event driven in a very small number of threads, approaching NCPUs as the lower useful limit. Putting all your state in per-thread storage or on the thread stack is a sign of weak state abstraction.

    -dB

    --
    "It if was easy to do, we'd find someone cheaper than you to do it."
    1. Re:Compare to Solaris evolution by be-fan · · Score: 3, Insightful

      IMO, anyone who runs more than about 4*NCPUS threads in a program is an idiot; the benchmarks on 10^5 threads are absurd and irrelevant.
      >>>>>>>>>
      Typical *NIX developer. Threads are useful for two things:

      1) Keeping CPUs busy. This is where the whole NCPU business comes from.
      2) Keeping the program responsive. *NIX developers, with their fear of user-interactive applications, seem to ignore this point. If an external event (be it a mouse click or network connection) needs the attention of the program, the program should respond *immediatly* to that request. Now, you can achieve this either by breaking up your compute thread into pieces, checking for pending requests after a specific amount of time, or you can just let the OS handle it. The OS is going to be interrupting your program very 1-10 ms anyway (timer interrupt) and with a good scheduler, it's trivial for it to check to see if another thread has become ready to run. The second model is far cleaner than the first. A thread becomes a single process that does a single specific task. No internal queueing of work is necessary, and threads split up according to logical abstractions (different tasks that need to be done) instead of physical ones (different CPUs that need to be kept busy).

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:Compare to Solaris evolution by dbrower · · Score: 3, Insightful
      I'm perfectly happy devoting a whole thread to UI events to get responsiveness. I shouldn't need 100 of them behind the scenes doing the real work if I only have 1 or 4 cpus.

      If your application design calls for 100 concurrently operating threads, there is something broken about the decomposition.

      -dB

      --
      "It if was easy to do, we'd find someone cheaper than you to do it."