Linux 2.6 Multithreading Advances

← Back to Stories (view on slashdot.org)

Linux 2.6 Multithreading Advances

Posted by michael on Friday November 8, 2002 @08:27PM from the code-warriors dept.

chromatic writes "Jerry Cooperstein has just written an excellent article explaining the competing threading implementations in the upcoming 2.6 kernel for the O'Reilly Network."

194 comments

Call me stupid but by j14ast · 2002-11-08 20:32 · Score: 0, Redundant

Wheren't we going to go straight to 3.0? I seem to rember that being hotly debated but now I cant find a thing on that debate. (quick googling revealed nothing)

--
Damn the man!
1. Re:Call me stupid but by Minna+Kirai · 2002-11-08 20:43 · Score: 3, Informative
  
  Sigh, I'll do your web searching for you.
  
  Basically, while Linus was incommunicado sailing across the ocean, someone got jumpy and suggested 3.0 should be the next step.
  
  It might be more likely that it proceeds through 2.10 and higher before going to 3, though. Just to confuse the people who think version numbers are floating-point.
2. Re:Call me stupid but by kasperd · 2002-11-08 21:39 · Score: 4, Interesting
  
  Wheren't we going to go straight to 3.0?
  
  I don't think the 2.6 vs. 3.0 debate is over yet. But it seems to be quiet right now. I think the discussion will live up when the release date starts getting close about a year from now. And I even think there will be discussion after the release, because the version number come as a complete surprise to some people. And I will not try to guess how much doubt will be in Linus mind once he actually wants to release the thing.
  
  But if you want to be unambigious when talking about it, you should call it 2.6. If it turns out not to be the case everybody will know that it was indeed 3.0 you were talking about. But if you talk about 3.0 already now, and it turns out to be called 2.6, then 3.0 might be something else released in the future.
  
  --
  
  Do you care about the security of your wireless mouse?
3. Re:Call me stupid but by Luke-Jr · 2002-11-09 05:01 · Score: 1
  
  stupid
  
  --
  Luke-Jr
4. Re:Call me stupid but by Phantasmo · 2002-11-09 06:12 · Score: 1
  
  We should only go to 3 if the new kernel breaks binary compatability with 2.4/2.5.
  If we change now, we'll end up downloading binaries like this:
  compiled for kernel versions 2.0.x - 3.3.x or
  compiled for kernel versions 3.4.x >
  or whenever they break compatability.
  
  --
  
  The US Army: promoting democracy through unquestioned obedience
5. Re:Call me stupid but by kasperd · 2002-11-09 20:54 · Score: 2
  
  We should only go to 3 if the new kernel breaks binary compatability with 2.4/2.5.
  
  First of all make that if it breaks binary compatability with 2.4, because if binary compatibility is not preserved between 2.4 and the latest 2.5 kernel, there will be lot of incompatibilities within the 2.5 series as well. And what happened during the development is not what we want to talk about anyway.
  
  But what do you have in mind, when talking about binary compatibility? AFAIK Linus does consider binary compatibility on the kernel/user mode interface to be important. Deprecated interfaces might be removed at some time, but before that it might even have produced warnings for some time if any programs used it.
  
  When I upgraded from 2.2 to 2.4 binary compatibility was preserved (mostly), I did not replace any libraries or executables because of the kernel change. But I did find a few problems: rpc.rstatd would core dump on the first request, because of a change of the format of a /proc pseudofile. And ktop sundenly thought every process on the system was owned by root. Does this count as binary incompatibility?
  
  I don't know how many interfaces changed between 1.2 and 2.0, I never used 1.2 my first kernel was 2.2. But I know that the reasons for Linus to choose 2.0 as version number was primarily the addition of SMP and portability to different platforms. Yet it might have remained binary compatible with 1.2. I suspect the executable format used by 1.2 was aout while today all executables are ELF. But kernels still have optional support for aout binaries. But actually I think even ELF support is optional, so I could build two 2.4 kernels with one supporting aout and the other supporting ELF, they will be same version number but obviously binary incompatible.
  
  Don't expect any Linux kernel in the future to introduce major binary incompatibilities with your previous kernel. There will be changes, but they will be slow, so most executables will be usable across three or more kernel releases. It shouldn't be hard to find an executable working on 2.0, 2.2, 2.4, and the latest 2.5.
  
  --
  
  Do you care about the security of your wireless mouse?
Site down or not found by kareejb · 2002-11-08 20:39 · Score: 0, Informative

I couldn't get to article linked to in the story. found this one though that looks like the same thing.
1. Re:Site down or not found by Sean+Trembath · 2002-11-08 22:32 · Score: 1
  
  I couldn't get too the one ypu linked too. The slashdot cure has been slashdotted. We have entered a whole new era on slashdotism. Oh dear. What is next. To many slashdotisms.
2. Re:Site down or not found by GooberToo · 2002-11-09 02:57 · Score: 2
  
  LOL! Why did this get modded up? He provided a link to the exact same link and he's modded up? It's not as if it's another unique link...it's the exact same link.
  
  Come on people..mod him back down.
  
  LOL.
3. Re:Site down or not found by Ari+Rahikkala · 2002-11-09 04:12 · Score: 1
  
  Gives a whole new meaning to karma whoring, though. Perhaps this should be called karma masturbation ;) ?
Non-threaded programs by SexyKellyOsbourne · 2002-11-08 20:39 · Score: 2, Insightful

While it's great that Linux has excellent multithreading support, it's a shame, however, that many programmers do not take advantage of multi-threading in their programs.

The worst example of this was the Quake I source code, which was used for many games, including Half-Life. The code was not multi-threaded, and the network code sat idle while everything else drew -- adding about 20ms of lag, unless you cut the frame rate down to about 15 or so.

The problem wasn't fixed in Half-Life -- the most popular multiplayer game of all time -- until sometime in 2000. We can only imagine how many other programs are not taking full advantage of multithreading.
1. Re:Non-threaded programs by Anonymous Coward · 2002-11-08 20:52 · Score: 0
  
  That wasn't a problem with the code not being multi-threaded. That was a problem with the code. Period. There's no reason the rendering code couldn't check for network traffic every N times through its main loop.
  
  Also, you claim this Half-Life (never heard of it) was more popular than Quake? I doubt that. You'd see it in stores or see it mentioned on Usenet if it was.z
2. Re:Non-threaded programs by Minna+Kirai · 2002-11-08 20:53 · Score: 5, Interesting
  
  Many coders are disinclined to use threads, because they don't necessarily improve code speed.
  
  Whether or not multithreading will accelerate any particular program has to be determined case-by-case. And for most software, the deciding factor should be whether threads will simplify development and correctness (theoretically they can, but lots of developers don't understand threads and use them wrong).
  
  My company has some realtime networked game for which threading was an impediment. Both the rate/duration of screen refreshs and network transmissions were low enough so they didn't usually interfere with each other in the same thread. But using thread-safe versions of standard library functions was degrading every other part of the program with constant locking/unlocking.
  
  So nonthreaded was faster. (Maybe cleverer people could've made special thread-unsafe alternative functions to use in contexts where we know inter-thread race conditions won't occur. But munging around with 2 standard libraries in one program is riskier than we'd like to deal with)
3. Re:Non-threaded programs by Silh · 2002-11-08 20:55 · Score: 5, Informative
  
  While Quake 1 was developed on NEXT, the target platform at that time would have been DOS, so multithreading would be a bit of a problem...
  
  As to further licencees of the engine, revamping the engine to use multithreading was probably not a very high priority in making a game.
  
  On the other hand, for someone writing an engine from scratch is a different matter.
  
  --
  -- Silhouette
4. Re:Non-threaded programs by krappie · 2002-11-08 21:08 · Score: 1
  
  While it's great that Linux has excellent multithreading support, it's a shame, however, that many programmers do not take advantage of multi-threading in their programs.
  The problem wasn't fixed in Half-Life...
  Heh.. I just wanted to point out that they probably need to port it to linux before they can take advantage of the elite linux multithreading.
5. Re:Non-threaded programs by DarkHelmet · 2002-11-08 21:42 · Score: 3, Informative
  
  Yeah yeah yeah... When life isn't perfect, blame Abrash...
  Troll! ;)
  ---
  (And yes, Mike Abrash did WinQuake, not Carmack)
  
  --
  /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
6. Re:Non-threaded programs by awol · 2002-11-08 22:33 · Score: 4, Insightful
  
  Many coders are disinclined to use threads, because they don't necessarily improve code speed.
  
  Further there are a number of examples where writing a single threaded application has definitive benefits. For example applications where deadlocks or race conditions would be an integral problem in a multithreaded implementation whilst a single thread has none of these problems.
  
  --
  "The first thing to do when you find yourself in a hole is stop digging."
7. Re:Non-threaded programs by Toraz+Chryx · 2002-11-09 01:04 · Score: 1
  
  "Also, you claim this Half-Life (never heard of it) was more popular than Quake? I doubt that. You'd see it in stores or see it mentioned on Usenet if it was."
  
  That's a troll/joke, right?.. right?... RIGHT?
8. Re:Non-threaded programs by Hirofyre · 2002-11-09 01:59 · Score: 1
  
  Embedded developers can use thier own thread libraries. Multiprocessor systems are becomming more and more common. It seems that the most scalable implementation(NGPT) should be used in order to minimize future chages. It would seem that the published benchmarks used NGPT on a slower context-switchign kernel. The NPTL is proabably still faster for small numbers of threads, but I would guess that "big-iron" performance is better with NGPT. NGPT also is a more flexible implementation. A well-maintained and widely testd user-space threading library makes it easier to patch in customized intra-process thread schedulers with reasonable assurances of stability.
9. Re:Non-threaded programs by captnjameskirk · 2002-11-09 03:31 · Score: 1
  
  Another reason many programmers don't use threads is that their language of choice (usually C or C++) doesn't natively support threads as a concept; they have to create a mental abstraction of threads on top of the language in order to understand the concept in the first place.
  I had been a "hobby" programmer with C (then C++) for 15 years, and not once did I truly understand how to implement threading in an efficient manner in any of my projects which might have benefitted theoretically from threads. This was a limitation of the language, not the OS. However, now that I have discovered Ada95 (not "ADA": it's not an acronymn), which has built-in language support and features for threads, it is becomming almost second nature to think in terms of threads. What a wonderful language!
  The bad press this language has recieved (being "designed by committee" is the common complaint) is entirely undeserved, and is usually spouted off by people who haven't actually used it. I find it far easier to maintain the code of even relatively small projects than it ever was with C++. I now shudder to think that I ever thought C++ was a good language. I'll never go back!
10. Re:Non-threaded programs by Salamander · 2002-11-09 03:31 · Score: 5, Informative
  applications where deadlocks or race conditions would be an integral problem in a multithreaded implementation whilst a single thread has none of these problems.
  
  That's a common myth. In fact, there are some kinds of deadlock that do go away, but there are also some kinds that merely change their shape. For example, the need to lock a data structure to guarantee consistent updates goes away, and so do deadlocks related to locking multiple data structures. OTOH, resource-contention deadlocks don't go away. You might still have two "tasks" contending for resources A and B, except that in the non-threaded model the tasks might be chained event handlers for some sort of state machine instead of threads. If task1 tries to get A then B, and task2 tries to get B then A, then task1's "B_READY" and task2's "A_READY" events will never fire and you're still deadlocked. Sure, you can solve it by requiring that resources be taken in order, but you can do that with threads too; the problem's solvable, but isn't solved by some kind of single-threading magic.
  
  I've written several articles on this topic for my website in the past. In case anyone's interested...
  
  multi-threading vs. event-based scorecard
  
  symmetric vs. asymmetric multithreading
  
  some micro-benchmarks
  
  the "threading's too hard" canard
  
  switching models
  
  minimizing context switches
  --
  Slashdot - News for Herds. Stuff that Splatters.
11. Re:Non-threaded programs by evocate · 2002-11-09 03:36 · Score: 2
  
  There are definitely cases where using multiple threads on a single-processor system can degrade performance (switching, locking, etc.). Now that dual- and quad-proc systems have become common, and hyperthreading right around the corner, multithreading will become a better-performing and therefore more frequently used approach. I, for one, am thrilled to see these improvements arrive.
12. Re:Non-threaded programs by 0x0d0a · 2002-11-09 04:12 · Score: 2
  
  Also, you claim this Half-Life (never heard of it) was more popular than Quake? I doubt that. You'd see it in stores or see it mentioned on Usenet if it was.
  
  Where do you *live*?
  
  Half-Life has been the foundation for the most popular FPS (unless UT took it away for a bit) for years. The original Half-Life was perhaps the first popular FPS to have a decent story (ignoring, of course, the elderly and much-revered Marathon (I challenge *anyone* to find a game that has developed a fan base as fanatical as the one that has developed the site at the other end of this link -- look at the "Facts and Puzzling things about..." sections)). Half-Life vastly changed the face of the FPS industry, and brought in much more scripting and plot work to FPSes, leading to impressive newer games like Max Payne.
  
  Half-Life spawned two short sequels from Valve (Opposing Forces and Blue Shift). It is the game that the phenomonally popular Counterstrike mod was made for.
  
  Half-Life is also notable for its very quick software renderer, its introduction of the newer "multiple weapons per slot" weapon inventory system, and popularizing manually-triggerable reloading.
  
  --
  May we never see th
13. Re:Non-threaded programs by 0x0d0a · 2002-11-09 04:21 · Score: 4, Insightful
  
  While it's great that Linux has excellent multithreading support, it's a shame, however, that many programmers do not take advantage of multi-threading in their programs.
  
  Multi-threading is an easy way to cut down response latency in programs and produce a responsive UI. Unfortunately, it also has many drawbacks -- it can actually be slower (due to having to maintain a bunch of locks...you're usually only better off with threads if you have a very few), and it's one of the very best ways to introduce very hard to debug bugs.
  
  I do think that a lot of GTK programmers, at least, block the UI when they'd be better off creating a single thread to handle UI issues and hand this data off to the core program. Also, when doing I/O that doesn't affect the rest of the program heavily, it can be more straightforward to use threads -- if you have multiple TCP connections running, it can be worthwhile to use a thread for each.
  
  There are a not insignificant number of libraries that are non-reentrant, and have issues with threads. Xlib, gtk+1 (dunno about 2), etc.
  
  Threading is just a paradigm. Just about anything you can manage to pull off with threading you can pull off without threading. The question is just which is cleaner in your case -- worrying about the interactions of multiple threads, or having more complex event handlers in a single-threaded program.
  
  The other problem is that UNIX has a good fork() multi-process model, so a lot of times when a Windows programmer would have to use threads, a UNIX programmer can get away with fork().
  
  So you only really want to use threads when:
  * you have a number of tasks, each of which operates mostly independently
  * when these tasks *do* need to affect each other, they do so with *large* amounts of data (so the traditional process model doesn't get as good performance).
  * You have more CPU-bound "tasks" than CPUs, so you derive a benefit from avoiding context switching that characterizes the fork() model.
  * you are using reentrant libraries in everything that the threads must use.
  
  --
  May we never see th
14. Re:Non-threaded programs by 0x0d0a · 2002-11-09 04:24 · Score: 2
  
  While Java is not the end-all be-all, it has multithreading support that is far better than C/C++. This is quite convenient for use in the lightweight networking tasks that Java excels at.
  
  --
  May we never see th
15. Re:Non-threaded programs by Anonymous Coward · 2002-11-09 04:26 · Score: 0
  
  What the fuck does this comment have to do with anything? Now were getting ADA trolls.
16. Re:Non-threaded programs by TheSunborn · 2002-11-09 04:35 · Score: 1
  
  Considering that neither C or C++ have any thread support, it don't require much for java to be better there.
17. Re:Non-threaded programs by Anonymous Coward · 2002-11-09 04:44 · Score: 1, Insightful
  
  Not, to mention that sometimes the threading libraries are broken and just plain don't work.
  
  I was working on a game for the PS2 last year. The initial prototype had been developed under Windows before we knew the target platform. The design was great, we used multiple threads, and the performance was great. We had a lot of flexibility with the graphics because our responsiveness didn't depend on our framerate.
  
  Once we started the port to the PS2, we ran into major, unfixable problems with the Sony libraries' ability to create and manage threads. I'd be that Sony has fixed these problems by now, but at the time, we had no way to solve our problem.
  
  So the end result is that we had a butt-simple infinite loop that polls each of our subsystems once per frame (60Hz). Simple code, good performance, and very, very portable to other platforms (PC, XBox, etc.). As a result, we figured that if we were running at 60 fps (pretty much a given for console games these days), the latency we were seeing in the other parts of the code would be negligible.
  
  I guess the moral is that _sometimes_ simple code that's slightly less efficient is preferable to more sophisticated code, especially on platforms which are a bitch to debug.
18. Re:Non-threaded programs by Minna+Kirai · 2002-11-09 05:02 · Score: 3, Insightful
  
  has built-in language support and features for threads, it is becomming almost second nature to think in terms of threads. What a wonderful language!
  
  This is partly a matter of taste, but I dislike languages that are excessively large. That is, when given the choice between implementing a feature in the language itself or in the standard libraries (which are built in, or at least interfaced via, the same language) you should try to use the language you already have.
  
  Academics prefers this because it follows principles like Occam's Razor and MDL (minimum description length, an artificial intelligence related term for program quality).
  
  This simplifies your language definition, but transfers some complexity to your library documentation- which is optional reading for learning the language. And it makes the language more extensible in the future. The classic example that C++ advocates pick on is Java's String class. Two Java Strings support the "+" operator to concatenate them as a special language feature. But 3rd party library developers cannot support "+" with their own Objects, like complex numbers or string-like series of non-character data.
  
  The same argument can be applied (with much more complexity and opportunity for disagreement or plain old error) to the question of including threading support native in the language, rather than as an external library. Language-supporters may say "The language natively provides the CPU's logical, arithmetic, and memory management operations. Threads are just as fundamental, and should go there too". The Library guys respond "No useful program lacks logic,arith,and memory. But we've gotten by fine for decades without threads. They're OPTIONAL. And not all OSes support threads- you want to make them incompatible with your language then?"
  
  It goes back and forth, but winds up with a pro-Library argument backed up by programming language theory- language support for threads offers no more expressive power than library support, so they should be kept in the standard libraries. So C/C++ adopted this approach (or rather, C++ kept the C approach as it had been justified).
  
  It sounds great to theorists, who think that even C++'s 4 styles of parentheses are redundant and excessive compared to what's used in Lisp. But outside of conceptual language design, there's a large practical problem which has retarded the performance of C++ programs to this day: backwards compatibility. Specifically, compatibility of new source code with old linkers. (This problem applied somewhat to the acceptance of other compiled languages besides C++)
  
  To get any acceptance, new versions of C++ needed to be compatibile with user's existing C libraries. And to reduce the workload of C++ compiler developers, they made C++ compilers fit into the C workflow (compile, compile... & link) as directly as possible.
  
  But that undermines one big assumption of the "provide important features IN the language, not AS the language" crowd- the assumption that the compiler is very, very good. Their quality metric ignored the ease of the compiler making good binary code from your source- as long as the language has the ability to express their intent compactly and unambiguously they're happy- but that intent may not be clear if the computer isn't looking at the whole program.
  
  A compiler can make global optimizations if it considers the whole program at once, avoiding the function-call overhead of using external functions for core features. But the C developement processs- only giving the compiler small sections of code at once, and then depending on a separate program to link them together- means that the compiler simply can't make the best choices, burdened with incomplete information. (Today, we sometimes have smarter linkers which support more function inlining and const propagation, but they're a poorer solution than using compilers all the way through).
  
  So, this lack of super-good compilers is why pulling more features into a language definition has been helpful, even though plenty of CS graduate theses say it shouldn't be so.
  
  I don't think C/C++ is a good language either- except to implement other languages in! (Where "other languages" may include all the graphics, networking, compression, and other low-level code that Ada95 programs access via C bindings). And to make the academics most happy, that language should be Lisp or ML, which can then be used to write any other compiler/interpreter you might wish.
  
  There are other valid reasons why C++ is still heavily used, but they're mostly shortsighted and based in legacy compatibility ("We've always written desktop applications that way!")
19. Re:Non-threaded programs by glwtta · 2002-11-09 05:15 · Score: 2
  
  While it's great that Linux has excellent multithreading support
  Actually, I believe the point is that Linux will have excellent thread support. In other words when 2.6/3.0 is stable enough for production use (2008? :) )
  
  --
  sic transit gloria mundi
20. Re:Non-threaded programs by Kynde · 2002-11-09 05:16 · Score: 3, Interesting
  
  While it's great that Linux has excellent multithreading support, it's a shame, however, that many programmers do not take advantage of multi-threading in their programs.
  
  What a load of crap. There are plenty on threaded applications for linux. The problem is that all these inexperienced threads-every-fucking-where-programmers" that Java spawned fail to understand that threading is NOT the solution for everything.
  Besides in unix style coding few tings are as common as forking about, which in many cases is the what people also do with java all the time. Real single memory space cloned processes (i.e. threads) have less uses than people actually think.
  
  The worst example of this was the Quake I source code, which was used for many games, including Half-Life. The code was not multi-threaded, and the network code sat idle while everything else drew -- adding about 20ms of lag, unless you cut the frame rate down to about 15 or so.
  
  If you'd EVER actually used threads in linux, you'd know that if there are busy threads you would still get to run atmost once in 20ms and even more likely far less seldom.
  
  It's easy to try out even. Write a code that preforms usleep(1) or sched_yields() every now and then and checks how long that takes. Especially try out the case by putting few totally separate processes in the back ground doing while(1); loops. There's your 20ms and way more...
  
  When quake 1 was written the 20 ms lag was concidered NOTHING. At that time online gaming was limited to mainly xpilot and muds. It started the boots and naturally the demands changed, too. THUS ID wrote the Quake World client which was quite different.
  
  Besides a brutal fact always is that single thread process can be made faster than _ANY_ multithreaded approach, although it's often quite difficult. Moreover, threading is never chosen as an approach due to performance, but rather because it simplifies the structure in some cases.
  
  Given the amount of optimization already present in Quake 1, I feel quite safe saying that lack of threads in Dos had jack to do with Quake 1 being single threaded.
  
  --
  1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW
21. Re:Non-threaded programs by Nicolay77 · 2002-11-09 05:27 · Score: 1
  
  From your journal:
  
  "My entire post archive is nothing but trolls, flamebait, and erroneous-information-filled karma whoring, and yet I now have enough karma to post at +2. I even did an early goatse link that didn't get modded down for at least an entire hour."
  
  The funny thing is people believing the shit about Half-Life being based in Quake I. Debbuging a multithreaded application is one of the hardest things to do, actually. You should try that, at least once.
  
  --
  We are Turing O-Machines. The Oracle is out there.
22. Re:Non-threaded programs by Minna+Kirai · 2002-11-09 05:31 · Score: 1
  
  Also, some implementations of "Ada95" feature corruptly unworkable thread support. The IBM AIX one, for instance. Its different from other Adas, or just plain wrong. Maybe they shouldn't have been allowed to call it Ada95 at all, but there ya go.
  
  (I've had the pain of watching Ada programs fork into 5 different processes and talk over strictly timed RPC to work around these shortcomings).
  
  Regarding the "Language vs Library" theories- if thread support was NOT part of Ada, but was provided by external libraries, there'd be less barrier to entry for some thread-expert wanting to fix these problems without re-implementing the entire enormous Ada language himself.
  
  (It would also help him if the compiler was Open Source. I suppose you like to use GNAT, I'm not sure if it works well on PowerPC hardware and non-linux-like OSes.)
23. Re:Non-threaded programs by Jeremi · 2002-11-09 06:03 · Score: 2
  
  Besides a brutal fact always is that single thread process can be made faster than _ANY_ multithreaded
  approach, although it's often quite difficult. Moreover, threading is never chosen as an approach due to
  performance, but rather because it simplifies the structure in some cases.
  
  Apparently you never use multi-CPU computers?
  
  In any case, for some tasks, raw speed isn't as important as low latency. By using multiple threads with a good scheduler and a well-thought-out priority system, you can end up with a very responsive program, something which would be much harder to do with a single thread. See BeOS's GUI for a good example.
  
  --
  
  I don't care if it's 90,000 hectares. That lake was not my doing.
24. Re:Non-threaded programs by 0x0d0a · 2002-11-09 06:05 · Score: 2
  
  Yup. :-)
  
  But even aside from that, a lot of widely-used C APIs are not reentrant, and can't be made so because of the use of static data, which makes it hard to add support via libraries. gethostbyname() is particularly annoying (for some reason, up until recently, gethostbyname_r() wasn't even in the man pages on my Linux box, and the parameters to the damn thing vary between Solaris and Linux...). errno doesn't lend itself well to multithreading. strtok() isn't either...
  
  And while I'm talking about C/C++ issues with threads, I'm not much of an SML fan, but the speed of spawning and destroying threads in CML (the concurrent extensions) makes the C-based pthreads look pathetic. :-(
  
  --
  May we never see th
25. Re:Non-threaded programs by Salamander · 2002-11-09 06:51 · Score: 3, Insightful
  
  There are definitely cases where using multiple threads on a single-processor system can degrade performance (switching, locking, etc.).
  
  This is only a factor with a poor multithreaded design. By contrast, single-threaded programs always fail to take advantage of multiple processors, no matter how well they're designed otherwise.
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
26. Re:Non-threaded programs by Anonymous Coward · 2002-11-09 06:56 · Score: 0
  
  I play games a couple of hours a week (used to play Quake, now I'm playing the Age of Empires games mostly), and I think I've only seen it mentioned once or twice. I doubt it was ever that popular.
27. Re:Non-threaded programs by be-fan · 2002-11-09 07:45 · Score: 2
  
  GUI programs particularly suck for this (though it's getting better). The lack of threading in programs such as Galeon and Konqueror are blatent. While the rendering engine is doing something complex, the rest of the program stops responding to events. Compare this to the behavior of a highly threaded program like Pan, where you can send it any number of requests, and the UI will still respond to the user.
  
  --
  A deep unwavering belief is a sure sign you're missing something...
28. Re:Non-threaded programs by be-fan · 2002-11-09 07:55 · Score: 2
  
  Um, if you're getting a minimum of 20ms of latency, then you're kernel is borked. You do realize that if priority of the network thread is higher than that of the compute thread, then it'll preempt the compute thread? So it'll get to run whenever it's ready, not just at time-slice boundries. Granted, it's 10ms on current Linux kernels but nothing's stopping you from jacking HZ to 1000.
  
  --
  A deep unwavering belief is a sure sign you're missing something...
29. Re:Non-threaded programs by Anonymous Coward · 2002-11-09 11:15 · Score: 0
  
  The lack of threading in programs such as Galeon and Konqueror are blatent.
  
  That is pretty funny troll. Konqi seperates from its main code, the IO as seperate processes based on KIO, java applets are in seperate processes, javascript is in seperate process, etc. Konqi does the drawing and manageing of seperate processes as well as process input that comes from X/kwin (the window manager). You will find that konqi in and of itself is very responsive. Where the problems are is the latency issue inside of Linux/BSD.
30. Re:Non-threaded programs by sigwinch · 2002-11-09 14:24 · Score: 2
  
  This is only a factor with a poor multithreaded design.
  Wrong. Every context switch burns hundreds, if not tens of thousands, of clock cycles. As the parent comment said, changing from single threading to multithreading on a uniprocessor system necessarily reduces performance. How bad it does depend on the program design, but there is always a slowdown.
  By contrast, single-threaded programs always fail to take advantage of multiple processors, no matter how well they're designed otherwise.
  Wrong again. It's trivial to run multiple copies of a single-threaded program on the different CPUs, and let them interact over IPC. One benefit is that this approach scales trivially to large numbers of networked processors. (How bad the latency hurts depends on the network and the workload, of course.) Another benefit is that catastrophic failure of one process does not necessarily corrupt the state of another process. (While one thread crashing is almost certain to bring down an entire multi-threaded program.)
  
  --
  --
  Kuro5hin.org: where the good times never end. ;-)
31. Re:Non-threaded programs by Tomble · 2002-11-09 14:42 · Score: 1
  
  Wrong. Every context switch burns hundreds, if not tens of thousands, of clock cycles.
  Er, I freely admit that you probably know the subject far better than I, but I thought that at least some types of threading implementations meant that switching between the threads of a single process were far less expensive than full process-to-process context switches?
  
  --
  Be careful! New moon tonight.
32. Re:Non-threaded programs by Entropy_ah · 2002-11-09 14:50 · Score: 2
  
  your comment makes even more sense if you take it to its logical extreme. you could implement a virtual machine as a single thread on a real machine. this virtual machine can have multi-threading that can experience deadlock, even though it is a single process on the real machine.
  
  --
  my other penis is a vagina
33. Re:Non-threaded programs by Toraz+Chryx · 2002-11-09 14:55 · Score: 1
  
  go look at the online gaming stats at gamespy...
  
  It'll be something like
  
  "Halflife : 60,000 online"
  "Quake 3 : 2500 online"
  "Quakeworld : 250 online"
34. Re:Non-threaded programs by Anonymous Coward · 2002-11-09 15:28 · Score: 0
  
  > There are a not insignificant number of libraries > that are non-reentrant
  
  How about...
  
  There are a significant number of libraries that are non-reentrant
  
  I had to read that sentance a coupla times :)
35. Re:Non-threaded programs by Anonymous Coward · 2002-11-09 15:33 · Score: 1, Funny
  
  Wrong. Every context switch burns hundreds, if not tens of thousands, of clock cycles. As the parent comment said, changing from single threading to multithreading on a uniprocessor system necessarily reduces performance. How bad it does depend on the program design, but there is always a slowdown
  Jesus fucking christ, with a modern multitasking OS, the OS will be swapping the process anyhow, and that's much more expensive than threading context switches. Or are you advocating the use of a cooperative multitasking OS (like MacOS 7) or a single-threaded OS (like MS DOS)?
  Wrong again. It's trivial to run multiple copies of a single-threaded program on the different CPUs, and let them interact over IPC.
  at which case it is no longer (collectively) single-threaded. Of course, this does seem like a slashdot solution (a complex and over engineered solution to a non-existant problem)
36. Re:Non-threaded programs by sigwinch · 2002-11-09 15:49 · Score: 2
  
  Well, you can make the overhead pretty small for a userland threading system. For lots of applications the slowdown will be lost in the noise.
  It isn't zero, though. If you have a few million tasks to do, pretty much any threading system is going to suck.
  
  --
  --
  Kuro5hin.org: where the good times never end. ;-)
37. Re:Non-threaded programs by Salamander · 2002-11-09 16:20 · Score: 3, Informative
  
  Every context switch burns hundreds, if not tens of thousands, of clock cycles.
  
  A well-designed multi-threaded implemention will organize its thread usage in such a way that under light load and/or on a single processor it will not have significantly more context switches than a single-threaded equivalent. Under such conditions it will exhibit the same performance characteristics as that single-threaded version, and yet it will also be able to take advantage of inherent parallelism and multiple processors when they exist. Been there.
  
  Bad multithreaded implementations schedule so many computationally active threads that TSE switches are inevitable. Bad multithreaded implementations force two context switches per request as work is handed off between "listener" and "worker" threads. Bad multithreaded implementations do lots of stupid things, but not all multithreaded implementations are bad. The main overhead involved in running a well-designed multithreaded program on a uniprocessor is not context switches but locking, and that will be buried in the noise. Done that.
  
  A handful of extra context switches per second and a fraction of a percent of extra locking overhead are a small price to pay for multiprocessor scalability.
  
  It's trivial to run multiple copies of a single-threaded program on the different CPUs, and let them interact over IPC.
  
  Trivial, but stupid. You really will context-switch yourself to death that way, as every occasion where you need to coordinate between processes generates at least one IPC and every IPC generates at least one context switch (usually two or more)...and those are complete process/address-space switches, not relatively lightweight thread switches. That's how to build a really slow application.
  
  this approach scales trivially to large numbers of networked processors.
  
  No, it doesn't. There's simply no comparison between the speed of using the same memory - often the same cache on the same processor, if you do things right - and shipping stuff across the network...any network, and I was working on 1.6Gb/s full-duplex 2us app-to-app round-trip interconnects five years ago. Writing software to run efficiently on even the best-provisioned loosely-coupled system is even more difficult than writing a good multithreaded program. That's why people only bother for the most regularly decomposable problems with very high compute-to-communicate ratios.
  
  catastrophic failure of one process does not necessarily corrupt the state of another process. (While one thread crashing is almost certain to bring down an entire multi-threaded program.)
  
  Using separate processes instead of threads on a single machine might allow your other processes to stay alive if one dies, but your application will almost certainly be just as dead. The causal dependencies don't go away just because you're using processes instead of threads. In many ways having the entire application go down is better, because at least then it can be restarted. When I used to work in high availability, a hung node was considered much worse than a crash, and the same applies to indefinite waits when part of a complex application craps out.
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
38. Re:Non-threaded programs by Salamander · 2002-11-09 18:40 · Score: 2
  
  For lots of applications the slowdown will be lost in the noise. It isn't zero, though.
  
  No, it's not zero but (as I said; *sigh*) it's a small price to pay to get multiprocessor scalability.
  
  If you have a few million tasks to do, pretty much any threading system is going to suck.
  
  On the contrary, a threading system is highly likely to be the only way you'll meet your cost/performance goals, because a well-written multithreaded server running on a 4-way server will get you nearly 4 CPUs worth of performance. One written your way will require a rack full of machines and fast switches plus some custom state-management code, and might still thrash itself to death no matter how much hardware you throw at it. Please, educate yourself. Do some reading, run some tests, get some experience...then come back and we can talk about which approaches can handle millions of requests per minute.
  
  BTW, thanks for visiting my site. Where's yours?
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
39. Re:Non-threaded programs by sigwinch · 2002-11-09 23:29 · Score: 2
  
  Let me preface this by saying that everything depends on the workload. Pretty much every approach is optimal for somebody's real-world workload.
  You really will context-switch yourself to death that way, as every occasion where you need to coordinate between processes generates at least one IPC and every IPC generates at least one context switch (usually two or more)...and those are complete process/address-space switches, not relatively lightweight thread switches.
  On a reasonable OS on a single-CPU machine, it isn't significantly worse than multi-threading. (Especially if the multi-threading system is using high-performance kernel threads.) On a multi-CPU machine, which I was referring to, the difference relative to multithreading is miniscule: a couple of system calls.
  Writing software to run efficiently on even the best-provisioned loosely-coupled system is even more difficult than writing a good multithreaded program.
  I must admit that I'm looking toward the future. I'm interested in program architectures that will saturate the machines of ten or twenty years in the future. A one-CPU future-generation system will have worse latency problems than a current-generation networked system. The future of high-performance computing is message passing.
  Look at it this way: the best current transistors can switch at a rate of 200 GHz. That's a clock period of 5 picoseconds. These transistors will be in mass production within 20 years. (Possibly within a few years, but let's be pessimistic.) An electrical signal can travel only about 2 millimeters in that period of time. That means your arithmetic-logic unit cannot even fetch an operand from the L1 cache in a single clock cycle. These processors will use asynchronous message-passing logic at the very core, and farther out the system will be entirely based on messages. HyperTransport is the writing on the wall.
  Also consider current-generation transactional systems. The request-producer doesn't have to block waiting for the first response: it can keep queueing requests. If the scheduler is good, this amounts to batching up a bunch of requests, processing them in one fell swoop, then sending the responses in one fell swoop. Of course whether this actually happens depends on the workload and the OS.
  Incidentally, I'm typing this on a Unix machine that runs the graphics subsystem in a separate process. Every screen update requires at least one context switch. Does it suck? Not at all. The X11 protocol allows requests to be batched up and handled all at once. Whether the application draws a single character or renders a 3-D scene doesn't have much influence on the context switch overhead. Again, the appropriate solution depends on the workload, and the proper messaging abstraction can make separate processes quite practical. And if your compute jobs cannot possibly fit in a single machine, you have no choice but to use multiple processes.
  Using separate processes instead of threads on a single machine might allow your other processes to stay alive if one dies, but your application will almost certainly be just as dead.
  Not at all. What about Monte Carlo simulations? Losing the occassional random process is irrelevant. What about artificial intelligences running on really big machines? Getting erroneous answers from subsystems, or not getting timely answers at all, will be a fact of a-life. Being able to terminate processes at-will will be critical to building reliable AIs. What about graphics systems? Losing a frame or a part therof is nothing compared to a full crash. What about speech reconition systems? A momentary interruption for one user is nothing compared to a total disruption of all users.
  Even in the present day, there are plenty of practical workloads that can withstand a subsystem dying, but would rather not see the whole system die hard. If the system is built on a foundation of multithreading, the only failure mode is a total crash.
  
  --
  --
  Kuro5hin.org: where the good times never end. ;-)
40. Re:Non-threaded programs by Salamander · 2002-11-10 06:05 · Score: 2
  
  I must admit that I'm looking toward the future.
  
  No, you're looking toward the past. In the future, multi-CPU machines will become more common, not less, and learning to use them efficiently will also become more important. Within the box, multithreading will perform better than alternatives, even if there's message passing going on between boxes.
  
  These processors will use asynchronous message-passing logic at the very core, and farther out the system will be entirely based on messages. HyperTransport is the writing on the wall.
  
  Yes, I'm somewhat familiar with transitions between message passing and shared memory. Remember that fast interconnect I mentioned, from five years ago? It was at Dolphin. On the wire, it was message passing. Above that, it presented a shared-memory interface. Above that, I personally implemented DLPI/NDIS message-passing drivers because that's what some customers wanted.
  
  The fact is that whatever's happening down below, at the programming-model level it's still more efficient to have multiple tasks coordinate by running in the same address space than by having them spend all their time packing and unpacking messages. The lower-level message-passing works precisely because it's very task-specific and carefully optimized to do particular things, but that all falls down when the messages have to be manipulated by the very same processors you were hoping to use for your real work.
  
  The future of high-performance computing is message passing
  
  ...between nodes that are internally multi-processor.
  
  The request-producer doesn't have to block waiting for the first response: it can keep queueing requests.
  
  Yes, yes, using parallelism to mask latency. Yawn. Irrelevant.
  
  Every screen update requires at least one context switch. Does it suck? Not at all.
  
  If context switches aren't all that bad, why were you bitching about context switching in multithreaded applications? Hm. The fact is, a context switch is less expensive that a context switch plus packing/unpacking plus address manipulation plus (often) a data copy. Your proposal is to use multiple processes instead of threads, even within one box. When are you going to start explaining how that will perform better, or even as well? When it won't "suck", to use your own charming phrase.
  
  What about Monte Carlo simulations? Losing the occassional random process is irrelevant. What about artificial intelligences running on really big machines?
  
  Please try to pay attention. I already referred to regularly decomposable applications with high compute-to-communicate ratios, and that's exactly what you're talking about. Yes, what you say is true for some applications, but does it work in general? No. As I said, I've worked in high availability. I've seen database app after database app, all based on message passing between nodes, lock up because one node froze but didn't crash. Everyone's familiar with applications hanging when the file server goes out, and that's not shared memory either. Message passing doesn't make causal dependencies go away.
  
  If the system is built on a foundation of multithreading, the only failure mode is a total crash.
  
  Simply untrue. I've seen (and written) plenty of multithreaded applications that could survive an abnormal thread exit better than most IPC-based apps could survive an abnormal process exit.
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
41. Re:Non-threaded programs by spinlocked · 2002-11-10 10:03 · Score: 1
  
  Not strictly true. It ran under a DOS32 extender (I seem to remember they used Watcom's DOS4GW). I'm pretty sure this provided a thread library (it's been a while, etc. etc.)
  
  --
  # init 5 Connection closed.
  
  Oh... ...bugger.
42. Re:Non-threaded programs by sigwinch · 2002-11-10 15:58 · Score: 2
  
  In the future, multi-CPU machines will become more common, not less, and learning to use them efficiently will also become more important. Within the box, multithreading will perform better than alternatives, even if there's message passing going on between boxes.
  That is identically the point I was trying to make. However, as I pointed out, core clock speeds are getting faster and faster. A 200 GHz clock speed will be practical within perhaps 5-10 years. That's an instruction cycle time of 5 picoseconds.
  As I also pointed out, the minimum time to send a round-trip signal from one CPU to another is determined by the speed of light. Suppose you have two processors that are 6 centimeters apart. The round-trip time for light is 200 picoseconds. Electrical signals are about half that fast, or 400 picoseconds. Therefore the act of merely acquiring an inter-thread lock will waste 80 clock cycles waiting for the atomic lock instruction to execute, assuming there is no contention. After that the thread will perform memory reads on shared data. Each read from a cold cache line will have an additional 80 wait states while the data is snooped from the hot cache. Complex activity can easily touch 50 cache lines, which is 4,000 clock cycles.
  And that's the best case scenario where the programmer has flawlessly layed out the variables in memory to minimize cache transfers. In the real world it is appalling easy to cause cache ping-pong, where two processes try to use the same cache line, and it keeps "bouncing" back and forth between multiple CPUs.
  Oh, and that's assuming a zero-overhead protocol for coherency. Realistically you should expect a few hundred picoseconds or so of additional round-trip latency.
  Finally, these numbers are for one node that contains a few CPUs. Going between nodes will be vastly worse. A four nanosecond cable (i.e., a one foot cable) between nodes means about 1000 wait states to acquire a lock. A two microsecond RTT (e.g., Myrinet) means 400,000 wait states.
  The fact is that whatever's happening down below, at the programming-model level it's still more efficient to have multiple tasks coordinate by running in the same address space than by having them spend all their time packing and unpacking messages.
  A shared memory space implies a coherency mechanism. A coherency mechanism implies round-trip messages. ("I need this, can I have it?" ...wait... "Yes, you can have it, here it is." ...wait...) Dependence on round-trip latency implies that the program is I/O bound.
  The lower-level message-passing works precisely because it's very task-specific and carefully optimized to do particular things, but that all falls down when the messages have to be manipulated by the very same processors you were hoping to use for your real work.
  Obviously the communication/networking hardware will have to be built directly into the CPU core. That's why I said HyperTransport is the writing on the wall. These CPUs will also have special instructions for message passing. Incoming messages will be stored in priority FIFOs.
  Yes, yes, using parallelism to mask latency. Yawn. Irrelevant.
  The I/O will be (at least) a thousand times slower than the core. Code that isn't batching up data 1,000 clock cycles in advance is pissing away CPU capability. Code that uses round-trip synchronization is benchmarking the I/O latency.
  The thing is, I/O latency isn't improvable. You can only put chips so close together, and the speed of light is sort of non-negotiable. CPU speed, however, is improvable. So the ratio of I/O latency to clock period is going to keep increasing. Code that doesn't batch up data will not run any faster on the machines of tomorrow.
  If context switches aren't all that bad, why were you bitching about context switching in multithreaded applications?
  I wasn't bitching about it. I was correcting the misstatement that its influence on performance was zero.
  I already referred to regularly decomposable applications with high compute-to-communicate ratios, and that's exactly what you're talking about. Yes, what you say is true for some applications, but does it work in general?
  I was pointing out that all applications that want maximum performance will be designed that way. They will do whatever it takes to deserialize the algorithms. If several CPUs need to know the results of a simple calculation, they will each calculate it themselves, because calculation it in one place and distributing the results would take a thousand times longer.
  Take a look at what the Linux kernel is doing sometime. Everything is moving to per-processor implementations. Each processors gets its own memory allocator, thread/process queue, interrupt handlers, and so forth. Inter-CPU locks and shared memory are avoided like a plague. They know that the I/O-to-core clock ratio is bad, and it's going to get much, much worse.
  
  --
  --
  Kuro5hin.org: where the good times never end. ;-)
43. Re:Non-threaded programs by psamuels · 2002-11-10 16:36 · Score: 1
  
  The NPTL is proabably still faster for small numbers of threads, but I would guess that "big-iron" performace is better with NGPT.
  
  Well, is your guess based on any fact, or just a guess? The NPTL people benched them and claimed to be quite surprised at how much better NPTL did than NGPT - they figured the two would be approximately equal. And this benchmarking wasn't on the order of 50 threads - it was more like 100_000 threads. Perhaps that's still small iron to you, but it's massive overkill for any workload I can think of.
  
  --
  "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
44. Re:Non-threaded programs by Salamander · 2002-11-11 02:19 · Score: 2
  
  core clock speeds are getting faster and faster...Suppose you have two processors that are 6 centimeters apart...[lots of other irrelevant drivel deleted]
  
  Gee, maybe all that's why people are going to things like multiple cores on one die, hyperthreading, etc. Programming-wise, these present the same interface as multiple physical CPUs, but they also ameliorate many of the problems you mention...speaking of which, everything you're presenting as a "killer" for multithreading is even worse for your multi-process model.
  
  A two microsecond RTT (e.g., Myrinet)
  
  As a former Dolphin employee, I have to point out that Myrinet was never that fast.
  
  Dependence on round-trip latency implies that the program is I/O bound.
  
  You remember that little thing about using parallelism to mask latency? Most serious programs outside of the scientific community are I/O bound anyway and the whole point of multithreading is to increase parallelism.
  
  Obviously the communication/networking hardware will have to be built directly into the CPU core.
  
  Getting rather far afield here, aren't you?
  
  I wasn't bitching about [context switching]. I was correcting the misstatement that its influence on performance was zero
  
  You're forgetting that an operation's effect on performance is a product of its cost and frequency. Go read H&P; it'll spell this out for you much better than I can.
  
  I was pointing out that all applications that want maximum performance will be designed that way.
  
  What about those that can't? That's a lot of applications, including important ones like databases which must preserve operation ordering across however many threads/processes/nodes are in use. You can't just point to some exceptional cases that are amenable to a particular approach, and then wave your hands about the others. Well, you can - you just did - but it doesn't convince anyone.
  
  Take a look at what the Linux kernel is doing sometime.
  
  Since you're practically an AC I can't be sure, but odds are pretty good that I know more about what the Linux kernel is doing and excellent that I know more about what kernels in general are doing. Your appeal to (anonymous) authority won't get you anywhere.
  
  The real point that we started with is your claim that any multithreaded application will "suck". That statement only has meaning relative to other approaches that accomplish the same goals. Are you ever going to get around to backing that up, or will you just keep going around and around the issue in ever-widening circles hoping I'll get nauseous and quit?
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
45. Re:Non-threaded programs by mccormick · 2002-11-11 18:27 · Score: 0
  
  Nope, they didn't use DOS4GW, but rather the subsystem that comes with DJ Delorie's DJGPP. And IIRC, it didn't come with any kind of threading library.
  
  --
  Pete
46. Re:Non-threaded programs by sigwinch · 2002-11-11 21:37 · Score: 2
  
  Gee, maybe all that's why people are going to things like multiple cores on one die, hyperthreading, etc. Programming-wise, these present the same interface as multiple physical CPUs, but they also ameliorate many of the problems you mention...
  Hyperthreading doesn't gain a lot. (Where "a lot" == capable of improving performance by factor of 10.) Multi-core dice run out of steam at 2-4 cores/die. (Maybe it's 8-16. The point is it'll never get to 256.) And multi-core dice will still have substantial latency problems. Multi-chip modules are better than pins-and-traces, but not by a lot.
  ...everything you're presenting as a "killer" for multithreading is even worse for your multi-process model.
  I was describing a multi-process system that ran in batch mode. To the extent that you can fit your problem into that framework, it makes comm latency much less important.
  Getting rather far afield here, aren't you?
  Huh? Describe the problem -> describe a solution.
  Most serious programs outside of the scientific community are I/O bound anyway and the whole point of multithreading is to increase parallelism.
  By "I/O bound" I mean "I/O bound at the ping rate". Which means "terribly slow".
  You're forgetting that an operation's effect on performance is a product of its cost and frequency.
  To repeat, I did not say it was important or unimportant, I said it was not identically zero.
  What about those that can't?
  They'll see no benefits. However very few problem spaces are truly sequential, especially if you are willing to trade latency for throughput.
  That's a lot of applications, including important ones like databases which must preserve operation ordering across however many threads/processes/nodes are in use.
  Indeed. Pipelining the algorithms will take considerable cleverness.
  I wonder if databases will be that hard, though. Good ones already provide on-line replication and failover, multi-version concurrency, and transactions that automatically roll-back if a collision is detected.
  You can't just point to some exceptional cases that are amenable to a particular approach, and then wave your hands about the others. Well, you can - you just did - but it doesn't convince anyone.
  Did I say that all programs will be easily adaptable to that approach? No. I said that those programs that do not adapt will tend to have poor performance.
  Since you're practically an AC I can't be sure, but odds are pretty good that I know more about what the Linux kernel is doing and excellent that I know more about what kernels in general are doing.
  Ah, I state an inarguable and easily-verified fact, you talk about how much more you know. Smooth move, Ex-Lax.
  Your appeal to (anonymous) authority won't get you anywhere.
  Look, Mr. Smarty Pants, I don't have the time to write a frickin tutorial on things that are common knowledge, where the reader can find enlightenment at their nearest search engine nearly as fast I can write an essay on the topic.
  But since you can't be arsed to do it, here's a link to search Google for "Linux per-cpu". See? Tens of thousands of hits. If you add "allocator" to the search criteria, this is the first hit. It assumes background knowledge, but should make sense.
  The real point that we started with is your claim that any multithreaded application will "suck". That statement only has meaning relative to other approaches that accomplish the same goals.
  Ladies and gentlemen, we have just lost cabin pressure.
  The statement was "If you have a few million tasks to do, pretty much any threading system is going to suck."
  The context makes it clear that "tasks" means "things that make the threads talk to each other". And it will, indeed, truly and royally suck.
  
  --
  --
  Kuro5hin.org: where the good times never end. ;-)
47. Re:Non-threaded programs by Salamander · 2002-11-12 00:59 · Score: 2
  
  The statement was "If you have a few million tasks to do, pretty much any threading system is going to suck."
  
  The context makes it clear that "tasks" means "things that make the threads talk to each other". And it will, indeed, truly and royally suck.
  
  ...and anything else - most especially the approach you espouse - will truly and royally suck more. Life just sucks sometimes; too bad, you lose.
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
48. Re:Non-threaded programs by awol · 2002-11-14 04:11 · Score: 1
  
  Totally agree. Please forgive my lack of clarity in the original post. I did not mean to suggest that a single thread removes all deadlocks and race conditions, rather that when having multiple threads you have these problems often a single thread will (as you pointed out, the locked data structure for example) not.
  
  The classical example in my experience is an application that did not even have to consider a whole bunch of contention issues because we chose the single threaded (for the transaction part) implementation. This also brought benefits by allowing zero "legging" risk when performing transactions across the instances of objects that might ordinarily be partitioned along thread lines. No IPC, no locking, all good. The trade offs in such a choice are straightforward and often quite cost (in all senses) effective.
  
  --
  "The first thing to do when you find yourself in a hole is stop digging."
And so it begins ! by forged · 2002-11-08 20:42 · Score: 0, Redundant

..It seems that any battle between the two implementations will now be played out in public, in good open source fashion.
* forged reaches for the compiler...:)
1. Re:And so it begins ! by riiv · 2002-11-08 20:50 · Score: 1
  
  .. There is a hole in your mind. (sorry)
  
  --
  Unix is a standard, DOS is a standard, windows XX is not.
2. Re:And so it begins ! by Anonymous Coward · 2002-11-09 04:36 · Score: 0
  
  Well, which compiler you choose will now
  complicate your task enen more.
  
  Why not spend less time on IRC, and more
  time reading up on benches.
  
  Can you name popular bencmarks? Which linux
  distro uses spec95, and wny? Which one uses
  spec2000 and why?
  
  *AC reaches for a cluestick .... :P
Competing multithreading by Anonymous Coward · 2002-11-08 20:42 · Score: 0

It seems the implementation is being multi-threaded ;)
Actually, Quake II by bwoodring · 2002-11-08 20:49 · Score: 1

Actually, Half-Life was based on Quake II, but for all I know they may both derive heavily from the original Quake.
1. Re:Actually, Quake II by Minna+Kirai · 2002-11-08 20:57 · Score: 3, Informative
  
  In terms of software archeology, there is an important intermediate ancestor.
  
  Quake's original networking was meant for LANs only- the fact that it was even barely playable over the internet suprised the authors.
  
  idsoftware soon released QuakeWorld free to Quake owners. It used the same interface and most of the graphics resources as Quake, so its arguably not a different program. But it came as a separate executable, with many Quake features removed (like monsters). And most importantly, the networking code was entirely re-written.
  
  It is that code that QuakeII and successors derived from.
Just hope they think about users first... by CheeseCow · 2002-11-08 21:02 · Score: 1

...so we don't end up with a lesser version because they don't like the other implementation.

I thought that was what OSS was about, getting the best of all worlds? ;)

--
Regards,
Helmers
I don't see why the two are mutually exclusive. by Second_Derivative · 2002-11-08 21:04 · Score: 5, Insightful

From what I understand NGPT is mainly a user space thing. Why not go with the 1:1 one in the kernel (NPTL or whatever), just have a libpthread.so (NPTL runtime) and libpthread-mn.so (NGPT). From a programmer's standpoint, when I say pthread_create() I want to know exactly what that does: with NPTL I know what happens. With NGPT I don't. Also, the old rule of "Don't pay for what you don't use" applies. If I'm going to have just, say, four threads, those four threads are going to run better as four kernel threads as opposed to 2 LWP's dynamically mapped between 4 CPU contexts.

But, again, I might want to write a server of some sort which handles hundreds of thousands of connections at once, but 99% are idle at any given time and the other 1% require some nontrivial processing sometimes and/or a long stream of data to be sent without prejudicing the other 99%. Now, for ANY 1:1 threading system, I can't just create x * 10^5 threads because the overhead would be colossal. But equally so, implementing this with poll() is going to be horrid, and if the amount of processing done on a connection is nontrivial and/or DoS'able, there's going to be tons of hairy context management code in there, until lo and behold you end up with a 1:N or M:N scheduling implementation yourself. NGPT could be very useful as a portable userspace library here, as these people have implemented an efficient M:N scheduler under GPL, something that hasn't existed before and could be very useful. I think these libraries might be much more complimentary than the article makes out.
1. Re:I don't see why the two are mutually exclusive. by jpc · 2002-11-08 21:29 · Score: 2, Informative
  
  If you have hundreds of thousands of connections you should be using aio, which is the new scalable replacement for lots of polls...
2. Re:I don't see why the two are mutually exclusive. by Second_Derivative · 2002-11-08 22:06 · Score: 1
  
  AIO is a way of beginning a large series of IO operations and leaving the kernel to complete them while you get on with something else (or that's the best definition I can find so far). That still doesn't solve the problem of how to efficiently serve a small number of active connections without ignoring the inactive connections for any extended period of time.
3. Re:I don't see why the two are mutually exclusive. by jpc · 2002-11-08 22:35 · Score: 1
  
  I think it does (but I need to write some programs using it). You just say I want to start reads async on all these connections, tell me what has finished (which will be the active ones). People on the list seem to be using it for these types of apps (thats why they want aio network io).
4. Re:I don't see why the two are mutually exclusive. by rweir · 2002-11-09 00:21 · Score: 5, Informative
  
  Now, for ANY 1:1 threading system, I can't just create x * 10^5 threads because the overhead would be colossal.
  
  Actually, it's kind of famous for that.
5. Re:I don't see why the two are mutually exclusive. by Second_Derivative · 2002-11-09 00:57 · Score: 1
  
  Had a look at the API, it's ... messy to say the least. There's a function that lets you touch off a list of async io operations (ie your "read on every single connection") with one system call, however there's no corresponding function that says "Here's a list of pending AIO ops, give me, now, a list of all the completed or errored-out ones". You can opt to have a realtime signal delivered with the corresponding AIO struct (or fd or something...) but from what I've seen of signals they're pretty evil.
  
  Dunno, I can see how this library would be useful but it stinks a bit too much of "design by committee" for my liking, and it isn't terribly well supported anyway (Linux, for instance, only has an AIO that is emulated by creating a thread for each AIO op. Which is precisely what we're trying to avoid in the first place)
6. Re:I don't see why the two are mutually exclusive. by Quixote · 2002-11-09 01:46 · Score: 3, Informative
  
  Now, for ANY 1:1 threading system, I can't just create x * 10^5 threads because the overhead would be colossal
  If you read the article, it shows benchmarks done by the NPTL folks which shows a 2x improvement in thread start/stop timings over NGPT (which itself is a 2x improvement over POLT (plain old Linux threads)).
  Read more about NPTL here (PDF file).
7. Re:I don't see why the two are mutually exclusive. by Anonymous Coward · 2002-11-09 02:03 · Score: 0
  
  You mean /dev/poll or one of the upcoming replacement APIs which are better at handling large poll lists. Of course, your kernel has to do something reasonable with the implementation, and your program has to do something reasonable with the results.
8. Re:I don't see why the two are mutually exclusive. by GooberToo · 2002-11-09 04:52 · Score: 3, Interesting
  
  "Here's a list of pending AIO ops, give me, now, a list of all the completed or errored-out ones".
  
  Because there's no need. Since AIO functions on the concept of callbacks, your callback will be called when the operation completes. Completion may be "errored-out" or it may be "completed". Adding the house-keeping for these is a no brainer should you really need to have them. After all, you already have to track AIO context to some degree (buffers and perhaps state). Keeping track of your desired information is trival at this point.
9. Re:I don't see why the two are mutually exclusive. by psamuels · 2002-11-10 16:47 · Score: 1
  
  If you have hundreds of thousands of connections you should be using aio, which is the new scalable replacement for lots of polls...
  
  Or epoll, which is the other new scalable replacement for lots of polls. epoll was merged into a kernel 2.5.4x recently. Its interface is somewhat similar to select() - should be relatively straightforward to convert select()-using code. Not that anyone should have been using select() for hundreds of thousands of connections in the first place!
  
  --
  "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
10. Re:I don't see why the two are mutually exclusive. by Omnifarious · 2002-11-10 23:50 · Score: 2
  
  Why is the context management code going to be so hairy that you end up with something as complex as a 1:N or M:N scheduler?
  
  --
  Need a Python, C++, Unix, Linux develop
Re:Yeah, yeah, yeah by Anonymous Coward · 2002-11-08 21:05 · Score: 0, Funny

I hear they just added the keyboard driver. It's the most advanced keyboard driver known to man.

Now all they need is some kind of output device.
So are they both useful? by drinkypoo · 2002-11-08 21:05 · Score: 5, Interesting

And is there any chance of getting both maintained and in the kernel (As options) if they are?
I can easily imagine that one of them might be more efficient for gigantic numbers of threads that don't individually do much, or maybe one might be more efficient for very large numbers of processors, but I don't know jack about the issues involved, so I'm just talking out my ass. (Hello! I'd like to "ass" you a few questions!)
So, someone who knows... Are these threading systems good for different things? And would it really be that hard to make them both come with the kernel?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
1. Re:So are they both useful? by sql*kitten · 2002-11-08 21:20 · Score: 5, Insightful
  
  So, someone who knows... Are these threading systems good for different things? And would it really be that hard to make them both come with the kernel?
  
  They both implement the POSIX threading API (a good thing IMHO). NPTL is more radical; the IBM team made a conscious decision to keep the impact of their changes to the minimum. For that reason, I expect that NGPT will be accepted; it has a shorter path to deployment in production systems, even though NPTL is a more "correct" solution (i.e. it uses purely kernel threads). But it changes userspace, libc and the kernel - it will be much harder to verify.
  
  Are these threading systems good for different things? And would it really be that hard to make them both come with the kernel?
  
  Developers shouldn't care, or more accurately it doesn't matter for them. Both implement POSIX threads, so it simply depends what is installed on the system on which their code ends up running - the same application code will work the same on both, altho' each will have its "quirks". Sysadmins will prefer the NGPT because it is easier to deploy and test. Linux purists will prefer NPTL because a) it's the "right" way to do it, and b) it was written by Red Hat.
  
  They could both come with the kernel source and you could choose one when you compiled it. I don't see how they could coexist on a single system.
2. Re:So are they both useful? by bitMonster · 2002-11-08 22:54 · Score: 3, Interesting
  
  This is totally wrong. Read the white paper. "Yes and No" below the parent post post gets it right.
  
  It seems to me that the NPTL will smoke the NGPT. The author of the article is just being diplomatic. Keep in mind that Ulrich is/was a key developer on both. Usually when a good engineer changes his approach to solving a problem, it's because he has found a better solution. :)
3. Re:So are they both useful? by fredrik70 · 2002-11-08 23:54 · Score: 1
  
  Wouldn't NGPT be better for apps with very high anount of threads? Do most of the thread scheduling in userspace, thus save the kernel from having to deal with too much.
  
  --
  if (!signature) { throw std::runtime_error("No sig!"); }
4. Re:So are they both useful? by barneyfoo · 2002-11-09 01:32 · Score: 2, Informative
  
  With the new O(1) scheduler and other improvements made by ingo molnar scheduling kernel threads is no longer a major bottleneck. Besides, NGPT goes against the linux philosophy of minimally invasive changes to the kernel api, and it's doubtful linus will accept it into the kernel.
5. Re:So are they both useful? by ProtonMotiveForce · 2002-11-09 08:31 · Score: 1
  
  There's nothing more "correct" about NPTL. Real Unix operating systems have used M:N thread semantics for a long, long time.
  
  It was hubris for the developers to even release LinuxThreads ("Oh, our context switches are so fast we don't need user threads!") and the result has been a painful last few years. Oh what joy, every thread is a process! How, umm, clever! Thank God for the new choices (especially NGPT).
6. Re:So are they both useful? by psamuels · 2002-11-10 16:54 · Score: 1
  
  And is there any chance of getting both maintained and in the kernel (As options)
  
  Neither NGPT nor NPTL are in the kernel. They are both userspace libraries using the same kernel interface. You probably associate them with the kernel because NPTL was developed in parallel (no pun intended) with Ingo Molnar's improvements to the kernel threading interface.
  
  Specifically, they implement libpthread, which is packaged with libc (so libc can use it to be thread-safe). Ulrich Drepper seems quite committed to packaging NPTL with the GNU libc, but it should be possible to rework NGPT to make it possible to swap it in at compile time, if you want to recompile your libc, or possibly even at runtime. (I say "possibly", because I don't know if the two have exactly the same ABI, with regards to things like mutexes.)
  
  --
  "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
WRONG!!! Half-Life was based, mostly, off of Quake by Ndr_Amigo · 2002-11-08 21:12 · Score: 5, Interesting

I really don't understand where people get that ridiculous idea.

Half-Life was mostly based off Quake1. The network protocol and prediction code was taken from QuakeWorld. Some small Quake2 functionality was merged later on.

The initial release of Half-Life was approximatly 65% Quake1, 15% QuakeWorld, 5% Quake2 and 15% Original(Not including the bastardisation of the code into MFC/C++).

And yes, people from Valve have confirmed the base was Quake1, not (as some people continue to claim, and I really wish I knew where the rumor started) Quake2.

Also, the percentages are based off some reverse engineering work I done a while ago when I was playing with making a compatible Linux clone of Half-Life.

(FYI, I took the Quake1 engine.. added Half-Life map loading and rendering within about three hours... Half-Life model support took about four days, and adding a mini-WINE dll loader for the gameplay code took about a week. I gave up on the project when it came down to having to keep it up-to-date with Valves patches)

- Ender
Founder, http://www.quakesrc.org/
Project Leader, http://www.scummvm.org/
Tenebrae! by SexyKellyOsbourne · 2002-11-08 21:19 · Score: 1

Send the code for that Half-Life project to the creators of the Tenebrae engine, and you will be highly revered.

Half-Life (or better yet, Counter-Strike) with Doom III graphics would just own.

http://tenebrae.sourceforge.net
1. Re:Tenebrae! by Ndr_Amigo · 2002-11-08 21:36 · Score: 2
  
  I know most of the Tenebrae developers, and trust me, none of them have ANY time to do this.
  
  Besides which, Tenebrae's code is just a proof of concept, and not remotely suitable for a real workable port.
  
  That, and I'm too busy with ScummVM to help anyone implement this stuff, it's very disjointed and hacky. And I'm already revered, thanks :o)
  
  (If anyone does want to play with this type of thing theirselves, check the Forums section of my QuakeSrc site. Both the half-life map rendering code, and an older version of my Half-Life MDL renderer, are floating around in several engines. I'm sure someone on the forums would be happy to help you.
  
  Your on your own on network protocol and merging the WINE stuff tho. I done this port some years ago, and I don't have much of the source left)
  
  - Ender
  Founder, http://www.quakesrc.org/
  Project Leader, http://www.scummvm.org/
2. Re:Tenebrae! by dolo666 · 2002-11-12 08:21 · Score: 2
  
  Counter Strike could be amazing with high-end graphics, but perhaps there is a reason that DoomIII isn't going to have really advanced multiplayer? Carmack said, some multiplayer, but not that much multiplayer. Could it be that DoomIII is too cumbersome to play over the net? Maybe too cumbersome to play over a LAN? (*ahem*Blood2*ahem*)
  
  I really think that a Counter Strike realism factor could rock, but what about the new Nvidia graphics language everyone keeps talking about? Maybe it won't be hard for us to make leaps toward a fast net CS game with super cool graphics. Maybe the new code would enable the CS team to split off from the Half Life engine and do their own thing?
  
  If you look at the stats on Gamespy, Counterstrike takes the biggest share of online use. Take that and think about Valve and how much money they made on the backs of that project!
  
  *shudder*
  
  To clear up my point, I think that a glossy Doom3 version of CS might be immersive, but it might also lack the networking capability that the current version has. CS today doesn't take up much resources on higher end systems, or even lower ones. Today it's pretty cheap to get CS enabled.
  
  Maybe that plays a factor.
Full Text by Anonymous Coward · 2002-11-08 21:32 · Score: 0, Informative

O'Reilly Network
Advertisement

Published on The O'Reilly Network (http://www.oreillynet.com/)
http://www.oreillynet.com/pub/a/onlamp/2002/11/07/ linux_threads.html
See this if you're having trouble printing code examples
Linux Multithreading Advances
by Jerry Cooperstein
11/07/2002

Recent advances in Linux's threading implementation are expected to continue to ease migration from other Unix-like operating systems. These advancements have arrived with intense activity on two fronts. First, thread-handling improvements have greatly enhanced the kernel's scalability even to thousands of threads. Second, there are now two fresh, competing implementations of the POSIX pthreads standard (NGPT and NPTL) set to replace the aging LinuxThreads library.

In typical open source fashion, only time will tell exactly who wins out in which arena. However, both new library implementations should be API-compatible with the standard, so the choice will depend on performance and stability. The required changes will appear in the upcoming Linux 2.6 (or 3.0) kernel and can already be tested in late development versions.
Multithreading on Linux

Threading implementations typically have components in both user and kernel space. It is possible to do everything from the one side or the other, but each approach has problems. With everything on the user side, all related threads are part of one single process (which can only run on one CPU at a time), and multi-processor systems are underutilized. With everything on the kernel side, the kernel scheduler must bear a heavy load.

proaches have ranged from the 1:1 pure kernel thread model in which each user thread has its own kernel thread, to the M:1 model in which the kernel sees only one normal process, with an arbitrary number of threads with which to schedule in user space. The M:N model falls in between, associating M user threads with each of N kernel threads.

The Linux kernel uses the clone() function to create new processes. Flags control parent/child resource sharing, where resources range from everything (memory, signal handlers, file descriptors, etc.) to nothing. While the usual fork() inherits resources from the parent, it may share nothing. Copy-on-write techniques ensure each process gets its own copy as soon as either one tries to modify a shared resource.

Programs can call the clone() function as a system call, using it directly to produce multithreaded programs. However, it is completely Linux-specific and non-portable. Since there is no external standard, there is no guarantee that its interface will be stable. Threading library implementations do in fact use the clone() system call, and it is the job of library maintainers to keep up with kernel changes.

The LinuxThreads implementation of the POSIX threads standard (pthreads), originally written by Xavier Leroy, has been the dominant one for years and is now incorporated and maintained in glibc. It has two problems on Linux:

* Compliance issues as compared to the POSIX standard
* Performance issues, especially when dealing with many threads; i.e., a lack of scalability

Compliance Issues

Virtually all compliance problems can be traced to the decision to use lightweight processes (LWPs, or the 1:1 model described above) as the basis of the implementation. New processes are created by clone(), with a shared-everything approach. While the new process is lighter due to the sharing, fundamentally it is still a process in its own right, with its own process identifier (pid), process descriptor, etc.

This led to the following standards compatibility problems:

* Signal handling is incorrect.
* An extra management thread is created by the pthreads library.
* ps shows all threads in a process.
* Core dumps don't contain the stack and machine register information for all threads.
* getpid() returns a different result for each thread.
* A thread cannot wait for a thread created by another thread.
* Threads have parent-child, not peer, relationships.
* Threads don't share user and group IDs.

If a pthreads application were written for Linux, one could expect easy portability. However, the inverse process, porting to Linux, was more difficult and slowed Linux deployment since important applications were now broken.

Some problems were resolvable by relatively minor kernel adjustments. For example, by modifying the basic data structure describing each process, (struct task_struct) to store a thread group identifier and some other bookkeeping, and then modifying the getpid() system call to return this identifier rather than the process identifier, one problem could be solved.

However, many key kernel developers resisted attempts to modify the kernel for compliance sake. On one hand, their taste runs to technically superior solutions rather than to "cut the toes to fit the shoes" to comply with standards. On the other hand, there was an aversion to creating many threads. Sentiments like "there is no need to create more threads than there are processors" were common. Thread-prolific languages such as Java were looked at with contempt for many reasons.
Performance Issues

The main performance problem has been scalability with growing numbers of threads. These difficulties are not unique to threads, but apply in all cases where the number of processes grows large.

Consider the process of obtaining a new pid. In the 2.4 kernel, Linux has to loop through all processes to ensure a candidate pid is not already assigned. With an outer loop on possible candidates, the time spent may scale quadratically; if there are thousands of processes, the system can slow down to a crawl. Since each thread has its own pid, creating zillions of threads is poisonous.

New Generation POSIX Threads

A group at IBM and Intel, led by Bill Abt at IBM, released the first version of the New Generation POSIX Threads (NGPT) library in May 2001. This consisted of a drop-in replacement for LinuxThreads, together with patches for kernels beginning with 2.4.0.

To ease acceptance, the group made a conscious effort to keep the impact on the kernel small. They worked to get the kernel modifications they needed through patient, piece-by-piece promotion and expected to have NGPT eventually replace LinuxThreads in the glibc system.

NGPT is a derivative of the GNU Pth (GNU Portable Threads) package, which up to now is based on an M:1 model. A user space priority and event-based, non-preemptive scheduler manages the M user threads. This was seen as an improvement over the 1:1 pure kernel thread model used by LinuxThreads where the kernel has to do a lot of scheduling work.

NGPT adopted the M:N hybrid model. Many developers saw this as the best path to good performance: keep all CPU's humming, minimize context switching between kernel threads, and switch mostly between user space threads. However, the M:N model is complex. It requires two cooperating schedulers, one each in user and kernel space. Signal handling is difficult and much work has to be done in user space. It takes fancy footwork to prevent one blocked thread from blocking other threads running in the same process.

Nonetheless, the NGPT team succeeded in implementing the full pthreads standard, and the kernel changes they needed were accepted in the mainline kernel early in the 2.5 development process (at kernel 2.5.4). They were also back-ported into the 2.4.19 kernel. Depending on the metric used, performance gains were claimed of up to 100 percent, and work continues on improvements.

On March 26-27, 2002, Compaq hosted a meeting to discuss the future replacement for the LinuxThreads library. In attendance were members of the NGPT team, some employees of (then distinct) Compaq and Hewlett-Packard, and representatives of the glibc team, including the head maintainer, Ulrich Drepper (a Red Hat employee), who wrote a summary of the meeting.

Pursuing the M:N approach, the report said:
"This is one of the reasons why it is absolutely necessary to think about two-level scheduling for the threads. I.e., the actual user threads are different from the kernel threads (or light-weighted process, or what ever one wants to call them) and scheduled separately. This is generally called the M-on-N model for a thread implementation. The 1-on-1 model dedicates a unique kernel thread for each user-level thread; this is the model used by the current, inadequate thread library implementation."

The report contains detailed analysis of how to get kernel and user-space schedulers to cooperate using the scheduler activations method.

It seemed the replacement for LinuxThreads would be based on NGPT.
Native POSIX Thread Library

On September 19, 2002, Ulrich Drepper and Ingo Molnar (also of Red Hat) released an alternative to NGPT called the Native POSIX Thread Library (NPTL). The project included a new user space library, changes to glibc, and kernel modifications. The initial announcement said in part:
"Unless major flaws in the design are found this code is intended to become the standard POSIX thread library on Linux system, and it will be included in the GNU C library distribution."

NPTL is based squarely on the 1:1 pure kernel thread model. A white paper explains why in detail.

Recent changes to kernel thread handling (mostly due to Ingo Molnar) had vastly improved thread performance. With these changes in place, the relative simplicity of the 1:1 model became very attractive.

There is only one scheduler. Signal handling remains in the kernel's hands. Blocking problems are handled naturally because each kernel thread schedules independently. In addition, the user space implementation becomes fundamentally simpler.

In some sense, one has come full circle; developers who wanted to ensure full Posix compliance were frustrated by the kernel maintainers' unwillingness to adapt the Linux kernel to fit their needs, and thus NGPT was developed in part as a polite end run requiring minimal kernel changes. Then a programming tour de force, mostly by one key kernel programmer, is now claimed to enable reversion to a much simpler approach.
Linux Kernel Improvements

What changes have been made in the Linux kernel to make threads perform and scale better?

Consider the previous example of obtaining a new pid. The potentially quadratic search is gone. Instead, the kernel sets aside a small but dynamic number of memory pages as bitmaps for process identifiers. Obtaining a new pid means finding a page with free entries and then finding and clearing the first set bit. No locking is required, and the search time is very short and almost independent of the number of running processes.

Another key improvement is the O(1) scheduler, which no longer has to cycle through all processes to find the most deserving one. Each CPU has its own queue, a simple priority-sorted bitmask. Once again finding a new process is very fast and scales fantastically.
Where Do We Go From Here?

The NPTL team posted some benchmarks, such as this display of the minimum time needed to create a number of top-level threads.

In general, while NGPT beat the old methods by a factor of two, NPTL could do better by another factor of two.

It remains to be seen exactly how the two implementations will rank against each other. NGPT may not yet be tuned to take advantage of recent kernel improvements the way NPTL has. Furthermore, benchmarks are often used to misrepresent. It will take further development by both teams, independent benchmarks, and real life comparisons to see who really beats whom.

You can test drive NGPT by simply downloading the library and installing it, as long as you have kernel 2.4.19 or later. For NPTL, you can download the library, but you will need a very recent development kernel as well as bleeding edge glibc and gcc. The announcement contains detailed instructions.

While there may be some hard feelings on the socio-political side about how NPTL seemed to come out of the blue, the maintainers of NGTP have not griped in public. It seems that any battle between the two implementations will now be played out in public, in good open source fashion. Either one library will win out over the other, or each will become the preferred tool in some universe for some load. At any rate it will be fun to see what happens. Linux will benefit by having a standards-compliant, and well-performing threads implementation(s).

Jerry Cooperstein is a senior consultant and Linux training specialist at Axian Inc., in Beaverton Oregon, and lives in Corvallis, Oregon.

Return to the Linux DevCenter.

oreillynet.com Copyright © 2000 O'Reilly & Associates, Inc.
Yes and No by krmt · 2002-11-08 21:33 · Score: 5, Informative

I don't understand this all that well myself, but I did just read the whitepaper linked to in the article written by Ingo Molnar and Ulrich Drepper. From the looks of things, NGPT's M:N model will cause a lot more problems because of the difficulty of getting the two schedulers (userspace and kernelspace) to dance well together.

By sticking with the 1:1 solution that's currently used in the kernel and the NPTL model, there's really only the kernel scheduler to worry about, making things run a lot more smoothly generally. I'd imagine latency being a big issue with M:N (I'm pretty sure that it was mentioned in the whitepaper). I haven't read the other side of the issue, but I think that pretty graph in the O'Reilly article says it all performance-wise.

There are other issues though, like getting full POSIX compliance with signal handling. The 1:1 model apparently makes signal handling much more difficult (I don't know anything about the POSIX signaling model, but there's a paper about it on Drepper's homepage that could probably shed some light on the subject if you were so inclined. There are other issues in the current thread model that have to be dealt with in a new 1:1 model (and are) such as a messy /proc directory when a process has tons of threads.

From the whitepaper, it seems that the development of the O(1) scheduler was meant to facilitate the new thread model they've developed, which I hadn't thought about before even though it makes sense. There's still some issues to work through, but both models look promising. If the signal handling issues can be resolved it looks like from the article that NPTL's model will win on sheer performance.

As for making them both come with the kernel, that's really really difficult, since this stuff touches on some major pieces of the kernel like signal handling. The same way you're only going to get one scheduler and VM subsystem, you're only going to get one threading model. You're able to patch your own tree to your heart's content, but as per a default install, there can be only one.

--

"I may not have morals, but I have standards."
1. Re:Yes and No by karlm · 2002-11-09 02:34 · Score: 2
  
  It would seem to me that NGPT could be modified easily to run on an NPTL kernel. In any case, I don't see why the o(1) scheduler and 0(1) kernel thread creation code woouldn't be worked into the 2.6 kernels. As far as which Linus likes better, my guess is NPTL, as it drastically improves kernel performance. Linus has shown a willingness to make drastic changes even in a production kernel if he feels the performance gains are substancial. I would guess that (most if not all of) the NPTL kernel mods will make it into Linus's tree. In the end people will go with Linus's decission, so it really comes down to who convinces Linus.
  I'd personally like to see the NPTL kernel mods with the NGPT libraries. This would seem to provide the most forward-looking approach, as it offers lots of scalability and flexability.
  I personally can't wait for the 2.6 kernel, whichever model wins. Java apps are nice, but they tend to use way too many threads. I really don't know why a select() wasnn't present in the beginning for a langugae designed to be used in a networing environment. Oh well.
  
  --
  Copyright Violation:"theft, piracy"::Anti-Trust Violation:"thermonuclear price terrorism"<-Overly dramatic language.
2. Re:Yes and No by ProtonMotiveForce · 2002-11-09 08:33 · Score: 1
  
  Again, real Unix operating systems have had working, stable M:N thread models for a long time. Why would it be any problem for Linux?
  
  Does this whitepaper by a (relatve) newcomer somehow invalidate 10 years of real experience by the rest of the world and by large commercial companies which make real money selling Unix operating systems? Hardly.
3. Re:Yes and No by taverngeek · 2002-11-09 11:11 · Score: 1
  
  Handling blocking system calls should be more efficiently handled by the kernel scheduler and have improved performance in the NPTL model.
  
  Handling interthread scheduling such as pthread_cont_wait() and pthread_cond_signal() should be more efficiently handled by an user space scheduler and benefit from the NGTL model.
  
  Thus, ideally the NGTL model would run on top of the NPTL model. The NGTL model could cherry pick that which it can handle faster than the kernel scheduler to provide a best of both worlds.
4. Re:Yes and No by sigwinch · 2002-11-09 14:50 · Score: 2
  
  Handling interthread scheduling such as pthread_cont_wait() and pthread_cond_signal() should be more efficiently handled by an user space scheduler and benefit from the NGTL model.
  Don't the new futexes solve that? The common case of no-contention is handled entirely in userland via atomic operations in a shared memory page. If there is contention, the waiter calls the kernel to yield the processor, and the kernel wakes it up when the lock is released.
  So you get the best of both worlds: the common case is handled with no overhead--not even a single system call--while scheduling is handled by an omniscient 1:1 kernel scheduler.
  Meta comment: Why the fuck does Slash strip out HTML non-breaking space character entities? More proof that the /. operators don't care about content, they just want to milk the site for advertising money.
  
  --
  --
  Kuro5hin.org: where the good times never end. ;-)
LWN by KidSock · 2002-11-08 21:46 · Score: 5, Informative

has a nice article about the state of threading on Linux. See the Sept. 27th Weekly Edition.
Wow! by jaavaaguru · 2002-11-08 21:51 · Score: 0

There are threading implementations in 2.6 specifically for the O'Reilly Network?

--
Follow me
Someone should start a site.... by jericho4.0 · 2002-11-08 22:00 · Score: 1, Redundant

Golly!!! That was an informative article.
I was aware of the debate about the linux threading issue, but the kernel mailing list was too noisy to pick out this kind of detail.
Someone should start a site that covers long term issues, rather than the week by week stuff I've found on the web... or maybe someone has, and I'm just too out of the loop....

--
"A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
1. Re:Someone should start a site.... by taviso · 2002-11-08 23:07 · Score: 4, Informative
  
  Someone should start a site that covers long term issues, rather than the week by week stuff I've found on the web... or maybe someone has, and I'm just too out of the loop....
  
  KernelTrap.
  
  --
  ex$$
2. Re:Someone should start a site.... by Salsaman · 2002-11-09 00:40 · Score: 4, Informative
  
  Try lwn.net. They have a weekly overview of the kernel status. Since they moved to a subscription model, you have to pay to see the latest news, but previous weeks can be viewed for free.
3. Re:Someone should start a site.... by rdennis · 2002-11-09 05:33 · Score: 1
  
  Kernel Traffic
  http://kt.zork.net/kernel-traffic/latest. html
  
  seems to keep track of the linux kernel mailing list every week or two....nice updates...summaries.
  
  --Rick
4. Re:Someone should start a site.... by Anonymous Coward · 2002-11-09 08:21 · Score: 0
  
  That's an interesting model. The New York Times lets you see the latest issue free and you have to pay for back issues.
Oh crap, I wish I didn't have to say this... by Saint+Stephen · 2002-11-08 22:28 · Score: 1, Flamebait

But as a Windows programmer you have to know how hopelessly amateurish this makes you all sound?

I guess since Windows sucks so bad at shitloads of processes, programming on 4 or more CPUs, you really quickly have to learn how to write multithreaded code that works, and correctly. You poor Unix guys are struggling through something we all went through years ago -- learning how to think more sophisticated than a single thread of control correctly.

CPU-bound tasks (spinning in a loop calculating PI) are easy to saturate all the resources using any model. How come y'all are switching to a thread-based model now? Was the other way running out of steam?

Honestly curious...
1. Re:Oh crap, I wish I didn't have to say this... by Anonymous Coward · 2002-11-08 23:11 · Score: 0
  
  Why do you derive your sense of self-worth from an operating system? Do you know how hopelessly amateurish you sound?
2. Re:Oh crap, I wish I didn't have to say this... by Fnord · 2002-11-08 23:48 · Score: 5, Interesting
  
  Mostly because the was unix VMs are designed is much more efficient at multiple process programs than windows is. Windows started doing threads long before smp was all that common. They did it because multi process was slow as hell. But for 90% of tasks it worked just fine in linux. And its not like linux is just now moving to a thread model. Its just making the existing one (which worked well until you scale to many many threads) a bit better. And by better I don't mean similar to windows performance, I mean similar to solaris (which has threading from the gods).
3. Re:Oh crap, I wish I didn't have to say this... by khuber · 2002-11-09 00:13 · Score: 2, Informative
  
  IBM's System 360 had multithreading in 1964.
  Multics had multithreading in the early 1970s.
  Windows was still launched from DOS in 1992.
  Please go back to your "innovating" with Windows.
  -Kevin
4. Re:Oh crap, I wish I didn't have to say this... by rweir · 2002-11-09 00:29 · Score: 2
  
  Seriously, what do you use threads for? Unix traditionally uses seperate processes to handle most things Windows folks use threads for. Since Linux has things like copy-on-write and shared memory, it's not that much less efficient. Plus you get the advantage of complete address seperation, aka they can't crash each other.
5. Re:Oh crap, I wish I didn't have to say this... by peterb · 2002-11-09 00:56 · Score: 2
  
  The "complete advantage" of running 'would-be threads' in separate process spaces also has the "complete disadvantage" of an inability to share data without using some heavyweight mechanism. Not to mention a heavyweight context switch.
  
  Yes, threads can be misused and abused, but for some problems they're the right tool. All the developers in this article are trying to do is develop a hammer that won't break the first time they hit a nail with it (IMHO, Linux's inability to deal with massive numbers of threads without severe performance degredation makes the threading implementation approach uselessness. I'm glad someone is fixing it).
6. Re:Oh crap, I wish I didn't have to say this... by smittyoneeach · 2002-11-09 01:38 · Score: 2
  
  Here, here: acknowledgement of technical excellence where present, disparagement of ethical vacuum where noticed.
  It's far more instructive for those of us a little lower on the learning curve to observe these architectural questions in public, than to have them made for us by Those Who Know Best.
  Prediction: both implementations have their appropriate place. In the ideal world, you'd install the proper one based on the application.
  The one that becomes the usual suspect in commercial distributions is likely the one that is best suited for business tasks, which aren't that thread-happy in the first place, if my guess isn't too far off...
  
  --
  Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
7. Re:Oh crap, I wish I didn't have to say this... by Anonymous Coward · 2002-11-09 02:19 · Score: 0
  
  What do we use threads for? Performance in server applications, mainly. Consider IIS v an early Apache 1:
  
  * async IO v synchronous IO
  * IOcompletion port v select (with thundering herd)
  * thread-pooled v process-pooled (or worse, spawning)
  * only context-switches when worker thread blocks (eg disk or database IO) v context-switches per request (multiple switches in naive implementations)
  
  The threaded one wins on performance, big time. But the process-based one wins on stability (and yes, that's more important).
  
  The two systems have different design philosophies and this influences the particular kernel features. For example, the UNIX simple programs-connected-by-pipes model means Linux MUST have fast process creation. Equally, the Windows reuse-existing-resources model leads to thread pooling, async IO and so on.
  
  But the two models are learning from each other: Apache 2 is multi-threaded, while IIS6 has process isolation/pooling/recycling (plus managed code needs process isolation less).
  
  And since .NET server's got more process isolation/recycling, the rumors that it's faster at starting processes/threads are reasonably plausible. Linux should get NPTL into the official kernel ASAP.
8. Re:Oh crap, I wish I didn't have to say this... by Waffle+Iron · 2002-11-09 02:39 · Score: 3, Insightful
  
  . How come y'all are switching to a thread-based model now? Was the other way running out of steam?
  Correctly programming threads is hard, so they should only be used when necessary. Many of the things that can be done with threads can be done more safely with fork() and/or select(). Since Windows lacks the former and has a broken version of the latter, Windows programmers tend to use threads when Unix programmers would use an alternative.
9. Re:Oh crap, I wish I didn't have to say this... by mindstrm · 2002-11-09 03:05 · Score: 1
  
  False.
  
  Many unix apps use threads.
  
  Linux has had threads for ages; this is a change to the model behind the threading, not "the addition of threads" to linux.
10. Re:Oh crap, I wish I didn't have to say this... by mindstrm · 2002-11-09 03:07 · Score: 1
  
  We do? What kinds of things does windows use threads for that we use separate processes for?
  
  Windows has shared memory.
  Windows has copy-on-write, I'm sure.
11. Re:Oh crap, I wish I didn't have to say this... by 0x0d0a · 2002-11-09 03:46 · Score: 3, Insightful
  
  You poor Unix guys are struggling through something we all went through years ago -- learning how to think more sophisticated than a single thread of control correctly.
  
  What the heck does altering the structure of a thread *library* have to do with application-level thread programming? What are you talking about?
  
  --
  May we never see th
12. Re:Oh crap, I wish I didn't have to say this... by Anonymous Coward · 2002-11-09 07:01 · Score: 0
  
  Solaris has threading from the gods ?
  
  Aha, that's a good one. Sure it has threading, something that Linux couldn't really claim until NPTL/NGPT. But as far as performance goes, HP-UX (late commer to pthread party) beats Solaris hands down.
  
  I just wish HP would reconsider their move to MxN. While there MxN implementation is faster than Solaris 1:1 and MxN versions, it adds too much complexity to the runtime. It's a ill-advised move, at a time when programmers finally realize that creating billions of threads is a bad idea (cf. Java nio, various attempts to fix/avoid poll/select (kqueues, event ports, ...)).
  
  1:1 relative simplicity is the major reason I hope NPTL will completely eclipse NGPT.
13. Re:Oh crap, I wish I didn't have to say this... by Oggust · 2002-11-09 07:19 · Score: 1
  Actually, processes are alomost always more efficient than threads.
  
  If you need threading for MP performance, multi-process are probably better for you. (unless the "thread"(s) are very short lived, so that the small startup cost actually matters.)
  
  If you're on a small box (2-4 CPUs), processes have the advantage that since you're only sharing what you really need (since you explicitly need to share it, with shared memory or whatever) instead of everything but stack (more or less), the CPUs get stay out of each other's way a lot more. With threads they keep invalidating eachothers caches all the time, which is a kind of nonobvious and hard-to-profile performance problem.
  
  If you're on a big box, chances are that it's actually a NUMA machine, and the problems with threading gets even worse on those (since you share much more than you need, many of the CPUs will use non-local memory a lot!)
  
  If you're on a single CPU box, threading is almost guaranteed to be a loss versus just running single threaded. The one thing you can gain from it is not having to sleep on IO. Then again, multi-process will do this for you too.)
  
  multi-process code is a lot more debuggable. You can run the separate processes by themselves, and verify their behavior, they will actually behave in mostly the same way in and out of the debugger, etc.
  
  You can use non-threadsafe libraries, or different threading systems within the processes (which is a major PITA otherwise.).
  
  It's harder to move a threaded program to a cluster than a multi-process one. (Since a cluster is mostly a NUMA box with really long access time to nonlocal memory.)
  
  /August
  --
  "An object declared as type _Bool is large enough to store the values 0 and 1." -- 6.1.2.5, C99 standard.
14. Re:Oh crap, I wish I didn't have to say this... by Anonymous Coward · 2002-11-09 07:57 · Score: 0
  
  OK Mr. Windows, how did you avoid deadlock between consumer/producer threads without an atomic 'SetEventAndWait()' method?
  Or did you just hope for the best with code like the following?
  // signal to the consumer (or producer) that the resource is available SetEvent(...) ; // wait for the producer (or consumer) to tell us there's something to do // N.B. -- I sure hope our thread doesn't get interrupted between // the above line and this one, possible causing us to // miss the event from the other side (that would suck!) WaitForMultipleObjects(...) ;
15. Re:Oh crap, I wish I didn't have to say this... by len_harms · 2002-11-09 08:13 · Score: 1
  
  Your using the wrong method... Events should ONLY be used if you do not mind missing them. Its just the way windows works. The idea behind events is everyone that is 'waiting' can go now if they are not already. But not necessarialy the other way around. Ill wait till someone else is done. Thats what mutexs and semaphores are for.
16. Re:Oh crap, I wish I didn't have to say this... by len_harms · 2002-11-09 08:36 · Score: 1
  
  Windows has its own flavor of threads. Ive always found it to be 'in the way' as it were to making seperate in-process type things.
  
  Windows also has copy on write. That is not a linux/unix exclusive. Its part of the 386 cpu and its paging method. Did you know if you load 1 dll and some else loads it they share the code memory? This was how in win3.1 people were able to do shared memory type things fairly simpley. 9x and NT did not allow such things so a lot of those apps broke when they came along.
  
  Win32 also has shared memory. Again mapping memory into different process spaces is part of the CPU. A 'easy' as it were for both teams to make work.
  
  You can also create seperate processes in win32. With the function called CreateProcess. Most of the time you do not need the extra overhead of a seperate process. You just need 'go do this and come back when done'. The load time in windows for a process is not a trival thing. That is why CreateProcess (or out of proc) is not used as much. Versus the thread model used (in proc). In proc gives you a simpler way to share memory and interathread communication. Out of proc usually involves mutexs and shared memory. Setting up a shared memory space is NOT a simple task win32. That is why you will see more threads in windows programs. Because people are lazy and tend to dislike complex things.
  
  What drives me up the wall is that CreateThread versus _beginthread versus AfxBeginThread. Each one with its own rules. You have to watch EVERYTHING with one and not the others. You have to watch out for things like shared heaps and the like or end up with crummy performance. There are LOTS of things I would like to see changed in the way windows does all this. Its far from perfection. Do one wrong thing with the wrong C-runtime at the wrong time and your HOSED in performance. There are things you can do to make it better but you usually end up having to do it all by hand...
  
  The one other thing that bugs me about Win32 threads is if you are not carefull you can actually swamp the CPU. With two threads of equal priority. One stuck in a loop and the other not. The one in the tight loop will swamp out any other threads on the system of equal priority. I usually end up having to yeild cpu with Sleep(1). Also Sleep(0) does NOTHING. It does not yield. But im just bitter and ranting :)
17. Re:Oh crap, I wish I didn't have to say this... by khuber · 2002-11-09 15:51 · Score: 1
  
  Which thread library (Solaris has two)? The new thread library which is now standard in 9 is very good.
  -Kevin
How about scheduling & thread-specific storage by parabyte · 2002-11-08 22:33 · Score: 5, Interesting

Among the issues with threads beeing half a process, half a thread (getpid() bug, signal handling etc.) that are mentioned in the article, I found issues in two other areas:
scheduler does not immediately respond to priority changes
thread-specific storage access is slow
There is a well known effect in multi-threaded programming called priority inversion that can cause deadlocks when a low-priority thread has acquired a resource that a high priority thread is waiting for, but a medium priority thread keeps the low priority thread from beeing executed and so the medium priority thread effectively gets more cycles than the high priority thread.
One way to overcome this problem is to use priority ceiling locks where the priority of a thread is boosted to a ceiling value when it acquires a lock. Unfortunately I found that changing the priority of a thread for a short interval does not have any effect at all with the current 2.4.x standard pthreads implementation.
The second problem I ecountered is that accessing thread-specific storage with pthread_getspecific() takes 100-200 processor cycles on 1 Ghz PIII, which makes this common approach to overcome hotspots almost as slow as locking.
Does anyone know if any of these issues are adressed by the new implementations ?
p.

--
Without order, nothing can exist. Without chaos, nothing can be created.
I'm getting worried by Anonymous Coward · 2002-11-08 23:00 · Score: 0

I'm getting worried.
Slashdot used to be a place for us, the advocates of Free Software. Now I'm getting Micro$oft .NET ads in almost every article and windoze uses like the parent poster post Ballmer's propaganda all over the place.
When did Taco sell out his ideals?
1. Re:I'm getting worried by Anonymous Coward · 2002-11-09 04:33 · Score: 0
  
  What ideals?
Kernel vs user doesn't make sense by iamacat · 2002-11-08 23:00 · Score: 4, Interesting

I don't see how someone can say that "kernel thread scheduling" is slower than "user thread scheduling". Whatever algorithms pthreads library is using could also be used by a kernel process scheduler and offer the same benefits for daemons that fork() a lot of processes. Indeed, most of the time threads are not used to take advantage of multiple processors. Instead they are used in place of multiple processes with some shared memory that handle multiple requests at once. If they could be re-written to really be multiple processes with some shared memory, the resulting application will be simplier and possibly more stable/secure because only some portions would need to worry about concurrent access. Conceptually, there is no reason why kernel code shouldn't use virtual memory, start system-use processes/threads, load shared libraries and so on. Or why "user" code shouldn't handle IRQs, call internal kernel functions or run in CPU supervisor code. Some tasks demand a certain programming model. For example, one would hope that a disk IRQ handler doesn't use virtual memory. But there is no need to place artificial restrictions to the point that multi-level schedulers and duplicated code are needed to run a nice Java web server.
1. Re:Kernel vs user doesn't make sense by khuber · 2002-11-09 00:19 · Score: 1
  
  Since the other poster answered your question, I just wanted to make the comment that the GNU Hurd blurs this user/kernel space distinction by minimizing what is in the kernel. A user can run and test their own kernel components independently from other users.
  -Kevin
2. Re:Kernel vs user doesn't make sense by inquis · 2002-11-09 02:28 · Score: 3, Informative
  
  Two words: context switches.
  
  Whenever execution switches between user mode and kernel mode, a context switch is required. Context switches are expensive.
  
  Inidentally, this is one of the advantages of the microkernel approach: by severely limiting the code that must be run in kernel space, you can minimize context switches between kernel and user mode and save a lot of time.
3. Re:Kernel vs user doesn't make sense by mOdQuArK! · 2002-11-09 04:07 · Score: 1
  
  Inidentally, this is one of the advantages of the microkernel approach: by severely limiting the code that must be run in kernel space, you can minimize context switches between kernel and user mode and save a lot of time.
  
  That's odd - I heard that many of the early, naive implementations of microkernels caused performance degradation because of the massive context switching that was required between all the various user-space processes created to handle all of the functionality which was normally handled in the monolithic kernel.
4. Re:Kernel vs user doesn't make sense by mesocyclone · 2002-11-09 05:16 · Score: 3, Informative
  
  Typically, the microkernel approach INCREASES the number of context switches. However, a microkernel also normally has very fast context switches.
  
  The context switches are increased because a single operation (say, and I/O read) requires switching into the kernel from the user process, and then out into a device driver. A non-microkernel would have the device driver in the kernel. This is just an example - it may be that the switch is to the file system manager instead, or some other helper process. The point is that the nature of a microkernel is to have lots of helper processes that perform what are normally macro-kernel functions.
  
  Context switches typically are expensive because they involve more than just a switch into kernel mode. They are likely to involve some effort to see if there is other work to do (such as preempt this thread). They may involve some privelege checks, and some statistical gathering.
  
  A microkernel just does less of this stuff.
  
  BTW... the first elegant running micro-kernel I ran into was the original Tandem operating system. The kernel was primarily a messaging system and scheduler (I think scheduling *policy* may have been handled by a task, btw). I/O, file system activity, etc was handled by privileged tasks. It was very elegant, and conveniently fit into their "Non-Stop (TM)" operation.
  
  --
  The only good weather is bad weather.
5. Re:Kernel vs user doesn't make sense by Anonymous Coward · 2002-11-09 08:26 · Score: 0
  
  That's ridiculous. The most difficult part of the implementation of a microkernel is making drivers perform well given that there are so many context switches. User-space interprocess communication is vital to avoid invoking the scheduler excessively. Slashdot moderators suck :/
Threads usage by zak · 2002-11-08 23:58 · Score: 1

People have been writing multithreaded Unix apps for a very very long time (DCE and UI threads have been around for more than a decade, not to count "non-standardised" thread packages). The fact that many applications have been written with the multi-PROCESS model in mind (which is natural for Unix), has nothing to do with this.
Mode switching times. by zak · 2002-11-09 00:04 · Score: 2, Informative

Switching between user and kernel mode takes time. If all your primitive operations are implemented in user mode, synchronisation (for instance) takes several cycles in the best case (resource is free, lock it), and a bit over a hundred in the worst (resource is busy, context switch). When you also add the user/kernel mode transition (which may be a couple dozen cycles on some RISCs but takes more than a hundred on some x86 architectures), you can see how performance may degrade.
1. Re:Mode switching times. by iamacat · 2002-11-09 05:59 · Score: 2, Insightful
  
  I guess I agree that we shouldn't do a context switch just for executing a single xchg instruction. But if the resource is busy, user level scheduler can not make a good decision. For one thing, it can only switch to threads in the same process where as kernel can make a global decision, such as switching to a process holding the resource we are waiting for. Also, user scheduler doesn't have execution statistics - working set, % of cpu slice used I/O behaviour etc - even for it's own threads. It can only do round-robin scheduling rather than optimizing potentian througput based on each thread's history.
Re:GNU/HURD by Anonymous Coward · 2002-11-09 00:47 · Score: 3, Funny

"Just like Communism, GNU/HURD will never take off."

More realistically, like communism GNU/HURD is a great idea in theory, but the only available example right now is run by fascist lunatics and doesn't work...

AC because I'm too cruel
and not a moment too soon by truth_revealed · 2002-11-09 01:12 · Score: 4, Interesting

Debugging multithreaded programs in Linux is a complete bitch. As the article mentioned, the core dump only has the stack of the thread that caused the fault. Yes, I know any competant multithreaded programmer uses log files extensively in debugging such code but any additional tool helps. Either of these LinuxThreads replacements would be a major improvement. I just hope the major distros roll in either package in their next release.
I bet the 1:1 package would have finer-grained context switching, though. M:N models tend to switch thread contexts only during I/O or blocking system calls. With finer-grained thread switching you tend to expose more bugs in multithreaded code, which is a very good thing. But I suppose even in an M:N model you could always set M=N to acheive similar results.
Re:How about scheduling & thread-specific stor by Juergen+Kreileder · 2002-11-09 01:35 · Score: 3, Informative

In the current version priorities only work SCHED_RR and SCHED_FIFO (both require superuser privileges), SCHED_OTHER (the default policy) doesn't support changing priorities.
Regarding thread specific data access: If your LinuxThreads library uses floating stacks (for ix86 this means it has been built with --enable-kernel=2.4 and for i686) it already will be faster.
For other TLS enhancements take a look at http://people.redhat.com/drepper/tls.pdf.
Why poll? or why M:N? by Kludge · 2002-11-09 01:37 · Score: 1

Perhaps I'm a bit ignorant here, but why poll at all? In my programs I never poll. If I ever need to wait on more than 1 I/O stream, I start a new thread that just blocks, while my other threads go about their business. The only other reason I ever create more threads is for better performance, to utilize multiple processors. Which leads me to ask, why would one need M:N? M:N doesn't help performance if you're waiting on I/O from the kernel or another process, and it doesn't help if you're trying to get a performance boost. And for what other reason would one create another thread?

Not trying to be antagonistic here, just curious.
1. Re:Why poll? or why M:N? by Anonymous Coward · 2002-11-09 02:41 · Score: 0
  
  Kludge said "And for what other reason would one create another thread?"
  
  Convenience. It's a reason, although not a good reason. Some problems are easier to solve as lots of parallel threads than as a state machine. But they almost always run faster without threads.
  
  So overall, you're right, M:N should stay the hell out of glibc and we should have the better kernel instead.
  
  Even better, Linux should have real async IO (with no threads under the hood). Because not creating a thread will always be faster than creating one, no matter how good the thread implementation is (and same for context-switching).
2. Re:Why poll? or why M:N? by GooberToo · 2002-11-09 04:55 · Score: 2, Insightful
  
  Because 1:1 implementations are well known to not scale well because of context switch overhead and synchronization overhead.
  
  For systems that don't require true high-end scalability, 1:1 works fairly well. It's because of this that M:N has some proponents.
3. Re:Why poll? or why M:N? by Anonymous Coward · 2002-11-10 06:38 · Score: 0
  
  Have no idea why this was considered over rated. Sure wish some people would learn how to properly moderate.
4. Re:Why poll? or why M:N? by Anonymous Coward · 2002-11-10 22:21 · Score: 0
  
  The POINT of the Linux implementation is that it's a new, magic, amazing 1:1 model that DOES SCALE WELL, thank's to Linux's new O(1) scheduling algorithm (which is the first real advance in scehdulng tech for about 15 years...)
5. Re:Why poll? or why M:N? by GooberToo · 2002-11-11 02:53 · Score: 2
  
  The point is, it does nothing to address context switches. Switching to another thread still requires a complete context change. Period. While the O(1) schedule allows for more timely (thus lower latency), it does nothing to prevent the actual required context change. Other thread models, given *some types* of workloads, can continue to out scale 1:1 implementations (where the inverse is also true). This is especially true when the threaded workloads exceed available hardware. The result is the same amount of work with fewer context switches. This can result in improved cached hits as well as the obvious gain of work from not having to switch process/thread context.
  
  So your POINT is actually incorrect. The correct statement is that the 2.6 kernel when using a 1:1 model will allow for greater scalability than it's previous implementation but has yet to prove it addresses all workloads better than all other threading models. In fact, it's expected that M:N will still have it's place for some types of workloads even after 2.6 is released.
  
  I personally expect that 1:1 and M:N will both become more common and that there is a time and place for both.
6. Re:Why poll? or why M:N? by GooberToo · 2002-11-11 02:57 · Score: 2
  
  which is the first real advance in scehdulng tech for about 15 years...
  
  IIRC, BSD has had a O(1) scheduler for a very long time now. Many RT OS's have also had them. This is evolutionary for Linux and is not a "real advance in scheduling tech". It is, however, a real advance on an evolutionary basis, for Linux.
Linux processes are not LWP's by Anonymous Coward · 2002-11-09 01:51 · Score: 2, Interesting

The article makes the statement
Virtually all compliance problems can be traced to the decision to use lightweight processes... New processes are created by clone(), with a shared-everything approach. While the new process is lighter due to the sharing

This is not true. Every process in Linux is a full-weight process. The fact that some of those processes may map to the same memory space does not make them in any way lighter to the other parts of the kernel. What Linux does have is a lightweight clone() system call.
Solaris is an excellent example of the differences between processes, lightweight processes (lwp's or kernel threads), and user threads.
1. Re:Linux processes are not LWP's by Serveert · 2002-11-09 08:35 · Score: 1
  
  yuo=spot on the money, wish i could mod you up. been programming for years on Solaris and it handles thousands of threads just fine.
  
  --
  2 years and no mod points. Join reddit. Because openness is good.
Linux will prevail by mithras+the+prophet · 2002-11-09 02:36 · Score: 3, Insightful

I am not well-versed in the world of Linux, ( have my own allegiances but am being drawn to it more and more. Reading the article, it felt very clear to me that Linux will prevail (with a nod to William Faulkner's Nobel speech).

Consider a few quotes from the article:

The LinuxThreads implementation of the POSIX threads standard (pthreads), originally written by Xavier Leroy

A group at IBM and Intel, led by Bill Abt at IBM, released the first version of the New Generation POSIX Threads (NGPT) library in May 2001

On March 26-27, 2002, Compaq hosted a meeting to discuss the future replacement for the LinuxThreads library. In attendance were members of the NGPT team, some employees of (then distinct) Compaq and Hewlett-Packard, and representatives of the glibc team

On September 19, 2002, Ulrich Drepper and Ingo Molnar (also of Red Hat) released an alternative to NGPT called the Native POSIX Thread Library (NPTL)

Perhaps others have already pointed this out, but I am newly impressed with the universal nature of Linux. The power of an operating system that *everyone* is interested in improving, and has the opportunity to improve, is awesome. Yes, Microsoft has tremendous resources, and very earnest, good-willed, brilliant people. But to improve Microsoft's kernels, you have to work for Microsoft. That means switching the kid's schools, moving to Redmond, etc. etc. On the other hand, everyone from IBM to HP to some kid in, say, Finland, can add a good idea to Linux. When the kernel's threads implementation is a topic for conversation at conferences, with multiple independent teams coming up with their best ideas, Linux is sure to win in the long run.

I'm struck by the parallels to my own field of scientific research: Yes, the large multinational companies have made tremendous contributions in materials science, seminconductors, and biotech. They work on the "closed-source", or perhaps "BSD" model of development. But it is the "GPL"-like process of peer-reviewed, openly shared, and collaborative academic science that has truly prevailed.

--
four nine eighteen twenty-7 thirty-nine forty-7 fiftyeight sixty-nine seventy-9 eighty-8 one-hundred-and-nine one-twenty
Linux needs WSAAsyncSelect :) by Anonymous Coward · 2002-11-09 02:54 · Score: 0

Standard select sucks in so many ways it isnt funny, Berkeley definetely did not get everything right.
Re:How about scheduling & thread-specific stor by parabyte · 2002-11-09 03:10 · Score: 3

Thank you for the hints on thread local storage; I am glad to see this has been adressed.
Regarding changing of priorities, I think that with SCHED_OTHER the priority is beeing automatically modified by the scheduler to distibute cycles in a more fair fashion.
I tried both SCHED_RR and SCHED_FIFO and changing priorities basically works, but it seemed to me that changing priorities did not have an immediate effect as required to implement priority ceiling locks.
For example, when boosting the priority of a thread to the ceiling priority, and the thread is the only one with this priority, I expect it to run without beeing preempted by anyone before the priority is lowered or the process blocks. On the other hand, when lowering the priority, I expect a higher prio thread to be executed immediately. I would also expect the order of unblocking threads is correctly adjusted when their priority was changed while suspended.
However, it seems that priority changes do not much affect the actual timeslice or the unblocking order, but I did not have the means to find out what exactly happens; using a debugger is outright impossible with fine-grained multi threaded programs.
Is it possible that some system thread needs to run inbetween to do some housekeeping ? Do you have any hints about the scheduler's inner workings ?
Thank you
p.

--
Without order, nothing can exist. Without chaos, nothing can be created.
Solaris is to blame for all this by 0x0d0a · 2002-11-09 03:42 · Score: 2

It's because of Solaris's inability to do 1:1 decently that we had Sun pushing for POSIX being M:N and consequently making the entire Linux world miserable.

1:1 is a cleaner, simpler model.

--
May we never see th
I thought it was 3.0? by MicroBerto · 2002-11-09 03:46 · Score: 2

It's not going to be 3.0?? I thought that was the decision since so many changes and additions/features are being put into this kernel..

--
Berto
1. Re:I thought it was 3.0? by WNight · 2002-11-09 05:26 · Score: 3, Informative
  
  There aren't really any incompatibilities with older code, so you don't need to go to a new kernel version like you would if you broke anything.
  
  In one of the discussions with Linus on this issue he said there was a planned change that broke something but it wouldn't be in for this version. Because that would warrant a major version change of its own, he didn't want to go from 2.5 to 3.0 then from 3.3 or something to 4.0, he'd rather go from 2.9(or so) to 3.0, and avoid the version inflation.
  
  I agree. There's no stigma in having a product numbered 1.x or 2.x, it simply means you got it right early on, without needing to break old applications too often.
2. Re:I thought it was 3.0? by AxelTorvalds · 2002-11-09 10:38 · Score: 2
  
  I probably will be 3.0. It's a pretty minor detail really. The tentative plan is 2.6 but it will most likely be a 3.0 release. There are huge differences between it an 2.0 and substantial ones between it and 2.4.
  FWIW, it's looking to be a hell of a kernel.
Learn to mod by Anonymous Coward · 2002-11-09 04:25 · Score: 0

Why do slashdot moderators keep modding things as "informative" and "insightful" without even checking if they're true??
Compare to Solaris evolution by dbrower · 2002-11-09 06:35 · Score: 5, Insightful

For a long time, Sun used M:N threading, and many people thought this was a good idea. They have recently changed their minds, and been moving towards 1:1.
The change in thinking for this is argued in this Sun Whitepaper , and this FAQ .
If one believes the Sun guys have a clue, you can take this as a vote in favor of 1:1.
IMO, anyone who runs more than about 4*NCPUS threads in a program is an idiot; the benchmarks on 10^5 threads are absurd and irrelevant.
Once you run a reasonable number of threads, you can be quickly driven to internal queueing of work from thread to thread; and by the time you have done that, you may already have reached a point of state abstraction that lets you run event driven in a very small number of threads, approaching NCPUs as the lower useful limit. Putting all your state in per-thread storage or on the thread stack is a sign of weak state abstraction.
-dB

--
"It if was easy to do, we'd find someone cheaper than you to do it."
1. Re:Compare to Solaris evolution by be-fan · 2002-11-09 08:15 · Score: 3, Insightful
  
  IMO, anyone who runs more than about 4*NCPUS threads in a program is an idiot; the benchmarks on 10^5 threads are absurd and irrelevant.
  >>>>>>>>>
  Typical *NIX developer. Threads are useful for two things:
  
  1) Keeping CPUs busy. This is where the whole NCPU business comes from.
  2) Keeping the program responsive. *NIX developers, with their fear of user-interactive applications, seem to ignore this point. If an external event (be it a mouse click or network connection) needs the attention of the program, the program should respond *immediatly* to that request. Now, you can achieve this either by breaking up your compute thread into pieces, checking for pending requests after a specific amount of time, or you can just let the OS handle it. The OS is going to be interrupting your program very 1-10 ms anyway (timer interrupt) and with a good scheduler, it's trivial for it to check to see if another thread has become ready to run. The second model is far cleaner than the first. A thread becomes a single process that does a single specific task. No internal queueing of work is necessary, and threads split up according to logical abstractions (different tasks that need to be done) instead of physical ones (different CPUs that need to be kept busy).
  
  --
  A deep unwavering belief is a sure sign you're missing something...
2. Re:Compare to Solaris evolution by ProtonMotiveForce · 2002-11-09 08:39 · Score: 1
  
  The 4*NCPUS remark has no bearing in reality.
  
  A lot of development models require threads that usually just sit there, but will occasionally wake up and act. Say you have a UI that needs updating, a logfile that needs written to, a few network sockets that need listening to, etc... There's no reason not to use as many threads as you need.
  
  The ease in programming outwieghs the nonexistent issues in performance because most of your threads aren't doing anything most of the time anyway.
  
  Now, if you're talking about hardcore computation in multiple threads, then obviously the fewer threads the better.
3. Re:Compare to Solaris evolution by dbrower · 2002-11-09 08:39 · Score: 3, Insightful
  
  I'm perfectly happy devoting a whole thread to UI events to get responsiveness. I shouldn't need 100 of them behind the scenes doing the real work if I only have 1 or 4 cpus.
  If your application design calls for 100 concurrently operating threads, there is something broken about the decomposition.
  -dB
  
  --
  "It if was easy to do, we'd find someone cheaper than you to do it."
4. Re:Compare to Solaris evolution by ProtonMotiveForce · 2002-11-09 08:43 · Score: 1
  
  100 threads - maybe. 10 threads on a single CPU, nonsense. And the number of CPU's has no bearing. Threads in this situation are just a way to increase responsiveness and to ease development effort. Writing complex event loops is silly unless you're doing intense numerical computations or something else where the CPUs are going to be nearing 100% load.
5. Re:Compare to Solaris evolution by dbrower · 2002-11-09 08:47 · Score: 2
  
  Yes, I'm overstating the position, but in my experience, using scads of threads is usually seen to be a mistake by those who have gone far down the road. If you have a thousand connections, are you better off with 1000 threads, or one/a few in select, and ~4*NCPU workers?
  This isssue is thinking through the scalability implications of using lots of threads. What the application ends up doing is relying on the thread scheduler to make proper policy decisions. If the appliation maintains it's own work queue(s) internally, it is in far better position to make correct decisions. Yes it's more work, but it works better.
  -dB
  
  --
  "It if was easy to do, we'd find someone cheaper than you to do it."
6. Re:Compare to Solaris evolution by dbrower · 2002-11-09 08:50 · Score: 2
  
  If you're not designing for the cases where the system CPU is being heavily utilized, it doesn't make much difference what you do. It's a multitasking world-- you don't know how much is being chewed up by the mpeg decoders playing/scaling streams to the little windows underneath your gui application.
  -dB
  
  --
  "It if was easy to do, we'd find someone cheaper than you to do it."
7. Re:Compare to Solaris evolution by Salamander · 2002-11-09 10:25 · Score: 2
  
  anyone who runs more than about 4*NCPUS threads in a program is an idiot
  
  I'd look at it a different way, and count likely-to-be-active threads. If that number's much greater than NCPUS you're probably hurting yourself, but threads that are certain or almost certain to be sleeping (e.g. periodic tasks that are between invocations) don't really count. I discuss this issue more in an article on my website about avoiding context switches (part of a larger article about server design) if you're interested.
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
8. Re:Compare to Solaris evolution by Salamander · 2002-11-09 10:29 · Score: 2
  
  If you have a thousand connections, are you better off with 1000 threads, or one/a few in select, and ~4*NCPU workers?
  
  Neither. ;-) Listener/worker separation is a fundamentally flawed model; try symmetric multithreading instead.
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
9. Re:Compare to Solaris evolution by dbrower · 2002-11-13 08:00 · Score: 1
  
  IMO, anyone who runs more than about 4*NCPUS threads in a program is an idiot; the benchmarks on 10^5 threads are absurd and irrelevant.
  >>>>>>>>> Typical *NIX developer.
  
  Then why is Microsoft pushing Fibres/Fibers instead of their kernel threads if you want to do lots of things fast? Why are they saying, "don't have too many more threads than you have cpus?"
  -dB
  
  --
  "It if was easy to do, we'd find someone cheaper than you to do it."
10. Re:Compare to Solaris evolution by dbrower · 2002-11-13 08:09 · Score: 2
  
  The linked article is pretty reasonable. I'd quibble about the active thread count thing, though; I think at the limits it's bad to abstract into something that will establish scads of threads even if they aren't all active at the same time. For instance, say you did do a thread-per-connection, but made it be the 'reading' thread only, so it could sit in read() or recv(). Whenever it got a message, it could queue it to a 'worker' thread that would consume cpu and send a response. For a zillion connections, this is a zillion stacks, which doesn't scale well. Probably better to i/o multiplex some fraction of connections into a smaller number of reader threads, and queue; or using the articles' suggestion, have N generic threads, which are either doing demux reading or cpu intensive work. In either of these events, you are breaking the binding of connection/client state to a thread, which is the important state abstraction to do. Having broken clean, any number of possible implmenentations are easy to try. Without it, you are kind of stuck.
  -dB
  
  --
  "It if was easy to do, we'd find someone cheaper than you to do it."
Cleaniness and simplity by Anonymous Coward · 2002-11-09 06:40 · Score: 0

1:1 is a cleaner, simpler model.

Except for the cases when M:N is warranted.
1. Re:Cleaniness and simplity by Anonymous Coward · 2002-11-09 08:43 · Score: 0
  
  So use 1:1 and make your own polling for those cases. Or do you trust that the threading library is sufficiently intelligent to sort the threads for you? If you have to do anything manual, you're just doing the polling in a 1:1 model but without the overhead of writing the polling code again.
Re:I already have multithreading... by morgajel · 2002-11-09 06:45 · Score: 1

that's ok, I have a beowulf cluster...
now THAT's multithreading!

--
Looking for Book Reviews? Check out Literary Escapism.
how to tell if processes are threads? by treat · 2002-11-09 06:50 · Score: 2

Nice, I look forward to the improvemenets. One big problem is that I can not find a way to determine that two processes are actually threads of the same process. It is possible to guess, of course, but is there a way to conclusively determine for sure, while we wait for these improvements?
1. Re:how to tell if processes are threads? by Anonymous Coward · 2002-11-09 18:20 · Score: 0
  
  As far as I know, the upper 16 bits of the pid is the true process ID and the lower 16 bits is the thread ID. Linux is completely non-standard with respect to POSIX threading in this regard.
Re:How about scheduling & thread-specific stor by Anonymous Coward · 2002-11-09 09:00 · Score: 0

Really? Wow... this is kind of scary. My OS class (Intro level) required implementing priority threading and priority donation on the very first assignment. I don't mean to be trolling, but doesn't the absence of this make the Linux kernel kind of ... archaic? Or am I completely mis-interpretting your comment?
QUIT MODDING THIS GUY UP, DUMB ASS MODERATORS by Anonymous Coward · 2002-11-09 09:11 · Score: 0

Don't you fucking moderators read anything???
SexyKellyOsbourne is nothing but a fucking TROLL. Read her journal entries. He'll take a story and make it appear somewhat legitimate, and since no one checks their facts around here, SKO will get modded way up for passing off what appears to be an insightful/informative posting. Get A FUCKING CLUE. Look at her journal entries. He ADMITS it.
Re:How about scheduling & thread-specific stor by Anonymous Coward · 2002-11-09 10:06 · Score: 0

Ragarding increasing the priority: yes, if you max the priority of one thread it should prevent any other threads of lower priority running. Anything else is a bug in the thread lib AFAICT. Same goes for threads which get a priority change while they are blocked. If there are multiple threads of the same priority, POSIX doesn't specify which thread runs first (it doesn't have to be fair, and it isn't necessarily FIFO, some pthreads implementations actually use LIFO queues). But my impression is that the priority change should affect the ordering, even if the change is done while the thread is blocked.

But, when lowering priority of a running thread, I would NOT expect it to be immediately pre-empted if there is a higher priority thread waiting, but rather just complete its time slice as normal. If you wanted preempt behaviour, you can force it with sched_yield().

I believe that the NPTL no longer needs a service thread running (although I may be mistaken).
Re:So then... by Anonymous Coward · 2002-11-09 11:45 · Score: 0

you know what is really smooth as silk? Silk! It's the only thing as smooth as silk. Also, hot wet pussy. I'd rather stick my cock in hot wet pussy than some cloth made from worm shit.
Re:I already have multithreading... by Anonymous Coward · 2002-11-09 11:50 · Score: 0

might I be so bold as to suggest the addition of head and tail?
Re:WRONG!!! Half-Life was based, mostly, off of Qu by tensai · 2002-11-09 14:23 · Score: 1

And yes, people from Valve have confirmed the base was Quake1, not (as some people continue to claim, and I really wish I knew where the rumor started) Quake2.

I read a article I think on Gamespot about that. Valve got the Quake2 source from id and considered moving to that to get all those extras, but it would have taken longer to move away from the Quake codebase and they were already behind schedule (who isn't?). But they did integrate pieces of Quake 2 into HL, and that along with HL releasing after Quake 2 probably led to the incorrect assumption.
Re:GNU/HURD by Anonymous Coward · 2002-11-09 14:46 · Score: 1, Funny

Just like Communism, GNU/HURD will never take off.
Please don't compare GNU/HURD to communism. They are similar in that both sound good in practice, but have problems in the real wordld. However, the similarity ends there. Communism was responsible for loss of life and liberty for millions of people killed and improsoned, Hurd, However, is released under GPL, and is therefore even worse.
Re:How about scheduling & thread-specific stor by Juergen+Kreileder · 2002-11-10 00:27 · Score: 1

Really? Wow... this is kind of scary. My OS class (Intro level) required implementing priority threading and priority donation on the very first assignment. I don't mean to be trolling, but doesn't the absence of this make the Linux kernel kind of ... archaic? Or am I completely mis-interpretting your comment?
Of course Linux supports priorities. We're talking about LinuxThreads, a (not fully compliant) POSIX thread implementation.
In the current LinuxThreads implementation setting priorities for threads doesn't make much sense for the default scheduling policy. So changing a thread's priority isn't supported for this policy.
LinuxThreads also supports two realtime scheduling policies (round robin and FIFO). With these policies priorties have a clearly defined meaning and are supported.
Re:How about scheduling & thread-specific stor by Anonymous Coward · 2002-11-10 17:01 · Score: 0

So LinuxThreads has its own internal thread scheduling system which doesn't correctly implement priorities (beyond comparatively simple real-time). Not even a kernel issue - but an issue getting a kernel-level construct to work in userspace.
Whew - I feel much better now. Thanks!