Slashdot Mirror


Running 100,000 Parallel Threads

An anonymous reader writes "This story explains how the latest Linux development kernel is now able to start and stop over 100,000 threads in parallel in only 2 seconds (about 14 minutes 58 seconds faster than with earlier Linux kernels)! Much of this impressive work is thanks to Ingo Molnar, author of the O(1) scheduler recently merged with the 2.5 Linux development kernel."

147 of 387 comments (clear)

  1. Hold this thread while I walk away by DoctorHibbert · · Score: 3, Funny

    The linux song

    --
    Arbitrary sig
  2. Re:Posix thread... by Wolfier · · Score: 5, Informative

    Your answer:

    http://www.cs.wustl.edu/~schmidt/ACE.html

    This is so far the best library I have used for pthread programming. Powerful, easy to use, and encapsulates message passing really well...

  3. 100,000 Linux threads by Anonymous Coward · · Score: 5, Funny
    1. Re:100,000 Linux threads by notanatheist · · Score: 2, Funny

      are those M$ employees looking for code?

    2. Re:100,000 Linux threads by Citizen+of+Earth · · Score: 2

      this image springs to mind

      Is that red splatter on the ground the remains of Bill Gates?

  4. Win ME Kicks that sorry statistic!!!! by SlimFastForYou · · Score: 4, Funny

    It takes two seconds to start 100,000 threads???? Piff! With my ME computer, It doesn't matter how many parallel threads I am running... I can stop them all instantly by simply attempting to use my computer :P.

    1. Re:Win ME Kicks that sorry statistic!!!! by CoolVibe · · Score: 4, Funny
      Pff... I can start a million threads on my FreeBSD box and stop them all in an instant...

      ...by hitting the reset button.

  5. I'm only a humble C programmer, but.... by cdrobbins · · Score: 4, Interesting

    And this is great news, and, indeed, impressive. But my question is, what (if any) change is this going to make to my daily use of linux (for gcc, reading slashdot, and that's about it...) Am I going to notice any performance differences?

    1. Re:I'm only a humble C programmer, but.... by SlimFastForYou · · Score: 5, Funny

      Just wait until Spyware For Linux(TM) comes out... With Bonzai Buddy For Linux(TM), Real Center For Linux(TM), XMMS Agent(TM), Linux Messenger(TM), Linux Update(TM), and FindFast for OpenOffice.org(TM). Then you will know why 100,000 parallel threads in two seconds is a good thing :P.

    2. Re:I'm only a humble C programmer, but.... by mattdm · · Score: 3, Insightful

      Java likes to run many threads very cavalierly, so it's likely to help there somewhat.

    3. Re:I'm only a humble C programmer, but.... by bm_luethke · · Score: 5, Informative

      probably none. On the other hand the field I work in (high performance computing) this will be a great help. Currently we are running a 500,000 processor simulation on a four node cluster, startup and running both is a pain. Remeber, on of the great things about linux is some of the neat/usefull applications being ran on it (human genome, nuclear simulations, fluid simulations). Windows is a toy and geared toward "normal" users (read very few threads not processor intensive). Linux is more of a workhorse (many threads, computationally expensive, and high uptimes). While there are exceptions to this look at advances such as this in that light. And finally, just because you won't use it compiling a kernel doesn't mean it's not needed.

      --
      ------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
    4. Re:I'm only a humble C programmer, but.... by Citizen+of+Earth · · Score: 2

      But my question is, what (if any) change is this going to make to my daily use of linux... Am I going to notice any performance differences?

      My question is why does the multithreading in Mozilla suck so badly on Linux and will this help it?

    5. Re:I'm only a humble C programmer, but.... by Subcarrier · · Score: 2

      But my question is, what (if any) change is this going to make to my daily use of linux...?

      Well, for one thing, you're now going to have to start typing a helluva lot faster. The machine is not going to slow you down. ;-)

      In truth, this is great news for those running servers but you probably won't notice much of a difference on a desktop, barring a few really thread heavy applications. UML (User Mode Linux) is one notorious example.

      --
      "I have opinions of my own, strong opinions, but I don't always agree with them." -- George H. W. Bush
    6. Re:I'm only a humble C programmer, but.... by Chops · · Score: 2

      The performance improvement won't mean much, but the POSIXization of the thread library might make a difference. Linux's thread support has up till now been pretty kludgy (signal handlers per-thread instead of per-process, wrong coredumps, etc.), and that made things like debugging threaded programs difficult; you may have run into this with gdb or whatever. Now that the Right Things have been coded in all over the map (kernel/libc/gcc/etc), we can drop the kludge and start doing it right.

    7. Re:I'm only a humble C programmer, but.... by cduffy · · Score: 5, Informative

      KDE actively discourages threads. Perhaps that will change now. Likewise servers, such as apache, will speed up.

      I'm not so sure about that.

      A threaded model doesn't necessarily offer advantages -- Apache's multiprocess model is really just as good on platforms without serious performance penalties on fork(), and Boa (which neither forks nor threads) is much, much faster than either Apache mode (though of course on SMP systems multiple instances must be run to use all the available CPUs).

      Indeed, unless SMP is being taken advantage of, a well-written single-threaded application will always be faster than an equivalent multithreaded application. Such an application has less overhead and is able to jump between its "subprocesses" only when needed -- and without the latencies involved by letting the OS handle said scheduling. Back in the Real World, I still write threaded code -- but because writing unthreaded code (in the problem spaces where threads are useful) is harder, not because it's faster.

    8. Re:I'm only a humble C programmer, but.... by joib · · Score: 2

      But then again, in the Real World (TM), different processes/threads often need to communicate with each other (for ex. scientific applications), or save memory by sharing stuff like script interpreters, db connections etc. (for eg. web servers).

    9. Re:I'm only a humble C programmer, but.... by cduffy · · Score: 3, Insightful

      Yes, it still (like everything) depends on your application.

      That said, though, sharing (and putting locks around) your DB connections or script interpreters is an easy way to lose performance and introduce potential deadlocks (or other hard-to-track, hard-to-reproduce bugs due to bad shared state) as opposed to having each process able to operate completely independantly from the others. Shared state is a Good Thing when it's genuinely needed -- but should be avoided when it's not.

      I'm not saying -- and I've never tried to say -- that threading is worthless; I just object to people who take the position that making an application multithreaded will necessarily make it faster.

    10. Re:I'm only a humble C programmer, but.... by ajs · · Score: 2

      You are correct. To state that in more specific terms, threads are a "big hammer" that can be applied to the need to manage multiple resources at once. In my experience (on general purpose hardware) you can always optimize that resource management better in a single process than the kernel can by performing lightweight context switching.

    11. Re:I'm only a humble C programmer, but.... by Fjord · · Score: 2

      I'd have agreed with you yesterday, but these improvements could change that. Something to consider. Kind of like the time (back in 98, I think) I realized my application ran faster in floating point than in fixed point, changes to the infrastructure can change the way you approach problems.

      --
      -no broken link
    12. Re:I'm only a humble C programmer, but.... by diablovision · · Score: 2, Insightful

      "Indeed, unless SMP is being taken advantage of, a well-written single-threaded application will always be faster than an equivalent multithreaded application."

      Two words: Blocking IO. You are correct that multithreading imposes an inevitable overhead on CPU intensive tasks running on a single processor machine, but most applications are not processor bound. The fact is that almost all applications that do anything besides scientific work have large portions of their execution times used by blocking on IO. Multiple threads allow the time spent waiting on IO in one thread to be spent doing something else "useful" in another thread--provided your OS supports native threads (if not, one thread can block an entire process).

      "...but because writing unthreaded code (in the problem spaces where threads are useful) is harder, not because it's faster."

      Isn't this almost a tautology? Restating: "In the areas where threads make things easier, it is easier to use threads than to not use threads."

      --
      120 characters isn't enough to explain it.
    13. Re:I'm only a humble C programmer, but.... by pclminion · · Score: 2
      KDE actively discourages threads. Perhaps that will change now.

      I'm only guessing, but the reason KDE discourages threads is probably because it's a real bitch to write a truly thread-safe library, and they don't want to fuck with it.

      In other words, it probably isn't because of performance.

      If there are any core KDE developers reading, please correct me if I'm wrong.

    14. Re:I'm only a humble C programmer, but.... by Jay+L · · Score: 2

      Two words: Blocking IO.

      Right. If you are going to use a single-threaded process, you must use non-blocking I/O.

    15. Re:I'm only a humble C programmer, but.... by cduffy · · Score: 2

      Two words: Blocking IO.

      Three words: Non-blocking IO. The APIs exist (such as the POSIX AIO standard) -- they're just rarely used.

      Isn't this almost a tautology? Restating: "In the areas where threads make things easier, it is easier to use threads than to not use threads."

      "Useful" and "easier" aren't quite the same.

      There are places where threads are useful -- because the application demands it -- and places where threads are easier -- because the programmer doesn't have the time, will, knowledge or need to define a nonthreaded solution even if such a solution would be more efficient.

    16. Re:I'm only a humble C programmer, but.... by ajs · · Score: 2

      changes to the infrastructure can change the way you approach problems.

      Yes, certainly. However, thread management has more headaches than can be hidden by the kernel and libraries easily. At least some of that overhead evidences in your userspace program. In many cases when people use threads, they're really just not thinking about their application. That's fine if performance is not a strong concern, but when it is (as evidenced by recent work in the Web server arena), threads should just be a tool in your box, along with many other techniques.

    17. Re:I'm only a humble C programmer, but.... by cduffy · · Score: 2

      a well-written single-threaded application that never makes a blocking system call (such as I/O) while there is useful work to be done will always be faster than an equivalent multithreaded application

      Since when did I suggest the use of blocking I/O? Any single-threaded application of this sort needs to use non-blocking I/O as a matter of course. See POSIX AIO for one non-blocking I/O API.

      Finally... you mention that error states are possible in threaded applications, and can be exercised by appropriately written code ("tests" which cripple a multithreaded application). Personally, having a number of almost impossible to find deadlocks hidden away in my system until some end-user chances upon one makes me very, very nervous.

    18. Re:I'm only a humble C programmer, but.... by Jay+L · · Score: 2

      Not as easy as you think.

      Well, I think it's not very easy, so it IS as easy as I think. It might not be as easy as someone else thinks, though. But it gets easier with practice and good libraries.

      At AOL, nearly all of our servers were single-threaded, based on a standard kernel of state-managing, event-calling support functions. We had non-blocking sockets, non-blocking database I/O, non-blocking DNS, really everything with the possible exception of local disk I/O (which is rarely necessary on a production server, and which can be solved with a cluster of "worker" processes doing the actual I/O).

      I worked on the mail system, so that's the part I know best. The difference in performance between sendmail (which forks multiple processes) and our own mail server (which ran single-threaded) was nothing short of astounding. Our days-late delivery problems disappeared almost instantly. This was on Suns and HPs; granted, we're now talking about forked processes rather than threads, and I don't know how the Sun and HP schedulers compare to Linux's, but the main point is that it's possible to write such a complex app single-threaded. I think the only significant thread-based app at AOL is AOLServer, which was developed independently.

      Writing single-threaded servers certainly takes skill and experience. Given the state-management problem, I had assumed that it was also more error-prone than writing threaded code, but from what I am learning about thread-safety, that may not be the case. But, no matter how little overhead you have doing a context switch, you have even less without it. At some performance level, that matters.

    19. Re:I'm only a humble C programmer, but.... by pthisis · · Score: 2

      Not as easy as you think

      But far easier than multithreaded programming.

      Threads are a way of saying "screw protected memory". They should be used only when you don't want memory protection within your application. Almost always, using threads is the wrong choice; multiple processes and/or a state machine with non-blocking I/O(depending on the problem) will accomplish the same ends as efficiently* and are much easier to implement**

      *Remember that processes (COEs which don't share memory) are nearly as fast as threads in Linux, and faster in some cases. On other OSes (Irix, Solaris, Windows), processes are inefficient and threads are implemented more efficiently. That's a horrible hack to make up for ridiculously heavyweight processes; it's true that a small number of things can be optimized in a thread implementation (setting up VM mappings), but the actual speed implications of that are negligible in real-life programs. I'd be extremely surprised if you could find even one program exhibiting a measurable speed difference in Linux attributable to the scheduling, creation, and destruction properties of threads.

      **Threaded solutions often seem straightforward. The devil is in the details, though; locking, synchronization, and debugging issues tend to bite hard, and in the end I've never dealt with a problem where threading was a win over multiprocs and/or a state machine. The advantage of multiprocesses is not only in keeping memory protection; it also forces you to be explicit about what's shared and how that is communicated (and greatly simplifies debugging). Resulting designs tend to be much clearer and easier to make correct and maintain.

      Sumner

      --
      rage, rage against the dying of the light
    20. Re:I'm only a humble C programmer, but.... by pthisis · · Score: 3, Insightful

      I think you are writing as a person who has never had to use either.

      I have written a dynamic content server that over the past 2 years has served over 6 billion requests, with 5 9's of uptime. I've written several realtime instrument control applications. I've written a distributed text mining application that does index-assisted regex searches of 1/2 terabyte of data in Threads can really be life savers when used correctly. Sure you have to implement locking but that's what pthread_mutex is for.

      On low-mem devices making full copies of the process to spawn copies is just insane.

      1) Look up COW and memory sharing.
      2) I never said "use only processes". A combination of processes and event loops is the way to go 99% of the time. There are some corner cases where threads are useful, but they tend to be abused by people who think "threads are good" without considering the alternatives nor the ramifications of that choice.

      And on windows the Thread implementation is *intentional* not accidental. The idea is that people using threads will take advantage of the speed increase.

      It's not a speed increase. Thread switching and thread creation on Windows are slower than process creation and process switching on Linux. On a par, but slower. Process creation on Windows is laughably slow, though, and process switching is substantially slower than thread switching.

      It's not that Windows figured out how to make their threads go fast, it's that their processes were dog-slow and they had to create an entirely seperate execution primitive to get any sort of reasonable concurrency. Linux did things the right way by making them both fast, and now allows you to choose between the two for _design_ reasons (do I want to share memory?) rather than artificial implementation reasons.

      You'll find a lot of knowledgeable people (Larry McVoy, former SGI kernel architect) who echo the same belief: use threads sparingly. Use as many threads as you have CPUs, and use processes instead if that makes more sense. Use more threads than that only if you're intimately familiar with the alternatives and know why they don't work, because while a state machine with non-blocking I/O may seem hard at first glance it'll almost certainly turn out to be easier to implement correctly, easier to debug, faster, and easier to maintain.

      Sumner

      --
      rage, rage against the dying of the light
  6. If you want to destroy my boxen. . . by endeitzslash · · Score: 3, Funny

    Launch 100,000 threads while I walk away. . .

    OK I'll shut up now.

  7. Parallelism by inkfox · · Score: 5, Interesting

    This is very cool; but does it scale to multiple CPU systems? More and more, SMP, split-bus and multi-core architectures are going to be taking over. If this holds up in those environments, Linux may actually have a leg up on some of the dedicated task heavyweights.

    --
    Says the RIAA: When you EQ, you're stealing bass!
    1. Re:Parallelism by Anonymous Coward · · Score: 2, Interesting
      I believe it said in the article/discussion that they were using a dual p4 for testing. That would imply that scaling isn't a problem

      Many algorithms work great for one extra processor but fail miserably with more.

      In most cases, you can just busy wait on a semaphore with two CPUs and never notice the hit. 8, 32 or 512 CPUs and you're going to throw away most of your processing time.

    2. Re:Parallelism by _Knots · · Score: 2

      I've been following LKML, not that I can contribute much, but still. Most of the scheduling work, if memory serves, is tested on large-way boxes (the number 32 leaps to mind).

      You are encouraged to read the list for yourself because it's early in the morning and my brain might be playing tricks on me.

      --Knots;

      --
      Anarchy$ dd if=/dev/random of=~/.signature bs=120 count=1
    3. Re:Parallelism by cheese_wallet · · Score: 2

      My question to this is, I didn't think that the P4 could do SMP?

      The p4 xeons can.

    4. Re:Parallelism by NerveGas · · Score: 2

      Yeah, I wish that SMP was taking over. We're not really that much farther in that regard than we were 5 or 6 years ago, with the Pentium Pro. In fact, in some regards, we're WORSE off: Since the PPro, all Intel chips had SMP capabilities, even the "non-SMP capable" celerons. However, now the norm is for their chips to NOT have SMP capability.

      Yes, the Itanium and upcoming SledgeHammer are going to change things. But we've been hearing that for a decade. We'll see if things REALLY change or not.

      steve

      --
      Oh, you're not stuck, you're just unable to let go of the onion rings.
  8. Great news! by zensonic · · Score: 2, Funny

    So now I'm able to open up 100.000 pr0n pictures in just 2 sec. Ubercool ;-)

    --
    Thomas S. Iversen
    1. Re:Great news! by Bishop923 · · Score: 2

      Only problem would be getting an HDD to transfer at 25.6 GB/s (assuming each pic was 500k)
      Now THAT would be impressive. :-)

  9. Re:Not 100,000 threads in parallel, just 50. by vvikram · · Score: 5, Informative


    Yeah right. And modded to "Informative"? Slashdot moderators are the _pits_.

    Read ingo's reply to Linus. They _did_ start
    one test serially and also _parallelly_ . In short he says that its possible.

    vv

  10. Re:Not 100,000 threads in parallel, just 50. by the_quark · · Score: 2

    This could be huge for things like webservers, though, which spend a lot of their time kicking off new (logical) processes. As I understand it, on Linux, a big part of the reason Apache 2.0 hasn't taken off (aside from lack of availability of major packages) is that Apache 2.0's main win is in threading support. Under Linux, thread creation hasn't been much faster than process creation, because process creation was so dang fast.

    So, am I right in thinking this means threading (and hence Apache 2.0) will be a big win for Linux web servers, now?

  11. Re:Not 100,000 threads in parallel, just 50. by mikec · · Score: 2

    A later post pointed out that Linus was wrong. They actually did both tests: one test created and destroyed threads as fast as possible; the other created 100K threads first and then killed them all.

  12. Re:Not 100,000 threads in parallel, just 50. by DoctorHibbert · · Score: 2, Insightful

    True, however the feat is still quite impressive. By making the creation and destruction of threads cheaper, it frees developers from having to worry so much about the overall system impact when spawning threads.

    For instance, because of the expense many applications use thread pools, which is simply a bunch of idle threads that sit around doing nothing, waiting for work to do. These idle threads still take up system resources even though there not actually using CPU. Not to mention the extra work the developers have do to make the thread pools work for there applications.

    --
    Arbitrary sig
  13. Re:Not 100,000 threads in parallel, just 50. by grytpype · · Score: 2
    I'm afraid YOU didn't read the article very carefully, Ingo replied as follows to Linus's post:
    actually, that was Ulrich's other test, which tests the serial starting of 100,000 threads. the test i did started up 100,000 concurrent threads which shot up the load-average to a couple of thousands. [the default timeslice the parent has is enough to start more than 50,000 parallel threads a pop or so.]
    --

    - Have a picture

  14. Sounds cool, but all I could think of... by Geek+Tragedy · · Score: 5, Funny

    "Hello, my name is Ingo Molnar. You killed -9 my process: prepare to die."

    Sorry, had to :P

    1. Re:Sounds cool, but all I could think of... by unsinged+int · · Score: 5, Funny

      I think it's more commmonly this:

      "Hello, my name is Ingo Molnar. You kill -9 my parent process. Prepare to vi."

  15. Re:Not 100,000 threads in parallel, just 50. by kinnunen · · Score: 5, Informative
    Read Ingo's posts too:
    actually, that was Ulrich's other test, which tests the serial starting of 100,000 threads. the test i did started up 100,000 concurrent threads which shot up the load-average to a couple of thousands. [the default timeslice the parent has is enough to start more than 50,000 parallel threads a pop or so.]
    And another one:
    Anton tested 1 million concurrent threads on one of his bigger PowerPC boxes, which started up in around 30 seconds. I think he saw a load average of around 200 thousand. [ie. the runqueue was probably a few hundred thousand entries long at times.]
  16. NOOO!!!!! by Monkelectric · · Score: 3, Funny

    At school (before I graduated so long ago) we would "fork bomb" the compute servers [ while(1) do { fork(); } ] in an attempt to extend deadlines or simply be assholes :)

    --

    Religion is a gateway psychosis. -- Dave Foley

    1. Re:NOOO!!!!! by powerlord · · Score: 2

      Hehehe I had a classmate do that accidentally.

      One of our final projects was to impliment our own shell. This would of course necessitate a fork() command... he hadn't checked conditions quite right and managed to use up all the resources for his account. Fortunately someone had set the Ultrix (Unix on VAX) system up with a little intelligence. He only bombed his own account and had to get the Prof. to go in and kill the out of control Shell :)

      I on the other hand merely got half-baked tokenizing. Great teacher (pity the disbanded the Comp-Sci department around us).

      --
      This space for rent. All reasonable inquiries will be entertained at proprietors discretion.
    2. Re:NOOO!!!!! by ez76 · · Score: 5, Funny

      I am replying pre-emptively to dissuade the AC's who would otherwise reply to you and point out that your post should not have been modded funny because this innovation would not prevent fork() bombing because it involves spawning threads and not processes.

      I am further replying pre-emptively to dissuade the AC's who would otherwise reply to me and point out my egregious abuse of run-on sentences.

      I am further replying pre-emptively to dissuade the AC's who would otherwise reply to me and point out my egregious abuse of +1 bonus.

      I am further replying pre-emptively to dissuade the AC's who would mod this post down as off-topic because they do not get the parallel allusion to fork-bombing.

    3. Re:NOOO!!!!! by AJWM · · Score: 3, Funny

      I did something like that back in my school days on a dual-CPU Burroughs B6700, but with a twist: Each process forked itself twice, then waited. When it received a signal about a child process being killed, it spawned two more. I had a sleep of a few seconds or so in there so it didn't grow too fast.

      The fun part of that was when the system operators saw the processes replicating like crazy and started to kill them, that made it worse.

      Another fun trick with that machine was to set up a circularly-linked list and invoke the LLLU (linked list lookup) instruction on it...

      (Yeah, stupid things to do. At least I only did them during relatively quiet times.)

      --
      -- Alastair
    4. Re:NOOO!!!!! by inio · · Score: 5, Funny

      Dude, you seriously need to look into writing patents.

    5. Re:NOOO!!!!! by defile · · Score: 2

      For any admins subjected to such clever users, the correct way to handle a forkbomb is to first send STOP to all processes, which will prevent them from replacing the siblings which you kill, then you break out the nine.

  17. Windows by jeffbru · · Score: 3, Interesting

    Just out of curiousity, how does the benchmark in windows compare?

    --
    - Jeff Brubaker
    1. Re:Windows by CoolVibe · · Score: 2, Troll
      Oh, I'll bet Microsoft could rig an system without the graphics, network, most driver subsystems and the GUI stuff to skimp on overhead and winge their way to a higher number of parallel threads in less time.

      Or they could just blatantly pay some other company that does "independant testing" *cough*mindcraft*cough to lie about it :)

    2. Re:Windows by Courageous · · Score: 2, Insightful

      It is *impossible* to even allocate more than about 31,000 threads under windows on 32 bit machines. You simply CAN'T do it. The minimum thread stack size is 1 64KB page. You an only address 2GB of memory on a 32 bit windows OS. Do the math.

      C//

    3. Re:Windows by bmajik · · Score: 2

      Two nitpicks:
      you can address 3GB with the /3GB switch :)
      you can address significantly more with AWE/PAE, but i dont know that you can use that additional memory for thread stacks.

      Just FYI, Yesterday i had SQL server 2k running with 1914 threads ( in AWE mode)

      --
      My opinions are my own, and do not necessarily represent those of my employer.
    4. Re:Windows by Courageous · · Score: 2

      Well, that may be the case. I was making more of a reference to the reserve memory lower limit on a thread's stack size.

      C//

  18. Re:Not 100,000 threads in parallel, just 50. by ergo98 · · Score: 3, Insightful

    Under Linux, thread creation hasn't been much faster than process creation, because process creation was so dang fast.

    That's called "making lemonade out of lemons". Clearly this test has shown that thread creation in Linux was horribly broken, not the flip side that process creation was so wonderfully good.

  19. Real World Example by robinjo · · Score: 2

    I'm building a project where there will be one huge database with up to 200 different companies connected to it pretty much nonstop. 1-10 users from every company depending on the time of the year. 2 threads for every connection.

    200*10*2=4000 threads.

    1. Re:Real World Example by khuber · · Score: 2
      Why would you use 2 threads per connection instead of the more common select() + worker thread pool? That doesn't seem like a scalable design.

      -Kevin

    2. Re:Real World Example by flux · · Score: 2, Interesting

      What is the fundamental reason select/poll should be that much faster anyway? Well, you win the context switch-times, if you can handle many clients in a tick. But on the other hand it does affect the way you need to design the code, and doing some stuff that neveer stalls withouot threads might be tricky.

      Just imagine a situation where a thread might need to calculate something, or initialize a big array. Now, if it's run under a select-loop, you need to do that in parts to avoid starving the server. With threads, you just do the trick and don't care about the rest of the world which keeps serving the clinets, no matter how long youo stay in the functino.

    3. Re:Real World Example by khuber · · Score: 2, Informative
      I was talking about a hybrid design, not pure select. Of course you are right about the limitations of pure select. Thread per client starts to bog as the number of simultaneous clients increases.

      It's not practical to serve hundreds/thousands of clients with a thread per client model. A typical machine can't handle the load well because it has limited resources. It will thrash. By having a thread pool you place a limit (throttle if you will) on resource utilization. Most high performance, highly scalable web and app servers use this model or a variant.

      There is another architecture based on event driven state machines aka SPED (single process event driven) that is high performance and single process/single thread in its pure form. The Zeus web server does this.

      -Kevin

  20. boxen. . . by Catskul · · Score: 2, Troll

    Could you please refrain from using "boxen". It makes my head hurt

    --

    Im not here now... Im out KILLING pepperoni
    1. Re: boxen. . . by spongman · · Score: 2
      Non-words used as words is like, so 1999, man.
      And so is using time as an adjective.
    2. Re: boxen. . . by pthisis · · Score: 2

      What I want to know is why use boxen rather than boxes?
      "boxes" refers to the physical objects (ie the cases and contents thereof)

      "boxen" refers to the notional servers.

      My Linux boxen could be retasked to be FreeBSD boxen, but they'd still be the same boxes.

      AFAIK, "boxen" was derived from "VAXen". And it was never "VAXes", that would brand you as computer-illiterate as quickly as saying "What's the http for that?" or using "PC" to mean "Windows box".

      Sumner

      --
      rage, rage against the dying of the light
  21. whoa! by RestiffBard · · Score: 4, Funny

    I have no idea what the hell you're talking about but it certainly sounds impressive. :)

    --
    - /* dead coders leave no comments */
  22. Great by C0D3X · · Score: 3, Funny

    Now we finally have the power to run 99,999 pop up ads when we visit that pr0n site

  23. Re:Windows comparison by pVoid · · Score: 3, Interesting

    Very interestingly enough, either windows has a quota, or some sort of memory leak or something...

    Max I can create in a process is 2031 threads... That being done in 700ms.

    It's odd cause I can create more if I run several processes. It doesn't look like the kernel is choking on thread creation...

    will investigate more.

  24. Possible use by captaineo · · Score: 2

    Normally I am of the "use only as many threads as CPUs" school of thought, but I can think of a reason to use 100,000 threads - imagine a large FTP server, or a multi-homed HTTP server, where you need to provide each connected user with his own set of access privileges or filesystem context. A one-thread-per-connection server may be the easiest way to build security into the system.

    1. Re:Possible use by vsync64 · · Score: 2, Informative

      Except that threads, as far as I am aware, share the same address space. Multiple processes need to arrange to share memory, and therefore are less likely to trample on one another or careen out of control.

      --
      TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
    2. Re:Possible use by jhines · · Score: 2

      Uber monster sim program, with a city full of residents, each run by a thread.

    3. Re:Possible use by bmajik · · Score: 2

      of course the penalty you pay for this is that fork() is expensive, and shared memory is a finite system resource. try the command "ipcs" on a sys-v type box.

      It is also generally the case that switching between processes is more expensive than switching between threads.

      to the parent poster : 1 thread per connection is a pretty naive way to do it, but its got advantages - simplicity. It's a moot point since on a stock OS you'd run out of socket descriptors long before you'd run into a thread-count maximum.

      --
      My opinions are my own, and do not necessarily represent those of my employer.
    4. Re:Possible use by be-fan · · Score: 4, Insightful

      "use only as many threads as CPUs"
      >>>>>>>>
      Then please stay away from my GUI apps. I hate those UNIX grognards that come from that school of thought, then try to code GUI applications with only one thread and end up with apps that can't update the GUI while doing I/O. On my 300 MHz PII, that particular trait made Galeon unusable. It had one rendering thread for all the tabs, so when I was loading a complex page like /. in another tab, whatever tab I was actually reading would freeze up.

      --
      A deep unwavering belief is a sure sign you're missing something...
    5. Re:Possible use by captaineo · · Score: 2

      I hate those UNIX grognards that come from that school of thought, then try to code GUI applications with only one thread and end up with apps that can't update the GUI while doing I/O

      Then they are stupid. I/O can be done asynchronously in a single-threaded program; you just need to use non-blocking I/O (sockets) or AIO (disk).

      As for Galeon - the rendering code needs optimization. Rendering in a background thread is NOT going to help. (think: the socket connection to the X server is inherently serialized; if rendering is the bottleneck then the rendering code itself is the problem)

    6. Re:Possible use by be-fan · · Score: 2

      I/O can be done asynchronously, but that forces an interrupt-driven model, which (IMO) is even more complex than a threaded model. As someone who started programming in the 90's (when threads were commonplace) and started GUI programming by playing around with BeOS, it seems much more natural to me to just have a thread that sleeps unless its drawing or getting updated information about the window contents. AIO vs threads aside, the aversion to threads causes a major problem with GUI programming. Most current windowing systems tend to encourage multithreaded GUI apps (take a look at Win32, apps there use as many threads as BeOS apps ever did) while none encourage AIO. As a result, programmers who don't like threads end up not using *either* model.

      I don't know the details of the Galeon code, but it doesn't seem to be the connection to the X server that's holding it up. While its loading a web page in one tab, it doesn't respond to events going on in another tab. Galeon could not possibly be spending all that time rendering (especially since Gecko is supposed to be so fast). What it seems to me is that the parsing and rendering and event handling are all occuring in one thread, so while the browser is parsing and loading one page, it can't render or respond to input events. Breaking the user-interaction into another thread would allow the GUI to respond while the page was loading.

      --
      A deep unwavering belief is a sure sign you're missing something...
    7. Re:Possible use by bmajik · · Score: 2

      maybe on linux - and maybe now. so does this announcement now mean that you can fork 100,000 processes in a matter of seconds as well ?

      (and what linux kernel lets you have 100k simultaneous processes ?)

      and given what i remember about fork, isn't it the case that you memcopy the entire address space of the forking process for each fork() (barring optimizations such as perhaps shared text segments) ?

      are you telling me that pthread_create() on _Every_ platform copies the entire process address space ? i dont think this is the case. a large orchestrated memcpy of course is a perf hit - one that afaik, forking required, and threading does not.

      Note that im not very well versed in how _linux_ threads work - i only know that they've always been " a bit different ".

      Cache synchronization in MT apps is hardly on the same scale as reading/writing to shared memory (or mmap()ed regions?!) If you demonstrate that your forked app works faster via mmap than a single-address space process with multiple threads writing to shared non-thread-local storage, on a platform that has a reasonable threads implementation (not necessarily linux, but i haven't followed linux's threading at all, honestly), i'll be pleasantly surprised.

      fork() has advantages. however, dismissing threading outright by claiming that fork() is equal or superior makes anything you claim dubious.

      re: thread switching vs process switching:
      this argument seems totally ridiculous. i can't possibly fathom how _any_ cache coherency solution that is thread-specific is time-comparative with flushing and reloading the tlb, flushing and reloading _all_ caches, and so on. and cache coherency has been well attacked in designs like the SGI O3k. Are you telling me that a cache-line or two is going to be _less_ efficient than dumping all caches and tlbs ?

      Locking isn't as bad as you claim. And the same fundamental problem(s) exist w.r.t. sharing resources wether your talking threads or processes.

      --
      My opinions are my own, and do not necessarily represent those of my employer.
    8. Re:Possible use by captaineo · · Score: 2

      Although I disagree that threaded programs are inherently easier to write than state machine programs, I can understand your point of view. It's probably a matter of personal preference and experience. (yes, Win32 and BeOS both strongly encourage a multithreaded style of programming, but I don't think that decision was made purely on technical merit. Multithreading was the "cool" thing to do back when these APIs were invented...)

      If you need an example of AIO in a GUI application - consider Netscape 4 or Mozilla on Mac OS 9. These programs are multithreaded on most operating systems, but since Mac OS 9 has weak support for threads, the Netscape runtime simulates multiple threads using coroutines and AIO. (this is done transparently to code running on top of the runtime, so it still appears to be written in the multithreaded model...)

      I use Mozilla and I do notice that it sometimes becomes unresponsive to input while loading a page. But when I boot into Windows, Internet Explorer has no problem loading the same pages without noticeable delays. (not only are other IE windows not blocked, but the one that's loading isn't blocked for long either - so it can't be IE doing all the work in a background thread). I blame inefficient code in Gecko... Pushing the work into a background thread is only a band-aid; it's like pushing a bubble sort algorithm into a background thread when you could just use quicksort and get the same work done much faster.

    9. Re:Possible use by be-fan · · Score: 2

      It probably is a matter of personal preference. I like multithreading because if you look at each thread seperately, each on by itself is linear. With state machines, everything is together, but its non-linear.

      The thing with AIO and multithreading is that it isn't really faster, but it *seems* faster (to the user). User-interface code takes very little CPU power, but its extremely time sensitive. Doing UI handling asyncronously prevents any sort of background work (efficient or not) from influencing the speed of the UI.

      --
      A deep unwavering belief is a sure sign you're missing something...
  25. Re:Not 100,000 threads in parallel, just 50. by the_quark · · Score: 5, Informative

    No, seriously. Process creation under Linux was time-similar to thread creation on other OSs. That's because Linux was as fast at creating *a process* as other OSs are at creating *a thread*. IIRC, threading was initially implemented in Linux from the process-creation methods, so it was similar in speed (the main advantage in Linux from threads was the shared memory space if your application wanted that sort of thing). That's why Apache 2.0 is bringing NT performance more in line with Linux 1.3 performance: NT's threading speed is a lot closer to Linux's forking speed. Again, I'd like to underscore I'm not an expert on this, and it's possible I'm mistaken about relative benchmarks (is NT w/Apache 2.0 a little faster than Linux w/Apache 1.3? Could be...) but I'm very confident of the basic underlying point, that Linux process creation is essentially comparable to other OSs' thread creation, perhaps even faster.

    See, for example, http://www.linux.cu/pipermail/linux-prog/2001-Febr uary/000027.html, just one of the first Google links that popped up when I went looking for proof that I'm not on crack: "Linux newcomers often are unaware of the substantial differences between Linux and other operating systems. To implement concurrency, they use multithreading exclusively, mistakenly assuming as high an overhead associated with Linux multiprocessing as on other platforms." In fact, knowing how fast Linux's process creation is relative to other systems' thread creation makes this even more impressive in my mind. This isn't just a bug fix; much like with process creation before it, Linux is doing something fundamentally better than its counterparts.

    Don't forget: Just because this is /. doesn't mean I'm just a Windows-hating troll. I try to make sure all my Windows-hating-troll-posts are at least backed up by facts. ;)

  26. Re:Alternative headline by Dahan · · Score: 4, Informative
    Gigantic performance problem in Linux code fixed after several years of "many eyes" scanning over it.

    Uh, why did that get moderated as a troll? Oh, right, Linux is absolutely perfect, and anyone who says otherwise must be a troll.

    Come on, Linux's scheduler has long been known to have performance problems once you have a lot of processes/threads... for example, read this paper [text version] (appropriately subtitled "How I Learned to Love the Alpha and Hate the Scheduler"):

    0.8.1 Create a fixed priority scheduler.
    Currently, the Linux scheduler is very different than the traditional Unix schedulers. Although the Linux scheduler is very efficient when only several processes are running, it is not scalable. In order to match the performance of *BSD and other Unices, another scheduling algorithm must be used.
    Moderators, don't be Slashbots, moderating according to the groupthink. Educate yourselves, and you'll be better moderators, and better people.
  27. Re:Windows comparison by Courageous · · Score: 4, Informative

    Very thread uses a minimum of *1 PAGE* of reserve memory for its statck, which is 64K. However, you have to go out of your way to use less than 1 megabyte of reserve memory. Since only 2GB of reserve memory (addressable memory) is available to user applications, this would fit your 2000 thread figure like a glove.

    C//

  28. nice, but... by g4dget · · Score: 4, Interesting

    It's nice that the Linux kernel can handle that many threads. But user level threads generally are even more lightweight, and high performance implementations like those on Solaris provide both user level and kernel level threads and map the former onto the latter. Is Linux going to get something similar? Is Sun perhaps donating their implementation? Or are these new kernel threads so lightweight and quick that they are competitive with Solaris on their own, without the mess and complication of adding user level threads?

    1. Re:nice, but... by Magnus+Reftel · · Score: 4, Informative

      According to a mail from Ingo Molnar halfway down the linked article, M:N threading doesn't really solve the real problem - it's good at switching back and forth between running threads, but the real reason for having very large amounts of threads (be they kernel or user space threads) to begin with, is to do IO, and for that, there is no real advantage of user space threads.

      More info on the 1:1 vs M:N issue can be read in the white paper

      --
      print "Yet another p{erl,ython} hacker\n",
    2. Re:nice, but... by g4dget · · Score: 2

      Thanks for the pointer. Sounds like they went with 1:1 for a good reason. I always thought of M:N threading as kind of a kludge and not entirely trustworthy anyway (scheduling and I/O become rather iffy).

  29. How will this affect Mozilla, OpenOffice... by 3770 · · Score: 4, Interesting

    How will this change affect Mozilla, the Sun JVM and OpenOffice, for instance.

    While it probably is generally true that it will take some time for most applications to start using the new threading model some larger applications could support it fairly soon.

    Can we expect these applications to be adapted to the new threading model some time soon, and how will it affect performance?

    --
    The Internet is full. Go Away!!!
    1. Re:How will this affect Mozilla, OpenOffice... by alext · · Score: 2

      Threading performance may be poor on Linux, though personally I haven't noticed it, other aspects are fine though. I'd say that big Java applications start up about 20% faster on my Linux partition than on my Windows one using Sun JVM 1.3.1_04.

      In fact, Solaris LWP threading has caused me more headaches - it seems that the old N:M thread model can deadlock with native libraries such as the Oracle OCI drivers, theoretically using the alternate 1:1 model fixes this but I haven't yet proved the case to my own satisfaction. Read up on this new model here, or try it by putting /usr/lib/lwp in your LD_LIBRARY_PATH.

    2. Re:How will this affect Mozilla, OpenOffice... by Wesley+Felter · · Score: 2

      There is no new threading model. The thread APIs are the same, so when you install the right kernel and glibc all apps will benefit.

  30. Re:How long before by _Knots · · Score: 2

    Be careful who you call a dumb fuck. Netscape had a functional browser long before IE3, aguably the first usable version of IE. And it would not surprise me if Netscape 1 predated IE 1, though I can't say I know that for sure.

    Speeding The Net is an excellent book about Netscape vs Microsoft, in case anybody cares (it's been a long while since I read it, thus why my date memory is rusty).

    --
    Anarchy$ dd if=/dev/random of=~/.signature bs=120 count=1
  31. Great... Now every lamer with no design knowledge by Alex+Belits · · Score: 2

    ...will start writing horrible monsters running hundreds and thousands of threads, and their creations will suffer from all other shortcomings of that decision.

    --
    Contrary to the popular belief, there indeed is no God.
  32. Re:Not 100,000 threads in parallel, just 50. by brianpane · · Score: 4, Informative
    Apache 2.0 doesn't actually do thread creation very frequently. The thread creation cost occurs mostly at startup. So the limiting factors for threaded Apache performance on Linux are mainly:
    • The speed with which the kernel can schedule and context-switch among threads
      For some recent data on this, see http://marc.theaimsgroup.com/?l=apache-httpd-dev&m =103228014211983. The O(1) scheduler patch for 2.4 seems to help here.
    • Memory usage per thread
    • Concurrency limitations of the Apache code itself
      This has been improving gradually with successive 2.0 releases, as the remaining global locks are removed or optimized.
    • General robustness of the thread implementation
      The current (2.4) Linux threading implementation doesn't work well with debuggers.
    At first glance, it looks like the NPTL could be a win for threaded Apache on Linux, as offers some solutions first the first and last of these issues.
  33. How *I* got kicked out of the computer lab by Naikrovek · · Score: 3, Funny

    I ran this in DOS:

    prompt "Enter Password:"

    No one could figure out that all i did was change the prompt from "$P$G" to that, and everyone was asking what the password was. haha, good old teacher was infinitely frustrated as well! IT WAS BEAUTIFUL.

    I got kicked out for a year (not beautiful).

  34. big deal by leomekenkamp · · Score: 4, Funny

    100.000 threads? What nonsense; everybody knows that no computer would ever use more than 640.

    --
    Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
    1. Re:big deal by sg_oneill · · Score: 2

      wry. verry wry. *g* (mod it up)

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
  35. Hooray for fixing the dynamic linking problem! by Foresto · · Score: 2, Interesting
    It looks like speed isn't the only improvement they've made with this library. From the notes:

    " - - libpthread should now be much more resistant to linking problems: even if the application doesn't list libpthread as a direct dependency functions which are extended by libpthread should work correctly."

    This ought to be a big help for those of us who write plug-in modules for servers like Apache 1.x and PHP. The existing thread library doesn't work properly unless the program executable explicitly links to it, which means that my shared libraries can't take advantage of standard thread management such as pthread_atfork().
  36. Does this help Apache 2.x by mustprotectdata · · Score: 2, Interesting

    Given that Apache 2.x can utilise threads as well as processes, does this mean that you can configure a large web server with, say "MaxSpareThreads 1000000" so that you can cope when you're slashdotted ;-)?

  37. 100000?! by thelexx · · Score: 3, Redundant

    640 should be enough for anybody!

    LEXX

    --
    "Gold still represents the ultimate form of payment in the world." - Alan Greenspan, 1999
  38. Wow! by mnordstr · · Score: 2

    Combine this with Apache2's Multi-threaded or Hybrid MPM and you'll have a heck of a web-server!

  39. Re:Not 100,000 threads in parallel, just 50. by Karellen · · Score: 5, Informative

    It's not process/thread _creation_ times that make the difference, it's the process/thread _context_switch_ times that really mount up, which is where Linux shines.

    And yes, Linux's process context switches are on a par (possibly faster - can't be bothered to look up benchmarks) with NT's thread context switches.

    K.

    --
    Why doesn't the gene pool have a life guard?
  40. Re:Alternative headline by himi · · Score: 3, Insightful

    Alternatively, you might want to consider that Linux's scheduler was very nicely tuned for far and away the most common case - where you have only a small number of running processes.

    Likewise, threading support under Linux has been oriented towards what the developers considered sane: a fairly small number of threads. They had good reasons for considering that the right way to do it - for a start, it worked nicely for what they wanted, and it was sufficiently simple that they didn't have to put in lots of complex code. Further, it's almost never a good idea to have a program architecture that requires very large numbers of threads - it generally only shows up in naive code where people simply don't understand the problems it brings. So, as far as the kernel developers were concerned, stupid people hurting themselves wasn't something to put any effort into amelioriating. This has changed recently, as people have started using Linux in areas where this kind of thing /isn't/ insane, and hence these new developments have come along.

    You need to understand the reasoning behind a lot of these decisions before you can start complaining about them. First and foremost, you simply /have/ to realise that the kernel developers care about how people actually use the system, rather than crappy benchmarketing numbers. These developments have come about because people needed them, and they didn't happen earlier because no one had needed them before. Go back and read the last few years of the lkml archives, and /then/ come back and talk about this kind of thing, when you understand /why/.

    himi

    --

    My very own DeCSS mirror.
  41. POSIX compliance ahead? by rkit · · Score: 2, Informative

    Scalability is a good thing, no doubt about that. However, there is another aspect that should be pointed out: the current thread API in linux is quite different from the POSIX specification and somewhat crufty. Just to mention the biggest problems:
    missing cancellation points: testing whether a thread has been cancelled should be done in lots of system calls, but linux pthreads do not support this. Instead, you have to call pthread_testcancel() before and after every such call. A real drag.
    signal handling: linux pthread signal handling is very different from the POSIX specification. However, proper signal handling is crucial for any real world application.
    fork() will not work as expected. This is a real nuissance if you want proper daemon behaviour for your application.
    documentation of linux-specific behaviour is poor. As a result, most of the existing literature on thread programming is pretty useless for linux.
    All these points can be worked around, for sure. Nevertheless, it makes writing portable software a nightmare. Porting threaded software to linux, well ... All in all, linux threads really need much better integration with the standard system API. A lot of applications could profit from multithreading. Just think of GUI responsiveness. Also, using threads makes some programming tasks much easier. No need for asynchronous hostname lookup, for example.
    A solid, well documented, standard conforming threads implementation will make linux a much nicer environment for serious programming than it already is. I am really looking forward to this.

    --
    sig intentionally left blank
    1. Re:POSIX compliance ahead? by inode_buddha · · Score: 2, Interesting

      Nobody ever said that linux-specific behavior is POSIX-compliant. Last I heard, POSIX is not about the specifics of any given UNIX-compatible or class of system. Rather, it attempts to be the abstraction and distillation of those class of systems, as codified by The Open Group. Please correct me if I am wrong in this idea. Linux simply simply "aims to be..." POSIX-compliant, as promulgated by the LSB, the FHS, et al. --

      That all said, I totally agree with you -- especially regarding cancellation points, fork(), and documentation.

      Please bear in mind that much of this behavior will be inherited from whatever libc it it compiled against. IMO, this simply shows the power of C, nothing else.

      The above scenario simply points out the differences between OpenGroup/POSIX and GNU/FSF... if things like that "bug" you (no pun intended, seriously), then perhaps you should recompile with whatever "-- posixly-correct" options you have available.

      And yes, I have a copy of the SUSV3 spec right here, in fact.

      --
      C|N>K
  42. NGPT by p3d0 · · Score: 2

    Then, with NGPT (Next-Generation Posix Threads), those 100,000 threads would be in user space and may be even cheaper.

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    1. Re:NGPT by benh57 · · Score: 2

      Nope. Apparently NPTL is Four times faster than NGPT.

  43. Re:Posix thread... by smittyoneeach · · Score: 2

    So maybe there is a heavyweight library for some applications, and a lighter weight one for common use.
    Probably you do the light one, and include it in the heavy when required.
    Ah, the one-size-fits-all thought process...

    --
    Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  44. Re:Alternative headline by croftj · · Score: 2, Insightful

    I think we need to pull some old stats out of our ass. This paper is about athe 2.2.x kernel. Correct me if I'm wrong, but hasn't there been massive overhauling of the 2.4.x and 2.5.x kernels in the scheduling area?

    I think I'll just slam XP performance based off of NT benchmarks and aricles. What the hell, thier both from MS the argument must be a valid.

    Get a grip!

    --
    -- Many men would appreciate a woman's mind more if they could fondle it
  45. Re:Posix thread... by DrunkenPenguin · · Score: 2, Funny

    ..actually.

    Your answer:

    http://www.linux.ncsu.edu/lug/lectures/rpm-pres/mg p00033.html

    This is so true to all of us ;)

  46. Re:How long before by Zeinfeld · · Score: 2
    Netscape is the direct descendant of NSCA Mosaic, the Ur browser. Frankly, I don't remember what the big deal about Netscape 1.0 was, relative to Mosaic, but there was much hype. Maybe something really hardcore, like introducing background colors?

    Wrong in every respect.

    First Mosaic was not the 'Ur browser'. Tim's NextStep browser was. Mosaic was browser number 15 or so. The significant things about Mosaic were that 1) it actually compiled without having to hack the code yourself or mess with 6 different support packages like tkwww and 2) it was the first X-Windows browser that did not look really amateur.

    Second, Netscape does not contain any code from Mosaic, although it was written by the same main author - Eric Bina. NCSA sold the commercial rights to Mosaic to Spyglass.

    Third IE was originally based on the Spyglass code, so if any browser is 'the direct descendant' it would be IE. Go look at the 'about' box on IE, although the original Mosaic actually had more lines of CERN code than NCSA code which were never acknowledged.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
  47. Wait, there's more by Quixote · · Score: 2
    From the ensuing discussion on the list:
    Ingo:...Anton tested 1 million concurrent threads on one of his bigger PowerPC boxes, which started up in around 30 seconds. I think he saw a load average of around 200 thousand. [ie. the runqueue was probably a few hundred thousand entries long at times.]
    Wow.. this is pretty good.The ability to spawn & run 1 million concurrent threads should keep even the most demanding users happy for a few years...

    OTOH, I hope this post doesn't become the butt of jokes a few months from now ("and you thought 1 million was a lot! Ha! My Palm 5000XL does more than that!")...

  48. Re:Not 100,000 threads in parallel, just 50. by Wdomburg · · Score: 2

    > The title and description is misleading. From the
    > comments further down in the article, Linus
    > points out that only 50 threads at a time were
    > running in parallel:

    And the next comment down is from Ingo:

    actually, that was Ulrich's other test, which
    tests the serial starting of 100,000 threads.

    the test i did started up 100,000 concurrent
    threads which shot up the load-average to a
    couple of thousands. [the default timeslice the
    parent has is enough to start more than 50,000
    parallel threads a pop or so.]

    So, yes, they did manage 100,000 threads running in parallel.

    Matt

  49. Not to mention.... by SwedishChef · · Score: 2

    Your egregious use of the word "egregious".

    --
    No one ever had to evacuate a city because the solar panels broke!
  50. Re:Ur browser??? by Anonymous Coward · · Score: 2, Informative

    I can only suppose you don't know what Ur is, maybe because you come from a very different culture...

    Anyway, and I'm really not well qualified to answer this, Ur was an ancient city-state from which a prominent ancestral of the Jewish-Christian-Islamic heritage (Abraham, if I'm not wrong).

    This city, IIRC already found, was sumerian (I'm not sure about this), the folks who are said to be the inventors of the wheel, among other neat things.

    So an Ur browser would be the primeval browser, in other words.

    Upon writing a note, one must be sure it will be understood; nonetheless, the "Ur" mention boosted the note level way up. All in all, I think it was great and I'm all for it.

    But explanations as these sometimes become necessary.

  51. Mod this up, please (was: POSIX compliance ahead?) by Observer · · Score: 2

    See subject. A useful 'heads up' post for folks like myself who tend to assume that Linux will follow the general Un*x-family behaviours we're familiar with from the commercially-sold variants.

    And yes, I would of course ;) check this assumption if I were to do some significant implementation for the Linux platform.

  52. Re:hmm by Graymalkin · · Score: 2

    Don't you mean

    My name is ingo Molnar.
    You kill -15 my parent process - Prepare to die.

    --
    I'm a loner Dottie, a Rebel.
  53. user-level threads are useless by RelliK · · Score: 2

    User-level threads cannot take advantage of multiple CPUs. True, they are somewhat faster on a single CPU system due to lower overhead, but that's all they are good for.

    --
    ___
    If you think big enough, you'll never have to do it.
  54. ACE is nice for big systems by 0x0d0a · · Score: 2

    ACE is nice for big systems.

    But it's also way overkill for small stuff. It's a whole distributed framework, not a wrapper around pthreads.

  55. Re:Windows comparison by Courageous · · Score: 2

    It's a Windows limit, and it's in the documentation.

    C//

  56. Re:Windows comparison by Courageous · · Score: 2

    The 64K page size is Windows' page size. I can only assume that the poster stating that the intel hardware page size is 4K. I would suppose this means that a Window's (2K,NT) page of 64K is assembled from 16 hardware pages, then. The Windows' page size of 64K is in their documentation. I never paused to think about how this interfaces with hardware pages...

    C//

  57. Will `top' and `ps' be fixed? by truth_revealed · · Score: 2

    Currently in Linux every thread is assigned a distinct process ID, and as such, a process has as many entries in `top' and `ps' as it has threads. This makes it difficult to monitor processes externally, or even see the other processes' information. Has this issue been addressed? (I realize this is a user-space program issue, not a kernel issue).

  58. Re:Not 100,000 threads in parallel, just 50. by Sivar · · Score: 2

    Linus was... Wrong?!

    Whoa, that's going to completely shatter the world view of many Slashdotters.

    --
    Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
  59. Multithreaded core files on Linux? by truth_revealed · · Score: 2, Interesting

    I can't seem to find any info on whether Linux core files still produce one core file per thread or just one core file per process (as does Solaris). Has `gdb' been enhanced to handle multithreaded programs (or multithreaded core file) on Linux? If I have a thousand threads - I sure don't want 1000 core files in the event of a crash. Is there a way around this?

  60. Re:Alternative headline by Dahan · · Score: 2
    Correct me if I'm wrong ...

    Okay, you're wrong. This O(1) scheduler in 2.5.x is the "massive overhauling." (Yes, the patch has been around for a while... but as the article says, it's only recently been merged into 2.5)

  61. Re:Windows comparison by be-fan · · Score: 2

    Err, Windows NT does use the native 4KB page size on Intel, but is designed to be expandable to systems with up to a 64KB page size. As a result, certain operations (like the reserve mapping that goes on for the thread stack) aligns data in 64KB increments. IIRC, there is also 64KB of virtual slack between memory mapped objects as well.

    --
    A deep unwavering belief is a sure sign you're missing something...
  62. Re:Windows comparison by sagei · · Score: 3, Informative

    Each Linux thread has two things of its own: its own stack, which can be as small as 1 or 2 pages if the code to run is simple enough, and also its own task_struct, which is 1 page including kernel stack for the thread.

    This is not true; the kernel stack is two pages in size, i.e. 8KB on i386.

    Also, in 2.5 (where these tests were done), the task_struct is no longer allocated on the stack. It is allocated off the slab cache, while the thread_info struct is on the stack. The task_struct slab object is another ~1.7KB per task.

    Finally, I do not know what the pthreads default stack size is (user-space? what is that?) but it is certainly larger than one page.

    --

    Robert Love

  63. Re:Not 100,000 threads in parallel, just 50. by sagei · · Score: 2

    No, seriously. Process creation under Linux was time-similar to thread creation on other OSs. That's because Linux was as fast at creating *a process* as other OSs are at creating *a thread*. IIRC, threading was initially implemented in Linux from the process-creation methods, so it was similar in speed

    It was and still is implemented by the process creation methods. Threads were (and still are) the same as processes in Linux (to the kernel, anyhow). All process creation is done by do_fork(), which accepts clone() flags that specify what to share between the parent and the child. "Threads" (as opposed to normal processes) just happen to share a few things: address space, signal handlers, open files, etc.

    But yah, process creation in Linux is sick. Hold your head high.

    --

    Robert Love

  64. Re:Not 100,000 threads in parallel, just 50. by AJWM · · Score: 2

    You're quite right, and in fact this predates Linux and NT -- Unix was always good at process creation, whereas VMS process startup was very heavy on overhead.

    It's not surprising that Linux (modelled on Unix) and NT (originally modelled on VMS) show similar characteristics. It's the reason that many Unix applications tend to be written as a bunch of cooperating processes, whereas NT apps are monolithic monsters with lots of threads.

    Unfortunately, thanks to a generation of CS students having learned bad habits on Windows, we're starting to see a lot of Linux apps written as monolithic monsters. (Of course there are few old Unix apps out there like that too, perhaps some old mainframe mentality leaking through.) There are advantages to cooperating/communicating processes vs the monolithic multithreaded approach: it's easier to test the components separately, it's easier to reuse the components to make different systems, and a bug in one place won't necessarily clobber the whole thing.

    --
    -- Alastair
  65. Other similar mischief... by TheLink · · Score: 3, Funny

    Some guys I know copied a Windows error dialog box and set it as a background image for the desktop, centered.

    Imagine the poor victim vainly clicking on the buttons, and getting more and more worried. Said victim actually rebooted the machine to see it reappear, and was not happy when he started to notice the sniggering bunch behind him...

    For example pic:
    http://www.adobe.com/support/techguides/oper atings ystem/windows/winerrors.html
    Probably want to replace CCmail with Explorer or something more dear to heart ;).

    I also installed a bluescreen STOP screensaver on April Fool's day on a colleague's PC. Heh, he was shocked enough to actually called another colleague over and made the usual worried mumbles.

    http://www.sysinternals.com/ntw2k/freeware/blues cr eensaver.shtml

    Since I had admin privs, I was also tempted to have ad.doubleclick.net and similar dns names to resolve to a private webserver which served out custom banner ads.

    Wonder how users would take it if they see the "Staff Meeting at 2pm banner ad". Or "Company Slogan here". Or "Big boss is watching you!". Or for search result sensitive ads: "Stop downloading mp3s/movies/porn!"

    I could actually justify that as a useful application. It's probably more useful than a doubleclick ad...

    But I'd probably need the 100K parallel thread kernel to serve up all those ad banners :).

    Bwahaha!
    Link.

    --
  66. Finally we have an answer to Sco unixware by Billly+Gates · · Score: 2

    sco and solaris both can create threads 10,000 times faster then the current linux kernels according to sun's and sco's marketing departments. My guess is that this was exagurated but is one of the benefits of the big unix's. Heavily threaded linux apps have been rumoured to fly on unixware where they would run slower on their own native platforms! I guess Linux is maturing in this aspect. Does anyone who knows anything about unix/linux threading care to comment? I wonder if this will help linux in server environments.

  67. Re:Windows comparison by Saint+Stephen · · Score: 2, Informative

    I've created over 200,000 process on a PIII 550 laptop with 256 mb of ram running Windows XP. Of course, it took a while (swapping).

    The process is called nothing.exe. Source Code: int WinMain(...) {Sleep(INFINITE);}

    I work at a lab, so I also ran it on a Compaq 8-way with 4-GB of ram. It worked but I don't remember how fast it went.

    However, there is a big gnarley limit in Windows that will limit the # of processes: the amount of memory allocated to virtual desktops or something. We researched it -- Look it up. This is why you get limited to a few thousand processes or threads if they all do GUI stuff. The bad thing is basically any function you call in user32 will register the thread as a GUI thread. It explains it all in the book Inside Windows 2000.

    Not meaning to troll, I'm just going to share basic fact: It sucks that Windows threads are so expensive, but tens of thousands of threads *DOES* suck (read: thread per client) on Windows. However, this is not the same thing as saying Windows doesn't scale -- you just have to code it differently. (Check out how many SQL Server uses when it's processing thousands of clients.) Stuff like IO Completion ports, AWE memory, and Scatter/Gather IO is the way that you have to go.

    Just because you *can* create hundreds of thousands of threads, doesn't mean it's a good idea or that your app won't run like shit on a 32-CPU machine!

  68. New locking primitive, "futex" by Animats · · Score: 2

    Hidden in the article was a reference to a new locking primitive, futex. I don't see a manpage on line for it, though. Where is this documented?

    1. Re:New locking primitive, "futex" by (startx) · · Score: 2

      There is some very primative discussion of it here .

    2. Re:New locking primitive, "futex" by Animats · · Score: 2

      If it's becoming a kernel feature, there should be more documentation than some comments on the kernel mailing list. But I'm not finding any.

  69. This may not even make it INTO 2.5.x... by Wolfrider · · Score: 2, Interesting

    See here ( http://lwn.net/Articles/9632/ )
    and here ( http://lwn.net/Articles/10248/ )

    --Linus is being pigheaded about this patch, wanting to "keep the code simple" instead of implementing Ingo's **fast** + Fixed solution.

    To quote LWN:
    [ So it's fast - though a few extra features have been requested. But this patch has stirred up a bit of a debate. Rather than put in a complicated new PID allocator, it is asked, why not just make the maximum PID be very large? Then, in theory, the quadratic part of get_pid() will never run so the performance problems go away, and the code stays simpler. Linus prefers this approach, as do a number of other developers; he has put a simple patch along these lines into his pre-2.5.37 BitKeeper tree.

    Ingo disagrees, pointing out that any reasonable maximum PID size can be exceeded eventually. He would rather fix the problem than try to hid it behind a large process ID space. In the absence of real-world examples that show people being bitten by get_pid()'s behavior in a larger PID space, though, Linus appears unlikely to accept any more complicated fix.
    ]

    --
    .
    == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
  70. Linus didn't think much of O1 scheduler by jelle · · Score: 3, Interesting

    I remember that Linus made a remark that he tought that the O1 scheduler wouldn't impact Linux much at all, and that its development would not be a biggie for Linux, downplaying the importance of what it can achieve. Go Ingo for keeping at it!

    --
    --- Hindsight is 20/20, but walking backwards is not the answer.
  71. Re:Careful there... by Dahan · · Score: 2
    I have carefully considered your points, and only have the following to say:
    • Consider that the Linux scheduler hasn't changed significantly in those THREE years.
    • Consider Ingo Molnar's post on the subject.
    • Consider providing some evidence for your position, rather than just saying that I'm wrong.
    • Bulleted lists are pretty. I can do that too.
    If you guys have some evidence that the paper I referenced is no longer valid, please post it (or references to it). Don't just tell me "oh, that paper's ancient; things are different now."

    'Cuz up until fairly recently, they weren't.

    P.S. And if anyone wants to compare Windows XP's scheduling performance with NT's, be my guest... I don't think you'll see much of a change. Remember that XP is just NT 5.1, and I haven't heard about any significant performance improvements in NT's scheduler. (The only vaguely scheduler-related change I remember is the addition of "fibers" in NT 4.0 SPsomething (3?))

  72. Re:Not 100,000 threads in parallel, just 50. by himi · · Score: 4, Interesting

    The latency issues that cause mp3 skipping under heavy load in Linux have nothing at all to do with context switching, and everything to do with /scheduling/ latency: how long it takes for a process that has work to do to actually get control of the cpu. Context switching has /nothing/ to do with that.

    The low latency patches go through the kernel breaking up areas where spinlocks are held for long periods of time. That's what causes massive scheduling latency in the kernel.

    Context switching under Linux /is/ extremely fast - it's actually been measured (a lot), and it's something the kernel developers pay a lot of attention to and optimise very carefully. They literally count cpu cycles in these code paths. Context switching time is a serious performance limiter in many areas, so getting it right is important, and it's something that Linux does /very/ well.

    Go do some real research before you accuse someone who's right of karma whoring bullshit.

    himi

    --

    My very own DeCSS mirror.
  73. Re:Windows comparison by The+Panther! · · Score: 2

    Hardware page size is 4KB, as was noted elsewhere. The key element that I haven't seen mentioned is that Windows' virtual memory system has several ways to 'allocate' memory. There's reserving pages, and there's committing pages. In the case where you tell the OS you want memory, it reserves pages. That is to say, it does not actually take memory from the free physical memory, but instead creates a contiguous address space large enough for your request, but allocates no hardware RAM at those addresses.

    When you commit a page, either through accessing a page (read or write) that is not allocated, it trips a hardware fault if the VM hasn't mapped a page to the address, which then searches for a free page, then links them together.

    The end result is, even if Windows does try to create 64k worth of memory segment space for a process, unless it is actually reading or writing to a byte in each 4k chunk, its internal VM will not allocate physical memory for the whole 64k. Furthermore, there's no such advantage or realistic way for the operating system to align anything in memory physically, except in AGP ram. The VM system handles physical pages of memory exclusively, but does not manage AGP-allocated memory (IIRC). In other words, though the OS can align the address space to anything it likes, the OS layer cannot request any physical allocation mapping or alignment. So that comment about aligning memory for processes is quite unlikely.

    Now, the XBox (which runs a variant of the Win2k kernel) has a bit more control over VM, but it also does not support demand paging, so it cannot swap to the hard disk and give you RAM+HD effective memory. Shame, that. But, as a result, you have an API that allows hardware level allocation control. Still, the OS doesn't take advantage of it, AFAIK. It's for developers.

    --
    Any connection between your reality and mine is purely coincidental.
  74. Re:Windows comparison by Courageous · · Score: 2

    The end result is, even if Windows does try to create 64k worth of memory segment space for a process, unless it is actually reading or writing to a byte in each 4k chunk, its internal VM will not allocate physical memory for the whole 64k.

    Yes. Quite true. I hade a problem a while back on Windows which took me a bit of reading through the documentation (and verifying with some low level sys calls) to determine that what was happening is that I was running out of "reserve memory". Which is to say that, while I had plenty of physical memory left, all the address space had been used up. You can do this very easily by creating thousands of threads on your computer. To get a large number of these threads, you'll have to push the default stack size to its minimum, 64K. I was a bit disatisfied with this minimum, but I suppose I'll live with it now (or port to linux) if I have to, or upgrade to a 64 bit os if it becomes a practical limit in the future.

    C//

  75. Re:Not 100,000 threads in parallel, just 50. by himi · · Score: 2

    Yes, that's a performance issue, but it's not a /latency/ issue - the new process is running, and from there on in the latencies are only a few hundred cycles rather than measurable in microseconds. Until the next time the process enters the kernel, or page faults, or whatever. As far as latency goes, context switching is of minimal importance unless you're worried by latencies on the order of less than a microsecond (depending on hardware and the like, of course).

    The argument that threads trash cache less than full processes seems fairly bogus to me - the cache trashing will be much more dependant on the size of the working sets of all the running processes, and there's nothing to say that a thread will have a smaller working set than a process. The text segment will be shared, yes, but it's the same with multiple instances of /any/ process, because the program text will be mmapped read only, allowing the memory to be shared, and thus kept in cache. The TLB flush needed would be an added cost, but unless the cache really is being trashed completely by your program it'll be reloaded straight from cache, and that shouldn't be more than a few hundred cycles (I think - don't quote me on that).

    In any case, the real performance comparison isn't between multiple processes versus multiple threads, it's between a multithreaded implementation and a single-threaded one. In /that/ comparison threads come last, simply because they /have/ those kinds of cache interactions and so forth, where a single-threaded version won't. They also have overhead due to locking, greater debugging difficulties, and other added complexities. On the other hand, though, you can't make use of more than one processor without having multiple processes, whether they're threads or full processes . . .

    I think the biggest thing making threads attractive to people is the fact that a threaded approach will often make things simpler to think about in the design stage. You can make all the independant threads of control in your design /real/ threads of control in the implementation. That comes at a cost, though . . .

    Personally, I like the quote from Alan Cox that I've seen in a few people's .sig: "Threads are for people who can't program state machines". It's more complex than that, but it does seem to capture a lot of what motivates threaded designs.

    himi

    --

    My very own DeCSS mirror.
  76. Re:Windows comparison by pthisis · · Score: 2

    Err, Windows NT does use the native 4KB page size on Intel, but is designed to be expandable to systems with up to a 64KB page size. As a result, certain operations (like the reserve mapping that goes on for the thread stack) aligns data in 64KB increments

    That's boneheaded. Linux supports page sizes up to at least 4MB, but it doesn't align everything on 4MB boundries on the off chance that you might be using 4MB pages. It uses the appropriate alignments for the page sizes actually in use.

    An OS that has dropped all support for non-Intel hardware citing a portability concern which doesn't exist in portable OSes? As they say in Snatch, "It's spurious, mate. Not genuine."

    Sumner

    --
    rage, rage against the dying of the light
  77. Re:Not 100,000 threads in parallel, just 50. by pthisis · · Score: 2

    And yes, Linux's process context switches are on a par (possibly faster - can't be bothered to look up benchmarks) with NT's thread context switches.

    Last time I benchmarked, which was a long time ago (NT 3.51 days), Linux process switch times were 5x faster than NT thread-switch times on the same hardware. Linux thread-switch times were on a par with process-switch times, NT thread-switch times were about 20x faster than NT process-switch times.

    I'd expect all those numbers to have changed, though.

    Sumner

    --
    rage, rage against the dying of the light
  78. Re:Windows comparison by Courageous · · Score: 2

    Yes, I know you are right. Amongst other things, I won't be stuck with 64K per thread stack in Linux, and as you say, I could use 64 bit alpha linux. I'm looking forward to Hammer, actually.

    C//

  79. Re:Windows comparison by sagei · · Score: 2

    Why it needs to be larger than one page? The kernel will trap access to page faults due to stack overflow, and will allocate additional stack to it anyway.

    It does not need to be bigger than one page, it just is. You are right, the stack is expanded via implicit mmap as it grows... but for performance reasons the default stack is usually measured in megabytes, not pages.

    Anything but the simplest of applications would use a page rather quickly. User-space applications are programmed to assume they have any size stack they want. Local variables are huge.

    In short, I was just commenting on the default. It can surely be lowered...

    --

    Robert Love

  80. What "c10k problem"? by tlambert · · Score: 2

    I don't understand what the issue is here.

    I was able to run 1,600,000 simultaneous connections with a modified FreeBSD kernel, in June of 2001. Couldn't get much work done, but at about 300 baud per conection, after dividing up a gigabit ethernet link... you shouldn't expect to do much work.

    Without modifications, after a patch to the credential reference counting (since committed to FreeBSD 4.5), as long as a stock kernel is tuned correctly, it can still *easily* handle 100,000 simultaneous connections (16K of window space for each connection = 1.6G of mbufs).

    -- Terry

  81. So? by tlambert · · Score: 2

    So? Use non-blocking I/O instead. Problem solved.

    -- Terry

  82. It is not broken. by 1101z · · Score: 2, Informative

    No you will see a pid per thread because, that is how the scheduler knows to schedule things. The getpid() c library call from within the program. When they said it is a 1-to-1 mapping that means that there is a process per thread. Just look when you see all those proccesses with the same name, and see if they have the exact same memory usage. If they do it means they are using the same memory and are threads. No matter how you implement threads there has to be more than one proccess other wise when the program blocks for I/O all threads would be blocked.

    --
    One day people will learn the folly of Winbloze, Linux Rules!
  83. Re:Not 100,000 threads in parallel, just 50. by Karellen · · Score: 2

    _Need_ the low latency patch? We don't.

    karellen $ uname -a
    Linux foo 2.4.17 #1 Sat Jul 13 12:21:18 GMT 2002 i686 unknown
    karellen $ cat /proc/cpuinfo | grep -E "model|cpu"
    cpu family : 6
    model : 3
    model name : AMD Duron(tm) Processor
    cpu MHz : 757.485
    cpuid level : 1
    karellen $ cat /proc/meminfo | grep MemTotal
    MemTotal: 126732 kB
    karellen $

    So, I'm running 2.4.17 on an AMD 750 with 128MB of RAM. You'll have to take my word that that's a stock 2.4.17, with no patches, but I'm playing a list of .ogg files with xmms, while ripping and ogging a CD in the background, with Mozilla running, and grabbing a mozilla window and moving it around the desktop (with opaque window moving switched on) really quickly for 20 seconds results in - no skipping.

    Yeah, reducing latency will be nice, but as far as I can tell, it's not actually needed for anything to do with the `user experience' at the moment.

    Don't know what you've got running in the background, but it must be pretty hefty.

    K.

    --
    Why doesn't the gene pool have a life guard?
  84. Ahhhh. So now it can by T.E.D. · · Score: 2

    ...run Ada 83 programs.

  85. Re:Great... Now every lamer with no design knowled by dvdeug · · Score: 2

    But while their threads will be slow, they will be to handle the text the users are entering; vastly more useful than the most optimized eight-bit character horror you would turn out.

  86. Re:Great... Now every lamer with no design knowled by Alex+Belits · · Score: 2

    Trolling is supposed to be:

    1. Fast! Writing random mild insults almost a week after the original posting isn't as great as making a real-time flamewar immediately after posting.

    2. Accessible to a potential reader. Referring to an obscure recurring theme of my rants made months away from this article (byte-value transparency of protocols vs. Unicode references in RFCs) would require a potential troll spectator a lot of googling before he will be able to appreciate your comment.

    --
    Contrary to the popular belief, there indeed is no God.