Slashdot Mirror


Running 100,000 Parallel Threads

An anonymous reader writes "This story explains how the latest Linux development kernel is now able to start and stop over 100,000 threads in parallel in only 2 seconds (about 14 minutes 58 seconds faster than with earlier Linux kernels)! Much of this impressive work is thanks to Ingo Molnar, author of the O(1) scheduler recently merged with the 2.5 Linux development kernel."

44 of 387 comments (clear)

  1. Hold this thread while I walk away by DoctorHibbert · · Score: 3, Funny

    The linux song

    --
    Arbitrary sig
  2. Re:Posix thread... by Wolfier · · Score: 5, Informative

    Your answer:

    http://www.cs.wustl.edu/~schmidt/ACE.html

    This is so far the best library I have used for pthread programming. Powerful, easy to use, and encapsulates message passing really well...

  3. 100,000 Linux threads by Anonymous Coward · · Score: 5, Funny
  4. Win ME Kicks that sorry statistic!!!! by SlimFastForYou · · Score: 4, Funny

    It takes two seconds to start 100,000 threads???? Piff! With my ME computer, It doesn't matter how many parallel threads I am running... I can stop them all instantly by simply attempting to use my computer :P.

    1. Re:Win ME Kicks that sorry statistic!!!! by CoolVibe · · Score: 4, Funny
      Pff... I can start a million threads on my FreeBSD box and stop them all in an instant...

      ...by hitting the reset button.

  5. I'm only a humble C programmer, but.... by cdrobbins · · Score: 4, Interesting

    And this is great news, and, indeed, impressive. But my question is, what (if any) change is this going to make to my daily use of linux (for gcc, reading slashdot, and that's about it...) Am I going to notice any performance differences?

    1. Re:I'm only a humble C programmer, but.... by SlimFastForYou · · Score: 5, Funny

      Just wait until Spyware For Linux(TM) comes out... With Bonzai Buddy For Linux(TM), Real Center For Linux(TM), XMMS Agent(TM), Linux Messenger(TM), Linux Update(TM), and FindFast for OpenOffice.org(TM). Then you will know why 100,000 parallel threads in two seconds is a good thing :P.

    2. Re:I'm only a humble C programmer, but.... by mattdm · · Score: 3, Insightful

      Java likes to run many threads very cavalierly, so it's likely to help there somewhat.

    3. Re:I'm only a humble C programmer, but.... by bm_luethke · · Score: 5, Informative

      probably none. On the other hand the field I work in (high performance computing) this will be a great help. Currently we are running a 500,000 processor simulation on a four node cluster, startup and running both is a pain. Remeber, on of the great things about linux is some of the neat/usefull applications being ran on it (human genome, nuclear simulations, fluid simulations). Windows is a toy and geared toward "normal" users (read very few threads not processor intensive). Linux is more of a workhorse (many threads, computationally expensive, and high uptimes). While there are exceptions to this look at advances such as this in that light. And finally, just because you won't use it compiling a kernel doesn't mean it's not needed.

      --
      ------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
    4. Re:I'm only a humble C programmer, but.... by cduffy · · Score: 5, Informative

      KDE actively discourages threads. Perhaps that will change now. Likewise servers, such as apache, will speed up.

      I'm not so sure about that.

      A threaded model doesn't necessarily offer advantages -- Apache's multiprocess model is really just as good on platforms without serious performance penalties on fork(), and Boa (which neither forks nor threads) is much, much faster than either Apache mode (though of course on SMP systems multiple instances must be run to use all the available CPUs).

      Indeed, unless SMP is being taken advantage of, a well-written single-threaded application will always be faster than an equivalent multithreaded application. Such an application has less overhead and is able to jump between its "subprocesses" only when needed -- and without the latencies involved by letting the OS handle said scheduling. Back in the Real World, I still write threaded code -- but because writing unthreaded code (in the problem spaces where threads are useful) is harder, not because it's faster.

    5. Re:I'm only a humble C programmer, but.... by cduffy · · Score: 3, Insightful

      Yes, it still (like everything) depends on your application.

      That said, though, sharing (and putting locks around) your DB connections or script interpreters is an easy way to lose performance and introduce potential deadlocks (or other hard-to-track, hard-to-reproduce bugs due to bad shared state) as opposed to having each process able to operate completely independantly from the others. Shared state is a Good Thing when it's genuinely needed -- but should be avoided when it's not.

      I'm not saying -- and I've never tried to say -- that threading is worthless; I just object to people who take the position that making an application multithreaded will necessarily make it faster.

    6. Re:I'm only a humble C programmer, but.... by pthisis · · Score: 3, Insightful

      I think you are writing as a person who has never had to use either.

      I have written a dynamic content server that over the past 2 years has served over 6 billion requests, with 5 9's of uptime. I've written several realtime instrument control applications. I've written a distributed text mining application that does index-assisted regex searches of 1/2 terabyte of data in Threads can really be life savers when used correctly. Sure you have to implement locking but that's what pthread_mutex is for.

      On low-mem devices making full copies of the process to spawn copies is just insane.

      1) Look up COW and memory sharing.
      2) I never said "use only processes". A combination of processes and event loops is the way to go 99% of the time. There are some corner cases where threads are useful, but they tend to be abused by people who think "threads are good" without considering the alternatives nor the ramifications of that choice.

      And on windows the Thread implementation is *intentional* not accidental. The idea is that people using threads will take advantage of the speed increase.

      It's not a speed increase. Thread switching and thread creation on Windows are slower than process creation and process switching on Linux. On a par, but slower. Process creation on Windows is laughably slow, though, and process switching is substantially slower than thread switching.

      It's not that Windows figured out how to make their threads go fast, it's that their processes were dog-slow and they had to create an entirely seperate execution primitive to get any sort of reasonable concurrency. Linux did things the right way by making them both fast, and now allows you to choose between the two for _design_ reasons (do I want to share memory?) rather than artificial implementation reasons.

      You'll find a lot of knowledgeable people (Larry McVoy, former SGI kernel architect) who echo the same belief: use threads sparingly. Use as many threads as you have CPUs, and use processes instead if that makes more sense. Use more threads than that only if you're intimately familiar with the alternatives and know why they don't work, because while a state machine with non-blocking I/O may seem hard at first glance it'll almost certainly turn out to be easier to implement correctly, easier to debug, faster, and easier to maintain.

      Sumner

      --
      rage, rage against the dying of the light
  6. If you want to destroy my boxen. . . by endeitzslash · · Score: 3, Funny

    Launch 100,000 threads while I walk away. . .

    OK I'll shut up now.

  7. Parallelism by inkfox · · Score: 5, Interesting

    This is very cool; but does it scale to multiple CPU systems? More and more, SMP, split-bus and multi-core architectures are going to be taking over. If this holds up in those environments, Linux may actually have a leg up on some of the dedicated task heavyweights.

    --
    Says the RIAA: When you EQ, you're stealing bass!
  8. Re:Not 100,000 threads in parallel, just 50. by vvikram · · Score: 5, Informative


    Yeah right. And modded to "Informative"? Slashdot moderators are the _pits_.

    Read ingo's reply to Linus. They _did_ start
    one test serially and also _parallelly_ . In short he says that its possible.

    vv

  9. Sounds cool, but all I could think of... by Geek+Tragedy · · Score: 5, Funny

    "Hello, my name is Ingo Molnar. You killed -9 my process: prepare to die."

    Sorry, had to :P

    1. Re:Sounds cool, but all I could think of... by unsinged+int · · Score: 5, Funny

      I think it's more commmonly this:

      "Hello, my name is Ingo Molnar. You kill -9 my parent process. Prepare to vi."

  10. Re:Not 100,000 threads in parallel, just 50. by kinnunen · · Score: 5, Informative
    Read Ingo's posts too:
    actually, that was Ulrich's other test, which tests the serial starting of 100,000 threads. the test i did started up 100,000 concurrent threads which shot up the load-average to a couple of thousands. [the default timeslice the parent has is enough to start more than 50,000 parallel threads a pop or so.]
    And another one:
    Anton tested 1 million concurrent threads on one of his bigger PowerPC boxes, which started up in around 30 seconds. I think he saw a load average of around 200 thousand. [ie. the runqueue was probably a few hundred thousand entries long at times.]
  11. NOOO!!!!! by Monkelectric · · Score: 3, Funny

    At school (before I graduated so long ago) we would "fork bomb" the compute servers [ while(1) do { fork(); } ] in an attempt to extend deadlines or simply be assholes :)

    --

    Religion is a gateway psychosis. -- Dave Foley

    1. Re:NOOO!!!!! by ez76 · · Score: 5, Funny

      I am replying pre-emptively to dissuade the AC's who would otherwise reply to you and point out that your post should not have been modded funny because this innovation would not prevent fork() bombing because it involves spawning threads and not processes.

      I am further replying pre-emptively to dissuade the AC's who would otherwise reply to me and point out my egregious abuse of run-on sentences.

      I am further replying pre-emptively to dissuade the AC's who would otherwise reply to me and point out my egregious abuse of +1 bonus.

      I am further replying pre-emptively to dissuade the AC's who would mod this post down as off-topic because they do not get the parallel allusion to fork-bombing.

    2. Re:NOOO!!!!! by AJWM · · Score: 3, Funny

      I did something like that back in my school days on a dual-CPU Burroughs B6700, but with a twist: Each process forked itself twice, then waited. When it received a signal about a child process being killed, it spawned two more. I had a sleep of a few seconds or so in there so it didn't grow too fast.

      The fun part of that was when the system operators saw the processes replicating like crazy and started to kill them, that made it worse.

      Another fun trick with that machine was to set up a circularly-linked list and invoke the LLLU (linked list lookup) instruction on it...

      (Yeah, stupid things to do. At least I only did them during relatively quiet times.)

      --
      -- Alastair
    3. Re:NOOO!!!!! by inio · · Score: 5, Funny

      Dude, you seriously need to look into writing patents.

  12. Windows by jeffbru · · Score: 3, Interesting

    Just out of curiousity, how does the benchmark in windows compare?

    --
    - Jeff Brubaker
  13. Re:Not 100,000 threads in parallel, just 50. by ergo98 · · Score: 3, Insightful

    Under Linux, thread creation hasn't been much faster than process creation, because process creation was so dang fast.

    That's called "making lemonade out of lemons". Clearly this test has shown that thread creation in Linux was horribly broken, not the flip side that process creation was so wonderfully good.

  14. whoa! by RestiffBard · · Score: 4, Funny

    I have no idea what the hell you're talking about but it certainly sounds impressive. :)

    --
    - /* dead coders leave no comments */
  15. Great by C0D3X · · Score: 3, Funny

    Now we finally have the power to run 99,999 pop up ads when we visit that pr0n site

  16. Re:Windows comparison by pVoid · · Score: 3, Interesting

    Very interestingly enough, either windows has a quota, or some sort of memory leak or something...

    Max I can create in a process is 2031 threads... That being done in 700ms.

    It's odd cause I can create more if I run several processes. It doesn't look like the kernel is choking on thread creation...

    will investigate more.

  17. Re:Not 100,000 threads in parallel, just 50. by the_quark · · Score: 5, Informative

    No, seriously. Process creation under Linux was time-similar to thread creation on other OSs. That's because Linux was as fast at creating *a process* as other OSs are at creating *a thread*. IIRC, threading was initially implemented in Linux from the process-creation methods, so it was similar in speed (the main advantage in Linux from threads was the shared memory space if your application wanted that sort of thing). That's why Apache 2.0 is bringing NT performance more in line with Linux 1.3 performance: NT's threading speed is a lot closer to Linux's forking speed. Again, I'd like to underscore I'm not an expert on this, and it's possible I'm mistaken about relative benchmarks (is NT w/Apache 2.0 a little faster than Linux w/Apache 1.3? Could be...) but I'm very confident of the basic underlying point, that Linux process creation is essentially comparable to other OSs' thread creation, perhaps even faster.

    See, for example, http://www.linux.cu/pipermail/linux-prog/2001-Febr uary/000027.html, just one of the first Google links that popped up when I went looking for proof that I'm not on crack: "Linux newcomers often are unaware of the substantial differences between Linux and other operating systems. To implement concurrency, they use multithreading exclusively, mistakenly assuming as high an overhead associated with Linux multiprocessing as on other platforms." In fact, knowing how fast Linux's process creation is relative to other systems' thread creation makes this even more impressive in my mind. This isn't just a bug fix; much like with process creation before it, Linux is doing something fundamentally better than its counterparts.

    Don't forget: Just because this is /. doesn't mean I'm just a Windows-hating troll. I try to make sure all my Windows-hating-troll-posts are at least backed up by facts. ;)

  18. Re:Alternative headline by Dahan · · Score: 4, Informative
    Gigantic performance problem in Linux code fixed after several years of "many eyes" scanning over it.

    Uh, why did that get moderated as a troll? Oh, right, Linux is absolutely perfect, and anyone who says otherwise must be a troll.

    Come on, Linux's scheduler has long been known to have performance problems once you have a lot of processes/threads... for example, read this paper [text version] (appropriately subtitled "How I Learned to Love the Alpha and Hate the Scheduler"):

    0.8.1 Create a fixed priority scheduler.
    Currently, the Linux scheduler is very different than the traditional Unix schedulers. Although the Linux scheduler is very efficient when only several processes are running, it is not scalable. In order to match the performance of *BSD and other Unices, another scheduling algorithm must be used.
    Moderators, don't be Slashbots, moderating according to the groupthink. Educate yourselves, and you'll be better moderators, and better people.
  19. Re:Windows comparison by Courageous · · Score: 4, Informative

    Very thread uses a minimum of *1 PAGE* of reserve memory for its statck, which is 64K. However, you have to go out of your way to use less than 1 megabyte of reserve memory. Since only 2GB of reserve memory (addressable memory) is available to user applications, this would fit your 2000 thread figure like a glove.

    C//

  20. nice, but... by g4dget · · Score: 4, Interesting

    It's nice that the Linux kernel can handle that many threads. But user level threads generally are even more lightweight, and high performance implementations like those on Solaris provide both user level and kernel level threads and map the former onto the latter. Is Linux going to get something similar? Is Sun perhaps donating their implementation? Or are these new kernel threads so lightweight and quick that they are competitive with Solaris on their own, without the mess and complication of adding user level threads?

    1. Re:nice, but... by Magnus+Reftel · · Score: 4, Informative

      According to a mail from Ingo Molnar halfway down the linked article, M:N threading doesn't really solve the real problem - it's good at switching back and forth between running threads, but the real reason for having very large amounts of threads (be they kernel or user space threads) to begin with, is to do IO, and for that, there is no real advantage of user space threads.

      More info on the 1:1 vs M:N issue can be read in the white paper

      --
      print "Yet another p{erl,ython} hacker\n",
  21. How will this affect Mozilla, OpenOffice... by 3770 · · Score: 4, Interesting

    How will this change affect Mozilla, the Sun JVM and OpenOffice, for instance.

    While it probably is generally true that it will take some time for most applications to start using the new threading model some larger applications could support it fairly soon.

    Can we expect these applications to be adapted to the new threading model some time soon, and how will it affect performance?

    --
    The Internet is full. Go Away!!!
  22. Re:Not 100,000 threads in parallel, just 50. by brianpane · · Score: 4, Informative
    Apache 2.0 doesn't actually do thread creation very frequently. The thread creation cost occurs mostly at startup. So the limiting factors for threaded Apache performance on Linux are mainly:
    • The speed with which the kernel can schedule and context-switch among threads
      For some recent data on this, see http://marc.theaimsgroup.com/?l=apache-httpd-dev&m =103228014211983. The O(1) scheduler patch for 2.4 seems to help here.
    • Memory usage per thread
    • Concurrency limitations of the Apache code itself
      This has been improving gradually with successive 2.0 releases, as the remaining global locks are removed or optimized.
    • General robustness of the thread implementation
      The current (2.4) Linux threading implementation doesn't work well with debuggers.
    At first glance, it looks like the NPTL could be a win for threaded Apache on Linux, as offers some solutions first the first and last of these issues.
  23. How *I* got kicked out of the computer lab by Naikrovek · · Score: 3, Funny

    I ran this in DOS:

    prompt "Enter Password:"

    No one could figure out that all i did was change the prompt from "$P$G" to that, and everyone was asking what the password was. haha, good old teacher was infinitely frustrated as well! IT WAS BEAUTIFUL.

    I got kicked out for a year (not beautiful).

  24. big deal by leomekenkamp · · Score: 4, Funny

    100.000 threads? What nonsense; everybody knows that no computer would ever use more than 640.

    --
    Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
  25. 100000?! by thelexx · · Score: 3, Redundant

    640 should be enough for anybody!

    LEXX

    --
    "Gold still represents the ultimate form of payment in the world." - Alan Greenspan, 1999
  26. Re:Not 100,000 threads in parallel, just 50. by Karellen · · Score: 5, Informative

    It's not process/thread _creation_ times that make the difference, it's the process/thread _context_switch_ times that really mount up, which is where Linux shines.

    And yes, Linux's process context switches are on a par (possibly faster - can't be bothered to look up benchmarks) with NT's thread context switches.

    K.

    --
    Why doesn't the gene pool have a life guard?
  27. Re:Alternative headline by himi · · Score: 3, Insightful

    Alternatively, you might want to consider that Linux's scheduler was very nicely tuned for far and away the most common case - where you have only a small number of running processes.

    Likewise, threading support under Linux has been oriented towards what the developers considered sane: a fairly small number of threads. They had good reasons for considering that the right way to do it - for a start, it worked nicely for what they wanted, and it was sufficiently simple that they didn't have to put in lots of complex code. Further, it's almost never a good idea to have a program architecture that requires very large numbers of threads - it generally only shows up in naive code where people simply don't understand the problems it brings. So, as far as the kernel developers were concerned, stupid people hurting themselves wasn't something to put any effort into amelioriating. This has changed recently, as people have started using Linux in areas where this kind of thing /isn't/ insane, and hence these new developments have come along.

    You need to understand the reasoning behind a lot of these decisions before you can start complaining about them. First and foremost, you simply /have/ to realise that the kernel developers care about how people actually use the system, rather than crappy benchmarketing numbers. These developments have come about because people needed them, and they didn't happen earlier because no one had needed them before. Go back and read the last few years of the lkml archives, and /then/ come back and talk about this kind of thing, when you understand /why/.

    himi

    --

    My very own DeCSS mirror.
  28. Re:Possible use by be-fan · · Score: 4, Insightful

    "use only as many threads as CPUs"
    >>>>>>>>
    Then please stay away from my GUI apps. I hate those UNIX grognards that come from that school of thought, then try to code GUI applications with only one thread and end up with apps that can't update the GUI while doing I/O. On my 300 MHz PII, that particular trait made Galeon unusable. It had one rendering thread for all the tabs, so when I was loading a complex page like /. in another tab, whatever tab I was actually reading would freeze up.

    --
    A deep unwavering belief is a sure sign you're missing something...
  29. Re:Windows comparison by sagei · · Score: 3, Informative

    Each Linux thread has two things of its own: its own stack, which can be as small as 1 or 2 pages if the code to run is simple enough, and also its own task_struct, which is 1 page including kernel stack for the thread.

    This is not true; the kernel stack is two pages in size, i.e. 8KB on i386.

    Also, in 2.5 (where these tests were done), the task_struct is no longer allocated on the stack. It is allocated off the slab cache, while the thread_info struct is on the stack. The task_struct slab object is another ~1.7KB per task.

    Finally, I do not know what the pthreads default stack size is (user-space? what is that?) but it is certainly larger than one page.

    --

    Robert Love

  30. Other similar mischief... by TheLink · · Score: 3, Funny

    Some guys I know copied a Windows error dialog box and set it as a background image for the desktop, centered.

    Imagine the poor victim vainly clicking on the buttons, and getting more and more worried. Said victim actually rebooted the machine to see it reappear, and was not happy when he started to notice the sniggering bunch behind him...

    For example pic:
    http://www.adobe.com/support/techguides/oper atings ystem/windows/winerrors.html
    Probably want to replace CCmail with Explorer or something more dear to heart ;).

    I also installed a bluescreen STOP screensaver on April Fool's day on a colleague's PC. Heh, he was shocked enough to actually called another colleague over and made the usual worried mumbles.

    http://www.sysinternals.com/ntw2k/freeware/blues cr eensaver.shtml

    Since I had admin privs, I was also tempted to have ad.doubleclick.net and similar dns names to resolve to a private webserver which served out custom banner ads.

    Wonder how users would take it if they see the "Staff Meeting at 2pm banner ad". Or "Company Slogan here". Or "Big boss is watching you!". Or for search result sensitive ads: "Stop downloading mp3s/movies/porn!"

    I could actually justify that as a useful application. It's probably more useful than a doubleclick ad...

    But I'd probably need the 100K parallel thread kernel to serve up all those ad banners :).

    Bwahaha!
    Link.

    --
  31. Linus didn't think much of O1 scheduler by jelle · · Score: 3, Interesting

    I remember that Linus made a remark that he tought that the O1 scheduler wouldn't impact Linux much at all, and that its development would not be a biggie for Linux, downplaying the importance of what it can achieve. Go Ingo for keeping at it!

    --
    --- Hindsight is 20/20, but walking backwards is not the answer.
  32. Re:Not 100,000 threads in parallel, just 50. by himi · · Score: 4, Interesting

    The latency issues that cause mp3 skipping under heavy load in Linux have nothing at all to do with context switching, and everything to do with /scheduling/ latency: how long it takes for a process that has work to do to actually get control of the cpu. Context switching has /nothing/ to do with that.

    The low latency patches go through the kernel breaking up areas where spinlocks are held for long periods of time. That's what causes massive scheduling latency in the kernel.

    Context switching under Linux /is/ extremely fast - it's actually been measured (a lot), and it's something the kernel developers pay a lot of attention to and optimise very carefully. They literally count cpu cycles in these code paths. Context switching time is a serious performance limiter in many areas, so getting it right is important, and it's something that Linux does /very/ well.

    Go do some real research before you accuse someone who's right of karma whoring bullshit.

    himi

    --

    My very own DeCSS mirror.