Running 100,000 Parallel Threads
An anonymous reader writes "This story explains how the latest Linux development kernel is now able to start and stop over 100,000 threads in parallel in only 2 seconds (about 14 minutes 58 seconds faster than with earlier Linux kernels)! Much of this impressive work is thanks to Ingo Molnar, author of the O(1) scheduler recently merged with the 2.5 Linux development kernel."
The linux song
Arbitrary sig
Your answer:
http://www.cs.wustl.edu/~schmidt/ACE.html
This is so far the best library I have used for pthread programming. Powerful, easy to use, and encapsulates message passing really well...
this image springs to mind
It takes two seconds to start 100,000 threads???? Piff! With my ME computer, It doesn't matter how many parallel threads I am running... I can stop them all instantly by simply attempting to use my computer :P.
And this is great news, and, indeed, impressive. But my question is, what (if any) change is this going to make to my daily use of linux (for gcc, reading slashdot, and that's about it...) Am I going to notice any performance differences?
Launch 100,000 threads while I walk away. . .
OK I'll shut up now.
This is very cool; but does it scale to multiple CPU systems? More and more, SMP, split-bus and multi-core architectures are going to be taking over. If this holds up in those environments, Linux may actually have a leg up on some of the dedicated task heavyweights.
Says the RIAA: When you EQ, you're stealing bass!
Yeah right. And modded to "Informative"? Slashdot moderators are the _pits_.
Read ingo's reply to Linus. They _did_ start
one test serially and also _parallelly_ . In short he says that its possible.
vv
"Hello, my name is Ingo Molnar. You killed -9 my process: prepare to die."
:P
Sorry, had to
At school (before I graduated so long ago) we would "fork bomb" the compute servers [ while(1) do { fork(); } ] in an attempt to extend deadlines or simply be assholes :)
Religion is a gateway psychosis. -- Dave Foley
Just out of curiousity, how does the benchmark in windows compare?
- Jeff Brubaker
Under Linux, thread creation hasn't been much faster than process creation, because process creation was so dang fast.
That's called "making lemonade out of lemons". Clearly this test has shown that thread creation in Linux was horribly broken, not the flip side that process creation was so wonderfully good.
I have no idea what the hell you're talking about but it certainly sounds impressive. :)
-
Now we finally have the power to run 99,999 pop up ads when we visit that pr0n site
Very interestingly enough, either windows has a quota, or some sort of memory leak or something...
Max I can create in a process is 2031 threads... That being done in 700ms.
It's odd cause I can create more if I run several processes. It doesn't look like the kernel is choking on thread creation...
will investigate more.
No, seriously. Process creation under Linux was time-similar to thread creation on other OSs. That's because Linux was as fast at creating *a process* as other OSs are at creating *a thread*. IIRC, threading was initially implemented in Linux from the process-creation methods, so it was similar in speed (the main advantage in Linux from threads was the shared memory space if your application wanted that sort of thing). That's why Apache 2.0 is bringing NT performance more in line with Linux 1.3 performance: NT's threading speed is a lot closer to Linux's forking speed. Again, I'd like to underscore I'm not an expert on this, and it's possible I'm mistaken about relative benchmarks (is NT w/Apache 2.0 a little faster than Linux w/Apache 1.3? Could be...) but I'm very confident of the basic underlying point, that Linux process creation is essentially comparable to other OSs' thread creation, perhaps even faster.
r uary/000027.html, just one of the first Google links that popped up when I went looking for proof that I'm not on crack: "Linux newcomers often are unaware of the substantial differences between Linux and other operating systems. To implement concurrency, they use multithreading exclusively, mistakenly assuming as high an overhead associated with Linux multiprocessing as on other platforms." In fact, knowing how fast Linux's process creation is relative to other systems' thread creation makes this even more impressive in my mind. This isn't just a bug fix; much like with process creation before it, Linux is doing something fundamentally better than its counterparts.
/. doesn't mean I'm just a Windows-hating troll. I try to make sure all my Windows-hating-troll-posts are at least backed up by facts. ;)
See, for example, http://www.linux.cu/pipermail/linux-prog/2001-Feb
Don't forget: Just because this is
Uh, why did that get moderated as a troll? Oh, right, Linux is absolutely perfect, and anyone who says otherwise must be a troll.
Come on, Linux's scheduler has long been known to have performance problems once you have a lot of processes/threads... for example, read this paper [text version] (appropriately subtitled "How I Learned to Love the Alpha and Hate the Scheduler"):
Moderators, don't be Slashbots, moderating according to the groupthink. Educate yourselves, and you'll be better moderators, and better people.Very thread uses a minimum of *1 PAGE* of reserve memory for its statck, which is 64K. However, you have to go out of your way to use less than 1 megabyte of reserve memory. Since only 2GB of reserve memory (addressable memory) is available to user applications, this would fit your 2000 thread figure like a glove.
C//
It's nice that the Linux kernel can handle that many threads. But user level threads generally are even more lightweight, and high performance implementations like those on Solaris provide both user level and kernel level threads and map the former onto the latter. Is Linux going to get something similar? Is Sun perhaps donating their implementation? Or are these new kernel threads so lightweight and quick that they are competitive with Solaris on their own, without the mess and complication of adding user level threads?
How will this change affect Mozilla, the Sun JVM and OpenOffice, for instance.
While it probably is generally true that it will take some time for most applications to start using the new threading model some larger applications could support it fairly soon.
Can we expect these applications to be adapted to the new threading model some time soon, and how will it affect performance?
The Internet is full. Go Away!!!
- The speed with which the kernel can
schedule and context-switch among threads
m =103228014211983.
The O(1) scheduler patch for 2.4 seems to help
here.
- Memory usage per thread
- Concurrency limitations of the Apache code
itself
- General robustness of the thread
implementation
At first glance, it looks like the NPTL could be a win for threaded Apache on Linux, as offers some solutions first the first and last of these issues.For some recent data on this, see http://marc.theaimsgroup.com/?l=apache-httpd-dev&
This has been improving gradually with successive 2.0 releases, as the remaining global locks are removed or optimized.
The current (2.4) Linux threading implementation doesn't work well with debuggers.
I ran this in DOS:
prompt "Enter Password:"
No one could figure out that all i did was change the prompt from "$P$G" to that, and everyone was asking what the password was. haha, good old teacher was infinitely frustrated as well! IT WAS BEAUTIFUL.
I got kicked out for a year (not beautiful).
100.000 threads? What nonsense; everybody knows that no computer would ever use more than 640.
Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
640 should be enough for anybody!
LEXX
"Gold still represents the ultimate form of payment in the world." - Alan Greenspan, 1999
It's not process/thread _creation_ times that make the difference, it's the process/thread _context_switch_ times that really mount up, which is where Linux shines.
And yes, Linux's process context switches are on a par (possibly faster - can't be bothered to look up benchmarks) with NT's thread context switches.
K.
Why doesn't the gene pool have a life guard?
Alternatively, you might want to consider that Linux's scheduler was very nicely tuned for far and away the most common case - where you have only a small number of running processes.
/isn't/ insane, and hence these new developments have come along.
/have/ to realise that the kernel developers care about how people actually use the system, rather than crappy benchmarketing numbers. These developments have come about because people needed them, and they didn't happen earlier because no one had needed them before. Go back and read the last few years of the lkml archives, and /then/ come back and talk about this kind of thing, when you understand /why/.
Likewise, threading support under Linux has been oriented towards what the developers considered sane: a fairly small number of threads. They had good reasons for considering that the right way to do it - for a start, it worked nicely for what they wanted, and it was sufficiently simple that they didn't have to put in lots of complex code. Further, it's almost never a good idea to have a program architecture that requires very large numbers of threads - it generally only shows up in naive code where people simply don't understand the problems it brings. So, as far as the kernel developers were concerned, stupid people hurting themselves wasn't something to put any effort into amelioriating. This has changed recently, as people have started using Linux in areas where this kind of thing
You need to understand the reasoning behind a lot of these decisions before you can start complaining about them. First and foremost, you simply
himi
My very own DeCSS mirror.
"use only as many threads as CPUs" /. in another tab, whatever tab I was actually reading would freeze up.
>>>>>>>>
Then please stay away from my GUI apps. I hate those UNIX grognards that come from that school of thought, then try to code GUI applications with only one thread and end up with apps that can't update the GUI while doing I/O. On my 300 MHz PII, that particular trait made Galeon unusable. It had one rendering thread for all the tabs, so when I was loading a complex page like
A deep unwavering belief is a sure sign you're missing something...
Each Linux thread has two things of its own: its own stack, which can be as small as 1 or 2 pages if the code to run is simple enough, and also its own task_struct, which is 1 page including kernel stack for the thread.
This is not true; the kernel stack is two pages in size, i.e. 8KB on i386.
Also, in 2.5 (where these tests were done), the task_struct is no longer allocated on the stack. It is allocated off the slab cache, while the thread_info struct is on the stack. The task_struct slab object is another ~1.7KB per task.
Finally, I do not know what the pthreads default stack size is (user-space? what is that?) but it is certainly larger than one page.
Some guys I know copied a Windows error dialog box and set it as a background image for the desktop, centered.
r atings ystem/windows/winerrors.html ;).
s cr eensaver.shtml
:).
Imagine the poor victim vainly clicking on the buttons, and getting more and more worried. Said victim actually rebooted the machine to see it reappear, and was not happy when he started to notice the sniggering bunch behind him...
For example pic:
http://www.adobe.com/support/techguides/ope
Probably want to replace CCmail with Explorer or something more dear to heart
I also installed a bluescreen STOP screensaver on April Fool's day on a colleague's PC. Heh, he was shocked enough to actually called another colleague over and made the usual worried mumbles.
http://www.sysinternals.com/ntw2k/freeware/blue
Since I had admin privs, I was also tempted to have ad.doubleclick.net and similar dns names to resolve to a private webserver which served out custom banner ads.
Wonder how users would take it if they see the "Staff Meeting at 2pm banner ad". Or "Company Slogan here". Or "Big boss is watching you!". Or for search result sensitive ads: "Stop downloading mp3s/movies/porn!"
I could actually justify that as a useful application. It's probably more useful than a doubleclick ad...
But I'd probably need the 100K parallel thread kernel to serve up all those ad banners
Bwahaha!
Link.
I remember that Linus made a remark that he tought that the O1 scheduler wouldn't impact Linux much at all, and that its development would not be a biggie for Linux, downplaying the importance of what it can achieve. Go Ingo for keeping at it!
--- Hindsight is 20/20, but walking backwards is not the answer.
The latency issues that cause mp3 skipping under heavy load in Linux have nothing at all to do with context switching, and everything to do with /scheduling/ latency: how long it takes for a process that has work to do to actually get control of the cpu. Context switching has /nothing/ to do with that.
/is/ extremely fast - it's actually been measured (a lot), and it's something the kernel developers pay a lot of attention to and optimise very carefully. They literally count cpu cycles in these code paths. Context switching time is a serious performance limiter in many areas, so getting it right is important, and it's something that Linux does /very/ well.
The low latency patches go through the kernel breaking up areas where spinlocks are held for long periods of time. That's what causes massive scheduling latency in the kernel.
Context switching under Linux
Go do some real research before you accuse someone who's right of karma whoring bullshit.
himi
My very own DeCSS mirror.