Running 100,000 Parallel Threads
An anonymous reader writes "This story explains how the latest Linux development kernel is now able to start and stop over 100,000 threads in parallel in only 2 seconds (about 14 minutes 58 seconds faster than with earlier Linux kernels)! Much of this impressive work is thanks to Ingo Molnar, author of the O(1) scheduler recently merged with the 2.5 Linux development kernel."
And this is great news, and, indeed, impressive. But my question is, what (if any) change is this going to make to my daily use of linux (for gcc, reading slashdot, and that's about it...) Am I going to notice any performance differences?
This is very cool; but does it scale to multiple CPU systems? More and more, SMP, split-bus and multi-core architectures are going to be taking over. If this holds up in those environments, Linux may actually have a leg up on some of the dedicated task heavyweights.
Says the RIAA: When you EQ, you're stealing bass!
Just out of curiousity, how does the benchmark in windows compare?
- Jeff Brubaker
Very interestingly enough, either windows has a quota, or some sort of memory leak or something...
Max I can create in a process is 2031 threads... That being done in 700ms.
It's odd cause I can create more if I run several processes. It doesn't look like the kernel is choking on thread creation...
will investigate more.
There was a patch for an O(1) scheduler awhile. What this means is it takes the same amount of time to select what runs next and it's not affected by how much is running. But you won't notice an improvement unless you have about 200 processes running at the same time. This may be good for servers, and the like, but it's a lot slower if you have few processes running. Keep this in mind...
It's nice that the Linux kernel can handle that many threads. But user level threads generally are even more lightweight, and high performance implementations like those on Solaris provide both user level and kernel level threads and map the former onto the latter. Is Linux going to get something similar? Is Sun perhaps donating their implementation? Or are these new kernel threads so lightweight and quick that they are competitive with Solaris on their own, without the mess and complication of adding user level threads?
How will this change affect Mozilla, the Sun JVM and OpenOffice, for instance.
While it probably is generally true that it will take some time for most applications to start using the new threading model some larger applications could support it fairly soon.
Can we expect these applications to be adapted to the new threading model some time soon, and how will it affect performance?
The Internet is full. Go Away!!!
" - - libpthread should now be much more resistant to linking problems: even if the application doesn't list libpthread as a direct dependency functions which are extended by libpthread should work correctly."
This ought to be a big help for those of us who write plug-in modules for servers like Apache 1.x and PHP. The existing thread library doesn't work properly unless the program executable explicitly links to it, which means that my shared libraries can't take advantage of standard thread management such as pthread_atfork().Given that Apache 2.x can utilise threads as well as processes, does this mean that you can configure a large web server with, say "MaxSpareThreads 1000000" so that you can cope when you're slashdotted ;-)?
What is the fundamental reason select/poll should be that much faster anyway? Well, you win the context switch-times, if you can handle many clients in a tick. But on the other hand it does affect the way you need to design the code, and doing some stuff that neveer stalls withouot threads might be tricky.
Just imagine a situation where a thread might need to calculate something, or initialize a big array. Now, if it's run under a select-loop, you need to do that in parts to avoid starving the server. With threads, you just do the trick and don't care about the rest of the world which keeps serving the clinets, no matter how long youo stay in the functino.
And does this mean the Java will start to really scale on linux?
>Are these completely independant, and competing, projects? Can these two groups work together and complement each other?
It is exactly this library that Redhat is hoping to stop. IBM's is a good library, but it is heavy. It will also complicate threads by moving part of the scheduling into user space. That is not neccesarily a bad thing, just bigger and more complicated. Redhat's should be thinner than the current pthreads lib. One thing that I have hated about pthreads is it's use of signals for some control. Now, if I read this correct, Redhat has moved all the control into the kernel which means the kernel handles it all.
Nobody ever said that linux-specific behavior is POSIX-compliant. Last I heard, POSIX is not about the specifics of any given UNIX-compatible or class of system. Rather, it attempts to be the abstraction and distillation of those class of systems, as codified by The Open Group. Please correct me if I am wrong in this idea. Linux simply simply "aims to be..." POSIX-compliant, as promulgated by the LSB, the FHS, et al. --
That all said, I totally agree with you -- especially regarding cancellation points, fork(), and documentation.
Please bear in mind that much of this behavior will be inherited from whatever libc it it compiled against. IMO, this simply shows the power of C, nothing else.
The above scenario simply points out the differences between OpenGroup/POSIX and GNU/FSF... if things like that "bug" you (no pun intended, seriously), then perhaps you should recompile with whatever "-- posixly-correct" options you have available.
And yes, I have a copy of the SUSV3 spec right here, in fact.
C|N>K
I can't seem to find any info on whether Linux core files still produce one core file per thread or just one core file per process (as does Solaris). Has `gdb' been enhanced to handle multithreaded programs (or multithreaded core file) on Linux? If I have a thousand threads - I sure don't want 1000 core files in the event of a crash. Is there a way around this?
See here ( http://lwn.net/Articles/9632/ )
and here ( http://lwn.net/Articles/10248/ )
--Linus is being pigheaded about this patch, wanting to "keep the code simple" instead of implementing Ingo's **fast** + Fixed solution.
To quote LWN:
[ So it's fast - though a few extra features have been requested. But this patch has stirred up a bit of a debate. Rather than put in a complicated new PID allocator, it is asked, why not just make the maximum PID be very large? Then, in theory, the quadratic part of get_pid() will never run so the performance problems go away, and the code stays simpler. Linus prefers this approach, as do a number of other developers; he has put a simple patch along these lines into his pre-2.5.37 BitKeeper tree.
Ingo disagrees, pointing out that any reasonable maximum PID size can be exceeded eventually. He would rather fix the problem than try to hid it behind a large process ID space. In the absence of real-world examples that show people being bitten by get_pid()'s behavior in a larger PID space, though, Linus appears unlikely to accept any more complicated fix.
]
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
I remember that Linus made a remark that he tought that the O1 scheduler wouldn't impact Linux much at all, and that its development would not be a biggie for Linux, downplaying the importance of what it can achieve. Go Ingo for keeping at it!
--- Hindsight is 20/20, but walking backwards is not the answer.
The latency issues that cause mp3 skipping under heavy load in Linux have nothing at all to do with context switching, and everything to do with /scheduling/ latency: how long it takes for a process that has work to do to actually get control of the cpu. Context switching has /nothing/ to do with that.
/is/ extremely fast - it's actually been measured (a lot), and it's something the kernel developers pay a lot of attention to and optimise very carefully. They literally count cpu cycles in these code paths. Context switching time is a serious performance limiter in many areas, so getting it right is important, and it's something that Linux does /very/ well.
The low latency patches go through the kernel breaking up areas where spinlocks are held for long periods of time. That's what causes massive scheduling latency in the kernel.
Context switching under Linux
Go do some real research before you accuse someone who's right of karma whoring bullshit.
himi
My very own DeCSS mirror.