Running 100,000 Parallel Threads
An anonymous reader writes "This story explains how the latest Linux development kernel is now able to start and stop over 100,000 threads in parallel in only 2 seconds (about 14 minutes 58 seconds faster than with earlier Linux kernels)! Much of this impressive work is thanks to Ingo Molnar, author of the O(1) scheduler recently merged with the 2.5 Linux development kernel."
I frequently hear people bitching about pthread lib and how f*cked up it is... is this going to change the way we use thread too? :-)
The linux song
Arbitrary sig
M$ is suddenly also capable of doing the same...
this image springs to mind
frits lysdexic boewflu pr0st!
It takes two seconds to start 100,000 threads???? Piff! With my ME computer, It doesn't matter how many parallel threads I am running... I can stop them all instantly by simply attempting to use my computer :P.
Wow, Slashdot may have had over 100,000 threads too. But then it took more than five years.
And this is great news, and, indeed, impressive. But my question is, what (if any) change is this going to make to my daily use of linux (for gcc, reading slashdot, and that's about it...) Am I going to notice any performance differences?
Fantastic job, well done. Every little step counts, one day linux will be the primary OS on everybody's desktop!!!
Launch 100,000 threads while I walk away. . .
OK I'll shut up now.
This is very cool; but does it scale to multiple CPU systems? More and more, SMP, split-bus and multi-core architectures are going to be taking over. If this holds up in those environments, Linux may actually have a leg up on some of the dedicated task heavyweights.
Says the RIAA: When you EQ, you're stealing bass!
Imagine a Beowulf cluster of such Linux developer kernels!
"This story explains how the latest Linux development kernel is now able to start and stop over 100,000 threads in parallel in only 2 seconds..."
;-)
I didn't know the Linux kernal was a mother-to-be....
Got a link for that?
So now I'm able to open up 100.000 pr0n pictures in just 2 sec. Ubercool ;-)
Thomas S. Iversen
It's called "pulling the power cable."
// file: mice.h
#include "frickin_lasers.h"
warning goatse link! don't click!
Why so many threads? "Because we can :)"
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
How does this compare to other OSes, such as Solaris, NT, OSX, etc?
"Hello, my name is Ingo Molnar. You killed -9 my process: prepare to die."
:P
Sorry, had to
At school (before I graduated so long ago) we would "fork bomb" the compute servers [ while(1) do { fork(); } ] in an attempt to extend deadlines or simply be assholes :)
Religion is a gateway psychosis. -- Dave Foley
Not much longer. Microsoft is going to be brought down by the next generation of IIS worms.
Just out of curiousity, how does the benchmark in windows compare?
- Jeff Brubaker
No, this is a goatse.cx link.
I'm building a project where there will be one huge database with up to 200 different companies connected to it pretty much nonstop. 1-10 users from every company depending on the time of the year. 2 threads for every connection.
200*10*2=4000 threads.
Could you please refrain from using "boxen". It makes my head hurt
Im not here now... Im out KILLING pepperoni
man, that chick is hot. Hot grits, anoyone?
I have no idea what the hell you're talking about but it certainly sounds impressive. :)
-
Now we finally have the power to run 99,999 pop up ads when we visit that pr0n site
Very interestingly enough, either windows has a quota, or some sort of memory leak or something...
Max I can create in a process is 2031 threads... That being done in 700ms.
It's odd cause I can create more if I run several processes. It doesn't look like the kernel is choking on thread creation...
will investigate more.
Normally I am of the "use only as many threads as CPUs" school of thought, but I can think of a reason to use 100,000 threads - imagine a large FTP server, or a multi-homed HTTP server, where you need to provide each connected user with his own set of access privileges or filesystem context. A one-thread-per-connection server may be the easiest way to build security into the system.
3 step plan:
1. write multithreaded web server
2. ???
3. PROFIT!
so this means Gary Kasparov can get beat at chess that much faster now?
I hate sigs.
It's much interested to have so many processes,
not threads in UNIX-like system...
Leave threads for those Window-ers...
There was a patch for an O(1) scheduler awhile. What this means is it takes the same amount of time to select what runs next and it's not affected by how much is running. But you won't notice an improvement unless you have about 200 processes running at the same time. This may be good for servers, and the like, but it's a lot slower if you have few processes running. Keep this in mind...
I thought Linux didn't have real threads, and they were implemented as processes... Am I missing something?
Uh, why did that get moderated as a troll? Oh, right, Linux is absolutely perfect, and anyone who says otherwise must be a troll.
Come on, Linux's scheduler has long been known to have performance problems once you have a lot of processes/threads... for example, read this paper [text version] (appropriately subtitled "How I Learned to Love the Alpha and Hate the Scheduler"):
Moderators, don't be Slashbots, moderating according to the groupthink. Educate yourselves, and you'll be better moderators, and better people.I suppose this means that sites will want to switch to Linux/Apache in order to avoid being incapacitated when linked by Slashdot?
Very thread uses a minimum of *1 PAGE* of reserve memory for its statck, which is 64K. However, you have to go out of your way to use less than 1 megabyte of reserve memory. Since only 2GB of reserve memory (addressable memory) is available to user applications, this would fit your 2000 thread figure like a glove.
C//
It's nice that the Linux kernel can handle that many threads. But user level threads generally are even more lightweight, and high performance implementations like those on Solaris provide both user level and kernel level threads and map the former onto the latter. Is Linux going to get something similar? Is Sun perhaps donating their implementation? Or are these new kernel threads so lightweight and quick that they are competitive with Solaris on their own, without the mess and complication of adding user level threads?
How will this change affect Mozilla, the Sun JVM and OpenOffice, for instance.
While it probably is generally true that it will take some time for most applications to start using the new threading model some larger applications could support it fairly soon.
Can we expect these applications to be adapted to the new threading model some time soon, and how will it affect performance?
The Internet is full. Go Away!!!
...will start writing horrible monsters running hundreds and thousands of threads, and their creations will suffer from all other shortcomings of that decision.
Contrary to the popular belief, there indeed is no God.
That is just sooo typical. Just when I thought I'd be superl33t and, like, "underground" by trying a more minimalistic approach installing Win 3.11 on my superfly dual P4, you guys go ahead and spoil it all by telling me something about 100.000 concurrent threads on something called "Linux", which at least to me seems to be some kind of binary Viagra? You ruined all my plans for world domination. You guys are no fun.
But rest assured, I'll try again - with Win95.
(I'm sorry. I had to do it.)
I do security
And the nice thing is that the improvement was brought to us by that company everyone loves to hate, no not microsoft, but Red Hat.
I ran this in DOS:
prompt "Enter Password:"
No one could figure out that all i did was change the prompt from "$P$G" to that, and everyone was asking what the password was. haha, good old teacher was infinitely frustrated as well! IT WAS BEAUTIFUL.
I got kicked out for a year (not beautiful).
Your days of being ignored are over, dude.
100.000 threads? What nonsense; everybody knows that no computer would ever use more than 640.
Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
" - - libpthread should now be much more resistant to linking problems: even if the application doesn't list libpthread as a direct dependency functions which are extended by libpthread should work correctly."
This ought to be a big help for those of us who write plug-in modules for servers like Apache 1.x and PHP. The existing thread library doesn't work properly unless the program executable explicitly links to it, which means that my shared libraries can't take advantage of standard thread management such as pthread_atfork().As far as I can tell, the current use of "threads" mostly boils down to a faster way to fork(). From an algorithmic point of view, not all that interesting.
In any case, hats off to the Linux developers for filling out the features checklist.
Given that Apache 2.x can utilise threads as well as processes, does this mean that you can configure a large web server with, say "MaxSpareThreads 1000000" so that you can cope when you're slashdotted ;-)?
Sun's user level threads package has been available for years as the the Sun lwp library. It is very fast and very portable. Use it with poll() or select() and some state machine glue, and you can implement almost any "threaded" algorithm which you can imagine.
640 should be enough for anybody!
LEXX
"Gold still represents the ultimate form of payment in the world." - Alan Greenspan, 1999
May I have a cheesecake?
In any large group of people you will find a few idiots, a few luminaries, and a great number of average thinkers. Sometimes the only thing that separates idiots from luminaries is their lack of social grace. Welcome to democracy.
"I have opinions of my own, strong opinions, but I don't always agree with them." -- George H. W. Bush
Or perhaps know which part of the banana peel is the good part to smoke? =) GOOD JOB OPEN SOURCE!!! KEEP AT IT!
We've missed you man. Welcome back. Have some nice hot grits.
That was an interesting read.
640 threads must be enough for everyone.
don't forget that everyone can use the very simple and efficient native linux threads using the clone() sys call (see man clone)
since there is less overhead than using the more complicated Posix API, clone threads will always be faster.
Good job Ingo Molnar! Fantastic performance rise but how about stability, is it less or more stabile? i hope that atleast same level of stability.
Pulsed Media Seedboxes
This ought to make RedHat, Dell, IBM, and Oracle very happy, given a few of the newer contracts with large retailers using Oracle's back-end... if you read the article closely you notice that RH takes the claim for sponsoring a bunch of the work involved in developing this.
C|N>K
Combine this with Apache2's Multi-threaded or Hybrid MPM and you'll have a heck of a web-server!
And does this mean the Java will start to really scale on linux?
Alternatively, you might want to consider that Linux's scheduler was very nicely tuned for far and away the most common case - where you have only a small number of running processes.
/isn't/ insane, and hence these new developments have come along.
/have/ to realise that the kernel developers care about how people actually use the system, rather than crappy benchmarketing numbers. These developments have come about because people needed them, and they didn't happen earlier because no one had needed them before. Go back and read the last few years of the lkml archives, and /then/ come back and talk about this kind of thing, when you understand /why/.
Likewise, threading support under Linux has been oriented towards what the developers considered sane: a fairly small number of threads. They had good reasons for considering that the right way to do it - for a start, it worked nicely for what they wanted, and it was sufficiently simple that they didn't have to put in lots of complex code. Further, it's almost never a good idea to have a program architecture that requires very large numbers of threads - it generally only shows up in naive code where people simply don't understand the problems it brings. So, as far as the kernel developers were concerned, stupid people hurting themselves wasn't something to put any effort into amelioriating. This has changed recently, as people have started using Linux in areas where this kind of thing
You need to understand the reasoning behind a lot of these decisions before you can start complaining about them. First and foremost, you simply
himi
My very own DeCSS mirror.
Scalability is a good thing, no doubt about that. However, there is another aspect that should be pointed out: the current thread API in linux is quite different from the POSIX specification and somewhat crufty. Just to mention the biggest problems: ... All in all, linux threads really need much better integration with the standard system API. A lot of applications could profit from multithreading. Just think of GUI responsiveness. Also, using threads makes some programming tasks much easier. No need for asynchronous hostname lookup, for example.
missing cancellation points: testing whether a thread has been cancelled should be done in lots of system calls, but linux pthreads do not support this. Instead, you have to call pthread_testcancel() before and after every such call. A real drag.
signal handling: linux pthread signal handling is very different from the POSIX specification. However, proper signal handling is crucial for any real world application.
fork() will not work as expected. This is a real nuissance if you want proper daemon behaviour for your application.
documentation of linux-specific behaviour is poor. As a result, most of the existing literature on thread programming is pretty useless for linux.
All these points can be worked around, for sure. Nevertheless, it makes writing portable software a nightmare. Porting threaded software to linux, well
A solid, well documented, standard conforming threads implementation will make linux a much nicer environment for serious programming than it already is. I am really looking forward to this.
sig intentionally left blank
[Ed: long list of requirements deleted]
Once all these prerequisites are met compiling glibc should be easy.
Phew!
Ur browser, naa, it were the World Wide Web who started it all
e b. html
http://www.w3.org/People/Berners-Lee/WorldWideW
Okay, where did you come up with the 64K figure, and also the 1 mega (megi) byte figure?
All Intel processors have 4KiB pages. Each Linux thread has two things of its own: its own stack, which can be as small as 1 or 2 pages if the code to run is simple enough, and also its own task_struct, which is 1 page including kernel stack for the thread. So all in all, you need 12KiB for each thread. Multiplying with the 100000 figure you get 1200000KiB or 1.144GiB, which is quite affordable for a 2GiB system.
Then, with NGPT (Next-Generation Posix Threads), those 100,000 threads would be in user space and may be even cheaper.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
That's a typical Linux zealot answer: "if Linux doesn't implement it properly, you don't need it, and if you need it you're an idiot".
I don't buy that. If I want to have a server with 10,000 clients, I want to be completly free to implement that with threads... Wasn't Linux supposed to shine on the server side? Likewise, if I want to create a browser with one thread per window, then let me be.
I don't want to be lectured by fascist arrogant pigs who are too lazy to implement correctly a proper Operating System (whose goal is to adapt to a wide variety of different uses). Fuck you. Have a nice day.
I think we need to pull some old stats out of our ass. This paper is about athe 2.2.x kernel. Correct me if I'm wrong, but hasn't there been massive overhauling of the 2.4.x and 2.5.x kernels in the scheduling area?
I think I'll just slam XP performance based off of NT benchmarks and aricles. What the hell, thier both from MS the argument must be a valid.
Get a grip!
-- Many men would appreciate a woman's mind more if they could fondle it
The 1 MB is the default stack size for every Posix thread. It takes some effort to determine what the smallest valid stack size since PTHREAD_STACK_MIN doesn't specify enough space for the start_func stack frame. The parent post merely stated that the default is 1MB and you have to work to lower that, and he is correct. If the grand-parent poster was just creating threads without specify a stack size, he would run out of RAM pretty quickly.
I wouldn't want to guess where the 64k figure came from.
Ingo:...Anton tested 1 million concurrent threads on one of his bigger PowerPC boxes, which started up in around 30 seconds. I think he saw a load average of around 200 thousand. [ie. the runqueue was probably a few hundred thousand entries long at times.]
Wow.. this is pretty good.The ability to spawn & run 1 million concurrent threads should keep even the most demanding users happy for a few years...
OTOH, I hope this post doesn't become the butt of jokes a few months from now ("and you thought 1 million was a lot! Ha! My Palm 5000XL does more than that!")...
Yes, of course! Everybody knows that the only reason "every lamer with no design knowledge" doesn't do that now is because they take into account the shortcomings of running 50,000 parallel threads.
This crap gets a +2?
You need to understand the reasoning behind a lot of these decisions before you can start complaining about them. First and foremost, you simply /have/ to realise that the kernel developers care about how people actually use the system, rather than crappy benchmarketing numbers.
That's a typical Linux zealot answer: "if Linux doesn't implement it properly, you don't need it, and if you need it you're an idiot".
I don't buy that. If I want to have a server with 10,000 clients, I want to be completly free to implement that with threads... Wasn't Linux supposed to shine on the server side? Likewise, if I want to create a browser with one thread per window, then let me be.
The goal of linux isn't to be the perfect OS for every single task. If you think that I have a bridge to sell you.
The goal of linux is to be the best it can be.
That's what's happening here. It is much like ignoring somebody asking a question you do not know the answer to.
Today's development environments do not do a great job of autmatically parallelizing code, and there are very few outstanding multithreading programmers in this world. Thank goodness one of them is working on this for Linux. The improvement in threading support is critical for Linux to scale in shared memory, multi-processor environments (SMP, NUMA), but will only be important for certain applications. The typical Slashdot reader uses a dual processor Linux system at best, where excellent multithreading is not necessary.
- Portable MP3 Players (Done before)
- Net shotting guns (1960s James Bond movies)
- Build your own sub woofer (My friend built a 500mm (1'8") X 500mm X 1000mm(3'4") Sub Woofer in 1988 and then put it in his Ford Transit)
- Tiny Linux boxen (Seen 1000s of these and *BSD boxen as well)
But 100,000 Threeds on Linux now that's impressive, too bad it won't make one iota of a difference to most of us who use Linux for just reading-Jasa -- Linux - The SOURCE will be with you, ALWAYS
If you want to do stupid things with your programs, that's fine by the kernel developers. Just don't expect /them/ to bend over backwards to make /your/ stupid design work as well as you want it to. That's your problem, and no one elses.
himi
My very own DeCSS mirror.
Since I absolutely suck at getting kernels from source to work correctly (I never get everything in there that I need I guess), the question is: When does all this great stuff reach production? (To then be pre-packaged by RedHat, et. al.)
Acts 17:28, "For in Him we live, and move, and have our being."
(about 14 minutes 58 seconds faster than with earlier Linux kernels)
. Ok, it is a genuine and serious question I have:
Were these 15 minutes extra responsaible for the extra painfull
long start times of apps like Mozilla, and Openoffice?
If so, as soon as I upgrade my
distro, I will boot it into 2.5.
-><- no
Your egregious use of the word "egregious".
No one ever had to evacuate a city because the solar panels broke!
See subject. A useful 'heads up' post for folks like myself who tend to assume that Linux will follow the general Un*x-family behaviours we're familiar with from the commercially-sold variants.
;) check this assumption if I were to do some significant implementation for the Linux platform.
And yes, I would of course
yeah, had to say it, first time I do :)
Hahahahahahahahahahaha!
Gawd, I didn't expect that at all LOL. I swear I have tears rolling down my cheeks because I'm laughing so hard!
My name is Ingo Molnar.
:)
You kill my father - prepare to die.
er... sorry about that, I won't do it again
-nwp
User-level threads cannot take advantage of multiple CPUs. True, they are somewhat faster on a single CPU system due to lower overhead, but that's all they are good for.
___
If you think big enough, you'll never have to do it.
ACE is nice for big systems.
But it's also way overkill for small stuff. It's a whole distributed framework, not a wrapper around pthreads.
May we never see th
It's a Windows limit, and it's in the documentation.
C//
The 64K page size is Windows' page size. I can only assume that the poster stating that the intel hardware page size is 4K. I would suppose this means that a Window's (2K,NT) page of 64K is assembled from 16 hardware pages, then. The Windows' page size of 64K is in their documentation. I never paused to think about how this interfaces with hardware pages...
C//
that's what the power buttun is for.
Actually Microsoft stole the name Internet Explorer. They were sued, the company eventually
went bankrupt and microsoft settled out of court.
Currently in Linux every thread is assigned a distinct process ID, and as such, a process has as many entries in `top' and `ps' as it has threads. This makes it difficult to monitor processes externally, or even see the other processes' information. Has this issue been addressed? (I realize this is a user-space program issue, not a kernel issue).
so that the moderation can't be swayed by bad moderation. As it is, any given post gets affected by at most about 10 moderators.
I can't seem to find any info on whether Linux core files still produce one core file per thread or just one core file per process (as does Solaris). Has `gdb' been enhanced to handle multithreaded programs (or multithreaded core file) on Linux? If I have a thousand threads - I sure don't want 1000 core files in the event of a crash. Is there a way around this?
Now finally systems can handle the huge demand for all those millions of .NET Web Services out there.
As I am in complete awe of the state of GNU. Where and how may I easily contribute (with focus) the only meaningful contribution I have....money?
I cannot express in words how the efforts of so many , for so many, resonates with my soul.
I am always offended by those whom wish to call these contributors "communists" when in the course of battle giving one's life is "heroic". So why is giving just a piece of ones live "communistic"?
I need a place where GNU projects layout their plans, budgets, and paypal accounts so I can participate in a meaningful way.
To every contributor I have just two words,
THANK YOU!
Okay, you're wrong. This O(1) scheduler in 2.5.x is the "massive overhauling." (Yes, the patch has been around for a while... but as the article says, it's only recently been merged into 2.5)
Wow, is Linux finally becoming a Real Unix OS? I won't see 50 "processes" when I do a 'ps' on a machine running Java applications, which causes less-than-aware users to claim it's using 500 Megs of memory and 10 processes?
Thank God, but why so long? HP-UX, Solaris, etc... have had a working, stable M:N implementation forever, and it worked. Linux's whole crap about "Eh, we're smarter than they are we'll use kernel threads but make them faster" was until now a load of crap - hubris, if you ask me.
Anyway, thank freaking God they finally have it working right.
The fact that YOU associate M$ to Microsoft only proves him right.
Err, Windows NT does use the native 4KB page size on Intel, but is designed to be expandable to systems with up to a 64KB page size. As a result, certain operations (like the reserve mapping that goes on for the thread stack) aligns data in 64KB increments. IIRC, there is also 64KB of virtual slack between memory mapped objects as well.
A deep unwavering belief is a sure sign you're missing something...
Each Linux thread has two things of its own: its own stack, which can be as small as 1 or 2 pages if the code to run is simple enough, and also its own task_struct, which is 1 page including kernel stack for the thread.
This is not true; the kernel stack is two pages in size, i.e. 8KB on i386.
Also, in 2.5 (where these tests were done), the task_struct is no longer allocated on the stack. It is allocated off the slab cache, while the thread_info struct is on the stack. The task_struct slab object is another ~1.7KB per task.
Finally, I do not know what the pthreads default stack size is (user-space? what is that?) but it is certainly larger than one page.
yeah, had to say it, first time I do :(
I feel sorry for his girlfriend. She must be totally wiped out...then maybe not:)
Hello.
The grandparent started at 2 because the author wasn't a cocksucking AC.
Thanks for playing.
- El Generoso
See me at http://goatse.cx/
will Mac OS Java shape up!?
Not only is there no version of J2DK1.4 for PPC(in Mac OS X or Debian/PPC), but the Mac OS X version takes a long time to load.
I reboot to Linux and run Forte over X(11R6) using my server's processor and ram(768mb as opposed to 128mb). I'd use OS X if I could get an X(11R6) server to work, but neither XDarwin nor Fink will last more than a minute before crashing.
Can this new technology be integrated into OS X or is it part of a static library that the JDK must be built with? Sun is infamous for building against outdated dynamic library, I had to 'ln -s' a library manually when installing the JDK.
You can't judge a book by the way it wears its hair.
Some guys I know copied a Windows error dialog box and set it as a background image for the desktop, centered.
r atings ystem/windows/winerrors.html ;).
s cr eensaver.shtml
:).
Imagine the poor victim vainly clicking on the buttons, and getting more and more worried. Said victim actually rebooted the machine to see it reappear, and was not happy when he started to notice the sniggering bunch behind him...
For example pic:
http://www.adobe.com/support/techguides/ope
Probably want to replace CCmail with Explorer or something more dear to heart
I also installed a bluescreen STOP screensaver on April Fool's day on a colleague's PC. Heh, he was shocked enough to actually called another colleague over and made the usual worried mumbles.
http://www.sysinternals.com/ntw2k/freeware/blue
Since I had admin privs, I was also tempted to have ad.doubleclick.net and similar dns names to resolve to a private webserver which served out custom banner ads.
Wonder how users would take it if they see the "Staff Meeting at 2pm banner ad". Or "Company Slogan here". Or "Big boss is watching you!". Or for search result sensitive ads: "Stop downloading mp3s/movies/porn!"
I could actually justify that as a useful application. It's probably more useful than a doubleclick ad...
But I'd probably need the 100K parallel thread kernel to serve up all those ad banners
Bwahaha!
Link.
- Consider the pace of Free Software development.
- Consider that the article was based on a study using the 2.2.5 kernel.
- Consider that that paper was from THREE years ago!
'Nuff said.sco and solaris both can create threads 10,000 times faster then the current linux kernels according to sun's and sco's marketing departments. My guess is that this was exagurated but is one of the benefits of the big unix's. Heavily threaded linux apps have been rumoured to fly on unixware where they would run slower on their own native platforms! I guess Linux is maturing in this aspect. Does anyone who knows anything about unix/linux threading care to comment? I wonder if this will help linux in server environments.
http://saveie6.com/
I've created over 200,000 process on a PIII 550 laptop with 256 mb of ram running Windows XP. Of course, it took a while (swapping).
The process is called nothing.exe. Source Code: int WinMain(...) {Sleep(INFINITE);}
I work at a lab, so I also ran it on a Compaq 8-way with 4-GB of ram. It worked but I don't remember how fast it went.
However, there is a big gnarley limit in Windows that will limit the # of processes: the amount of memory allocated to virtual desktops or something. We researched it -- Look it up. This is why you get limited to a few thousand processes or threads if they all do GUI stuff. The bad thing is basically any function you call in user32 will register the thread as a GUI thread. It explains it all in the book Inside Windows 2000.
Not meaning to troll, I'm just going to share basic fact: It sucks that Windows threads are so expensive, but tens of thousands of threads *DOES* suck (read: thread per client) on Windows. However, this is not the same thing as saying Windows doesn't scale -- you just have to code it differently. (Check out how many SQL Server uses when it's processing thousands of clients.) Stuff like IO Completion ports, AWE memory, and Scatter/Gather IO is the way that you have to go.
Just because you *can* create hundreds of thousands of threads, doesn't mean it's a good idea or that your app won't run like shit on a 32-CPU machine!
i've tried to bring 2.5.37 up on 5 different machines, and they all crash anywhere from "OK, booting the kernel..." (hard lock) to getting all the way down to loading SCSI drivers, and getting "Powering off device 0." and then locking up.
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
I think I know the limit you are talking about: it's a handle limit in the GDI subsystem.
As for the 200k processes taking time to launch, it is quite normal, as launching a process is much more heavy than just launching a thread.
The 2k threads I created were created in 700ms. which is very acceptable in my books.
And to confirm, yes, creating so many threads ain't the best idea.
Someone else mentionned thread pools as being a workaround, but only a workaround. I personally think thread pools are actually a way of doing things, and not a workaround for slow thread creation. In fact there are new WinNT APIs for thread pooling.
yada yada... I don't think anyone will actually ever read this post =)
In the systems programming world threading, thread
scheduling, and signal processing in threads, was always considered Linux' primary weakness; and was the main strength of Solaris, especially for applications running in the telecom space. But with this announcement, I can see Solaris' last tech superority over Linux crumbling.
I think Sun will need to quicken its re-invention pace.
-j
Yes.
Minimum loadable Memory Section in windows is 64K. I guess a thread creation creates a new stack on a newly created section boundary.
Hidden in the article was a reference to a new locking primitive, futex. I don't see a manpage on line for it, though. Where is this documented?
I can't believe how insane tux is, Ingo just continues to make Linux what it is. Keep up the good work Ingo.
-R.Dietrich
See here ( http://lwn.net/Articles/9632/ )
and here ( http://lwn.net/Articles/10248/ )
--Linus is being pigheaded about this patch, wanting to "keep the code simple" instead of implementing Ingo's **fast** + Fixed solution.
To quote LWN:
[ So it's fast - though a few extra features have been requested. But this patch has stirred up a bit of a debate. Rather than put in a complicated new PID allocator, it is asked, why not just make the maximum PID be very large? Then, in theory, the quadratic part of get_pid() will never run so the performance problems go away, and the code stays simpler. Linus prefers this approach, as do a number of other developers; he has put a simple patch along these lines into his pre-2.5.37 BitKeeper tree.
Ingo disagrees, pointing out that any reasonable maximum PID size can be exceeded eventually. He would rather fix the problem than try to hid it behind a large process ID space. In the absence of real-world examples that show people being bitten by get_pid()'s behavior in a larger PID space, though, Linus appears unlikely to accept any more complicated fix.
]
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
I remember that Linus made a remark that he tought that the O1 scheduler wouldn't impact Linux much at all, and that its development would not be a biggie for Linux, downplaying the importance of what it can achieve. Go Ingo for keeping at it!
--- Hindsight is 20/20, but walking backwards is not the answer.
The threads issue needs to be solved, and
soon. We are using Java with Linux
and get regular hangs. Conversations with
IBM's Java support indicates that
this is a problem with the Linux kernel,
Java thread design, and underlying
thread libraries on Linux. And no,
we are not running thousands of
threads, just two Java programs
on a 2 CPU SMP machine.
We eagerly await a fix.
If you can start that many threads per second then that is one more reason to just use processes instead of bothering with threads. But how much longer does it take process A to tell process B to set variable X to value Y than for thread A to just set B's X to Y?
So,in other words when it comes to comparing threads, size does matter.
Just wanted to say yay for hungarians.
That's the sound of M:N threading whizzing past your head.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Hardware page size is 4KB, as was noted elsewhere. The key element that I haven't seen mentioned is that Windows' virtual memory system has several ways to 'allocate' memory. There's reserving pages, and there's committing pages. In the case where you tell the OS you want memory, it reserves pages. That is to say, it does not actually take memory from the free physical memory, but instead creates a contiguous address space large enough for your request, but allocates no hardware RAM at those addresses.
When you commit a page, either through accessing a page (read or write) that is not allocated, it trips a hardware fault if the VM hasn't mapped a page to the address, which then searches for a free page, then links them together.
The end result is, even if Windows does try to create 64k worth of memory segment space for a process, unless it is actually reading or writing to a byte in each 4k chunk, its internal VM will not allocate physical memory for the whole 64k. Furthermore, there's no such advantage or realistic way for the operating system to align anything in memory physically, except in AGP ram. The VM system handles physical pages of memory exclusively, but does not manage AGP-allocated memory (IIRC). In other words, though the OS can align the address space to anything it likes, the OS layer cannot request any physical allocation mapping or alignment. So that comment about aligning memory for processes is quite unlikely.
Now, the XBox (which runs a variant of the Win2k kernel) has a bit more control over VM, but it also does not support demand paging, so it cannot swap to the hard disk and give you RAM+HD effective memory. Shame, that. But, as a result, you have an API that allows hardware level allocation control. Still, the OS doesn't take advantage of it, AFAIK. It's for developers.
Any connection between your reality and mine is purely coincidental.
In otherwords, I've read tons of articles about all the fancyness being incorperated into the .5 kernel.
.4 kernel (or linux at all rather), is a mistake for a serious production server.
When is it expected that it becomes stable? how long do I have to wait?
The more I read about this, the more I feel going with the
The end result is, even if Windows does try to create 64k worth of memory segment space for a process, unless it is actually reading or writing to a byte in each 4k chunk, its internal VM will not allocate physical memory for the whole 64k.
Yes. Quite true. I hade a problem a while back on Windows which took me a bit of reading through the documentation (and verifying with some low level sys calls) to determine that what was happening is that I was running out of "reserve memory". Which is to say that, while I had plenty of physical memory left, all the address space had been used up. You can do this very easily by creating thousands of threads on your computer. To get a large number of these threads, you'll have to push the default stack size to its minimum, 64K. I was a bit disatisfied with this minimum, but I suppose I'll live with it now (or port to linux) if I have to, or upgrade to a 64 bit os if it becomes a practical limit in the future.
C//
Err, Windows NT does use the native 4KB page size on Intel, but is designed to be expandable to systems with up to a 64KB page size. As a result, certain operations (like the reserve mapping that goes on for the thread stack) aligns data in 64KB increments
That's boneheaded. Linux supports page sizes up to at least 4MB, but it doesn't align everything on 4MB boundries on the off chance that you might be using 4MB pages. It uses the appropriate alignments for the page sizes actually in use.
An OS that has dropped all support for non-Intel hardware citing a portability concern which doesn't exist in portable OSes? As they say in Snatch, "It's spurious, mate. Not genuine."
Sumner
rage, rage against the dying of the light
One of the nice things about Linux. You don't have to live with any of these 32-bit limitations if your application is big enough to justify 64-bitness. While Microsoft had NT running on Alpha, I understand it was essentially still a 32-bit OS - it was only truly ported to 64 bits when Itanium support was added. Linux, on the other hand, has had true 64-bit implementations running since '94 or '95, so you can be fairly confident that the niggling little 32-bit-isms have mostly been caught by now.
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
My thread-creation benchmark can create 100,000 threads in 10 seconds on my 800MHz Linux machine at work. No idea what kernel it's running, but I'm sure it's not recent. Furthermore, my 450MHz home machine running FreeBSD 4.5 runs the same benchmark in only five seconds. WTF?
> Finally, I do not know what the
> pthreads default stack size is
> (user-space? what is that?) but it is
> certainly larger than one page.
Why it needs to be larger than one page? The kernel will trap access to page faults due to stack overflow, and will allocate additional stack to it anyway.
Yes, I know you are right. Amongst other things, I won't be stuck with 64K per thread stack in Linux, and as you say, I could use 64 bit alpha linux. I'm looking forward to Hammer, actually.
C//
Why it needs to be larger than one page? The kernel will trap access to page faults due to stack overflow, and will allocate additional stack to it anyway.
It does not need to be bigger than one page, it just is. You are right, the stack is expanded via implicit mmap as it grows... but for performance reasons the default stack is usually measured in megabytes, not pages.
Anything but the simplest of applications would use a page rather quickly. User-space applications are programmed to assume they have any size stack they want. Local variables are huge.
In short, I was just commenting on the default. It can surely be lowered...
I don't understand what the issue is here.
I was able to run 1,600,000 simultaneous connections with a modified FreeBSD kernel, in June of 2001. Couldn't get much work done, but at about 300 baud per conection, after dividing up a gigabit ethernet link... you shouldn't expect to do much work.
Without modifications, after a patch to the credential reference counting (since committed to FreeBSD 4.5), as long as a stock kernel is tuned correctly, it can still *easily* handle 100,000 simultaneous connections (16K of window space for each connection = 1.6G of mbufs).
-- Terry
So? Use non-blocking I/O instead. Problem solved.
-- Terry
No you will see a pid per thread because, that is how the scheduler knows to schedule things. The getpid() c library call from within the program. When they said it is a 1-to-1 mapping that means that there is a process per thread. Just look when you see all those proccesses with the same name, and see if they have the exact same memory usage. If they do it means they are using the same memory and are threads. No matter how you implement threads there has to be more than one proccess other wise when the program blocks for I/O all threads would be blocked.
One day people will learn the folly of Winbloze, Linux Rules!
Aren't we all? (:
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
Hey, dork! We've never seen anyone use a redirect link to the goatse.cx site before. Wow, you must be, like, you know, like, rilly brite. Gosh, me wants be smirt lyke ewe.
Pain is merely failure leaving the body
> It does not need to be bigger than one
> page, it just is.
At least that isn't what suggested by the documentation of linuxthreads (in Debian testing). In E.5 it says the following, implying that the default stack size is really just 1 page.
E.5: Does LinuxThreads implement pthread_attr_setstacksize() and pthread_attr_setstackaddr()?
These optional functions are provided in recent versions of LinuxThreads (0.8 and up). Earlier releases did not provide these optional components of the POSIX standard.
Even if pthread_attr_setstacksize() and pthread_attr_setstackaddr() are now provided, we still recommend that you do not use them unless you really have strong reasons for doing so. The default stack allocation strategy for LinuxThreads is nearly optimal: stacks start small (4k) and automatically grow on demand to a fairly large limit (2M). Moreover, there is no portable way to estimate the stack requirements of a thread, so setting the stack size yourself makes your program less reliable and non-portable.
Except this things ARE NOT stupid. In no way.
...run Ada 83 programs.
But while their threads will be slow, they will be to handle the text the users are entering; vastly more useful than the most optimized eight-bit character horror you would turn out.
Trolling is supposed to be:
1. Fast! Writing random mild insults almost a week after the original posting isn't as great as making a real-time flamewar immediately after posting.
2. Accessible to a potential reader. Referring to an obscure recurring theme of my rants made months away from this article (byte-value transparency of protocols vs. Unicode references in RFCs) would require a potential troll spectator a lot of googling before he will be able to appreciate your comment.
Contrary to the popular belief, there indeed is no God.