Running 100,000 Parallel Threads

Posix thread... by alexandre · 2002-09-21 17:26 · Score: 1

I frequently hear people bitching about pthread lib and how f*cked up it is... is this going to change the way we use thread too? :-)

Re:Posix thread... by Wolfier · 2002-09-21 17:29 · Score: 5, Informative

Your answer:

http://www.cs.wustl.edu/~schmidt/ACE.html

This is so far the best library I have used for pthread programming. Powerful, easy to use, and encapsulates message passing really well...
Re:Posix thread... by Anonymous Coward · 2002-09-21 22:54 · Score: 1, Interesting

>Are these completely independant, and competing, projects? Can these two groups work together and complement each other?

It is exactly this library that Redhat is hoping to stop. IBM's is a good library, but it is heavy. It will also complicate threads by moving part of the scheduling into user space. That is not neccesarily a bad thing, just bigger and more complicated. Redhat's should be thinner than the current pthreads lib. One thing that I have hated about pthreads is it's use of signals for some control. Now, if I read this correct, Redhat has moved all the control into the kernel which means the kernel handles it all.
Re:Posix thread... by smittyoneeach · 2002-09-22 00:03 · Score: 2

So maybe there is a heavyweight library for some applications, and a lighter weight one for common use.
Probably you do the light one, and include it in the heavy when required.
Ah, the one-size-fits-all thought process...

--
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
Re:Posix thread... by DrunkenPenguin · 2002-09-22 00:38 · Score: 2, Funny

..actually.

Your answer:

http://www.linux.ncsu.edu/lug/lectures/rpm-pres/mg p00033.html

This is so true to all of us ;)

Hold this thread while I walk away by DoctorHibbert · 2002-09-21 17:27 · Score: 3, Funny

The linux song

--
Arbitrary sig

100,000 Linux threads by Anonymous Coward · 2002-09-21 17:31 · Score: 5, Funny

this image springs to mind

Re:100,000 Linux threads by notanatheist · 2002-09-21 17:42 · Score: 2, Funny

are those M$ employees looking for code?
Re:100,000 Linux threads by Citizen+of+Earth · 2002-09-21 18:35 · Score: 2

this image springs to mind

Is that red splatter on the ground the remains of Bill Gates?

Win ME Kicks that sorry statistic!!!! by SlimFastForYou · 2002-09-21 17:32 · Score: 4, Funny

It takes two seconds to start 100,000 threads???? Piff! With my ME computer, It doesn't matter how many parallel threads I am running... I can stop them all instantly by simply attempting to use my computer :P.

Re:Win ME Kicks that sorry statistic!!!! by CoolVibe · 2002-09-21 18:12 · Score: 4, Funny

Pff... I can start a million threads on my FreeBSD box and stop them all in an instant...
...by hitting the reset button.
Re:Win ME Kicks that sorry statistic!!!! by grmoc · 2002-09-21 19:11 · Score: 1

I betcha your BSD box doesn't have enough memory/address space to start up 1 million threads... (since I'm guessing its running on x86 hardware...)...

I'm only a humble C programmer, but.... by cdrobbins · 2002-09-21 17:34 · Score: 4, Interesting

And this is great news, and, indeed, impressive. But my question is, what (if any) change is this going to make to my daily use of linux (for gcc, reading slashdot, and that's about it...) Am I going to notice any performance differences?

Re:I'm only a humble C programmer, but.... by SlimFastForYou · 2002-09-21 17:39 · Score: 5, Funny

Just wait until Spyware For Linux(TM) comes out... With Bonzai Buddy For Linux(TM), Real Center For Linux(TM), XMMS Agent(TM), Linux Messenger(TM), Linux Update(TM), and FindFast for OpenOffice.org(TM). Then you will know why 100,000 parallel threads in two seconds is a good thing :P.
Re:I'm only a humble C programmer, but.... by cdrobbins · 2002-09-21 17:44 · Score: 1, Offtopic

My above comment was moderated as a troll, and yeah, maybe it sounded like that. But it's a serious question. I'd like to know what benefits us normal uses will see.
Re:I'm only a humble C programmer, but.... by mattdm · 2002-09-21 17:58 · Score: 3, Insightful

Java likes to run many threads very cavalierly, so it's likely to help there somewhat.
Re:I'm only a humble C programmer, but.... by bm_luethke · 2002-09-21 18:11 · Score: 5, Informative

probably none. On the other hand the field I work in (high performance computing) this will be a great help. Currently we are running a 500,000 processor simulation on a four node cluster, startup and running both is a pain. Remeber, on of the great things about linux is some of the neat/usefull applications being ran on it (human genome, nuclear simulations, fluid simulations). Windows is a toy and geared toward "normal" users (read very few threads not processor intensive). Linux is more of a workhorse (many threads, computationally expensive, and high uptimes). While there are exceptions to this look at advances such as this in that light. And finally, just because you won't use it compiling a kernel doesn't mean it's not needed.

--
------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
Re:I'm only a humble C programmer, but.... by Citizen+of+Earth · 2002-09-21 18:31 · Score: 2

But my question is, what (if any) change is this going to make to my daily use of linux... Am I going to notice any performance differences?

My question is why does the multithreading in Mozilla suck so badly on Linux and will this help it?
Re:I'm only a humble C programmer, but.... by Jester99 · 2002-09-21 18:39 · Score: 1, Offtopic

And why wouldn't you want a mod system?

If you want to view what everybody posts, just set your default viewing threshold to -1. Simple as that. I found that scanning at +4 typically lets me get a good sense of things if I'm short on time. If I've got more time to spend, then I view at a lower threshold, like +2. If there's an interesting looking thread, then I'll view that whole thread.

However, I simply don't have the time to cut through all the noise to the signals on my own. Without the moderation system, I would just not be able to read comments manageably, at all. And that's just the truth.

The mod system does do a decent job of reducing the S:N ratio, on balance.
Re:I'm only a humble C programmer, but.... by Subcarrier · 2002-09-21 18:55 · Score: 2

But my question is, what (if any) change is this going to make to my daily use of linux...?

Well, for one thing, you're now going to have to start typing a helluva lot faster. The machine is not going to slow you down. ;-)

In truth, this is great news for those running servers but you probably won't notice much of a difference on a desktop, barring a few really thread heavy applications. UML (User Mode Linux) is one notorious example.

--
"I have opinions of my own, strong opinions, but I don't always agree with them." -- George H. W. Bush
Re:I'm only a humble C programmer, but.... by bbtom · 2002-09-21 23:09 · Score: 1

Even worse than that, the infernal Comet Cursor...

--
catch (HumourFailureException e) { e.user.send("You, sir, are a humourless idiot."); }
Re:I'm only a humble C programmer, but.... by Chops · 2002-09-21 23:26 · Score: 2

The performance improvement won't mean much, but the POSIXization of the thread library might make a difference. Linux's thread support has up till now been pretty kludgy (signal handlers per-thread instead of per-process, wrong coredumps, etc.), and that made things like debugging threaded programs difficult; you may have run into this with gdb or whatever. Now that the Right Things have been coded in all over the map (kernel/libc/gcc/etc), we can drop the kludge and start doing it right.
Re:I'm only a humble C programmer, but.... by cduffy · 2002-09-22 01:30 · Score: 5, Informative

KDE actively discourages threads. Perhaps that will change now. Likewise servers, such as apache, will speed up.

I'm not so sure about that.

A threaded model doesn't necessarily offer advantages -- Apache's multiprocess model is really just as good on platforms without serious performance penalties on fork(), and Boa (which neither forks nor threads) is much, much faster than either Apache mode (though of course on SMP systems multiple instances must be run to use all the available CPUs).

Indeed, unless SMP is being taken advantage of, a well-written single-threaded application will always be faster than an equivalent multithreaded application. Such an application has less overhead and is able to jump between its "subprocesses" only when needed -- and without the latencies involved by letting the OS handle said scheduling. Back in the Real World, I still write threaded code -- but because writing unthreaded code (in the problem spaces where threads are useful) is harder, not because it's faster.
Re:I'm only a humble C programmer, but.... by Mark+J+Tilford · 2002-09-22 01:36 · Score: 1

On a single CPU computer, yes. What about on an SMP computer?

--
-----------
100% pure freak
Re:I'm only a humble C programmer, but.... by cduffy · 2002-09-22 02:04 · Score: 1

On a SMP box, you'll generally get best performance if your application can run a standalone, unsynchronized, entirely separate process on each CPU.
Re:I'm only a humble C programmer, but.... by joib · 2002-09-22 03:07 · Score: 2

But then again, in the Real World (TM), different processes/threads often need to communicate with each other (for ex. scientific applications), or save memory by sharing stuff like script interpreters, db connections etc. (for eg. web servers).
Re:I'm only a humble C programmer, but.... by cduffy · 2002-09-22 04:03 · Score: 3, Insightful

Yes, it still (like everything) depends on your application.

That said, though, sharing (and putting locks around) your DB connections or script interpreters is an easy way to lose performance and introduce potential deadlocks (or other hard-to-track, hard-to-reproduce bugs due to bad shared state) as opposed to having each process able to operate completely independantly from the others. Shared state is a Good Thing when it's genuinely needed -- but should be avoided when it's not.

I'm not saying -- and I've never tried to say -- that threading is worthless; I just object to people who take the position that making an application multithreaded will necessarily make it faster.
Re:I'm only a humble C programmer, but.... by ajs · 2002-09-22 05:19 · Score: 2

You are correct. To state that in more specific terms, threads are a "big hammer" that can be applied to the need to manage multiple resources at once. In my experience (on general purpose hardware) you can always optimize that resource management better in a single process than the kernel can by performing lightweight context switching.
Re:I'm only a humble C programmer, but.... by clubin · 2002-09-22 07:42 · Score: 1

"How about a system that scores posts based on the amount of replies they recieve?"

So, you want "flamebait" to receive the most karma? It's the insightful comments that receive the least replies, as they would most likely just "yeah, you're right" posts (save for the few that are further insightful, with thought inspired by the original). There is no sensible, reply-count-based system of rating.

However, given your taste in posts, maybe you should join kuro5hin, where one might say that posts that get people talking (and thus thinking, hopefully) are the most desired.
Re:I'm only a humble C programmer, but.... by Fjord · 2002-09-22 08:44 · Score: 2

I'd have agreed with you yesterday, but these improvements could change that. Something to consider. Kind of like the time (back in 98, I think) I realized my application ran faster in floating point than in fixed point, changes to the infrastructure can change the way you approach problems.

--
-no broken link
Re:I'm only a humble C programmer, but.... by diablovision · 2002-09-22 09:00 · Score: 2, Insightful

"Indeed, unless SMP is being taken advantage of, a well-written single-threaded application will always be faster than an equivalent multithreaded application."

Two words: Blocking IO. You are correct that multithreading imposes an inevitable overhead on CPU intensive tasks running on a single processor machine, but most applications are not processor bound. The fact is that almost all applications that do anything besides scientific work have large portions of their execution times used by blocking on IO. Multiple threads allow the time spent waiting on IO in one thread to be spent doing something else "useful" in another thread--provided your OS supports native threads (if not, one thread can block an entire process).

"...but because writing unthreaded code (in the problem spaces where threads are useful) is harder, not because it's faster."

Isn't this almost a tautology? Restating: "In the areas where threads make things easier, it is easier to use threads than to not use threads."

--
120 characters isn't enough to explain it.
Re:I'm only a humble C programmer, but.... by pclminion · 2002-09-22 14:59 · Score: 2

KDE actively discourages threads. Perhaps that will change now.
I'm only guessing, but the reason KDE discourages threads is probably because it's a real bitch to write a truly thread-safe library, and they don't want to fuck with it.
In other words, it probably isn't because of performance.
If there are any core KDE developers reading, please correct me if I'm wrong.
Re:I'm only a humble C programmer, but.... by Jay+L · 2002-09-22 15:19 · Score: 2

Two words: Blocking IO.

Right. If you are going to use a single-threaded process, you must use non-blocking I/O.
Re:I'm only a humble C programmer, but.... by cduffy · 2002-09-22 20:55 · Score: 2

Two words: Blocking IO.

Three words: Non-blocking IO. The APIs exist (such as the POSIX AIO standard) -- they're just rarely used.

Isn't this almost a tautology? Restating: "In the areas where threads make things easier, it is easier to use threads than to not use threads."

"Useful" and "easier" aren't quite the same.

There are places where threads are useful -- because the application demands it -- and places where threads are easier -- because the programmer doesn't have the time, will, knowledge or need to define a nonthreaded solution even if such a solution would be more efficient.
Re:I'm only a humble C programmer, but.... by ajs · 2002-09-23 02:19 · Score: 2

changes to the infrastructure can change the way you approach problems.

Yes, certainly. However, thread management has more headaches than can be hidden by the kernel and libraries easily. At least some of that overhead evidences in your userspace program. In many cases when people use threads, they're really just not thinking about their application. That's fine if performance is not a strong concern, but when it is (as evidenced by recent work in the Web server arena), threads should just be a tool in your box, along with many other techniques.
Re:I'm only a humble C programmer, but.... by JohnsonJohnson · 2002-09-23 04:07 · Score: 1

There is one sigificant caveat to your assertion that a single threaded application will always outperform a multithreaded version. "a well-written single-threaded application that never makes a blocking system call (such as I/O) while there is useful worrk to be done will always be faster than an equivalent multithreaded application", emphasis mine. Since a large number of useful applications: DB's, mail clients, browsers, processor simulators, web servers etc. have exactly this kind of behaviour it is very desirable.

To cover my own ass, I will add one more caveat. This perfomance advantage only shows up in thoughput on a balanced workload. It is trivial to create "tests" which cripple a multithreaded application by finding a code path which causes all threads to block on a shared resource (ie force a DB to write to tables that have little disk/memory locality simultaneously) which a single threaded app composed to deal with that situation will show higher performance in. Single threaded apps using that architecture generally fail in more balanced (you can argue whether that's real world or not) situations.
Re:I'm only a humble C programmer, but.... by cduffy · 2002-09-23 05:34 · Score: 2

a well-written single-threaded application that never makes a blocking system call (such as I/O) while there is useful work to be done will always be faster than an equivalent multithreaded application

Since when did I suggest the use of blocking I/O? Any single-threaded application of this sort needs to use non-blocking I/O as a matter of course. See POSIX AIO for one non-blocking I/O API.

Finally... you mention that error states are possible in threaded applications, and can be exercised by appropriately written code ("tests" which cripple a multithreaded application). Personally, having a number of almost impossible to find deadlocks hidden away in my system until some end-user chances upon one makes me very, very nervous.
Re:I'm only a humble C programmer, but.... by JohnsonJohnson · 2002-09-23 07:01 · Score: 1

Since when did I suggest the use of blocking I/O? Any single-threaded application of this sort needs to use non-blocking I/O as a matter of course. See POSIX AIO for one non-blocking I/O API.

Use of nonblocking I/O is in general almost as difficult as writing good multithreaded code. In effect the developer has to reimplement the logic of a multithreaded application in user space and wrap it around the nonblocking I/O calls. For applications such as a log file writer of course using nonblocking I/O without extra logic to check the results is fine though. Consider the case of a plant controller though, where multiple sensors could be triggering events for multiple controlers asynchronously. In that case a multithreaded design may make for simpler code than writing (and optimizing) what is in effect a thread scheduler for a single threaded application. In short, unless one does not read the results of a previous write, there is no such thing as truly nonblocking, I/O. You will always have to check a semaphore to determine whether it is safe to read (or have some subsytem below implement the logic).

Finally... you mention that error states are possible in threaded applications, and can be exercised by appropriately written code ("tests" which cripple a multithreaded application). Personally, having a number of almost impossible to find deadlocks hidden away in my system until some end-user chances upon one makes me very, very nervous.

Now you are taking liberties with my statements. I did not say multithreaded code necessarily contains an error state. I was thinking of the case of a TPC like test. One common architecture for a multithreaded application is to have a pool of threads available which can perform all tasks available to the system. If the "benchmark" causes all the available threads to be contending for a single resource (like a write to the same record) then the application may be unavailable for future requests thus reducing throughput. A single threaded application will generally batch requests, so the multiple writes will be condensed to a single write request and the throughput will be higher. Nothing prevents a multithreaded application from doing the same thing, but you are correct the overhead in communication required in a multithreaded application would probably prevent it from equalling a single threaded app's performance on such a test. Deadlocks are a challenge because the most popular current languages have poor support for writing multithreaded applications (compare the relatively unpopular Ada95 to C/C++ for example). Deadlocks are not a necessity anymore than unstructured assembly code, but just as higher level languages encourage better structured code, newer languages encourage writing of reentrant, nonlocking code (Java for example strikes a reasonable middle ground between Ada and C as far as multiple threads go).
Re:I'm only a humble C programmer, but.... by tomstdenis · 2002-09-23 07:33 · Score: 1

Not as easy as you think. Most socket stuff is blocking to some degree and even still the "timed" stuff is platform dependent. At least with pthreads [which is nicely portable] you can work with trivial socket code in multiple threads.

Also things like web servers don't work nicely as single threaded applications. If you have ever written one [I have, http://tom.iahu.ca] its very simple in a threaded model.

Tom

--
Someday, I'll have a real sig.
Re:I'm only a humble C programmer, but.... by Jay+L · 2002-09-23 08:05 · Score: 2

Not as easy as you think.

Well, I think it's not very easy, so it IS as easy as I think. It might not be as easy as someone else thinks, though. But it gets easier with practice and good libraries.

At AOL, nearly all of our servers were single-threaded, based on a standard kernel of state-managing, event-calling support functions. We had non-blocking sockets, non-blocking database I/O, non-blocking DNS, really everything with the possible exception of local disk I/O (which is rarely necessary on a production server, and which can be solved with a cluster of "worker" processes doing the actual I/O).

I worked on the mail system, so that's the part I know best. The difference in performance between sendmail (which forks multiple processes) and our own mail server (which ran single-threaded) was nothing short of astounding. Our days-late delivery problems disappeared almost instantly. This was on Suns and HPs; granted, we're now talking about forked processes rather than threads, and I don't know how the Sun and HP schedulers compare to Linux's, but the main point is that it's possible to write such a complex app single-threaded. I think the only significant thread-based app at AOL is AOLServer, which was developed independently.

Writing single-threaded servers certainly takes skill and experience. Given the state-management problem, I had assumed that it was also more error-prone than writing threaded code, but from what I am learning about thread-safety, that may not be the case. But, no matter how little overhead you have doing a context switch, you have even less without it. At some performance level, that matters.
Re:I'm only a humble C programmer, but.... by cduffy · 2002-09-23 08:14 · Score: 1

Use of nonblocking I/O is in general almost as difficult as writing good multithreaded code.

Absolutely. As I said, writing good singlethreaded code is often harder than writing multithreaded code for a similar function. However, it's also faster. My entire argument here is that threading is unwise as a performance-driven decision, not that threading is a poor idea on any other grounds (except for the deadlock-related potshots, which I'm willing to withdraw).

In that case a multithreaded design may make for simpler code than writing (and optimizing) what is in effect a thread scheduler for a single threaded application.

Absolutely -- I never intended to imply that a good single-threaded solution would be simplest of those options available. Note, however, that libraries implementing such a scheduler are already available, so one need not implement it ground-up. See Cheap Threads for one example.
Re:I'm only a humble C programmer, but.... by tomstdenis · 2002-09-23 09:08 · Score: 1

Yeah note how you are writing code to work on one platform. Try writing a POSIX compatible web server with sockets then watch as it doesn't work on half the platforms because "non-blocking" is something they don't support.

I still can't imagine how anyone could think its a better solution. For example, consider a web server. When I submit a request with POST data I send a content-length. Now instead of just doing multiple recv()'s until you get it all your going to have some global structure which will patch the data as select() determines its available.

Its *possible* to not use threads and get the same job done its just not as conceptually simple. For example, in the threaded case you'd have on function to receive all of the POST data. In the non-threaded model you'd have two functions. One function which calls select() waiting for activity and another which delegates the handler for each active socket.

On multi-threaded apps thread-swapping amounts to saving the registers and loading new ones. Its not entirely a difficult task [so to speak, hahaha punny]. Unless you have like a couple 1000 active threads you're not going to notice.

Tom

--
Someday, I'll have a real sig.
Re:I'm only a humble C programmer, but.... by pthisis · 2002-09-23 13:18 · Score: 2

Not as easy as you think

But far easier than multithreaded programming.

Threads are a way of saying "screw protected memory". They should be used only when you don't want memory protection within your application. Almost always, using threads is the wrong choice; multiple processes and/or a state machine with non-blocking I/O(depending on the problem) will accomplish the same ends as efficiently* and are much easier to implement**

*Remember that processes (COEs which don't share memory) are nearly as fast as threads in Linux, and faster in some cases. On other OSes (Irix, Solaris, Windows), processes are inefficient and threads are implemented more efficiently. That's a horrible hack to make up for ridiculously heavyweight processes; it's true that a small number of things can be optimized in a thread implementation (setting up VM mappings), but the actual speed implications of that are negligible in real-life programs. I'd be extremely surprised if you could find even one program exhibiting a measurable speed difference in Linux attributable to the scheduling, creation, and destruction properties of threads.

**Threaded solutions often seem straightforward. The devil is in the details, though; locking, synchronization, and debugging issues tend to bite hard, and in the end I've never dealt with a problem where threading was a win over multiprocs and/or a state machine. The advantage of multiprocesses is not only in keeping memory protection; it also forces you to be explicit about what's shared and how that is communicated (and greatly simplifies debugging). Resulting designs tend to be much clearer and easier to make correct and maintain.

Sumner

--
rage, rage against the dying of the light
Re:I'm only a humble C programmer, but.... by tomstdenis · 2002-09-23 13:25 · Score: 1

I think you are writing as a person who has never had to use either. Threads can really be life savers when used correctly. Sure you have to implement locking but that's what pthread_mutex is for.

On low-mem devices making full copies of the process to spawn copies is just insane.

And on windows the Thread implementation is *intentional* not accidental. The idea is that people using threads will take advantage of the speed increase.

Tom

--
Someday, I'll have a real sig.
Re:I'm only a humble C programmer, but.... by pthisis · 2002-09-23 13:52 · Score: 3, Insightful

I think you are writing as a person who has never had to use either.

I have written a dynamic content server that over the past 2 years has served over 6 billion requests, with 5 9's of uptime. I've written several realtime instrument control applications. I've written a distributed text mining application that does index-assisted regex searches of 1/2 terabyte of data in Threads can really be life savers when used correctly. Sure you have to implement locking but that's what pthread_mutex is for.

On low-mem devices making full copies of the process to spawn copies is just insane.

1) Look up COW and memory sharing.
2) I never said "use only processes". A combination of processes and event loops is the way to go 99% of the time. There are some corner cases where threads are useful, but they tend to be abused by people who think "threads are good" without considering the alternatives nor the ramifications of that choice.

And on windows the Thread implementation is *intentional* not accidental. The idea is that people using threads will take advantage of the speed increase.

It's not a speed increase. Thread switching and thread creation on Windows are slower than process creation and process switching on Linux. On a par, but slower. Process creation on Windows is laughably slow, though, and process switching is substantially slower than thread switching.

It's not that Windows figured out how to make their threads go fast, it's that their processes were dog-slow and they had to create an entirely seperate execution primitive to get any sort of reasonable concurrency. Linux did things the right way by making them both fast, and now allows you to choose between the two for _design_ reasons (do I want to share memory?) rather than artificial implementation reasons.

You'll find a lot of knowledgeable people (Larry McVoy, former SGI kernel architect) who echo the same belief: use threads sparingly. Use as many threads as you have CPUs, and use processes instead if that makes more sense. Use more threads than that only if you're intimately familiar with the alternatives and know why they don't work, because while a state machine with non-blocking I/O may seem hard at first glance it'll almost certainly turn out to be easier to implement correctly, easier to debug, faster, and easier to maintain.

Sumner

--
rage, rage against the dying of the light
Re:I'm only a humble C programmer, but.... by Marijuana+al-Shehi · 2002-09-23 15:33 · Score: 1

What? Do you have any idea how much time (and how much RAM) it would take to launch 100,000 threads in Java? Do you even know how it is done? This may help:
Userspace threads:Java::Kernel space threads:C on Linux
. Nice way to use cavalierly in a sentence though.

--
"I think all foreigners should stop interfering in the internal affairs of Iraq"
-- Paul Wolfowitz, 7/21/2003

If you want to destroy my boxen. . . by endeitzslash · 2002-09-21 17:35 · Score: 3, Funny

Launch 100,000 threads while I walk away. . .

OK I'll shut up now.

Re:If you want to destroy my boxen. . . by Wayfare · 2002-09-21 17:58 · Score: 1

*as* I walk away.

Sorry - had to.

Parallelism by inkfox · 2002-09-21 17:36 · Score: 5, Interesting

This is very cool; but does it scale to multiple CPU systems? More and more, SMP, split-bus and multi-core architectures are going to be taking over. If this holds up in those environments, Linux may actually have a leg up on some of the dedicated task heavyweights.

--
Says the RIAA: When you EQ, you're stealing bass!

Re:Parallelism by Anonymous Coward · 2002-09-21 18:53 · Score: 2, Interesting

I believe it said in the article/discussion that they were using a dual p4 for testing. That would imply that scaling isn't a problem

Many algorithms work great for one extra processor but fail miserably with more.
In most cases, you can just busy wait on a semaphore with two CPUs and never notice the hit. 8, 32 or 512 CPUs and you're going to throw away most of your processing time.
Re:Parallelism by _Knots · 2002-09-21 19:18 · Score: 2

I've been following LKML, not that I can contribute much, but still. Most of the scheduling work, if memory serves, is tested on large-way boxes (the number 32 leaps to mind).

You are encouraged to read the list for yourself because it's early in the morning and my brain might be playing tricks on me.

--Knots;

--
Anarchy$ dd if=/dev/random of=~/.signature bs=120 count=1
Re:Parallelism by deKernel · 2002-09-22 01:50 · Score: 1

My question to this is, I didn't think that the P4 could do SMP?
Re:Parallelism by cheese_wallet · 2002-09-22 02:11 · Score: 2

My question to this is, I didn't think that the P4 could do SMP?

The p4 xeons can.
Re:Parallelism by NerveGas · 2002-09-25 14:01 · Score: 2

Yeah, I wish that SMP was taking over. We're not really that much farther in that regard than we were 5 or 6 years ago, with the Pentium Pro. In fact, in some regards, we're WORSE off: Since the PPro, all Intel chips had SMP capabilities, even the "non-SMP capable" celerons. However, now the norm is for their chips to NOT have SMP capability.

Yes, the Itanium and upcoming SledgeHammer are going to change things. But we've been hearing that for a decade. We'll see if things REALLY change or not.

steve

--
Oh, you're not stuck, you're just unable to let go of the onion rings.

IE/Mozilla by teasea · 2002-09-21 17:47 · Score: 1

Got a link for that?

Great news! by zensonic · 2002-09-21 17:48 · Score: 2, Funny

So now I'm able to open up 100.000 pr0n pictures in just 2 sec. Ubercool ;-)

--
Thomas S. Iversen

Re:Great news! by Bishop923 · 2002-09-21 18:54 · Score: 2

Only problem would be getting an HDD to transfer at 25.6 GB/s (assuming each pic was 500k)
Now THAT would be impressive. :-)
Re:Great news! by damiam · 2002-09-22 00:48 · Score: 1

No problem, just steal Google's RAM array.

--
It's hard to be religious when certain people are never incinerated by bolts of lightning.

Not 100,000 threads in parallel, just 50. by Tronster · 2002-09-21 17:51 · Score: 1, Informative

The title and description is misleading. From the comments further down in the article, Linus points out that only 50 threads at a time were running in parallel:

From: Linus Torvalds
Subject: Re: [ANNOUNCE] Native POSIX Thread Library 0.1
Date: Fri, 20 Sep 2002 06:01:47 +0000 (UTC)

Rik van Riel wrote:

>I agree, it's pretty silly. But still, I was curious how they
>managed to achieve it ;)

You didn't read the post carefully.

They started and waited for 100,000 threads.

They did not have them all running at the same time. I think the
original post said something like "up to 50 at a time".

Basically, the benchmark was how _fast_ thread creation is, not now many
you can run at the same time. 100k threads at once is crazy, but you can
do it now on 64-bit architectures if you really want to.

Linus

Re:Not 100,000 threads in parallel, just 50. by vvikram · 2002-09-21 18:01 · Score: 5, Informative

Yeah right. And modded to "Informative"? Slashdot moderators are the _pits_.

Read ingo's reply to Linus. They _did_ start
one test serially and also _parallelly_ . In short he says that its possible.

vv
Re:Not 100,000 threads in parallel, just 50. by the_quark · 2002-09-21 18:01 · Score: 2

This could be huge for things like webservers, though, which spend a lot of their time kicking off new (logical) processes. As I understand it, on Linux, a big part of the reason Apache 2.0 hasn't taken off (aside from lack of availability of major packages) is that Apache 2.0's main win is in threading support. Under Linux, thread creation hasn't been much faster than process creation, because process creation was so dang fast.

So, am I right in thinking this means threading (and hence Apache 2.0) will be a big win for Linux web servers, now?
Re:Not 100,000 threads in parallel, just 50. by mikec · 2002-09-21 18:03 · Score: 2

A later post pointed out that Linus was wrong. They actually did both tests: one test created and destroyed threads as fast as possible; the other created 100K threads first and then killed them all.
Re:Not 100,000 threads in parallel, just 50. by DoctorHibbert · 2002-09-21 18:03 · Score: 2, Insightful

True, however the feat is still quite impressive. By making the creation and destruction of threads cheaper, it frees developers from having to worry so much about the overall system impact when spawning threads.

For instance, because of the expense many applications use thread pools, which is simply a bunch of idle threads that sit around doing nothing, waiting for work to do. These idle threads still take up system resources even though there not actually using CPU. Not to mention the extra work the developers have do to make the thread pools work for there applications.

--
Arbitrary sig
Re:Not 100,000 threads in parallel, just 50. by grytpype · 2002-09-21 18:04 · Score: 2

I'm afraid YOU didn't read the article very carefully, Ingo replied as follows to Linus's post:
actually, that was Ulrich's other test, which tests the serial starting of 100,000 threads. the test i did started up 100,000 concurrent threads which shot up the load-average to a couple of thousands. [the default timeslice the parent has is enough to start more than 50,000 parallel threads a pop or so.]

--
- Have a picture
Re:Not 100,000 threads in parallel, just 50. by kinnunen · 2002-09-21 18:05 · Score: 5, Informative

Read Ingo's posts too:
actually, that was Ulrich's other test, which tests the serial starting of 100,000 threads. the test i did started up 100,000 concurrent threads which shot up the load-average to a couple of thousands. [the default timeslice the parent has is enough to start more than 50,000 parallel threads a pop or so.]
And another one:
Anton tested 1 million concurrent threads on one of his bigger PowerPC boxes, which started up in around 30 seconds. I think he saw a load average of around 200 thousand. [ie. the runqueue was probably a few hundred thousand entries long at times.]
Re:Not 100,000 threads in parallel, just 50. by ergo98 · 2002-09-21 18:11 · Score: 3, Insightful

Under Linux, thread creation hasn't been much faster than process creation, because process creation was so dang fast.

That's called "making lemonade out of lemons". Clearly this test has shown that thread creation in Linux was horribly broken, not the flip side that process creation was so wonderfully good.
Re:Not 100,000 threads in parallel, just 50. by the_quark · 2002-09-21 18:39 · Score: 5, Informative

No, seriously. Process creation under Linux was time-similar to thread creation on other OSs. That's because Linux was as fast at creating *a process* as other OSs are at creating *a thread*. IIRC, threading was initially implemented in Linux from the process-creation methods, so it was similar in speed (the main advantage in Linux from threads was the shared memory space if your application wanted that sort of thing). That's why Apache 2.0 is bringing NT performance more in line with Linux 1.3 performance: NT's threading speed is a lot closer to Linux's forking speed. Again, I'd like to underscore I'm not an expert on this, and it's possible I'm mistaken about relative benchmarks (is NT w/Apache 2.0 a little faster than Linux w/Apache 1.3? Could be...) but I'm very confident of the basic underlying point, that Linux process creation is essentially comparable to other OSs' thread creation, perhaps even faster.

See, for example, http://www.linux.cu/pipermail/linux-prog/2001-Febr uary/000027.html, just one of the first Google links that popped up when I went looking for proof that I'm not on crack: "Linux newcomers often are unaware of the substantial differences between Linux and other operating systems. To implement concurrency, they use multithreading exclusively, mistakenly assuming as high an overhead associated with Linux multiprocessing as on other platforms." In fact, knowing how fast Linux's process creation is relative to other systems' thread creation makes this even more impressive in my mind. This isn't just a bug fix; much like with process creation before it, Linux is doing something fundamentally better than its counterparts.

Don't forget: Just because this is /. doesn't mean I'm just a Windows-hating troll. I try to make sure all my Windows-hating-troll-posts are at least backed up by facts. ;)
Re:Not 100,000 threads in parallel, just 50. by brianpane · 2002-09-21 19:19 · Score: 4, Informative
Apache 2.0 doesn't actually do thread creation very frequently. The thread creation cost occurs mostly at startup. So the limiting factors for threaded Apache performance on Linux are mainly:
- The speed with which the kernel can schedule and context-switch among threads
  For some recent data on this, see http://marc.theaimsgroup.com/?l=apache-httpd-dev&m =103228014211983. The O(1) scheduler patch for 2.4 seems to help here.
- Memory usage per thread
- Concurrency limitations of the Apache code itself
  This has been improving gradually with successive 2.0 releases, as the remaining global locks are removed or optimized.
- General robustness of the thread implementation
  The current (2.4) Linux threading implementation doesn't work well with debuggers.
At first glance, it looks like the NPTL could be a win for threaded Apache on Linux, as offers some solutions first the first and last of these issues.
Re:Not 100,000 threads in parallel, just 50. by kinnunen · 2002-09-21 19:26 · Score: 1

This is just the latest example why slashdot needs a -1 bullshit/misinformation moderation option. The post isn't a troll, it's not really a flame[bait] either and it certainly isn'f off-topic. That leaves -1 overrated. The -1 is fine, but it doesnt change the moderation reason shown with the comment. So if you moderate +5 informative post with -1 overrated the next idiot moderator will just see the +4 informative and think "hey this is a good post, it really should be +5 informative".
Re:Not 100,000 threads in parallel, just 50. by Karellen · 2002-09-21 22:32 · Score: 5, Informative

It's not process/thread _creation_ times that make the difference, it's the process/thread _context_switch_ times that really mount up, which is where Linux shines.

And yes, Linux's process context switches are on a par (possibly faster - can't be bothered to look up benchmarks) with NT's thread context switches.

K.

--
Why doesn't the gene pool have a life guard?
Re:Not 100,000 threads in parallel, just 50. by Wdomburg · 2002-09-22 01:53 · Score: 2

> The title and description is misleading. From the
> comments further down in the article, Linus
> points out that only 50 threads at a time were
> running in parallel:

And the next comment down is from Ingo:

actually, that was Ulrich's other test, which
tests the serial starting of 100,000 threads.

the test i did started up 100,000 concurrent
threads which shot up the load-average to a
couple of thousands. [the default timeslice the
parent has is enough to start more than 50,000
parallel threads a pop or so.]

So, yes, they did manage 100,000 threads running in parallel.

Matt
Re:Not 100,000 threads in parallel, just 50. by Sivar · 2002-09-22 05:04 · Score: 2

Linus was... Wrong?!

Whoa, that's going to completely shatter the world view of many Slashdotters.

--
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
Re:Not 100,000 threads in parallel, just 50. by sagei · 2002-09-22 06:53 · Score: 2

No, seriously. Process creation under Linux was time-similar to thread creation on other OSs. That's because Linux was as fast at creating *a process* as other OSs are at creating *a thread*. IIRC, threading was initially implemented in Linux from the process-creation methods, so it was similar in speed

It was and still is implemented by the process creation methods. Threads were (and still are) the same as processes in Linux (to the kernel, anyhow). All process creation is done by do_fork(), which accepts clone() flags that specify what to share between the parent and the child. "Threads" (as opposed to normal processes) just happen to share a few things: address space, signal handlers, open files, etc.

But yah, process creation in Linux is sick. Hold your head high.

--

Robert Love
Re:Not 100,000 threads in parallel, just 50. by AJWM · 2002-09-22 07:05 · Score: 2

You're quite right, and in fact this predates Linux and NT -- Unix was always good at process creation, whereas VMS process startup was very heavy on overhead.

It's not surprising that Linux (modelled on Unix) and NT (originally modelled on VMS) show similar characteristics. It's the reason that many Unix applications tend to be written as a bunch of cooperating processes, whereas NT apps are monolithic monsters with lots of threads.

Unfortunately, thanks to a generation of CS students having learned bad habits on Windows, we're starting to see a lot of Linux apps written as monolithic monsters. (Of course there are few old Unix apps out there like that too, perhaps some old mainframe mentality leaking through.) There are advantages to cooperating/communicating processes vs the monolithic multithreaded approach: it's easier to test the components separately, it's easier to reuse the components to make different systems, and a bug in one place won't necessarily clobber the whole thing.

--
-- Alastair
Re:Not 100,000 threads in parallel, just 50. by gregorio · 2002-09-22 09:12 · Score: 1

And yes, Linux's process context switches are on a par (possibly faster - can't be bothered to look up benchmarks) with NT's thread context switches.

And why do we need a low-latency patch just to listen to a mp3 and move windows without having to listen a lot of noise (when you move a window)?
What you just said is just plain BS, hoping for karma, if Linux had a good context switching performance, we wouldn't need a low latency patch.
Re:Not 100,000 threads in parallel, just 50. by himi · 2002-09-22 14:14 · Score: 4, Interesting

The latency issues that cause mp3 skipping under heavy load in Linux have nothing at all to do with context switching, and everything to do with /scheduling/ latency: how long it takes for a process that has work to do to actually get control of the cpu. Context switching has /nothing/ to do with that.

The low latency patches go through the kernel breaking up areas where spinlocks are held for long periods of time. That's what causes massive scheduling latency in the kernel.

Context switching under Linux /is/ extremely fast - it's actually been measured (a lot), and it's something the kernel developers pay a lot of attention to and optimise very carefully. They literally count cpu cycles in these code paths. Context switching time is a serious performance limiter in many areas, so getting it right is important, and it's something that Linux does /very/ well.

Go do some real research before you accuse someone who's right of karma whoring bullshit.

himi

--

My very own DeCSS mirror.
Re:Not 100,000 threads in parallel, just 50. by napir · 2002-09-23 07:09 · Score: 1

The actual time spent in the kernel during a context switch seems like it would be only a small part of the penalty you'd pay during a (process-changing) context switch. Switching from one process to another means moving to a different address space, which means that most of the stuff in your caches is going to be trash. The number of cache misses you're going to suffer when the process first starts running again seems like it would be the largest performance problem. This is the main argument for using threads instead of separate processes, not how long they take to create.
Re:Not 100,000 threads in parallel, just 50. by himi · 2002-09-23 13:29 · Score: 2

Yes, that's a performance issue, but it's not a /latency/ issue - the new process is running, and from there on in the latencies are only a few hundred cycles rather than measurable in microseconds. Until the next time the process enters the kernel, or page faults, or whatever. As far as latency goes, context switching is of minimal importance unless you're worried by latencies on the order of less than a microsecond (depending on hardware and the like, of course).

The argument that threads trash cache less than full processes seems fairly bogus to me - the cache trashing will be much more dependant on the size of the working sets of all the running processes, and there's nothing to say that a thread will have a smaller working set than a process. The text segment will be shared, yes, but it's the same with multiple instances of /any/ process, because the program text will be mmapped read only, allowing the memory to be shared, and thus kept in cache. The TLB flush needed would be an added cost, but unless the cache really is being trashed completely by your program it'll be reloaded straight from cache, and that shouldn't be more than a few hundred cycles (I think - don't quote me on that).

In any case, the real performance comparison isn't between multiple processes versus multiple threads, it's between a multithreaded implementation and a single-threaded one. In /that/ comparison threads come last, simply because they /have/ those kinds of cache interactions and so forth, where a single-threaded version won't. They also have overhead due to locking, greater debugging difficulties, and other added complexities. On the other hand, though, you can't make use of more than one processor without having multiple processes, whether they're threads or full processes . . .

I think the biggest thing making threads attractive to people is the fact that a threaded approach will often make things simpler to think about in the design stage. You can make all the independant threads of control in your design /real/ threads of control in the implementation. That comes at a cost, though . . .

Personally, I like the quote from Alan Cox that I've seen in a few people's .sig: "Threads are for people who can't program state machines". It's more complex than that, but it does seem to capture a lot of what motivates threaded designs.

himi

--

My very own DeCSS mirror.
Re:Not 100,000 threads in parallel, just 50. by pthisis · 2002-09-23 13:40 · Score: 2

And yes, Linux's process context switches are on a par (possibly faster - can't be bothered to look up benchmarks) with NT's thread context switches.

Last time I benchmarked, which was a long time ago (NT 3.51 days), Linux process switch times were 5x faster than NT thread-switch times on the same hardware. Linux thread-switch times were on a par with process-switch times, NT thread-switch times were about 20x faster than NT process-switch times.

I'd expect all those numbers to have changed, though.

Sumner

--
rage, rage against the dying of the light
Re:Not 100,000 threads in parallel, just 50. by Karellen · 2002-09-24 07:32 · Score: 2

_Need_ the low latency patch? We don't.

karellen $ uname -a
Linux foo 2.4.17 #1 Sat Jul 13 12:21:18 GMT 2002 i686 unknown
karellen $ cat /proc/cpuinfo | grep -E "model|cpu"
cpu family : 6
model : 3
model name : AMD Duron(tm) Processor
cpu MHz : 757.485
cpuid level : 1
karellen $ cat /proc/meminfo | grep MemTotal
MemTotal: 126732 kB
karellen $

So, I'm running 2.4.17 on an AMD 750 with 128MB of RAM. You'll have to take my word that that's a stock 2.4.17, with no patches, but I'm playing a list of .ogg files with xmms, while ripping and ogging a CD in the background, with Mozilla running, and grabbing a mozilla window and moving it around the desktop (with opaque window moving switched on) really quickly for 20 seconds results in - no skipping.

Yeah, reducing latency will be nice, but as far as I can tell, it's not actually needed for anything to do with the `user experience' at the moment.

Don't know what you've got running in the background, but it must be pretty hefty.

K.

--
Why doesn't the gene pool have a life guard?

Best Quote: by Fallen+Kell · 2002-09-21 18:02 · Score: 1

Why so many threads? "Because we can :)"

--
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"

Sounds cool, but all I could think of... by Geek+Tragedy · 2002-09-21 18:04 · Score: 5, Funny

"Hello, my name is Ingo Molnar. You killed -9 my process: prepare to die."

Sorry, had to :P

Re:Sounds cool, but all I could think of... by unsinged+int · 2002-09-21 18:46 · Score: 5, Funny

I think it's more commmonly this:

"Hello, my name is Ingo Molnar. You kill -9 my parent process. Prepare to vi."
Re:Sounds cool, but all I could think of... by Sycle · 2002-09-22 01:02 · Score: 1

Ingo Molnar is credited in the summary (try the top of the page) as being the major contributor to the project.

It's nice you got half of the reference, anyway.
Re:Sounds cool, but all I could think of... by rweir · 2002-09-22 15:00 · Score: 1

Talking of vi and patches...

No one's as manly as Al Viro

NOOO!!!!! by Monkelectric · 2002-09-21 18:07 · Score: 3, Funny

At school (before I graduated so long ago) we would "fork bomb" the compute servers [ while(1) do { fork(); } ] in an attempt to extend deadlines or simply be assholes :)

--

Religion is a gateway psychosis. -- Dave Foley

Re:NOOO!!!!! by powerlord · 2002-09-21 18:18 · Score: 2

Hehehe I had a classmate do that accidentally.

One of our final projects was to impliment our own shell. This would of course necessitate a fork() command... he hadn't checked conditions quite right and managed to use up all the resources for his account. Fortunately someone had set the Ultrix (Unix on VAX) system up with a little intelligence. He only bombed his own account and had to get the Prof. to go in and kill the out of control Shell :)

I on the other hand merely got half-baked tokenizing. Great teacher (pity the disbanded the Comp-Sci department around us).

--
This space for rent. All reasonable inquiries will be entertained at proprietors discretion.
Re:NOOO!!!!! by ez76 · 2002-09-21 18:50 · Score: 5, Funny

I am replying pre-emptively to dissuade the AC's who would otherwise reply to you and point out that your post should not have been modded funny because this innovation would not prevent fork() bombing because it involves spawning threads and not processes.

I am further replying pre-emptively to dissuade the AC's who would otherwise reply to me and point out my egregious abuse of run-on sentences.

I am further replying pre-emptively to dissuade the AC's who would otherwise reply to me and point out my egregious abuse of +1 bonus.

I am further replying pre-emptively to dissuade the AC's who would mod this post down as off-topic because they do not get the parallel allusion to fork-bombing.
Re:NOOO!!!!! by AJWM · 2002-09-21 18:58 · Score: 3, Funny

I did something like that back in my school days on a dual-CPU Burroughs B6700, but with a twist: Each process forked itself twice, then waited. When it received a signal about a child process being killed, it spawned two more. I had a sleep of a few seconds or so in there so it didn't grow too fast.

The fun part of that was when the system operators saw the processes replicating like crazy and started to kill them, that made it worse.

Another fun trick with that machine was to set up a circularly-linked list and invoke the LLLU (linked list lookup) instruction on it...

(Yeah, stupid things to do. At least I only did them during relatively quiet times.)

--
-- Alastair
Re:NOOO!!!!! by stephanruby · 2002-09-21 18:59 · Score: 1

At school (before I graduated so long ago) we would "fork bomb" the compute servers [ while(1) do { fork(); } ] in an attempt to extend deadlines or simply be assholes :)
That explains why some professors wouldn't give us any extensions when the labs' servers were down.
Re:NOOO!!!!! by inio · 2002-09-21 20:24 · Score: 5, Funny

Dude, you seriously need to look into writing patents.
Re:NOOO!!!!! by Agent+Orange · 2002-09-21 20:50 · Score: 1

Heh, I remeber defining an inifitely-recursive function that kept spawning subshells in shell once. only _then_ did I discover that the main server didn't have any limits...

15,000 zombie processes later....*CRASH*

oops :)
Re:NOOO!!!!! by inode_buddha · 2002-09-21 22:46 · Score: 1

Fond memories of the Burroughs B6900 at my local college here... 25 years later it was replaced by a Gateway2000 with dual Xeons...

--
C|N>K
Re:NOOO!!!!! by defile · 2002-09-22 06:29 · Score: 2

For any admins subjected to such clever users, the correct way to handle a forkbomb is to first send STOP to all processes, which will prevent them from replacing the siblings which you kill, then you break out the nine.
Re:NOOO!!!!! by Error27 · 2002-09-23 13:58 · Score: 1

at the kernel level threads are just processes with shared memory.

Windows by jeffbru · 2002-09-21 18:08 · Score: 3, Interesting

Just out of curiousity, how does the benchmark in windows compare?

--
- Jeff Brubaker

Re:Windows by CoolVibe · 2002-09-21 18:22 · Score: 2, Troll

Oh, I'll bet Microsoft could rig an system without the graphics, network, most driver subsystems and the GUI stuff to skimp on overhead and winge their way to a higher number of parallel threads in less time.
Or they could just blatantly pay some other company that does "independant testing" *cough*mindcraft*cough to lie about it :)
Re:Windows by Courageous · 2002-09-21 19:01 · Score: 2, Insightful

It is *impossible* to even allocate more than about 31,000 threads under windows on 32 bit machines. You simply CAN'T do it. The minimum thread stack size is 1 64KB page. You an only address 2GB of memory on a 32 bit windows OS. Do the math.

C//
Re:Windows by akintayo · 2002-09-22 03:42 · Score: 1

maybe i am confused, but i thought windows had a switch that would allow the user application to address 3G of memory

--
Woe be on to them, all who rise against poor people, shall perish in a the end. Buju Banton
Re:Windows by bmajik · 2002-09-22 04:05 · Score: 2

Two nitpicks:
you can address 3GB with the /3GB switch :)
you can address significantly more with AWE/PAE, but i dont know that you can use that additional memory for thread stacks.

Just FYI, Yesterday i had SQL server 2k running with 1914 threads ( in AWE mode)

--
My opinions are my own, and do not necessarily represent those of my employer.
Re:Windows by Courageous · 2002-09-22 04:39 · Score: 2

Well, that may be the case. I was making more of a reference to the reserve memory lower limit on a thread's stack size.

C//

Real World Example by robinjo · 2002-09-21 18:11 · Score: 2

I'm building a project where there will be one huge database with up to 200 different companies connected to it pretty much nonstop. 1-10 users from every company depending on the time of the year. 2 threads for every connection.

200*10*2=4000 threads.

Re:Real World Example by khuber · 2002-09-21 20:07 · Score: 2

Why would you use 2 threads per connection instead of the more common select() + worker thread pool? That doesn't seem like a scalable design.
-Kevin
Re:Real World Example by flux · 2002-09-21 20:46 · Score: 2, Interesting

What is the fundamental reason select/poll should be that much faster anyway? Well, you win the context switch-times, if you can handle many clients in a tick. But on the other hand it does affect the way you need to design the code, and doing some stuff that neveer stalls withouot threads might be tricky.

Just imagine a situation where a thread might need to calculate something, or initialize a big array. Now, if it's run under a select-loop, you need to do that in parts to avoid starving the server. With threads, you just do the trick and don't care about the rest of the world which keeps serving the clinets, no matter how long youo stay in the functino.
Re:Real World Example by khuber · 2002-09-21 23:39 · Score: 2, Informative

I was talking about a hybrid design, not pure select. Of course you are right about the limitations of pure select. Thread per client starts to bog as the number of simultaneous clients increases.
It's not practical to serve hundreds/thousands of clients with a thread per client model. A typical machine can't handle the load well because it has limited resources. It will thrash. By having a thread pool you place a limit (throttle if you will) on resource utilization. Most high performance, highly scalable web and app servers use this model or a variant.
There is another architecture based on event driven state machines aka SPED (single process event driven) that is high performance and single process/single thread in its pure form. The Zeus web server does this.
-Kevin
Re:Real World Example by saurik · 2002-09-22 06:23 · Score: 1

Another model is SEDA ("Staged Event-Driven Architecture"). This was mainly examined by the guy who wrote what became Java's new java.nio package.

Here's a link: http://www.cs.berkeley.edu/~mdw/proj/sandstorm/

If I remember correctly, SEDA uses a few thread pools to handle clients at different stages of their work.

From his website:
"We have built a number of applications to demonstrate the SEDA framework. Haboob is a a high-performance Web server including support for both static and dynamic pages that outperforms both Apache and Flash (which are implemented in C) on a SPECWeb99-like benchmark. Other applications include a Gnutella packet router and Arashi, a Web-based email service similar to Yahoo! Mail."

boxen. . . by Catskul · 2002-09-21 18:20 · Score: 2, Troll

Could you please refrain from using "boxen". It makes my head hurt

--

Im not here now... Im out KILLING pepperoni

Re: boxen. . . by Naikrovek · 2002-09-21 20:21 · Score: 1

agreed, "boxen" is not a word. "Boxes" is a word.

Non-words used as words is like, so 1999, man.
Re: boxen. . . by BlacKat · 2002-09-21 22:33 · Score: 1

Maybe you need to go visit dictionary.com?

http://www.dictionary.com/cgi-bin/dict.pl?term=b ox en&db=*

Which has this entry from the jargon file:

boxen /bok'sn/ (By analogy with VAXen) A fanciful plural of box often encountered in the phrase "Unix boxen", used to describe commodity Unix hardware. The connotation is that any two Unix boxen are interchangeable.
Re: boxen. . . by spongman · 2002-09-22 02:05 · Score: 2

Non-words used as words is like, so 1999, man.
And so is using time as an adjective.
Re: boxen. . . by BlacKat · 2002-09-22 08:28 · Score: 1

How the hell was my post "overrated"?

It was directly to the point at hand, that the word "boxen" is a valid "word" in use by many people.

Yeesh.
Re: boxen. . . by bsartist · 2002-09-22 15:59 · Score: 1

Could you please refrain from using "Im" instead of "I'm?" It makes my head hurt.

--
Lost: Sig, white with black letters. No collar. Reward if found!
Re: boxen. . . by heffrey · 2002-09-22 21:03 · Score: 1

What I want to know is why use boxen rather than boxes?
Re: boxen. . . by BlacKat · 2002-09-23 08:23 · Score: 1

Got me, tho "boexn" just sounds better to me then "boxes". ;)

It's also acceptable according to the Jargon File so hey... Boxen it is!
Re: boxen. . . by pthisis · 2002-09-23 13:28 · Score: 2

What I want to know is why use boxen rather than boxes?
"boxes" refers to the physical objects (ie the cases and contents thereof)

"boxen" refers to the notional servers.

My Linux boxen could be retasked to be FreeBSD boxen, but they'd still be the same boxes.

AFAIK, "boxen" was derived from "VAXen". And it was never "VAXes", that would brand you as computer-illiterate as quickly as saying "What's the http for that?" or using "PC" to mean "Windows box".

Sumner

--
rage, rage against the dying of the light

whoa! by RestiffBard · 2002-09-21 18:27 · Score: 4, Funny

I have no idea what the hell you're talking about but it certainly sounds impressive. :)

--
- /* dead coders leave no comments */

Great by C0D3X · 2002-09-21 18:28 · Score: 3, Funny

Now we finally have the power to run 99,999 pop up ads when we visit that pr0n site

Re:Windows comparison by pVoid · 2002-09-21 18:30 · Score: 3, Interesting

Very interestingly enough, either windows has a quota, or some sort of memory leak or something...

Max I can create in a process is 2031 threads... That being done in 700ms.

It's odd cause I can create more if I run several processes. It doesn't look like the kernel is choking on thread creation...

will investigate more.

Possible use by captaineo · 2002-09-21 18:34 · Score: 2

Normally I am of the "use only as many threads as CPUs" school of thought, but I can think of a reason to use 100,000 threads - imagine a large FTP server, or a multi-homed HTTP server, where you need to provide each connected user with his own set of access privileges or filesystem context. A one-thread-per-connection server may be the easiest way to build security into the system.

Re:Possible use by vsync64 · 2002-09-21 18:41 · Score: 2, Informative

Except that threads, as far as I am aware, share the same address space. Multiple processes need to arrange to share memory, and therefore are less likely to trample on one another or careen out of control.

--
TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
Re:Possible use by jhines · 2002-09-22 03:23 · Score: 2

Uber monster sim program, with a city full of residents, each run by a thread.
Re:Possible use by bmajik · 2002-09-22 04:11 · Score: 2

of course the penalty you pay for this is that fork() is expensive, and shared memory is a finite system resource. try the command "ipcs" on a sys-v type box.

It is also generally the case that switching between processes is more expensive than switching between threads.

to the parent poster : 1 thread per connection is a pretty naive way to do it, but its got advantages - simplicity. It's a moot point since on a stock OS you'd run out of socket descriptors long before you'd run into a thread-count maximum.

--
My opinions are my own, and do not necessarily represent those of my employer.
Re:Possible use by be-fan · 2002-09-22 06:45 · Score: 4, Insightful

"use only as many threads as CPUs"
>>>>>>>>
Then please stay away from my GUI apps. I hate those UNIX grognards that come from that school of thought, then try to code GUI applications with only one thread and end up with apps that can't update the GUI while doing I/O. On my 300 MHz PII, that particular trait made Galeon unusable. It had one rendering thread for all the tabs, so when I was loading a complex page like /. in another tab, whatever tab I was actually reading would freeze up.

--
A deep unwavering belief is a sure sign you're missing something...
Re:Possible use by Nevyn · 2002-09-22 07:09 · Score: 1

of course the penalty you pay for this is that fork() is expensive
Not on Linux, fork() is about the same speed of thread creation ... by design.
and shared memory is a finite system resource. try the command "ipcs" on a sys-v type box.
Wrong again, you can use mmap() for file backed shared memory.
It is also generally the case that switching between processes is more expensive than switching between threads.
Somewhat, in that you can keep the TLB etc., but you have the cache syncronisation problems with threads that you don't get with processes (Ie. multiple threads write to the same point in memory then it bounces around their caches). And of course pretty much everyone gets the locking wrong in threads, so the app. either serialises on locks or goes really fast and stops.

--
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Re:Possible use by captaineo · 2002-09-22 10:54 · Score: 2

I hate those UNIX grognards that come from that school of thought, then try to code GUI applications with only one thread and end up with apps that can't update the GUI while doing I/O

Then they are stupid. I/O can be done asynchronously in a single-threaded program; you just need to use non-blocking I/O (sockets) or AIO (disk).

As for Galeon - the rendering code needs optimization. Rendering in a background thread is NOT going to help. (think: the socket connection to the X server is inherently serialized; if rendering is the bottleneck then the rendering code itself is the problem)
Re:Possible use by be-fan · 2002-09-22 11:27 · Score: 2

I/O can be done asynchronously, but that forces an interrupt-driven model, which (IMO) is even more complex than a threaded model. As someone who started programming in the 90's (when threads were commonplace) and started GUI programming by playing around with BeOS, it seems much more natural to me to just have a thread that sleeps unless its drawing or getting updated information about the window contents. AIO vs threads aside, the aversion to threads causes a major problem with GUI programming. Most current windowing systems tend to encourage multithreaded GUI apps (take a look at Win32, apps there use as many threads as BeOS apps ever did) while none encourage AIO. As a result, programmers who don't like threads end up not using *either* model.

I don't know the details of the Galeon code, but it doesn't seem to be the connection to the X server that's holding it up. While its loading a web page in one tab, it doesn't respond to events going on in another tab. Galeon could not possibly be spending all that time rendering (especially since Gecko is supposed to be so fast). What it seems to me is that the parsing and rendering and event handling are all occuring in one thread, so while the browser is parsing and loading one page, it can't render or respond to input events. Breaking the user-interaction into another thread would allow the GUI to respond while the page was loading.

--
A deep unwavering belief is a sure sign you're missing something...
Re:Possible use by bmajik · 2002-09-22 12:06 · Score: 2

maybe on linux - and maybe now. so does this announcement now mean that you can fork 100,000 processes in a matter of seconds as well ?

(and what linux kernel lets you have 100k simultaneous processes ?)

and given what i remember about fork, isn't it the case that you memcopy the entire address space of the forking process for each fork() (barring optimizations such as perhaps shared text segments) ?

are you telling me that pthread_create() on _Every_ platform copies the entire process address space ? i dont think this is the case. a large orchestrated memcpy of course is a perf hit - one that afaik, forking required, and threading does not.

Note that im not very well versed in how _linux_ threads work - i only know that they've always been " a bit different ".

Cache synchronization in MT apps is hardly on the same scale as reading/writing to shared memory (or mmap()ed regions?!) If you demonstrate that your forked app works faster via mmap than a single-address space process with multiple threads writing to shared non-thread-local storage, on a platform that has a reasonable threads implementation (not necessarily linux, but i haven't followed linux's threading at all, honestly), i'll be pleasantly surprised.

fork() has advantages. however, dismissing threading outright by claiming that fork() is equal or superior makes anything you claim dubious.

re: thread switching vs process switching:
this argument seems totally ridiculous. i can't possibly fathom how _any_ cache coherency solution that is thread-specific is time-comparative with flushing and reloading the tlb, flushing and reloading _all_ caches, and so on. and cache coherency has been well attacked in designs like the SGI O3k. Are you telling me that a cache-line or two is going to be _less_ efficient than dumping all caches and tlbs ?

Locking isn't as bad as you claim. And the same fundamental problem(s) exist w.r.t. sharing resources wether your talking threads or processes.

--
My opinions are my own, and do not necessarily represent those of my employer.
Re:Possible use by captaineo · 2002-09-22 12:19 · Score: 2

Although I disagree that threaded programs are inherently easier to write than state machine programs, I can understand your point of view. It's probably a matter of personal preference and experience. (yes, Win32 and BeOS both strongly encourage a multithreaded style of programming, but I don't think that decision was made purely on technical merit. Multithreading was the "cool" thing to do back when these APIs were invented...)

If you need an example of AIO in a GUI application - consider Netscape 4 or Mozilla on Mac OS 9. These programs are multithreaded on most operating systems, but since Mac OS 9 has weak support for threads, the Netscape runtime simulates multiple threads using coroutines and AIO. (this is done transparently to code running on top of the runtime, so it still appears to be written in the multithreaded model...)

I use Mozilla and I do notice that it sometimes becomes unresponsive to input while loading a page. But when I boot into Windows, Internet Explorer has no problem loading the same pages without noticeable delays. (not only are other IE windows not blocked, but the one that's loading isn't blocked for long either - so it can't be IE doing all the work in a background thread). I blame inefficient code in Gecko... Pushing the work into a background thread is only a band-aid; it's like pushing a bubble sort algorithm into a background thread when you could just use quicksort and get the same work done much faster.
Re:Possible use by be-fan · 2002-09-22 13:45 · Score: 2

It probably is a matter of personal preference. I like multithreading because if you look at each thread seperately, each on by itself is linear. With state machines, everything is together, but its non-linear.

The thing with AIO and multithreading is that it isn't really faster, but it *seems* faster (to the user). User-interface code takes very little CPU power, but its extremely time sensitive. Doing UI handling asyncronously prevents any sort of background work (efficient or not) from influencing the speed of the UI.

--
A deep unwavering belief is a sure sign you're missing something...
Re:Possible use by vsync64 · 2002-09-22 14:08 · Score: 1

and given what i remember about fork, isn't it the case that you memcopy the entire address space of the forking process for each fork() (barring optimizations such as perhaps shared text segments) ?

Ever hear of copy-on-write? This kind of technique is why that's a big deal. The point of that, if I understand correctly, is that while the program thinks its memory space has been copied, it's still in the exact same place until it actually tries to write to that memory. At that point, I imagine the system would still only copy the page being written to.
This is where Linux memory allocation (used to; I'm pretty sure I saw an article about them fixing it recently) fall down under stress sometimes. Programs could allocate all sorts of memory, but until they tried to write to it, the memory wouldn't be taken out of free space. If software suddenly starts claiming nonexistant memory it was promised, the system runs out of memory and it has to kill things off.

--
TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
Re:Possible use by imroy · 2002-09-23 00:34 · Score: 1
...isn't it the case that you memcopy the entire address space of the forking process for each fork()...

No. There's this little trick called Copy On Write (COW). It uses the MMU and works like this.
1. Process fork()'s
2. Kernel sets up the child process.
3. Instead of copying the whole memory space over, it simply points the childs pagetable entries at the parents pages. Both the parents and childs pages are also marked as read-only (this is the important part).
Now, when either process goes to write to data in their memory spaces:
1. An exception is raised because the page is marked as read-only and the CPU jumps into the MM code of the kernel.
2. The kernel now does the copy for real, but only of that page.
3. The parent and child now have there own seperate copy of this page of data that they can work with.
Tada! Data is only copied when it's needed. Depending on the program involved, this may be a lot or a little. For programs that will immedediately perform an exec() after the fork, vfork() gives even better performance. Under Linux (according to the vfork(2) man page), vfork doesn't even copy the pagetables and it suspends the parent until an exec() is performed. It's kind of a hack, but it saves some work when all you want to do is spawn a new program. Which isn't all that uncommon in the Unix/Linux environment.
Re:Possible use by rossy · 2002-09-27 20:09 · Score: 1

Hear Hear! I think the fear is that software race conditions will be a problem. I've worked with "LOTS" of GUI apps that can't keep up with my typing speed. In an earlier life I found that the X-windows distribution was limiting the xterm baud rate to 9600 baud! (Upping the rate made stuff scroll by faster in SUNOS). I hate it if I have to wait more than 3 seconds for ANY screen repaint, and if ever the APP can't keep up with my typing speed. Someday I would hope that with 2Ghz clock cycles of 500ps, I can get a couple of characters fed into the CPU! I don't think the real problem is processor speed, or CPU/MEMORY bandwitdh. In most cases, it is priorities. We all want to be listened to. I think that multi-threading has to be the answer to this problem... a software agent and GUI signalling setup that tells you that the CPU has the info, and is working on it.

--
Ross Youngblood

Gary Kasparov by Raven42rac · 2002-09-21 18:37 · Score: 1

so this means Gary Kasparov can get beat at chess that much faster now?

--
I hate sigs.

Threads? What about processes? by WetCat · 2002-09-21 18:38 · Score: 1

It's much interested to have so many processes,
not threads in UNIX-like system...
Leave threads for those Window-ers...

Re:Threads? What about processes? by Mr+Z · 2002-09-22 03:04 · Score: 1

This is Linux. In Linux, the difference between a thread and a process is more-or-less whether the VM space is shared or copy-on-write. That's pretty much it as I understand it.

Starting 100000 processes, therefore, seems like it should be doable, so long as there aren't so many "dirty" pages in the forked executables that the system thrashes itself to death handling the COW pages. This is where having a 64-bit system and a lot of RAM would help you. (You need the 64-bit system to address the RAM sanely. The whole "highmem" kludge on x86 is just that, a kludge.)
--Joe

--
Program Intellivision!

This isn't for everyone, though by Anonymous Coward · 2002-09-21 18:42 · Score: 1, Interesting

There was a patch for an O(1) scheduler awhile. What this means is it takes the same amount of time to select what runs next and it's not affected by how much is running. But you won't notice an improvement unless you have about 200 processes running at the same time. This may be good for servers, and the like, but it's a lot slower if you have few processes running. Keep this in mind...

Threads? by SashaM · 2002-09-21 18:48 · Score: 1

I thought Linux didn't have real threads, and they were implemented as processes... Am I missing something?

Re:Threads? by Anonymous Coward · 2002-09-22 00:01 · Score: 1, Informative

A thread is a "light weight process". Threads have many definitions and types.
In java's green threads, the JVM has 1 process and many user-space threads.
In Solaris, there are actually 2 types of threads: a kernel and a user-space thread.

Linux has had a fast process creation and context-switch so Linus chose to implement kernel threads in terms of a process (clone is the creation function). Pthread is simply a posix API wrapper for a linux thread.

While there are many thread libraries, pthread is the main one now. NGPT (Next Gen Posix Thread from IBM) was being positioned to replace pthread with a M:N implementation. It is heavier than current pthread, but with ability to create both kernel and user space threads.

Redhat's library may actually be the replacement just due to it's simplicity (read this as fast and less buggy).

Re:Alternative headline by Dahan · 2002-09-21 18:49 · Score: 4, Informative

Gigantic performance problem in Linux code fixed after several years of "many eyes" scanning over it.

Uh, why did that get moderated as a troll? Oh, right, Linux is absolutely perfect, and anyone who says otherwise must be a troll.

Come on, Linux's scheduler has long been known to have performance problems once you have a lot of processes/threads... for example, read this paper [text version] (appropriately subtitled "How I Learned to Love the Alpha and Hate the Scheduler"):

0.8.1 Create a fixed priority scheduler.
Currently, the Linux scheduler is very different than the traditional Unix schedulers. Although the Linux scheduler is very efficient when only several processes are running, it is not scalable. In order to match the performance of *BSD and other Unices, another scheduling algorithm must be used.

Moderators, don't be Slashbots, moderating according to the groupthink. Educate yourselves, and you'll be better moderators, and better people.

no more slashdot effect? by at10u8 · 2002-09-21 18:51 · Score: 1

I suppose this means that sites will want to switch to Linux/Apache in order to avoid being incapacitated when linked by Slashdot?

Re:Windows comparison by Courageous · 2002-09-21 18:59 · Score: 4, Informative

Very thread uses a minimum of *1 PAGE* of reserve memory for its statck, which is 64K. However, you have to go out of your way to use less than 1 megabyte of reserve memory. Since only 2GB of reserve memory (addressable memory) is available to user applications, this would fit your 2000 thread figure like a glove.

C//

nice, but... by g4dget · 2002-09-21 19:00 · Score: 4, Interesting

It's nice that the Linux kernel can handle that many threads. But user level threads generally are even more lightweight, and high performance implementations like those on Solaris provide both user level and kernel level threads and map the former onto the latter. Is Linux going to get something similar? Is Sun perhaps donating their implementation? Or are these new kernel threads so lightweight and quick that they are competitive with Solaris on their own, without the mess and complication of adding user level threads?

Re:nice, but... by Magnus+Reftel · 2002-09-21 21:50 · Score: 4, Informative

According to a mail from Ingo Molnar halfway down the linked article, M:N threading doesn't really solve the real problem - it's good at switching back and forth between running threads, but the real reason for having very large amounts of threads (be they kernel or user space threads) to begin with, is to do IO, and for that, there is no real advantage of user space threads.

More info on the 1:1 vs M:N issue can be read in the white paper

--
print "Yet another p{erl,ython} hacker\n",
Re:nice, but... by g4dget · 2002-09-21 21:59 · Score: 2

Thanks for the pointer. Sounds like they went with 1:1 for a good reason. I always thought of M:N threading as kind of a kludge and not entirely trustworthy anyway (scheduling and I/O become rather iffy).

How will this affect Mozilla, OpenOffice... by 3770 · 2002-09-21 19:01 · Score: 4, Interesting

How will this change affect Mozilla, the Sun JVM and OpenOffice, for instance.

While it probably is generally true that it will take some time for most applications to start using the new threading model some larger applications could support it fairly soon.

Can we expect these applications to be adapted to the new threading model some time soon, and how will it affect performance?

--
The Internet is full. Go Away!!!

Re:How will this affect Mozilla, OpenOffice... by madmarcel · 2002-09-21 21:12 · Score: 1

Sun JVM?? Hmmm...

That reminds me of a little Java game I write for an assignment last year. It was a cheap rip-off of Zelda. Being a lazy sod, I'd set it up as such that each monster on the screen had its own unique 'thread' + a thread for the player + a thread for the 2D buffering.
<<Did anyone mention lamos writing monster applications? ;P >>

It ran reasonably well, but I did notice that on my triple-boot machine (Win98, linux & winNT - yes, I'm a sicko :) the game ran at very different speeds depending on which OS I used...linux being the slowest of the lot :(
Not sure it has anything to do with the thread-handling or perhaps the way linux handles graphics or (more likely) the JVM being optimized differently/more/less for each OS :\

<<Resists temptation to dust of old java game and start hacking>>
Re:How will this affect Mozilla, OpenOffice... by egghat · 2002-09-21 23:26 · Score: 1

Interesting.

What can one do about it? Use blackdown's JVM for example?

TIA for your answers.

Bye egghat.

--
-- "As a human being I claim the right to be widely inconsistent", John Peel
Re:How will this affect Mozilla, OpenOffice... by alext · 2002-09-22 05:48 · Score: 2

Threading performance may be poor on Linux, though personally I haven't noticed it, other aspects are fine though. I'd say that big Java applications start up about 20% faster on my Linux partition than on my Windows one using Sun JVM 1.3.1_04.

In fact, Solaris LWP threading has caused me more headaches - it seems that the old N:M thread model can deadlock with native libraries such as the Oracle OCI drivers, theoretically using the alternate 1:1 model fixes this but I haven't yet proved the case to my own satisfaction. Read up on this new model here, or try it by putting /usr/lib/lwp in your LD_LIBRARY_PATH.
Re:How will this affect Mozilla, OpenOffice... by Wesley+Felter · 2002-09-22 07:54 · Score: 2

There is no new threading model. The thread APIs are the same, so when you install the right kernel and glibc all apps will benefit.
Re:How will this affect Mozilla, OpenOffice... by psamuels · 2002-09-23 16:09 · Score: 1

There is no new threading model. The thread APIs are the same, so when you install the right kernel and glibc all apps will benefit.

True, but note that if they just drop this stuff straight into glibc there could be some application incompatibilities. Ulrich Drepper explicitly mentions this: if your app relies on (read: works around) any of Linux's former deviations from the POSIX thread standard, it will break now that glibc pthreads are POSIX-compliant.

At the prospect of breaking ABI compatibility (something the glibc developers are loath to do - glibc 2.0 was supposed to be the last non-backward-compatible C library for Linux) - I imagine they will tread rather carefully when it comes to "transparently" replacing Linuxthreads with NPTL.

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

Re:How long before by _Knots · 2002-09-21 19:03 · Score: 2

Be careful who you call a dumb fuck. Netscape had a functional browser long before IE3, aguably the first usable version of IE. And it would not surprise me if Netscape 1 predated IE 1, though I can't say I know that for sure.

Speeding The Net is an excellent book about Netscape vs Microsoft, in case anybody cares (it's been a long while since I read it, thus why my date memory is rusty).

--
Anarchy$ dd if=/dev/random of=~/.signature bs=120 count=1

Great... Now every lamer with no design knowledge by Alex+Belits · 2002-09-21 19:03 · Score: 2

...will start writing horrible monsters running hundreds and thousands of threads, and their creations will suffer from all other shortcomings of that decision.

--
Contrary to the popular belief, there indeed is no God.

In light of recent news.... by Gyorg_Lavode · 2002-09-21 19:26 · Score: 1

the development kernel will now be nicknamed "Heroin Spider".

(I'm sorry. I had to do it.)

--
I do security

How *I* got kicked out of the computer lab by Naikrovek · 2002-09-21 19:45 · Score: 3, Funny

I ran this in DOS:

prompt "Enter Password:"

No one could figure out that all i did was change the prompt from "$P$G" to that, and everyone was asking what the password was. haha, good old teacher was infinitely frustrated as well! IT WAS BEAUTIFUL.

I got kicked out for a year (not beautiful).

big deal by leomekenkamp · 2002-09-21 19:53 · Score: 4, Funny

100.000 threads? What nonsense; everybody knows that no computer would ever use more than 640.

--
Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.

Re:big deal by sg_oneill · 2002-09-21 20:28 · Score: 2

wry. verry wry. *g* (mod it up)

--
Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
Re:big deal by leomekenkamp · 2002-09-22 04:28 · Score: 1

That comment was definitively posted as a joke.
I even wonder who would consider it flamebait...

--
Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.

Hooray for fixing the dynamic linking problem! by Foresto · 2002-09-21 19:57 · Score: 2, Interesting

It looks like speed isn't the only improvement they've made with this library. From the notes:

" - - libpthread should now be much more resistant to linking problems: even if the application doesn't list libpthread as a direct dependency functions which are extended by libpthread should work correctly."

This ought to be a big help for those of us who write plug-in modules for servers like Apache 1.x and PHP. The existing thread library doesn't work properly unless the program executable explicitly links to it, which means that my shared libraries can't take advantage of standard thread management such as pthread_atfork().

Does this help Apache 2.x by mustprotectdata · 2002-09-21 20:16 · Score: 2, Interesting

Given that Apache 2.x can utilise threads as well as processes, does this mean that you can configure a large web server with, say "MaxSpareThreads 1000000" so that you can cope when you're slashdotted ;-)?

100000?! by thelexx · 2002-09-21 20:23 · Score: 3, Redundant

640 should be enough for anybody!

LEXX

--
"Gold still represents the ultimate form of payment in the world." - Alan Greenspan, 1999

Group think by Subcarrier · 2002-09-21 20:27 · Score: 1, Offtopic

In any large group of people you will find a few idiots, a few luminaries, and a great number of average thinkers. Sometimes the only thing that separates idiots from luminaries is their lack of social grace. Welcome to democracy.

--
"I have opinions of my own, strong opinions, but I don't always agree with them." -- George H. W. Bush

but will it know to presoak my wash? by Rooked_One · 2002-09-21 20:37 · Score: 1

Or perhaps know which part of the banana peel is the good part to smoke? =) GOOD JOB OPEN SOURCE!!! KEEP AT IT!

Re:How long before by mumkin · 2002-09-21 21:04 · Score: 1

Netscape is the direct descendant of NSCA Mosaic, the Ur browser. Frankly, I don't remember what the big deal about Netscape 1.0 was, relative to Mosaic, but there was much hype. Maybe something really hardcore, like introducing background colors?

Re:How long before by MyAss · 2002-09-21 21:31 · Score: 1

Actually I think the two big features was the "stop" button and the ability to open more than one connection as a time. (So it would load the images much faster)

--

They misunderestimated me. -- George W. Bush

100,000 parallel threads by inode_buddha · 2002-09-21 22:28 · Score: 1

This ought to make RedHat, Dell, IBM, and Oracle very happy, given a few of the newer contracts with large retailers using Oracle's back-end... if you read the article closely you notice that RH takes the claim for sponsoring a bunch of the work involved in developing this.

--
C|N>K

Wow! by mnordstr · 2002-09-21 22:28 · Score: 2

Combine this with Apache2's Multi-threaded or Hybrid MPM and you'll have a heck of a web-server!

the linux c10k problem - solved? And Java? by Anonymous Coward · 2002-09-21 22:47 · Score: 1, Interesting

Does this solve the c10k problem? As I can start a thread for every socket? See the C10K problem

And does this mean the Java will start to really scale on linux?

Re:the linux c10k problem - solved? And Java? by Mr+Z · 2002-09-22 14:31 · Score: 1

And why, perchance, does it remain unsolved without actual investigation?

There is an issue of timeslice granularity -- each thread will only get a timeslice occasionally when there are many threads. So, you'll need to crank up HZ a bit and/or service multiple clients per thread.

The main thing, though, is that if you consider that the threads share their VM (that is, their program code and static data), you could probably do 10000 clients without too much heartache. The "COW" space (the amount on data that's private due to "copy on write") is the bounding factor.

Now, 10000 clients with 100% dynamic content? Ha. Too many dirty pages. 10000 clients with largely static content? Sure -- just sendfile it all. :-)
--Joe

--
Program Intellivision!
Re:the linux c10k problem - solved? And Java? by Mr+Z · 2002-09-23 02:04 · Score: 1

Yes, I read the page. The person I was responding to was only talking about RAM usage for the stack, so that's all I was talking about.
--Joe

--
Program Intellivision!

Re:Alternative headline by himi · 2002-09-21 22:51 · Score: 3, Insightful

Alternatively, you might want to consider that Linux's scheduler was very nicely tuned for far and away the most common case - where you have only a small number of running processes.

Likewise, threading support under Linux has been oriented towards what the developers considered sane: a fairly small number of threads. They had good reasons for considering that the right way to do it - for a start, it worked nicely for what they wanted, and it was sufficiently simple that they didn't have to put in lots of complex code. Further, it's almost never a good idea to have a program architecture that requires very large numbers of threads - it generally only shows up in naive code where people simply don't understand the problems it brings. So, as far as the kernel developers were concerned, stupid people hurting themselves wasn't something to put any effort into amelioriating. This has changed recently, as people have started using Linux in areas where this kind of thing /isn't/ insane, and hence these new developments have come along.

You need to understand the reasoning behind a lot of these decisions before you can start complaining about them. First and foremost, you simply /have/ to realise that the kernel developers care about how people actually use the system, rather than crappy benchmarketing numbers. These developments have come about because people needed them, and they didn't happen earlier because no one had needed them before. Go back and read the last few years of the lkml archives, and /then/ come back and talk about this kind of thing, when you understand /why/.

himi

--

My very own DeCSS mirror.

POSIX compliance ahead? by rkit · 2002-09-21 23:15 · Score: 2, Informative

Scalability is a good thing, no doubt about that. However, there is another aspect that should be pointed out: the current thread API in linux is quite different from the POSIX specification and somewhat crufty. Just to mention the biggest problems:
missing cancellation points: testing whether a thread has been cancelled should be done in lots of system calls, but linux pthreads do not support this. Instead, you have to call pthread_testcancel() before and after every such call. A real drag.
signal handling: linux pthread signal handling is very different from the POSIX specification. However, proper signal handling is crucial for any real world application.
fork() will not work as expected. This is a real nuissance if you want proper daemon behaviour for your application.
documentation of linux-specific behaviour is poor. As a result, most of the existing literature on thread programming is pretty useless for linux.
All these points can be worked around, for sure. Nevertheless, it makes writing portable software a nightmare. Porting threaded software to linux, well ... All in all, linux threads really need much better integration with the standard system API. A lot of applications could profit from multithreading. Just think of GUI responsiveness. Also, using threads makes some programming tasks much easier. No need for asynchronous hostname lookup, for example.
A solid, well documented, standard conforming threads implementation will make linux a much nicer environment for serious programming than it already is. I am really looking forward to this.

--
sig intentionally left blank

Re:POSIX compliance ahead? by inode_buddha · 2002-09-22 01:30 · Score: 2, Interesting

Nobody ever said that linux-specific behavior is POSIX-compliant. Last I heard, POSIX is not about the specifics of any given UNIX-compatible or class of system. Rather, it attempts to be the abstraction and distillation of those class of systems, as codified by The Open Group. Please correct me if I am wrong in this idea. Linux simply simply "aims to be..." POSIX-compliant, as promulgated by the LSB, the FHS, et al. --

That all said, I totally agree with you -- especially regarding cancellation points, fork(), and documentation.

Please bear in mind that much of this behavior will be inherited from whatever libc it it compiled against. IMO, this simply shows the power of C, nothing else.

The above scenario simply points out the differences between OpenGroup/POSIX and GNU/FSF... if things like that "bug" you (no pun intended, seriously), then perhaps you should recompile with whatever "-- posixly-correct" options you have available.

And yes, I have a copy of the SUSV3 spec right here, in fact.

--
C|N>K
Re:POSIX compliance ahead? by rkit · 2002-09-22 03:10 · Score: 1

Yeah, I know, linux only aims at POSIX compliance... Despite some things in linux pthreads that are non-optimal, performance is reasonable, and everythings runs quite stable, once you have found out how to do it, so my critique may sound unfair. I notice you do not agree on my complaint about signal behaviour, and to be honest, this is a somewhat arcane thing also on other systems :-(
However, my main point is that there is no real smooth integration of threads in glibc. IMHO this is much more important than largescale scalability. These guys are working to change that, which is a Very Good Thing(TM). As it is, using threads only makes sense if you really need them, and have time at hand to explore the nifty details that must be correct for a real world application.

--
sig intentionally left blank
Re:POSIX compliance ahead? by Nevyn · 2002-09-22 07:17 · Score: 1

*sigh* feel free to read the data pointed to by /.
signal handling: linux pthread signal handling is very different from the POSIX specification. However, proper signal handling is crucial for any real world application.
fork() will not work as expected. This is a real nuissance if you want proper daemon behaviour for your application.

These were both fixed as part same rewrite that helped Linux scale pthreads to 100s of thousands or threads.

Not sure about the cancellation points, but the library was fairly largley rewritten ... so I'd check that data too as it may well be out of date.

--
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Re:POSIX compliance ahead? by inode_buddha · 2002-09-22 17:32 · Score: 1

True, there doesn't seem to be any smooth way for glibc to deal with this... I must have missed your point then. Please don't misunderstand about "fair" criticism; constructive criticism is always useful, and I certainly didn't think your comment was "unfair" -- just realistic. I truly hope that there are clues in this "mini-conversation" that everyone can use.
It's great that this sort of thing is being dealt with, though I'll probably never use it on my desktop to it's fullest extent. I've applied the relevant patches to my plain-vanilla 2.4.19 kernel and rebuilt; on my main workstation (dual P3/1gB ram) the difference becomes noticeable (in a good way) under heavy loads but not on the day-to-day basis. I need to find some way to measure things regarding that, if you have any ideas...

--
C|N>K

Re:Windows comparison by IkeTo · 2002-09-21 23:56 · Score: 1

Okay, where did you come up with the 64K figure, and also the 1 mega (megi) byte figure?

All Intel processors have 4KiB pages. Each Linux thread has two things of its own: its own stack, which can be as small as 1 or 2 pages if the code to run is simple enough, and also its own task_struct, which is 1 page including kernel stack for the thread. So all in all, you need 12KiB for each thread. Multiplying with the 100000 figure you get 1200000KiB or 1.144GiB, which is quite affordable for a 2GiB system.

NGPT by p3d0 · 2002-09-21 23:57 · Score: 2

Then, with NGPT (Next-Generation Posix Threads), those 100,000 threads would be in user space and may be even cheaper.

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

Re:NGPT by Nevyn · 2002-09-22 07:20 · Score: 1

Benchmarks are showing NPTL to be about 4x faster than the IBM NGPT library (Note that NPTL was written knowing about the IBM work, and knowing how to make it better).

--
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Re:NGPT by benh57 · 2002-09-22 07:38 · Score: 2

Nope. Apparently NPTL is Four times faster than NGPT.
Re:NGPT by p3d0 · 2002-09-22 23:56 · Score: 1

Hmm. I was wondering what would happen if they implement m:n on top of NPTL, but it would probably just slow it down and introduce the headaches of two-level scheduling.
Thanks for the link.

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

Re:Alternative headline by croftj · 2002-09-22 00:36 · Score: 2, Insightful

I think we need to pull some old stats out of our ass. This paper is about athe 2.2.x kernel. Correct me if I'm wrong, but hasn't there been massive overhauling of the 2.4.x and 2.5.x kernels in the scheduling area?

I think I'll just slam XP performance based off of NT benchmarks and aricles. What the hell, thier both from MS the argument must be a valid.

Get a grip!

--
-- Many men would appreciate a woman's mind more if they could fondle it

Re:How long before by Zeinfeld · 2002-09-22 00:42 · Score: 2

Netscape is the direct descendant of NSCA Mosaic, the Ur browser. Frankly, I don't remember what the big deal about Netscape 1.0 was, relative to Mosaic, but there was much hype. Maybe something really hardcore, like introducing background colors?

Wrong in every respect.

First Mosaic was not the 'Ur browser'. Tim's NextStep browser was. Mosaic was browser number 15 or so. The significant things about Mosaic were that 1) it actually compiled without having to hack the code yourself or mess with 6 different support packages like tkwww and 2) it was the first X-Windows browser that did not look really amateur.

Second, Netscape does not contain any code from Mosaic, although it was written by the same main author - Eric Bina. NCSA sold the commercial rights to Mosaic to Spyglass.

Third IE was originally based on the Spyglass code, so if any browser is 'the direct descendant' it would be IE. Go look at the 'about' box on IE, although the original Mosaic actually had more lines of CERN code than NCSA code which were never acknowledged.

--
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/

Re:Windows comparison by Anonymous Coward · 2002-09-22 00:49 · Score: 1, Informative

The 1 MB is the default stack size for every Posix thread. It takes some effort to determine what the smallest valid stack size since PTHREAD_STACK_MIN doesn't specify enough space for the start_func stack frame. The parent post merely stated that the default is 1MB and you have to work to lower that, and he is correct. If the grand-parent poster was just creating threads without specify a stack size, he would run out of RAM pretty quickly.

I wouldn't want to guess where the 64k figure came from.

Wait, there's more by Quixote · 2002-09-22 01:11 · Score: 2

From the ensuing discussion on the list:
Ingo:...Anton tested 1 million concurrent threads on one of his bigger PowerPC boxes, which started up in around 30 seconds. I think he saw a load average of around 200 thousand. [ie. the runqueue was probably a few hundred thousand entries long at times.]
Wow.. this is pretty good.The ability to spawn & run 1 million concurrent threads should keep even the most demanding users happy for a few years...

OTOH, I hope this post doesn't become the butt of jokes a few months from now ("and you thought 1 million was a lot! Ha! My Palm 5000XL does more than that!")...

Now that's impressive by Jasa · 2002-09-22 02:00 · Score: 1

I was begining to think this sites standards were getting low:-

Portable MP3 Players (Done before)
Net shotting guns (1960s James Bond movies)
Build your own sub woofer (My friend built a 500mm (1'8") X 500mm X 1000mm(3'4") Sub Woofer in 1988 and then put it in his Ford Transit)
Tiny Linux boxen (Seen 1000s of these and *BSD boxen as well)

But 100,000 Threeds on Linux now that's impressive, too bad it won't make one iota of a difference to most of us who use Linux for just reading /.

--
-Jasa -- Linux - The SOURCE will be with you, ALWAYS

Re:Alternative headline by himi · 2002-09-22 02:18 · Score: 1

If you want to do stupid things with your programs, that's fine by the kernel developers. Just don't expect /them/ to bend over backwards to make /your/ stupid design work as well as you want it to. That's your problem, and no one elses.

himi

--

My very own DeCSS mirror.

2.5 - 2.6? by Dunkirk · 2002-09-22 02:34 · Score: 1

Since I absolutely suck at getting kernels from source to work correctly (I never get everything in there that I need I guess), the question is: When does all this great stuff reach production? (To then be pre-packaged by RedHat, et. al.)

--
Acts 17:28, "For in Him we live, and move, and have our being."

Re:2.5 - 2.6? by psamuels · 2002-09-23 16:22 · Score: 1

When does all this great stuff reach production? (To then be pre-packaged by RedHat, et. al.)

2.6 is supposed to feature-freeze on All Hallows' Eve, and there is enough buzz about this that it may actually happen on schedule. Expect a few months of shakedown before 2.6.0 gold, then (if they're smart) Red Hat will pound on it for a couple monts more, minimum, before selling it to the people to whom they promise a stable OS.

Then again, I guess it's possible that Red Hat, being the sponsor for this work, will backport it to their 2.4-based kernels and package it up for the enterprise. (*shudder* - Doesn't "enterprise" mean "business"? Why does it always, in the computing world, imply "big business"?)

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

Faster app start times? by Pope+Raymond+Lama · 2002-09-22 02:46 · Score: 1

(about 14 minutes 58 seconds faster than with earlier Linux kernels)
. Ok, it is a genuine and serious question I have:
Were these 15 minutes extra responsaible for the extra painfull
long start times of apps like Mozilla, and Openoffice?

If so, as soon as I upgrade my
distro, I will boot it into 2.5.

--
-><- no .sig is good sig.

Re:Faster app start times? by Mr+Z · 2002-09-22 14:44 · Score: 1

It'll have nothing to do with it. Mozilla is at most a handful of threads.

The bulk of the startup time for those apps is in dealing with all the dynamic linking and in faulting in all the code pages from disk. There's also a fair bit of time in loading non-code resources from the various files the app keeps. (eg. all the XUL and chrome that defines the user interface in Mozilla.)

One way possibly to speed up loading large apps is to staticly link 'em (icky!) and then do "cat app > /dev/null" before starting it so it's in the disk cache first. That'll help a little. It won't solve the data-file issue, but then you could cat all those to /dev/null prior to app startup also.

The reason catting a file to /dev/null gives a speedup is that the cat operation is linear. Thus, it gets good utilization from the disk. (assuming your filesystem isn't heavily fragmented, that is.) When an app relies on page faults to bring its code in from disk, the faults may be pseudorandomly spread throughout the file. Further, they may be timewise further apart, so the various I/O clustering algos (eg. elevator seek) won't cluster the reads.

That said, you're still stuck with the CPU time the app takes to initialize and configure itself internally. Starting a statically linked Mozilla or OpenOffice from a RAM disk would still take a non-trivial amount of time.
--Joe

--
Program Intellivision!
Re:Faster app start times? by psamuels · 2002-09-23 16:36 · Score: 1

One way possibly to speed up loading large apps is to staticly link 'em (icky!) and then do "cat app > /dev/null" before starting it so it's in the disk cache first.

ELF runtime linking is inefficient and does unpleasant things to your COW pages. I haven't investigated this in full, but Jakub Jelinek of Red Hat has written a program to "pre-link" your ELF libraries and executables, so that you can mmap, sanity check, and go without any runtime relocation.

Not a new idea, actually - IRIX, I believe, solves the same problem in pretty much the same way. Digital Unix attempts to avoid it by maintaining a cache of "allocated" address ranges and letting the compile-time linker read and update said cache, in the hopes that no libraries' addresses will clash at runtime and thus no relocations would need to be made. But then, DU had the luxury of a 64-bit address space, so they could afford to "waste" it as it will never run out. (Please no jokes about 640k of address space being enough for anybody - except from people who actually understand how huge a 64-bit address space actually is.)

That said, you're still stuck with the CPU time the app takes to initialize and configure itself internally.

That's what the Emacs unexec solution is for!

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
Re:Faster app start times? by Mr+Z · 2002-09-24 13:59 · Score: 1

ELF runtime linking is inefficient and does unpleasant things to your COW pages. I haven't investigated this in full, but Jakub Jelinek of Red Hat has written a program to "pre-link" your ELF libraries and executables, so that you can mmap, sanity check, and go without any runtime relocation.

Awesome! Any word on when this will be 'production worthy'? Also, do you know of a website that has any performance numbers or other details?
--Joe

--
Program Intellivision!
Re:Faster app start times? by psamuels · 2002-09-24 22:55 · Score: 1

Any word on when this will be 'production worthy'?

No, the first I heard of it was as I was poking around a Red Hat ftp archive looking for some RPMs for chasing a bug reported by a RH user (turns out he was using an old version of my package, but I digress...) and I noticed the prelink package sitting there somewhere. It sounded a lot like the IRIX thing - and I've thought off and on over the years that it would be nice for Linux to have that - so I read the README.

Gooogle for "jakub jelinek prelink" - it'll tell you a lot more than I can.

Also, do you know of a website that has any performance numbers or other details?

Thanks to Google we have his original announcement, where he claims to have reduced the startup overhead in Konqueror (before opening its first X window) from 0.510 seconds to 0.011 seconds. Of course this is a best-case scenario, as prelinking will be the most noticeable for big binaries with lots of shared library dependencies.

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

Not to mention.... by SwedishChef · 2002-09-22 02:53 · Score: 2

Your egregious use of the word "egregious".

--
No one ever had to evacuate a city because the solar panels broke!

Re:Ur browser??? by Anonymous Coward · 2002-09-22 02:59 · Score: 2, Informative

I can only suppose you don't know what Ur is, maybe because you come from a very different culture...

Anyway, and I'm really not well qualified to answer this, Ur was an ancient city-state from which a prominent ancestral of the Jewish-Christian-Islamic heritage (Abraham, if I'm not wrong).

This city, IIRC already found, was sumerian (I'm not sure about this), the folks who are said to be the inventors of the wheel, among other neat things.

So an Ur browser would be the primeval browser, in other words.

Upon writing a note, one must be sure it will be understood; nonetheless, the "Ur" mention boosted the note level way up. All in all, I think it was great and I'm all for it.

But explanations as these sometimes become necessary.

Mod this up, please (was: POSIX compliance ahead?) by Observer · 2002-09-22 03:22 · Score: 2

See subject. A useful 'heads up' post for folks like myself who tend to assume that Linux will follow the general Un*x-family behaviours we're familiar with from the commercially-sold variants.

And yes, I would of course ;) check this assumption if I were to do some significant implementation for the Linux platform.

Imagine a Beowulf Cluster of this! by SirDaShadow · 2002-09-22 04:05 · Score: 1

yeah, had to say it, first time I do :)

hmm by luphus · 2002-09-22 04:07 · Score: 1

My name is Ingo Molnar.
You kill my father - prepare to die.

er... sorry about that, I won't do it again :)

-nwp

Re:hmm by Graymalkin · 2002-09-22 04:14 · Score: 2

Don't you mean

My name is ingo Molnar.
You kill -15 my parent process - Prepare to die.

--
I'm a loner Dottie, a Rebel.
Re:hmm by luphus · 2002-09-22 04:25 · Score: 1

heh, very nice :)

-nwp

user-level threads are useless by RelliK · 2002-09-22 04:33 · Score: 2

User-level threads cannot take advantage of multiple CPUs. True, they are somewhat faster on a single CPU system due to lower overhead, but that's all they are good for.

--
___
If you think big enough, you'll never have to do it.

ACE is nice for big systems by 0x0d0a · 2002-09-22 04:36 · Score: 2

ACE is nice for big systems.

But it's also way overkill for small stuff. It's a whole distributed framework, not a wrapper around pthreads.

--
May we never see th

Re:Windows comparison by Courageous · 2002-09-22 04:36 · Score: 2

It's a Windows limit, and it's in the documentation.

C//

Re:Windows comparison by Courageous · 2002-09-22 04:44 · Score: 2

The 64K page size is Windows' page size. I can only assume that the poster stating that the intel hardware page size is 4K. I would suppose this means that a Window's (2K,NT) page of 64K is assembled from 16 hardware pages, then. The Windows' page size of 64K is in their documentation. I never paused to think about how this interfaces with hardware pages...

C//

Will `top' and `ps' be fixed? by truth_revealed · 2002-09-22 04:59 · Score: 2

Currently in Linux every thread is assigned a distinct process ID, and as such, a process has as many entries in `top' and `ps' as it has threads. This makes it difficult to monitor processes externally, or even see the other processes' information. Has this issue been addressed? (I realize this is a user-space program issue, not a kernel issue).

Re:Will `top' and `ps' be fixed? by psamuels · 2002-09-23 16:39 · Score: 1

Currently in Linux every thread is assigned a distinct process ID

Ah, but not any more. With the new NPTL, the getpid() call will return the same number for each thread in a process. I don't actually know how or if this perturbs the /proc data, though I expect it doesn't, meaning that the procps utilities will still have to be rewritten to properly portray the new abstractions.

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

Multithreaded core files on Linux? by truth_revealed · 2002-09-22 05:11 · Score: 2, Interesting

I can't seem to find any info on whether Linux core files still produce one core file per thread or just one core file per process (as does Solaris). Has `gdb' been enhanced to handle multithreaded programs (or multithreaded core file) on Linux? If I have a thousand threads - I sure don't want 1000 core files in the event of a crash. Is there a way around this?

Re:Alternative headline by Dahan · 2002-09-22 06:22 · Score: 2

Correct me if I'm wrong ...

Okay, you're wrong. This O(1) scheduler in 2.5.x is the "massive overhauling." (Yes, the patch has been around for a while... but as the article says, it's only recently been merged into 2.5)

Re:Windows comparison by be-fan · 2002-09-22 06:41 · Score: 2

Err, Windows NT does use the native 4KB page size on Intel, but is designed to be expandable to systems with up to a 64KB page size. As a result, certain operations (like the reserve mapping that goes on for the thread stack) aligns data in 64KB increments. IIRC, there is also 64KB of virtual slack between memory mapped objects as well.

--
A deep unwavering belief is a sure sign you're missing something...

Re:Windows comparison by sagei · 2002-09-22 06:49 · Score: 3, Informative

Each Linux thread has two things of its own: its own stack, which can be as small as 1 or 2 pages if the code to run is simple enough, and also its own task_struct, which is 1 page including kernel stack for the thread.

This is not true; the kernel stack is two pages in size, i.e. 8KB on i386.

Also, in 2.5 (where these tests were done), the task_struct is no longer allocated on the stack. It is allocated off the slab cache, while the thread_info struct is on the stack. The task_struct slab object is another ~1.7KB per task.

Finally, I do not know what the pthreads default stack size is (user-space? what is that?) but it is certainly larger than one page.

--

Robert Love

Other similar mischief... by TheLink · 2002-09-22 07:33 · Score: 3, Funny

Some guys I know copied a Windows error dialog box and set it as a background image for the desktop, centered.

Imagine the poor victim vainly clicking on the buttons, and getting more and more worried. Said victim actually rebooted the machine to see it reappear, and was not happy when he started to notice the sniggering bunch behind him...

For example pic:
http://www.adobe.com/support/techguides/oper atings ystem/windows/winerrors.html
Probably want to replace CCmail with Explorer or something more dear to heart ;).

I also installed a bluescreen STOP screensaver on April Fool's day on a colleague's PC. Heh, he was shocked enough to actually called another colleague over and made the usual worried mumbles.

http://www.sysinternals.com/ntw2k/freeware/blues cr eensaver.shtml

Since I had admin privs, I was also tempted to have ad.doubleclick.net and similar dns names to resolve to a private webserver which served out custom banner ads.

Wonder how users would take it if they see the "Staff Meeting at 2pm banner ad". Or "Company Slogan here". Or "Big boss is watching you!". Or for search result sensitive ads: "Stop downloading mp3s/movies/porn!"

I could actually justify that as a useful application. It's probably more useful than a doubleclick ad...

But I'd probably need the 100K parallel thread kernel to serve up all those ad banners :).

Bwahaha!
Link.

--

Too many replies beneath your current threshold

Finally we have an answer to Sco unixware by Billly+Gates · 2002-09-22 08:09 · Score: 2

sco and solaris both can create threads 10,000 times faster then the current linux kernels according to sun's and sco's marketing departments. My guess is that this was exagurated but is one of the benefits of the big unix's. Heavily threaded linux apps have been rumoured to fly on unixware where they would run slower on their own native platforms! I guess Linux is maturing in this aspect. Does anyone who knows anything about unix/linux threading care to comment? I wonder if this will help linux in server environments.

--
http://saveie6.com/

Re:Windows comparison by Saint+Stephen · 2002-09-22 08:19 · Score: 2, Informative

I've created over 200,000 process on a PIII 550 laptop with 256 mb of ram running Windows XP. Of course, it took a while (swapping).

The process is called nothing.exe. Source Code: int WinMain(...) {Sleep(INFINITE);}

I work at a lab, so I also ran it on a Compaq 8-way with 4-GB of ram. It worked but I don't remember how fast it went.

However, there is a big gnarley limit in Windows that will limit the # of processes: the amount of memory allocated to virtual desktops or something. We researched it -- Look it up. This is why you get limited to a few thousand processes or threads if they all do GUI stuff. The bad thing is basically any function you call in user32 will register the thread as a GUI thread. It explains it all in the book Inside Windows 2000.

Not meaning to troll, I'm just going to share basic fact: It sucks that Windows threads are so expensive, but tens of thousands of threads *DOES* suck (read: thread per client) on Windows. However, this is not the same thing as saying Windows doesn't scale -- you just have to code it differently. (Check out how many SQL Server uses when it's processing thousands of clients.) Stuff like IO Completion ports, AWE memory, and Scatter/Gather IO is the way that you have to go.

Just because you *can* create hundreds of thousands of threads, doesn't mean it's a good idea or that your app won't run like shit on a 32-CPU machine!

how does anyone know? by XO · 2002-09-22 08:47 · Score: 1

i've tried to bring 2.5.37 up on 5 different machines, and they all crash anywhere from "OK, booting the kernel..." (hard lock) to getting all the way down to loading SCSI drivers, and getting "Powering off device 0." and then locking up.

--
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/

Re:Windows comparison by pVoid · 2002-09-22 09:33 · Score: 1

I think I know the limit you are talking about: it's a handle limit in the GDI subsystem.

As for the 200k processes taking time to launch, it is quite normal, as launching a process is much more heavy than just launching a thread.

The 2k threads I created were created in 700ms. which is very acceptable in my books.

And to confirm, yes, creating so many threads ain't the best idea.
Someone else mentionned thread pools as being a workaround, but only a workaround. I personally think thread pools are actually a way of doing things, and not a workaround for slow thread creation. In fact there are new WinNT APIs for thread pooling.

yada yada... I don't think anyone will actually ever read this post =)

Solaris Deathnail? by zapatero · 2002-09-22 09:33 · Score: 1

In the systems programming world threading, thread
scheduling, and signal processing in threads, was always considered Linux' primary weakness; and was the main strength of Solaris, especially for applications running in the telecom space. But with this announcement, I can see Solaris' last tech superority over Linux crumbling.

I think Sun will need to quicken its re-invention pace.

-j

Re:Windows comparison by pVoid · 2002-09-22 09:42 · Score: 1

Yes.

Minimum loadable Memory Section in windows is 64K. I guess a thread creation creates a new stack on a newly created section boundary.

New locking primitive, "futex" by Animats · 2002-09-22 10:02 · Score: 2

Hidden in the article was a reference to a new locking primitive, futex. I don't see a manpage on line for it, though. Where is this documented?

Re:New locking primitive, "futex" by Mr+Z · 2002-09-22 16:19 · Score: 1

A futex is a mutual-exclusion primitive that's based around file and file operations rather than some heavier mechanism. The idea is that the kernel doesn't keep track of all the folks locking the file and so on, with all the extra separate bookkeeping to handle when processes exit and so on. Futexes live mostly in user-space, making them very fast.

I don't pretend to know a fraction of the details--only that I've heard of them before and what their benefits are, roughly. Read more here.
--Joe

--
Program Intellivision!
Re:New locking primitive, "futex" by (startx) · 2002-09-23 13:32 · Score: 2

There is some very primative discussion of it here .
Re:New locking primitive, "futex" by Animats · 2002-09-23 14:26 · Score: 2

If it's becoming a kernel feature, there should be more documentation than some comments on the kernel mailing list. But I'm not finding any.

This may not even make it INTO 2.5.x... by Wolfrider · 2002-09-22 10:12 · Score: 2, Interesting

See here ( http://lwn.net/Articles/9632/ )
and here ( http://lwn.net/Articles/10248/ )

--Linus is being pigheaded about this patch, wanting to "keep the code simple" instead of implementing Ingo's **fast** + Fixed solution.

To quote LWN:
[ So it's fast - though a few extra features have been requested. But this patch has stirred up a bit of a debate. Rather than put in a complicated new PID allocator, it is asked, why not just make the maximum PID be very large? Then, in theory, the quadratic part of get_pid() will never run so the performance problems go away, and the code stays simpler. Linus prefers this approach, as do a number of other developers; he has put a simple patch along these lines into his pre-2.5.37 BitKeeper tree.

Ingo disagrees, pointing out that any reasonable maximum PID size can be exceeded eventually. He would rather fix the problem than try to hid it behind a large process ID space. In the absence of real-world examples that show people being bitten by get_pid()'s behavior in a larger PID space, though, Linus appears unlikely to accept any more complicated fix.
]

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??

Re:This may not even make it INTO 2.5.x... by psamuels · 2002-09-23 15:22 · Score: 1

--Linus is being pigheaded about this patch, wanting to "keep the code simple" instead of implementing Ingo's **fast** + Fixed solution.

Linus raised some valid objections, Ingo reworked some of the code, and it's all in 2.5.36. No worries.

[ So it's fast - though a few extra features have been requested.

Did you click on the "few extra features" bit? It's the funniest l-k post I've seen in some time. That whole sub-thread (sorry) was pretty good.

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

Linus didn't think much of O1 scheduler by jelle · 2002-09-22 10:22 · Score: 3, Interesting

I remember that Linus made a remark that he tought that the O1 scheduler wouldn't impact Linux much at all, and that its development would not be a biggie for Linux, downplaying the importance of what it can achieve. Go Ingo for keeping at it!

--
--- Hindsight is 20/20, but walking backwards is not the answer.

Re:Careful there... by Dahan · 2002-09-22 11:22 · Score: 2

I have carefully considered your points, and only have the following to say:

Consider that the Linux scheduler hasn't changed significantly in those THREE years.
Consider Ingo Molnar's post on the subject.
Consider providing some evidence for your position, rather than just saying that I'm wrong.
Bulleted lists are pretty. I can do that too.

If you guys have some evidence that the paper I referenced is no longer valid, please post it (or references to it). Don't just tell me "oh, that paper's ancient; things are different now."

'Cuz up until fairly recently, they weren't.

P.S. And if anyone wants to compare Windows XP's scheduling performance with NT's, be my guest... I don't think you'll see much of a change. Remember that XP is just NT 5.1, and I haven't heard about any significant performance improvements in NT's scheduler. (The only vaguely scheduler-related change I remember is the addition of "fibers" in NT 4.0 SPsomething (3?))

Java and Linux by wwi · 2002-09-22 12:23 · Score: 1

The threads issue needs to be solved, and
soon. We are using Java with Linux
and get regular hangs. Conversations with
IBM's Java support indicates that
this is a problem with the Linux kernel,
Java thread design, and underlying
thread libraries on Linux. And no,
we are not running thousands of
threads, just two Java programs
on a 2 CPU SMP machine.

We eagerly await a fix.

Re:Windows comparison by red_gnom · 2002-09-22 12:50 · Score: 1

So,in other words when it comes to comparing threads, size does matter.

Woosh by p3d0 · 2002-09-23 06:22 · Score: 1

That's the sound of M:N threading whizzing past your head.

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

Re:Windows comparison by The+Panther! · 2002-09-23 10:04 · Score: 2

Hardware page size is 4KB, as was noted elsewhere. The key element that I haven't seen mentioned is that Windows' virtual memory system has several ways to 'allocate' memory. There's reserving pages, and there's committing pages. In the case where you tell the OS you want memory, it reserves pages. That is to say, it does not actually take memory from the free physical memory, but instead creates a contiguous address space large enough for your request, but allocates no hardware RAM at those addresses.

When you commit a page, either through accessing a page (read or write) that is not allocated, it trips a hardware fault if the VM hasn't mapped a page to the address, which then searches for a free page, then links them together.

The end result is, even if Windows does try to create 64k worth of memory segment space for a process, unless it is actually reading or writing to a byte in each 4k chunk, its internal VM will not allocate physical memory for the whole 64k. Furthermore, there's no such advantage or realistic way for the operating system to align anything in memory physically, except in AGP ram. The VM system handles physical pages of memory exclusively, but does not manage AGP-allocated memory (IIRC). In other words, though the OS can align the address space to anything it likes, the OS layer cannot request any physical allocation mapping or alignment. So that comment about aligning memory for processes is quite unlikely.

Now, the XBox (which runs a variant of the Win2k kernel) has a bit more control over VM, but it also does not support demand paging, so it cannot swap to the hard disk and give you RAM+HD effective memory. Shame, that. But, as a result, you have an API that allows hardware level allocation control. Still, the OS doesn't take advantage of it, AFAIK. It's for developers.

--
Any connection between your reality and mine is purely coincidental.

When does the .5 kernel actually become .6 ? by zaqattack911 · 2002-09-23 11:53 · Score: 1

In otherwords, I've read tons of articles about all the fancyness being incorperated into the .5 kernel.

When is it expected that it becomes stable? how long do I have to wait?

The more I read about this, the more I feel going with the .4 kernel (or linux at all rather), is a mistake for a serious production server.

Re:Windows comparison by Courageous · 2002-09-23 12:10 · Score: 2

The end result is, even if Windows does try to create 64k worth of memory segment space for a process, unless it is actually reading or writing to a byte in each 4k chunk, its internal VM will not allocate physical memory for the whole 64k.

Yes. Quite true. I hade a problem a while back on Windows which took me a bit of reading through the documentation (and verifying with some low level sys calls) to determine that what was happening is that I was running out of "reserve memory". Which is to say that, while I had plenty of physical memory left, all the address space had been used up. You can do this very easily by creating thousands of threads on your computer. To get a large number of these threads, you'll have to push the default stack size to its minimum, 64K. I was a bit disatisfied with this minimum, but I suppose I'll live with it now (or port to linux) if I have to, or upgrade to a 64 bit os if it becomes a practical limit in the future.

C//

Re:Windows comparison by pthisis · 2002-09-23 13:32 · Score: 2

Err, Windows NT does use the native 4KB page size on Intel, but is designed to be expandable to systems with up to a 64KB page size. As a result, certain operations (like the reserve mapping that goes on for the thread stack) aligns data in 64KB increments

That's boneheaded. Linux supports page sizes up to at least 4MB, but it doesn't align everything on 4MB boundries on the off chance that you might be using 4MB pages. It uses the appropriate alignments for the page sizes actually in use.

An OS that has dropped all support for non-Intel hardware citing a portability concern which doesn't exist in portable OSes? As they say in Snatch, "It's spurious, mate. Not genuine."

Sumner

--
rage, rage against the dying of the light

Re:Windows comparison by psamuels · 2002-09-23 15:52 · Score: 1

I was a bit disatisfied with this minimum, but I suppose I'll live with it now (or port to linux) if I have to, or upgrade to a 64 bit os if it becomes a practical limit in the future.

One of the nice things about Linux. You don't have to live with any of these 32-bit limitations if your application is big enough to justify 64-bitness. While Microsoft had NT running on Alpha, I understand it was essentially still a 32-bit OS - it was only truly ported to 64 bits when Itanium support was added. Linux, on the other hand, has had true 64-bit implementations running since '94 or '95, so you can be fairly confident that the niggling little 32-bit-isms have mostly been caught by now.

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

I totally don't get this story. by jefp · 2002-09-23 22:38 · Score: 1

My thread-creation benchmark can create 100,000 threads in 10 seconds on my 800MHz Linux machine at work. No idea what kernel it's running, but I'm sure it's not recent. Furthermore, my 450MHz home machine running FreeBSD 4.5 runs the same benchmark in only five seconds. WTF?

Re:Windows comparison by IkeTo · 2002-09-24 03:01 · Score: 1

> Finally, I do not know what the
> pthreads default stack size is
> (user-space? what is that?) but it is
> certainly larger than one page.

Why it needs to be larger than one page? The kernel will trap access to page faults due to stack overflow, and will allocate additional stack to it anyway.

Re:Windows comparison by Courageous · 2002-09-24 05:40 · Score: 2

Yes, I know you are right. Amongst other things, I won't be stuck with 64K per thread stack in Linux, and as you say, I could use 64 bit alpha linux. I'm looking forward to Hammer, actually.

C//

Re:Windows comparison by sagei · 2002-09-24 06:06 · Score: 2

Why it needs to be larger than one page? The kernel will trap access to page faults due to stack overflow, and will allocate additional stack to it anyway.

It does not need to be bigger than one page, it just is. You are right, the stack is expanded via implicit mmap as it grows... but for performance reasons the default stack is usually measured in megabytes, not pages.

Anything but the simplest of applications would use a page rather quickly. User-space applications are programmed to assume they have any size stack they want. Local variables are huge.

In short, I was just commenting on the default. It can surely be lowered...

--

Robert Love

What "c10k problem"? by tlambert · 2002-09-24 06:14 · Score: 2

I don't understand what the issue is here.

I was able to run 1,600,000 simultaneous connections with a modified FreeBSD kernel, in June of 2001. Couldn't get much work done, but at about 300 baud per conection, after dividing up a gigabit ethernet link... you shouldn't expect to do much work.

Without modifications, after a patch to the credential reference counting (since committed to FreeBSD 4.5), as long as a stock kernel is tuned correctly, it can still *easily* handle 100,000 simultaneous connections (16K of window space for each connection = 1.6G of mbufs).

-- Terry

So? by tlambert · 2002-09-24 06:17 · Score: 2

So? Use non-blocking I/O instead. Problem solved.

-- Terry

It is not broken. by 1101z · 2002-09-24 06:17 · Score: 2, Informative

No you will see a pid per thread because, that is how the scheduler knows to schedule things. The getpid() c library call from within the program. When they said it is a 1-to-1 mapping that means that there is a process per thread. Just look when you see all those proccesses with the same name, and see if they have the exact same memory usage. If they do it means they are using the same memory and are threads. No matter how you implement threads there has to be more than one proccess other wise when the program blocks for I/O all threads would be blocked.

--
One day people will learn the folly of Winbloze, Linux Rules!

Re: It is not broken. by psamuels · 2002-09-24 06:57 · Score: 1

No you will see a pid per thread because, that is how the scheduler knows to schedule things.

Indeed we have not got rid of the concept of one pid per thread. But we do have an additional number, the thread group ID. The tgid corresponds to the pid of the thread group leader, and this is what will be returned by getpid() in the new pthreads. (What we call a pid is, and has always been, actually a thread ID.) This is exposed to /proc via a field in /proc/{tid}/status - one need only hack the procps utilities to make something useful of this.

An interesting question came up on l-k recently: how to maintain (efficiently) the uniqueness of a tgid. What if you are a thread group leader, you spawn off some threads, then you die, and eventually your pid is reused. Then another process could be a thread group leader with the same tgid as your thread group. If I remember correctly, the fix is simple and elegant: if the thread group leader dies before its "siblings", it sits around in a zombie state until they all die, all the while preventing reuse of its pid.

Just look when you see all those proccesses with the same name, and see if they have the exact same memory usage. If they do it means they are using the same memory and are threads.

Not necessarily. Try the following:
for x in 1 2 3 4 5; do sleep 60 & disown; done ps ux | grep sleep

Notice how all five copies of 'sleep' have exactly the same name and memory usage, yet they are independent processes. If you really want to see evidence for threads, you need to check their tgids.

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

Re:Windows comparison by psamuels · 2002-09-24 06:37 · Score: 1

I'm looking forward to Hammer, actually.

Aren't we all? (:

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

Re:This Ingo Molnar ?= the author I. Molnar by haruchai · 2002-09-24 17:50 · Score: 1

Hey, dork! We've never seen anyone use a redirect link to the goatse.cx site before. Wow, you must be, like, you know, like, rilly brite. Gosh, me wants be smirt lyke ewe.

--
Pain is merely failure leaving the body

Re:Windows comparison by IkeTo · 2002-09-26 02:05 · Score: 1

> It does not need to be bigger than one
> page, it just is.

At least that isn't what suggested by the documentation of linuxthreads (in Debian testing). In E.5 it says the following, implying that the default stack size is really just 1 page.

E.5: Does LinuxThreads implement pthread_attr_setstacksize() and pthread_attr_setstackaddr()?
These optional functions are provided in recent versions of LinuxThreads (0.8 and up). Earlier releases did not provide these optional components of the POSIX standard.

Even if pthread_attr_setstacksize() and pthread_attr_setstackaddr() are now provided, we still recommend that you do not use them unless you really have strong reasons for doing so. The default stack allocation strategy for LinuxThreads is nearly optimal: stacks start small (4k) and automatically grow on demand to a fairly large limit (2M). Moreover, there is no portable way to estimate the stack requirements of a thread, so setting the stack size yourself makes your program less reliable and non-portable.

Ahhhh. So now it can by T.E.D. · 2002-09-26 04:06 · Score: 2

...run Ada 83 programs.

Re:Great... Now every lamer with no design knowled by dvdeug · 2002-09-27 20:42 · Score: 2

But while their threads will be slow, they will be to handle the text the users are entering; vastly more useful than the most optimized eight-bit character horror you would turn out.

Re:Great... Now every lamer with no design knowled by Alex+Belits · 2002-09-28 15:34 · Score: 2

Trolling is supposed to be:

1. Fast! Writing random mild insults almost a week after the original posting isn't as great as making a real-time flamewar immediately after posting.

2. Accessible to a potential reader. Referring to an obscure recurring theme of my rants made months away from this article (byte-value transparency of protocols vs. Unicode references in RFCs) would require a potential troll spectator a lot of googling before he will be able to appreciate your comment.

--
Contrary to the popular belief, there indeed is no God.

Slashdot Mirror

Running 100,000 Parallel Threads

247 of 387 comments (clear)