Should Servers be Mono-Process or Multithreaded?

← Back to Stories (view on slashdot.org)

Should Servers be Mono-Process or Multithreaded?

Posted by ryuzaki0 on Wednesday July 12, 2006 @12:50PM from the your-preferred-architecture dept.

An anonymous reader wonders: "How would you design the fastest possible Linux-based server application today? A few years ago, the thinking was that multi-threading was not the way to go — instead, high-performance servers used an event-driven, mono-process model (consider lighttpd and haproxy). However, things have changed. Today CPUs have dual cores, and over the next few years this is only likely to increase. Also, the 2.6 Linux kernel has made multi-threading much more efficient. So I'm wondering, does Slashdot think that modern high performance server software should be designed to be multi-threaded, or does it still make more sense to use an event driven, mono-process architecture, despite the advances in the Linux 2.6 threading and the arrival of multi-core CPUs?"

10 of 96 comments (clear)

Min score:

Reason:

Sort:

Re:combination by PhrostyMcByte · 2006-07-12 13:23 · Score: 2, Insightful

If you're going to be serving more than a few connections at a time, it's easy for threads to eat monstrous amounts of resources. It's better if you can handle network connections via a single thread.

I disagree. While using a single thread per connection is definately a retarded way to go, making I/O operations asynchronous and handling callbacks via a thread pool is definately the fastest and most efficient way to build a scalable daemon today. And once you get used to the idea, it is even much easier than using select()/poll()-based approaches. (disclaimer: I have no experience doing this in Linux)

If anyone is interested I have been slowly piecing together a library to make this easier for Windows developers.
Re:Debugging by samkass · 2006-07-12 13:39 · Score: 2, Insightful

>

Is this intended as flamebait? Performance in Java is as important as performance anywhere else. Most of the performance in any complex system depends more on the algorithms and how easy it is to debug a complex, fast algorithm, than on any inherent speed advantage of a particular language. I'd say the question of what design of network I/O and threading maximizes performance is at least as important a question in Java as any other language.

--
E pluribus unum
Neither. Mutiprocess usually Prevails by KidSock · 2006-07-12 14:03 · Score: 4, Insightful

You forgot multiprocess. Like anything in software the answer is, "it depends on the application". But one of the most overlooked and frequently very important factors that affects performance is cache locality. If the CPU has to fetch something from main memory (or heaven forbid it actually has to drudge it up from disk) the program has to wait. That wait time is often much much greater than the execution time of the target code. Aside from simply writing small code (that only get's you so far), one way to get better cache locality is to break up your processing into a pipeline. Mail servers frequently do this. One process will accept connections do some sanity checking and write the message to another process. The next process juggles addresses for routing and writes it to another process. That process might then work on delivery either locally or remotely. What happends (or what is supposed to happen under high load) is that one process becomes hot and processes as many messages as it can until the buffer to the next process is full. Then the next process runs processing all of those messages until it either runs out of stuff to process or cant write anything more to the next process in the pipeline. If you have multiple cores / CPUs this scales pretty well too.

But again, "it depends on the application". The above pipelining method only performs well if you're processing items in an assembly line fashion. If you're an HTTP proxy server you wouldn't want that model. You would probably want a single process libevent type of thing. I have some code that doesn't use either of those models. It's a multiprocess model but event driven with *everything* in shared memory. It's very close to a multithreaded model but I needed security context switching. Also, contrary to popular belief threaded servers are slower than equivalent multiprocess servers. So in-general, the benifit of a multithreaded server is pretty much just about convenience for the programmer. Since you can acheive the same effect by just creating an allocator from a big chunk of shared memory mapped before anything is forked, there's very little reason to use threads at all.
Re:combination by PhrostyMcByte · 2006-07-12 15:59 · Score: 2, Insightful

So the long and short of it is that it is not obvious that threads are faster than independent processes. In fact if you really want to scale, processes are likely to be faster.
I'm not arguing on threads or processes, I don't care about that. Just that 2 threads will be more performant than 200 threads on a 2 cpu machine. And apparently modern production web daemons agree with me.
Threading --- hype, more hype and extra hyped hype by inflex · 2006-07-12 16:06 · Score: 4, Insightful

start-rant:

Threads are useful, that's granted - but it would seem a lot of people are trying to convert wholesale over to this threading model just for the hell of it, running along with the apparent reasoning that threading is "lighter" than processes. Maybe threads are lighter/cheaper on Windows systems - but a Unix system with copy-on-demand paging forking/process system is _DESIGNED_ to handle processes. Right now a lot of the time threads are a hack. Unix and processes work nicely together.

As for "maximising" available resources, well don't forget there's typically another couple of dozen processes running on any give Unix setup, more so on a multi-user multi-purpose machine (let's say WWW, email and DNS setup - throw in SpamAssassin for lots of fun) there's no shortage of available processes to use up a CPU. On a monolithic system where it's running only one process, sure, threads become useful there to spread the load.

My gripe basically boils down to a lot of people going along and choosing to use threads rather than forking because they think that it's "cool" or (supposedly) "lighter" - not because they've done any real world testing/checking. Remember, Unix was built around the idea of many small processes/programs working together, so that'd tend to naturally allow usage of multiple CPUs without any exotic hacks. /rant.
The state of the art is hybrid of the two by NittanyTuring · 2006-07-12 18:37 · Score: 2, Insightful

Threads are used because they are easy for development. They can also keep the multiple units of a parallel processor busy. However, using more threads than there are processing units introduces overhead into the system. There are better ways to do task scheduling... like event-driven models.

So, why not combine the multi-threaded and event-driven models? Some very interesting research has been done in this area. Check out Staged Event-Driven Architecture, or SEDA. Like threads, it has high fairness in scheduling. Like event-driven models, it scales to a large number of concurrent requests. In fact, it degrades gracefully even during a Slashdot effect.
Event devices by bytesex · 2006-07-12 20:23 · Score: 2, Insightful

We need proper event devices in systems programming, both between threads and processes. I want to be able to select() (or poll() or epoll() dammit) and wait on not only file-descriptors, but semaphores, system signals, threads waking up out of a sleep() call, and even a crucial variable changing its value. And I want it API-proof (no signals) and statically initializable. And tomorrow. And a pony.

Seriously though, I'm not sure what methods the WIN32 API has, under the hood, of its event waiting calls, but they have a good side to them. For any device that has a potentially blocking system call associated with it, I can define an event that I can wait for. Of course the rest of the whole interface is clunky as hell, but why hasn't anyone come up with something similar in *IX ?

--
Religion is what happens when nature strikes and groupthink goes wrong.
Re:Multiple processes by Alex+Belits · 2006-07-12 21:30 · Score: 2, Insightful

I don't understand how processes could possibly be faster than threads doing the same task. If processes are better because there is no interaction between them, then threads doing the same task would also have no interaction between them, while incurring faster context switches.

Because there IS interaction between threads -- mutexes handling takes resources, too, and checking i/o status also requires time and syscalls (OS just supplies pending i/o information to its scheduler by itself). And because data access mechanism within your application is likely to be a much worse scheduler than scheduler in the OS.

Of course it seems silly in this day and age to have a single thread/process handling a single connection in anything resembling a high-performance server.

Actually, it's not (unless you are in Windows). For quite a while it was the only way network i/o was possible in Java at all. And while it was a bad decision for Java, it was based on a valid idea that sleeping threads or processs eat less resources than what is necessary for multiplexed handling of completely unrelated connections within one thread.

In that case, each thread/process should be able to handle multiple connections at once. It should execute a request, grab the next request off the queue or wait for one, ad infinitum. With all threads in a single process, it is easy to balance the load queues between threads; with separate processes it is almost impossible to move a request from a busy process to a waiting one.

You don't need to do that -- i/o scheduler in kernel does that for you already, and when that is not good enough, you can have a separate process that dispatches request by whatever set of rules. As opposed to Windows, Unixlike systems allow socket passing between processes.

Also, it's hard to understand why reducing total physical footprint and amount of cache invalidations is only good on systems with bad VM systems or "crippled" schedulers (whatever that means).

Because with good VM they are smaller in the first place. In any case, when you deal with scalale systems that have to handle huge amounts of requests simultaneously, your cache will be severely beaten just because of the large amount of data involved.

--
Contrary to the popular belief, there indeed is no God.
Programmers by SpeedBump0619 · 2006-07-13 02:18 · Score: 2, Insightful

While most of the discussion here covers the technical aspects of the question I think that the biggest factor here is being overlooked. I've been employed now in 5 different environments 4 of which used threading for parallelism. For some reason many programmers just have a problem wrapping their minds around shared memory problems. The number of times I have reveiwed code changes and seen someone not lock a mutex, or forget to release a semaphore is mind boggling.

As far as I'm concerned threading should only be considered when you have either:
1) a high tolerance for extremely difficult debugging
2) a customer who likes it (pays you more) when your program crashes
3) a team of programmers experienced with threading (at least 75%...if not 100% you *must* review code changes)

In modern Unix systems most people won't be able to really tell the speed difference between create_thread and fork. If you screw up pushing data through a pipe it just doesn't come out the other end, but if you screw up using shared memory you won't know until something unexpected (and seemingly unrelated) happens.
If you can build it that way... by mengel · 2006-07-13 02:47 · Score: 2, Insightful

... the fastest implementation is always the lean, event-loop (e.g. while/select loop) version that never blocks. This is because when a threaded implementation works properly on a single CPU, that's what it ends up doing anyway -- code in each thread runs, until it does something that blocks that thread, and another thread wakes up... If you split out those same code segments into a common event loop, you get rid of the thread context-switch overhead. You can even break up long event-handlers with "Idle events" -- you do part of the work, and send yourself an "idle event" to remind yourself to do the next chunk when you don't have new work coming in. This is generally less overhead than a thread time-slicing context switch.
If you then want to take advantage of multiple CPU's efficiently, you need to look at the event handler code in the above implementation, and see if you have race conditions if two branches are run in parallel. If not, you can just fork a couple of processes and/or threads to tag-team the same event stream, and the code just cruises and uses more CPUs.
If there are race conditions between branches, you take those branches that need to share a resource, make a single separate thread/process for those handlers, and forward the required events to that thread/process so they get handled sequentially.
This style of code is often harder to design than a more obvious multiple-threads type implementation, but it is faster (and often easier to maintain) when properly done. In either case, the source of obscure bugs is race conditions that are overlooked.

--
- "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'