Slashdot Mirror


Should Servers be Mono-Process or Multithreaded?

An anonymous reader wonders: "How would you design the fastest possible Linux-based server application today? A few years ago, the thinking was that multi-threading was not the way to go — instead, high-performance servers used an event-driven, mono-process model (consider lighttpd and haproxy). However, things have changed. Today CPUs have dual cores, and over the next few years this is only likely to increase. Also, the 2.6 Linux kernel has made multi-threading much more efficient. So I'm wondering, does Slashdot think that modern high performance server software should be designed to be multi-threaded, or does it still make more sense to use an event driven, mono-process architecture, despite the advances in the Linux 2.6 threading and the arrival of multi-core CPUs?"

3 of 96 comments (clear)

  1. info by illuminatedwax · · Score: 5, Informative

    Check out the C10K page for a very detailed discussion about this.

    --
    Did you ever notice that *nix doesn't even cover Linux?
  2. Consider Fault Tolerance & Thread Safety Too by thebiss · · Score: 5, Informative

    Since you didn't say what kind of server you're building, I'm going to assume:
    - that you're building a custom-purpose, client-server or message processing application,
    - it needs to be highly parallel to be efficient
    - the language is C, C#, or C++, and not Java (process-based servers in Java?)

    I have done this before, using both processes and threads, for the same application. Consider the impact of application faults on your design, and then consider how hard it will be to create thread-safe code.

    o A highly multithreaded server, where threads are performing complex and/or memory demanding tasks, will be susceptable to complete failure of all running jobs on all threads, if just a single thread SEGFAULTs. And despite your best testing efforts, complex code (1M+ lines) will at some point, somewhere, fail.

    o Threaded code must be thread safe. Static variables, shared data structures, and factories all previously accessed through a singleton must now be protected with guard functions and semaphores. Race conditions need to be considered. Design for this up front. It will be much harder to add it later.

    The Project

    I worked on a team which added an asychronous processing engine to a web application. The engine was responsible for performing memory and time-intensive financial analysis and reporting for 16,000 accountants, so that they could close a large company's financial books. Unlike a webserver triggered by on-line end users, this engine is triggered by events in the company's financial database: once the database raises the "ready" flag, this engine begins running as many reports as it can, as fast as it can, on behalf of the 16K users. The analysis and report code was 2 million lines of C++, running on AIX.

    Process implementation

    The initial implementation used processes. A dispatcher job monitored the database for the ready flag, and then forked children of itself to analyze slices of the data, and generate the reports. One child job was used for each analysis and report pair, and the manager controlled how many jobs ran in parallel, maintaining a scoreboard of which jobs succeeded, and which failed.

    Due to the complexity of the system, failures (core) occasionally occurred. The monitor would record this, retry the failed analysis up to 3 times, and keep a uniquely named core file of the event. Other analysis reports would continue to be generated, otherwise unharmed by the thrashing thread. Approximately once every 90 days, the development team would collect the few cores generated, use the gbd/xldb debugger to determine the cause of failure, and correct the fault.

    The downsides of this? The solution was slowed because couldn't re-use resources like database connections (they were destroyed with each process), and more memory was used than need be. DB2 caching helped somewhat, but potential performance improvements remained.

    Threaded implementation

    In a large company, there are IT standards, and one of the standards at my company is that applications shall never, ever, ever fork(), even if running on a large dedicated machine. After losing the fight against this, my team re-architected the report engine. Largely
    the same as the previous, the new engine waits for the "ready" signal, and then spawns pthreads (POSIX threads) as workers to analyze the data and generate the report. In theory, it was robust.

    The alpha version of this solution immediately failed (cored) during testing. We neglected to identify the less obvious non-thread-safe code in the application, and failed to identify several race conditions. Unlike previous failures, this faults were total: a SEGFAULT in code on one of 20 threads would halt the entire application. And the corefile generated was now huge - it contained a snapshot of memory for all 20 running jobs, instead of just the one of interest.

    Extensive root-cause analysis, design, and restart management solved this, and the current version is as robust, and a good bit faster, than the previous. At a significant price.

    --
    Beware: I believe all are created equal, and have the right to life, liberty, and the pursuit of happiness.
  3. Real life examples by sdfad1 · · Score: 5, Informative

    I cannot speculate, but I can look at what people are doing today. One thing that I have noticed, is the widespread research into, with compelling arguments, for massively multithreaded programming techniques. See Erlang for example. It is designed right from the beginning for this sort of problem - high throughput, high reliability, high uptime telephony networks.

    As a rough benchmark, someone's got this.

    That's an order of magnitude increase in "performance" (depends on what you mean by performance". I thought I'll do a casual informal test of my own, with a decent static file size (instead of the 1 byte used in that benchmark)

    Server Software: Yaws/1.56
    Document Length: 402 bytes

    Concurrency Level: 500
    Time taken for tests: 8.480740 seconds
    Complete requests: 5000
    Requests per second: 589.57 [#/sec] (mean)
    Time per request: 848.074 [ms] (mean)

    Server Software: Apache/2.0.54
    Document Length: 402 bytes

    Concurrency Level: 500
    Time taken for tests: 29.787216 seconds
    Complete requests: 5000
    Requests per second: 167.86 [#/sec] (mean)
    Time per request: 2978.722 [ms] (mean)

    Output edited to get past lameness filter.

    Err crap, I could have sworn the first time I tried this, when Yaws was first installed, its performance was worse! Oh well, perhaps it's something I've inadvertently done since then. Could have been due to my computer reboot (this is a desktop PC). It seems I've proven my point, although I was trying to disprove it. Standard caveats regarding benchmarks apply. Both servers are default Ubuntu installs with no configuration changes - I didn't compile anything manually.

    Additionally it has also been noted that:

    > Linus Torvalds: 100k threads at once is crazy Using Posix style threads, I'd have to agree. Posix threads were just not designed with this level of usage in mind. Which is why concurrent lanugages like Erlang and Mozart/Oz don't use Posix threads.

    Well, that's where it could be headed anyway - a multiprocessor system with green threads (ie simulated threads, like Java ones) implementing massive concurency and redundancy. Some prototypes for systems like this are already available, and being used. Cheers.