Building a Better Webserver
msolnik writes: "The guys over at Aces' Hardware have put up a new article going over the basics, and not-so-basics, of building a new server. This is a very informative, I think everyone should devote 5 minutes and a can of Dr Pepper to this article."
"Real multithreading" is really no panacea. See the notes from John Ousterhout's talk, Why Threads Are A Bad Idea (for most purposes).
Consider a user with a typical analog modem that has an average maximum downstream throughput of, say, 5 KB/s. If this user is trying to download the general message board index page, about 200 KB in size (rather small by today's standards), it will require a solid 40 seconds to complete this single download.... To maximize the efficiency of the network itself, we can compress the output stream and thus, compress the site. HTML is often very repetitive, so it's not impossible to reach a very high compression ratio. The 200 KB request mentioned above required 40 seconds of sustained transfer on a 5 KB/s link. If that 200 KB request can be compressed to 15 KB, it will require only 3 seconds of transfer time.
Except that 56 Kbps modems get 5 KBps thoughput by compressing the data! If the client and server compress, the modems won't be able to; the net effect is lots of extra work on the server side, and probably no increased throughput for the modem user.
The server might or might not see a decrease in latency, and in the number of sockets needed simultaneously; it depends on how much it can "stuff" the intermediate "pipes". The server will see an overall decrease in bandwidth needed to serve all the pages.
Ironically, broadband customers (who presumably don't have any compression between their clients and Internet servers) will see pages load faster. (And the poor cable modem providers from the previous story will be happy.)
Stupid job ads, weird spam, occasional insight at
One thing that does seem to work against the onslaught is a throttling webserver. If you haven't got the bandwidth etc to serve a sudden onslaught of requests, probably the best thing to do is to just start 503'ing -- at least people get a quick message 'come back later' instead of just dead air.
Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
In a part about databases and persistent connections they confuse the issues more than a bit. The real problem is not too many processes, what automatically makes threads look better, but the symmetry among processes -- any request should be possible to serve by every process, so all processes end up with database connections. This is a problem particular to Apache and Apachelike servers, not a fundamental issue with processes and threads.
In my server (fhttpd I have used the completely different idea -- processes are still processes, however they can be specialized, and requests that don't run database-dependent scripts are directed to processes that don't have database connections, so reasonable performance is achieved if the webmaster defines different applications for different purposes. While I didn't post any updates to the server's source in two last years (was rather busy at work that I am leaving now), even the published version 0.4.3, despite its lack of clustering and process management mechanism that I am working on now, performed well in situations where "lightweight" and "heavyweight" tasks were separated.
Contrary to the popular belief, there indeed is no God.
If you haven't noticed by now, Ace's Hardware has a neat little indicator on each page that shows time processing and queue time it spent getting to you (very bottom left-hand corner of each page). Most are about 74ms - 112ms for me. This, plus the result of some pings and traceroutes leads me to belive they're heavily BANDWIDTH bound right now, not CPU bound. I do hope Ace puts up a summary of the Slashdot effect as well as some other data for us to pour over. Some MRTG router graphs of the bandwidth usage would be *really* nice, too.