Building a Better Webserver
msolnik writes: "The guys over at Aces' Hardware have put up a new article going over the basics, and not-so-basics, of building a new server. This is a very informative, I think everyone should devote 5 minutes and a can of Dr Pepper to this article."
Microsoft has written several white papers of this sort already. Of course, they're Microsoft, so that means I can kiss my +2 bonus goodbye. Seriously, though.
If you fall off a building, go real limp, because maybe you'll look like a dummy and people will be like hey, free dummy
but I figure it's best to spend the money early on, get a good setup going that can handle high volumes
Throwing money at the problem is exactly the WRONG approch. You need to start by spending time PLANNING and RESEARCHING the best way to do things.
For example, if you are setting up a dynamic site like ./, which is serving 100 pages/second. It obviously needs to be dynamic, so you need a database to store all the comments in.
There are two ways to do this, one is to serve content straight out of the database, but this means that for every page you serve up, there needs to be a database query. (the database queries are the expensive part in terms of time it takes to serve a page). The other way would be to serve the articals as static pages which are generated every minute or so by a process on the database and pushed down to the web server, which serves these up as static pages.
The advantage of this is that insted of 100 database queries per minute, you end up with, maybe 10 queries per minute to populate the static pages. Sure, you site is no longer 100% dynamic, but it is a whole lot faster, and you have saved thousands of dollars to boot!
This is just one small off-the-top-of-my-head example of where PLANNING sould become way before spending any money.
One thing that does seem to work against the onslaught is a throttling webserver. If you haven't got the bandwidth etc to serve a sudden onslaught of requests, probably the best thing to do is to just start 503'ing -- at least people get a quick message 'come back later' instead of just dead air.
Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
Actually, that's a terribly wasteful way to go. If you work on an easily-scalable infrastructure, then you can pretty much purchase capacity as it's needed, which not only frees up capital for a longer time, you end up spending a lot less, as the price of computers is always dropping, and the performance is always going up.
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
I am amazed at how people buy into the myth of cheap PC?s. Yes, if you are technically oriented and are not running critical applications, a cheap PC will be ok. On the other hand, I have been involved with several enterprises in which my employer insisted on going with cheap PC?s at the expense of short- and long-term productivity. One certainly cannot get a server class PC for $500, and there is few if any available for $1000. I would not say that a Blade would make a good office machine, but it seems to be a good choice for a server.
One other factor to consider is that the gzip transfer encoding compresses much better than the algorithm in the modem. Part of this is the algorithm with its larger dictionary size, the other part is the `pure' data stream being fed to it. It is just the html, not the html interspersed with ppp, ip, and tcp headers.
If you haven't noticed by now, Ace's Hardware has a neat little indicator on each page that shows time processing and queue time it spent getting to you (very bottom left-hand corner of each page). Most are about 74ms - 112ms for me. This, plus the result of some pings and traceroutes leads me to belive they're heavily BANDWIDTH bound right now, not CPU bound. I do hope Ace puts up a summary of the Slashdot effect as well as some other data for us to pour over. Some MRTG router graphs of the bandwidth usage would be *really* nice, too.