High-Performance Web Server How-To
ssassen writes "Aspiring to build a high-performance web server? Hardware Analysis has an article posted that details how to build a high-performance web server from the ground up. They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month."
There is no useful information in that infomercial. They seem to have judged "reliability" through vendor brochures and in a couple days; reliability is when your uptime is > 1 year.
This article should be called "M. Joe-Average-Overclocker Builds A Web Server".
This quote is funny:
That brings us to the next important component in a web server, the CPU(s). For our new server we were determined to go with an SMP solution, simply because a single CPU would quickly be overloaded when the database is queried by multiple clients simultaneously.
It's well known that single CPU computers can't handle simultaneous queries, eh!
Every time that you click on a link and get bumped back to the front page here on Slashdot, it's a failure of mysql. So much for high-performance.
Why hasn't Slashdot changed to postgresql?
I thought this was a good question, if slightly off-topic.
-Kevin
Well, they're about slashdotted now. They lost my last request, and it says they have almost 2000 anonymous users. I sometimes think the reason I like reading Slashdot isn't because of the great links and articles, but instead because I like being a part of the goddamned Slashdot effect. :)
Which brings me to the point. Ya know, about the only site that can handle the Slashdot effect is Slashdot. So maybe Taco should write an article like this (or maybe he has?). The Slashdot guys know what they're doing, we should pay attention. Although I find it interesting that when slashdot does "go down," the only way I know is because for some reason it's telling me I have to log in (which is a lot nicer than Squid telling me the server's gone).
--
Daniel
-Kevin
Big sites, really big sites, put caching in the application. The biggest thing to cache is session data, easy if you're running a single box but harder if you need to cluster (and you certainly do need to cluster if you're talking about a high-volume site; nobody makes single machines powerful enough for that). Clustering means session affinity and that means more complicated software. (Aside: Is there any open source software that manages session affinity yet? )
Frankly speaking, Intel-based hardware would not be my first choice for building a high-volume site (although "millions of page views per month" is really only a moderate volume site; sites I have worked on do millions per /day/). It would probably be my third or fourth choice. The hardware reliability isn't really the problem, it can be good enough, the issue is single box scalability.
To run a really large site you end up needing hundreds or even thousands of Intel boxes where a handful of midrange Suns would do the trick, or even just a couple of high-end Suns or IBM mainframes. Going the many-small-boxes route your largest cost ends up being maintenance. Your people spend all their time just fixing and upgrading boxes. Upgrading or patching in particular is a pain in the neck because you have to do it over such a broad base. It's what makes Windows very impractical as host for such a system; less so for something like Linux because of tools like rdist, but even so you have to do big, painful upgrades with some regularity.
What you need to do is find a point where the box count is low enough that it can be managed by a few people and yet the individual boxes are cheap enough that you don't go broke.
These days the best machines for that kind of application are midrange Suns. It will probably be a couple of years before Intel-based boxes are big and fast enough to realistically take that away ... not because there isn't the hardware to do it (though such hardware is, as yet, unusual) but because the available operating systems don't scale well enough yet.
jim frost
jimf@frostbytes.com
I seem to remember that there was an article just after the WTC attacks last year, which discussed how Slashdot had handled the massive surge in traffic after other online sites went down.
From memory it involved switching to static pages, and dropping gifs, etc.
Unfortunately the search engine on Slashdot really sucks - so I couldn't find the piece in question.
What do you think about handling capacity? Do you see sites with a lot of spare capacity? We'd have trouble meeting demand if we lost a server during prime hours (and it happens).
-Kevin
3) replicate your databases to all machines so
db access is always LOCAL
This is probably a bad idea. Accessing the database over a socket is going to be much less resource intensive than accessing it locally. With the database locally, the database server uses up CPU time and disk I/O time. Disk I/O on a web server is very important. If the entire database isn't cached in memory, then it is going to be hitting the disk. The memory used up caching the database cannot be used by the OS to cache web content. A separate database server with a lot of RAM will almost always work better than a local one with less RAM.
This Apache nonsense of cramming everything into the webserver is very bad engineering practice. A web server should serve web content. A web application should generate web content. A database server should serve data. These are all separate processes that should not be combined.
But 2.5 million hits a day is still just a moderate volume site to me. One of the sites I worked on sees in excess of a hundred million hits per day these days; it was up over ten million hits per day back in 1998.
I don't happen to know what Slashdot does for volume, but Slashdot is a very simplistic site when it comes to content production. Each page render doesn't take much horsepower and sheer replication can be used effectively. Things get more complicated when you're doing something like trying to figure out what stuff a user is likely to buy given their past buying history and/or what they're looking at right now.
If you really think a 4-way Intel box is equivalent to a 12-way Sun, well, it's clear you don't know what you're talking about. You're wrong even if all you're talking about is CPU, and of course I/O bandwidth is what makes or breaks you -- and there's no comparison in that respect.
jim frost
jimf@frostbytes.com
I'd post sooner, but it took forever to get to the article.. here are my thoughts...
First off SCSI.
IDE drives are fast in a single user/workstation environment. As a file server for thousands of people sharing an array of drives? I'm sure the output was solid for a single user when they benched it... looks like /. is letting them know what multiple users do to IDE. 'Overhead of SCSI controller'... Methinks they do not know how SCSI works. The folks who share this box will suffer.
Heat issues with SCSI. This is why you put the hardware in a nice climate controlled room that is sound proof. Yes, this stuff runs a bit hot. I swear some vendors are dumping 8K RPM fans with ducting engineered to get heat out of the box and into the air conditioned 8'x19" chassis that holds the other 5-30 machines as well.
I liked the note about reliability too... it ran, it ran cool, it ran stable for 2 weeks. I've got 7x9G Cheetahs that were placed into a production video editing system and ran HARD for the last 5+ years. Mind you, they ran about $1,200 each new... but the down time cost are measured in minutes... Mission critical, failure is not an option.
OS
Lets assume the Windows 2000 Pro was service packed to at least SP2... If that is the case, the TCP/IP stack is neutered. Microsoft wanted to push people to Server and Advanced Server... I noticed the problem when I patched my counter strike server and performance dogged on w2kpro w/sp2 - you can find more info in Microsoft's KB... (The box was used for other things too, so be gentle) Nuking the TCP/IP stack is was the straw that cracked my back to just port another box to Linux and run it there.
Red Had does make it easy to get a Linux box up and running, but if this thing is going outside the firewall, 7.3 was a lot of work to strip out all the stuff that are bundled with a "server" install. I don't like running any program I did not actually install myself. For personal boxes living at my ISP, I use slackerware (might be moving to gentoo however). Not to say I'm digging through the code or checking MD5 hashes as often as I could, but the box won't even need an xserver, mozilla, tux racer, or anything other than what it needs to deliver content and get new stuff up to the server.
CPU's (really a chassis problem):
I've owned AMD's MP and Intel's Xeon dually boards. These things do crank out some heat. Since web serving is usually not processor bound, it does not really matter. Pointing back to the over heating issues with the hard drives, these guys must have a $75 rack mount 19" chassis. Who needs a floppy or CD-ROM in a web server? Where are the fans? Look at the cable mess! For god's sake, at least spend $20 and get rounded cables so you have better airflow.
+++ UGUCAUCGUAUUUCU
Not so...
.02 secs .00003 secs
You can cache with technologies like Sleepycat's DBM (db3).
We have a PHP application that caches lookup tables on each local server. If it cant find the data in the local cache, then it hits our Postgresql database. The local DBM cache gets refreshed every hour.
Typical comparison
-------------------
DB access time for query:
Local cache (db3) time:
We server load dropped from typical 0.7 to an acceptable 0.2, and the load on the DB server dropped like a rock! This is with over a million requests (no graphics, just GETS to the PHP script) every day.
We also tuned the heck out of Apache (Keepalive, # of children, life of children etc).
Some other things we realized after extensive testing:
1. Apache 2.0 sucks big time! Until modules like PHP and mod_perl are properly optimized, there's not much point in moving there.
2. AolServer is great for Tcl, but not for PHP or other plugin technologies
Because of all these changes, we were able to switch from a backhand cluster of 4 machines, back down to a single dual processer machine, with another machine available on hot standby. Beat that!
we serve up between 5 and 7 million pageviews daily to up to 100,000 individual IP's
Decent speed to me is one in which the server is no longer the bottleneck, in other words serving up
dynamic content you should be able to saturate the pipe that you are connected to.
I have never replaced the power supply because of energy costs, it simply isn't a factor in the
overal scheme of things (salaries, bandwidth, amortization of equipment)
500-700 Mhz machines are fine for most medium volume sites, I would only consider a really fast machine to break a bottleneck, and I'd have a second one on standby in case it burns up
MP3 Search Engine
RedHat is a pain to strip down to a bare minimum web server, I prefer OpenBSD [openbsd.org]. Sleek and elegant like the early days of Linux distros.
./configure --prefix=/usr/local/apache \
Huh?
for i in `rpm -qa|grep ^mod_`;do rpm -e $i;done
rpm -e apache
cd ~/src/apache.xxx
--enable-rule=SHARED_CORE \
--enable-module=so
make
make install
with mod_so (DSO - Dynamic Shared Object) support, module installation is trivial.
"A mind is a terrible thing to taste."
We're talking about a totally different scale, really.
jim frost
jimf@frostbytes.com
The mere fact that they recommended 7200 rpm Western Digital drives for their high performance system gives me the impression they haven't a clue.
I disagree with the assertion that a 10,000 rpm SCSI drive is more prone to failure than a 7,200 IDE drive because it "moves faster". I've had far more failures with cheap IDE drives than with SCSI drives. Not to mention that IDE drives work great with minor loads, but when you start really cranking on them, the bottlenecks of IDE start to haunt the installation.
In a surprising amount of cases, it really isn't. For example, storing user preferences for visiting a given web page; there is never a case where you need to relate the different users to each other. The power aggregation abilities of relational databases are irrelevant, so why incur the overhead (performance-wise, cost-wise, etc.)
Even when aggregating such information is useful, I've often found off-line duplication of the information to databases (which you can then query the hell out of, without affecting the production system) a better way to go.
If a flat file will do the job, use that instead of a database.
Love many, trust a few, do harm to none.