High-Performance Web Server How-To
ssassen writes "Aspiring to build a high-performance web server? Hardware Analysis has an article posted that details how to build a high-performance web server from the ground up. They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month."
I'd suggest everybody with the need of a high-performance web server to try out
fnord. It's extremely small, and pretty fast (without any special performance hacks!), see here.
A monkey is doing the real work for me.
Computer hardware is so fast relative to the amount of traffic coming to almost any site that any web server is a high-performance web server, if you are just serving static pages. A website made of static pages would surely fit into a gigabyte or so of disk cache, so disk speed is largely irrelevant, and so is processor speed. All the machine needs to do is stuff data down the network pipe as fast as possible, and any box you buy can do that adequately. Maybe if you have really heavy traffic you'd need to use Tux or some other accelerated server optimized for static files.
With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.
In other words: an article on 'building a fast database server' or 'building a machine to run disk-intensive search scripts' I can understand. But there is really nothing special about web servers.
-- Ed Avis ed@membled.com
.. if their webservers are as reliable as the ones in the article..
:P
i guess there's only one way to find out..
slashdotters! advance!
There is no useful information in that infomercial. They seem to have judged "reliability" through vendor brochures and in a couple days; reliability is when your uptime is > 1 year.
This article should be called "M. Joe-Average-Overclocker Builds A Web Server".
This quote is funny:
That brings us to the next important component in a web server, the CPU(s). For our new server we were determined to go with an SMP solution, simply because a single CPU would quickly be overloaded when the database is queried by multiple clients simultaneously.
It's well known that single CPU computers can't handle simultaneous queries, eh!
If we were to use, for example, Microsoft Windows 2000 Pro, our server would need to be at least three times more powerful to be able to offer the same level of performance.
"three times?" Can somebody point me to some evidence for this sort of rather bald assertion?
The article seemed way too focused on hardware.
Anyone who's ever worked on a big server in this cash-strapped world will know that squeezing every last ounce of capacity out of apache and your web applications needs to be done.
* I prefer SCSI over IDE
* RedHat is a pain to strip down to a bare minimum web server, I prefer OpenBSD. Sleek and elegant like the early days of Linux distros.
* I've used Dell PowerEdge 2650 rackmount servers and they're VERY well made and easy to use. Redundant power supplies, SCSI removable drives, good physical security (lots of locks).
I know that in the server market you often go for tried-and-tested rather than latest-and-greatest, and that the Pentium III still sees some use in new servers. But 1.26GHz with PC133 SDRAM? Surely they'd have got better performance from a single 2.8GHz Northwood with Rambus or DDR memory, and it would have required less cooling and fewer moving parts. Even a single Athlon 2200+ might compare favourably in many applications.
SMP isn't a good thing in itself, as the article seemed to imply: it's what you use when there isn't a single processor available that's fast enough. One processor at full speed is almost always better than two at half the speed.
-- Ed Avis ed@membled.com
Step one: Submit story on high performance web servers.
Step two: ???
Step three: Die of massive slashdotting, loss of reputation and business
Still, if someone has a link to a cache...
Karma:This parrot is dead! (and so is the joke.)
... Don't forget to post an article on /. so you can actually measure high-volume bulk traffic.
/content/article/1549/ HTTP/1.0
[~] edwin@topaz>time telnet www.hardwareanalysis.com 80
Trying 217.115.198.3...
Connected to powered.by.nxs.nl.
Escape character is '^]'.
GET
Host: www.hardwareanalysis.com
[...]
Connection closed by foreign host.
real 1m21.354s
user 0m0.000s
sys 0m0.050s
Do as we say, don't do as we do.
bash$
Maybe it's their idea of a stress test. It's kinda like testing a car's crash durability by parking it in front of an advancing tank.
This implies that you shouldn't store servers in high altitudes, because they move faster up there due to earth rotation.
Hmmm, I think we know now why these Mars missions tend to fail so often.
Owner of a Mensa membership card.
Many other people will likely post a comment like mine, if they haven't already. But hey, karma was made to burn!
According to my computer clock and the timestamp on the article posting, it's only been about 33 minutes (since the article was posted). Even so, it took me over a minute to finally receive the "Hardware Analysis" main page. The top of that page has:
Draw your own conclusions.
Furry cows moo and decompress.
I don't understand.
Their article is about building a high performance web server, and they tell people to use Apache.
Apache is featureful, but it has never been designed to be fast.
Zeus is designed for high performance.
The article supposes that money is not a problem. So go for Zeus. The Apache recommendation is totally out of context.
{{.sig}}
Have a good weekend,
Sander Sassen
Email: ssassen@hardwareanalysis.com
Visit us at: http://www.hardwareanalysis.com
I'd post sooner, but it took forever to get to the article.. here are my thoughts...
First off SCSI.
IDE drives are fast in a single user/workstation environment. As a file server for thousands of people sharing an array of drives? I'm sure the output was solid for a single user when they benched it... looks like /. is letting them know what multiple users do to IDE. 'Overhead of SCSI controller'... Methinks they do not know how SCSI works. The folks who share this box will suffer.
Heat issues with SCSI. This is why you put the hardware in a nice climate controlled room that is sound proof. Yes, this stuff runs a bit hot. I swear some vendors are dumping 8K RPM fans with ducting engineered to get heat out of the box and into the air conditioned 8'x19" chassis that holds the other 5-30 machines as well.
I liked the note about reliability too... it ran, it ran cool, it ran stable for 2 weeks. I've got 7x9G Cheetahs that were placed into a production video editing system and ran HARD for the last 5+ years. Mind you, they ran about $1,200 each new... but the down time cost are measured in minutes... Mission critical, failure is not an option.
OS
Lets assume the Windows 2000 Pro was service packed to at least SP2... If that is the case, the TCP/IP stack is neutered. Microsoft wanted to push people to Server and Advanced Server... I noticed the problem when I patched my counter strike server and performance dogged on w2kpro w/sp2 - you can find more info in Microsoft's KB... (The box was used for other things too, so be gentle) Nuking the TCP/IP stack is was the straw that cracked my back to just port another box to Linux and run it there.
Red Had does make it easy to get a Linux box up and running, but if this thing is going outside the firewall, 7.3 was a lot of work to strip out all the stuff that are bundled with a "server" install. I don't like running any program I did not actually install myself. For personal boxes living at my ISP, I use slackerware (might be moving to gentoo however). Not to say I'm digging through the code or checking MD5 hashes as often as I could, but the box won't even need an xserver, mozilla, tux racer, or anything other than what it needs to deliver content and get new stuff up to the server.
CPU's (really a chassis problem):
I've owned AMD's MP and Intel's Xeon dually boards. These things do crank out some heat. Since web serving is usually not processor bound, it does not really matter. Pointing back to the over heating issues with the hard drives, these guys must have a $75 rack mount 19" chassis. Who needs a floppy or CD-ROM in a web server? Where are the fans? Look at the cable mess! For god's sake, at least spend $20 and get rounded cables so you have better airflow.
+++ UGUCAUCGUAUUUCU
It's pretty clear that whomever wrote that article has never run a really high-volume web site.
I've designed and implemented sites that actually handle millions of dynamic pageviews per day, and they look rather different from what these guys are proposing.
A typical configuration includes some or all of:
- Firewalls (at least two redundant)
- Load balancers (again, at least two redundant)
- Front-end caches (usually several) -- these cache entire pages or parts of pages (such as images) which are re-used within some period of time (the cache timeout period, which can vary by object)
- Webservers (again, several) - these generate the dynamic pages using whatever page generation you're using -- JSP, PHP, etc.
- Back-end caches (two or more)-- these are used to cache the results of database queries so you don't have to hit the database for every request.
- Read-only database servers (two or more) -- this depends on the application, and would be used in lieu of the back end caches in certain applications. If you're serving lots of dynamic pages which mainly re-use the same content, having multiple, cheap read-only database servers which are updated periodically from a master can give much higher efficiency at lower cost.
- One clustered back-end database server with RAID storage. Typically this would be a big Sun box running clustering/failover software -- all the database updates (as opposed to reads) go through this box.
And then:
- The entire setup duplicated in several geographic locations.
If you build -one- server and expect it to do everything, it's not going to be high-performance.