High-Performance Web Server How-To
ssassen writes "Aspiring to build a high-performance web server? Hardware Analysis has an article posted that details how to build a high-performance web server from the ground up. They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month."
Computer hardware is so fast relative to the amount of traffic coming to almost any site that any web server is a high-performance web server, if you are just serving static pages. A website made of static pages would surely fit into a gigabyte or so of disk cache, so disk speed is largely irrelevant, and so is processor speed. All the machine needs to do is stuff data down the network pipe as fast as possible, and any box you buy can do that adequately. Maybe if you have really heavy traffic you'd need to use Tux or some other accelerated server optimized for static files.
With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.
In other words: an article on 'building a fast database server' or 'building a machine to run disk-intensive search scripts' I can understand. But there is really nothing special about web servers.
-- Ed Avis ed@membled.com
If we were to use, for example, Microsoft Windows 2000 Pro, our server would need to be at least three times more powerful to be able to offer the same level of performance.
"three times?" Can somebody point me to some evidence for this sort of rather bald assertion?
The article seemed way too focused on hardware.
Anyone who's ever worked on a big server in this cash-strapped world will know that squeezing every last ounce of capacity out of apache and your web applications needs to be done.
I know that in the server market you often go for tried-and-tested rather than latest-and-greatest, and that the Pentium III still sees some use in new servers. But 1.26GHz with PC133 SDRAM? Surely they'd have got better performance from a single 2.8GHz Northwood with Rambus or DDR memory, and it would have required less cooling and fewer moving parts. Even a single Athlon 2200+ might compare favourably in many applications.
SMP isn't a good thing in itself, as the article seemed to imply: it's what you use when there isn't a single processor available that's fast enough. One processor at full speed is almost always better than two at half the speed.
-- Ed Avis ed@membled.com
-Kevin
It's pretty clear that whomever wrote that article has never run a really high-volume web site.
I've designed and implemented sites that actually handle millions of dynamic pageviews per day, and they look rather different from what these guys are proposing.
A typical configuration includes some or all of:
- Firewalls (at least two redundant)
- Load balancers (again, at least two redundant)
- Front-end caches (usually several) -- these cache entire pages or parts of pages (such as images) which are re-used within some period of time (the cache timeout period, which can vary by object)
- Webservers (again, several) - these generate the dynamic pages using whatever page generation you're using -- JSP, PHP, etc.
- Back-end caches (two or more)-- these are used to cache the results of database queries so you don't have to hit the database for every request.
- Read-only database servers (two or more) -- this depends on the application, and would be used in lieu of the back end caches in certain applications. If you're serving lots of dynamic pages which mainly re-use the same content, having multiple, cheap read-only database servers which are updated periodically from a master can give much higher efficiency at lower cost.
- One clustered back-end database server with RAID storage. Typically this would be a big Sun box running clustering/failover software -- all the database updates (as opposed to reads) go through this box.
And then:
- The entire setup duplicated in several geographic locations.
If you build -one- server and expect it to do everything, it's not going to be high-performance.