High-Performance Web Server How-To
ssassen writes "Aspiring to build a high-performance web server? Hardware Analysis has an article posted that details how to build a high-performance web server from the ground up. They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month."
I'd suggest everybody with the need of a high-performance web server to try out
fnord. It's extremely small, and pretty fast (without any special performance hacks!), see here.
A monkey is doing the real work for me.
The guys use 10'000 RPM drive for "reliabilit" and "performance" ... 10k drives are LESS reliable, since they move faster. Moreover, they're not even necessarily that faster.
Computer hardware is so fast relative to the amount of traffic coming to almost any site that any web server is a high-performance web server, if you are just serving static pages. A website made of static pages would surely fit into a gigabyte or so of disk cache, so disk speed is largely irrelevant, and so is processor speed. All the machine needs to do is stuff data down the network pipe as fast as possible, and any box you buy can do that adequately. Maybe if you have really heavy traffic you'd need to use Tux or some other accelerated server optimized for static files.
With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.
In other words: an article on 'building a fast database server' or 'building a machine to run disk-intensive search scripts' I can understand. But there is really nothing special about web servers.
-- Ed Avis ed@membled.com
.. if their webservers are as reliable as the ones in the article..
:P
i guess there's only one way to find out..
slashdotters! advance!
There is no useful information in that infomercial. They seem to have judged "reliability" through vendor brochures and in a couple days; reliability is when your uptime is > 1 year.
This article should be called "M. Joe-Average-Overclocker Builds A Web Server".
This quote is funny:
That brings us to the next important component in a web server, the CPU(s). For our new server we were determined to go with an SMP solution, simply because a single CPU would quickly be overloaded when the database is queried by multiple clients simultaneously.
It's well known that single CPU computers can't handle simultaneous queries, eh!
If we were to use, for example, Microsoft Windows 2000 Pro, our server would need to be at least three times more powerful to be able to offer the same level of performance.
"three times?" Can somebody point me to some evidence for this sort of rather bald assertion?
The article seemed way too focused on hardware.
Anyone who's ever worked on a big server in this cash-strapped world will know that squeezing every last ounce of capacity out of apache and your web applications needs to be done.
* I prefer SCSI over IDE
* RedHat is a pain to strip down to a bare minimum web server, I prefer OpenBSD. Sleek and elegant like the early days of Linux distros.
* I've used Dell PowerEdge 2650 rackmount servers and they're VERY well made and easy to use. Redundant power supplies, SCSI removable drives, good physical security (lots of locks).
I know that in the server market you often go for tried-and-tested rather than latest-and-greatest, and that the Pentium III still sees some use in new servers. But 1.26GHz with PC133 SDRAM? Surely they'd have got better performance from a single 2.8GHz Northwood with Rambus or DDR memory, and it would have required less cooling and fewer moving parts. Even a single Athlon 2200+ might compare favourably in many applications.
SMP isn't a good thing in itself, as the article seemed to imply: it's what you use when there isn't a single processor available that's fast enough. One processor at full speed is almost always better than two at half the speed.
-- Ed Avis ed@membled.com
Step one: Submit story on high performance web servers.
Step two: ???
Step three: Die of massive slashdotting, loss of reputation and business
Still, if someone has a link to a cache...
Karma:This parrot is dead! (and so is the joke.)
... Don't forget to post an article on /. so you can actually measure high-volume bulk traffic.
/content/article/1549/ HTTP/1.0
[~] edwin@topaz>time telnet www.hardwareanalysis.com 80
Trying 217.115.198.3...
Connected to powered.by.nxs.nl.
Escape character is '^]'.
GET
Host: www.hardwareanalysis.com
[...]
Connection closed by foreign host.
real 1m21.354s
user 0m0.000s
sys 0m0.050s
Do as we say, don't do as we do.
bash$
Maybe it's their idea of a stress test. It's kinda like testing a car's crash durability by parking it in front of an advancing tank.
-Kevin
An article about creating high performacne webservers being slashdotted
Microsoft IIS is to webserving as KFC is to healthy eating
Server has nothing to do with it.
10,000 slashdotters * 500k pages = 5gigs in about an hour.
these figures are both estimates, but you can see that network congestion is obviously more of a bottleneck than their performance server.
Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
Many other people will likely post a comment like mine, if they haven't already. But hey, karma was made to burn!
According to my computer clock and the timestamp on the article posting, it's only been about 33 minutes (since the article was posted). Even so, it took me over a minute to finally receive the "Hardware Analysis" main page. The top of that page has:
Draw your own conclusions.
Furry cows moo and decompress.
1. goto here :)
2. click buy
3. upon delivery open box and plugin
4. turn on Apache with the click of a button
5. happily serve up lots of content
6. (optional) wait for attacks from ppl at suggesting using apple hardware...
I don't understand.
Their article is about building a high performance web server, and they tell people to use Apache.
Apache is featureful, but it has never been designed to be fast.
Zeus is designed for high performance.
The article supposes that money is not a problem. So go for Zeus. The Apache recommendation is totally out of context.
{{.sig}}
The article is about *WEB* high performance.
I don't see your point. "ping" has never been designed to benchmark web servers AFAIK.
My servers don't answer to "ping". Does it mean that the web server is down? Noppe... it's up a running...
"ping" is not an all-in-one magic tool. By using "ping" you can test a "ping" server. Nothing else.
{{.sig}}
Have a good weekend,
Sander Sassen
Email: ssassen@hardwareanalysis.com
Visit us at: http://www.hardwareanalysis.com
http://www.microsoft.com/backstage/whitepaper.htm
-Kevin
1) use multiple machines / round robin DNS
2) use decent speed hardware but stay away from
'top of the line' stuff (fastest processor,
fastest drives) because they usually are not
more reliable
3) replicate your databases to all machines so
db access is always LOCAL
4) use a front end cache to make sure you use
as little database interaction as you can
get away with (say flush the cache once per
minute)
5) use decent switching hardware and routers, no
point in having a beast of a server hooked up
to a hub now is there...
that's it ! reasonable price and lots of performance
MP3 Search Engine
- Use lots, and I mean lots of graphics. Cute ones, animated ones, you name it and people expect to see them. Skimping here will hurt your image.
- CSS style sheets may be the way of the future, but just for now make sure you include dozens or even hundreds of font tags, color tags, and tables in your site. Trust us. This has the added benefit of increasing your page file size by at least 30%. You do want a robust site right?
- Make sure you are serving plenty of third party ads! Their bandwidth matters also, and you know the way to make money on the web is be serving lots of "fun" animated ads. This will not slow down the user experience of your site one bit! Those ad people are slick, they know that you are building a high bandwidth / high performance site and will be expecting the traffic.
- A site is not a high performance site until is has withstood the infamous Slashdot effect. You will want to post a link to your site on
/. post haste to begin testing.
That should be enough to get you started. Now you too can build a rocking 200K per page site, and having read our hardware guidelines, you can expect it to perform just as well as ours did. One more free tip: Placing a cool dynamic hit counter or traffic meter on your site in a prominent position will encourage casual visitors to hit the reload button again and again, driving the performance of your site through the roof.Have a good weekend,
Sander Sassen
Email: ssassen@hardwareanalysis.com
Visit us at: http://www.hardwareanalysis.com
3) replicate your databases to all machines so
db access is always LOCAL
This is probably a bad idea. Accessing the database over a socket is going to be much less resource intensive than accessing it locally. With the database locally, the database server uses up CPU time and disk I/O time. Disk I/O on a web server is very important. If the entire database isn't cached in memory, then it is going to be hitting the disk. The memory used up caching the database cannot be used by the OS to cache web content. A separate database server with a lot of RAM will almost always work better than a local one with less RAM.
This Apache nonsense of cramming everything into the webserver is very bad engineering practice. A web server should serve web content. A web application should generate web content. A database server should serve data. These are all separate processes that should not be combined.
I'd post sooner, but it took forever to get to the article.. here are my thoughts...
First off SCSI.
IDE drives are fast in a single user/workstation environment. As a file server for thousands of people sharing an array of drives? I'm sure the output was solid for a single user when they benched it... looks like /. is letting them know what multiple users do to IDE. 'Overhead of SCSI controller'... Methinks they do not know how SCSI works. The folks who share this box will suffer.
Heat issues with SCSI. This is why you put the hardware in a nice climate controlled room that is sound proof. Yes, this stuff runs a bit hot. I swear some vendors are dumping 8K RPM fans with ducting engineered to get heat out of the box and into the air conditioned 8'x19" chassis that holds the other 5-30 machines as well.
I liked the note about reliability too... it ran, it ran cool, it ran stable for 2 weeks. I've got 7x9G Cheetahs that were placed into a production video editing system and ran HARD for the last 5+ years. Mind you, they ran about $1,200 each new... but the down time cost are measured in minutes... Mission critical, failure is not an option.
OS
Lets assume the Windows 2000 Pro was service packed to at least SP2... If that is the case, the TCP/IP stack is neutered. Microsoft wanted to push people to Server and Advanced Server... I noticed the problem when I patched my counter strike server and performance dogged on w2kpro w/sp2 - you can find more info in Microsoft's KB... (The box was used for other things too, so be gentle) Nuking the TCP/IP stack is was the straw that cracked my back to just port another box to Linux and run it there.
Red Had does make it easy to get a Linux box up and running, but if this thing is going outside the firewall, 7.3 was a lot of work to strip out all the stuff that are bundled with a "server" install. I don't like running any program I did not actually install myself. For personal boxes living at my ISP, I use slackerware (might be moving to gentoo however). Not to say I'm digging through the code or checking MD5 hashes as often as I could, but the box won't even need an xserver, mozilla, tux racer, or anything other than what it needs to deliver content and get new stuff up to the server.
CPU's (really a chassis problem):
I've owned AMD's MP and Intel's Xeon dually boards. These things do crank out some heat. Since web serving is usually not processor bound, it does not really matter. Pointing back to the over heating issues with the hard drives, these guys must have a $75 rack mount 19" chassis. Who needs a floppy or CD-ROM in a web server? Where are the fans? Look at the cable mess! For god's sake, at least spend $20 and get rounded cables so you have better airflow.
+++ UGUCAUCGUAUUUCU
Their IDE-RAID is actually software RAID. The SCSI myth can go off the shelf, sure, but don't take the RAID myth down.
The promise FastTrak and Highpoint and a few others are not actually hardware RAID controllers. They are regular controlers with enough firmware to allow BIOS calls to do drive access via software RAID (located in the firmware of the controller), and OS drivers that implement the company's own software RAID implementation at the driver level, thereby doing things like making only one device appear to the OS. Some of the chips have some performance improvements over a purely software RAID solutions, such as the ability to do data comparisons between two drives in a mirror during reads, but that's about it. If you ever boot them into a new install of windows without preloading their "drivers", guess what? Your "RAID" of 4 drives is just 4 drives. The hardware recovery options they have are also pretty damned worthless when it comes to a comparison with real RAID controllers - be they IDE or SCSI.
A good solution to the IDE RAID debacle are the controllers by 3Ware (very fine) or the Adaptec AAA series controllers (also pretty fine). These are real hardware controllers with onboard cache, hardware XOR acceleration for RAID 5 and the whole bit.
Anyway, I'm not really all that taken aback that this webserver is floundering a bit, but seems really responsive when the page request "gets through," so to speak. If it's not running low on physical RAM, it's probably got a lot of processes stuck in D state due to the shit promise controller. A nice RAID controller would probably have everything the disks are thrashing on in a RAM cache at this point.
~GoRK
Ooh! Ooh! I really want you guys to teach me how to build a high performance webserver! What's that? You can't, because your webserver is down? Curses!
(Obligatory disclaimer for humor-impaired: yes I understand that the slashdot effect is generally caused by lack of bandwidth rather than lack of webserver performance.)
Karma: Incomprehensible (Mostly affected by posting at +5, reading at -1, and metamoderating everything unfair.)
Too bad "millions of page views every month" is simply not even in the realm that would require "High-Performance Web Server"(s). These guys need to come back and write an article once they've served up 5+ million page views per day. Not hits. Page views.
The mere fact that they recommended 7200 rpm Western Digital drives for their high performance system gives me the impression they haven't a clue.
I disagree with the assertion that a 10,000 rpm SCSI drive is more prone to failure than a 7,200 IDE drive because it "moves faster". I've had far more failures with cheap IDE drives than with SCSI drives. Not to mention that IDE drives work great with minor loads, but when you start really cranking on them, the bottlenecks of IDE start to haunt the installation.