High-Performance Web Server How-To
ssassen writes "Aspiring to build a high-performance web server? Hardware Analysis has an article posted that details how to build a high-performance web server from the ground up. They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month."
I'd suggest everybody with the need of a high-performance web server to try out
fnord. It's extremely small, and pretty fast (without any special performance hacks!), see here.
A monkey is doing the real work for me.
The guys use 10'000 RPM drive for "reliabilit" and "performance" ... 10k drives are LESS reliable, since they move faster. Moreover, they're not even necessarily that faster.
Computer hardware is so fast relative to the amount of traffic coming to almost any site that any web server is a high-performance web server, if you are just serving static pages. A website made of static pages would surely fit into a gigabyte or so of disk cache, so disk speed is largely irrelevant, and so is processor speed. All the machine needs to do is stuff data down the network pipe as fast as possible, and any box you buy can do that adequately. Maybe if you have really heavy traffic you'd need to use Tux or some other accelerated server optimized for static files.
With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.
In other words: an article on 'building a fast database server' or 'building a machine to run disk-intensive search scripts' I can understand. But there is really nothing special about web servers.
-- Ed Avis ed@membled.com
.. if their webservers are as reliable as the ones in the article..
:P
i guess there's only one way to find out..
slashdotters! advance!
There is no useful information in that infomercial. They seem to have judged "reliability" through vendor brochures and in a couple days; reliability is when your uptime is > 1 year.
This article should be called "M. Joe-Average-Overclocker Builds A Web Server".
This quote is funny:
That brings us to the next important component in a web server, the CPU(s). For our new server we were determined to go with an SMP solution, simply because a single CPU would quickly be overloaded when the database is queried by multiple clients simultaneously.
It's well known that single CPU computers can't handle simultaneous queries, eh!
If we were to use, for example, Microsoft Windows 2000 Pro, our server would need to be at least three times more powerful to be able to offer the same level of performance.
"three times?" Can somebody point me to some evidence for this sort of rather bald assertion?
The article seemed way too focused on hardware.
Anyone who's ever worked on a big server in this cash-strapped world will know that squeezing every last ounce of capacity out of apache and your web applications needs to be done.
* I prefer SCSI over IDE
* RedHat is a pain to strip down to a bare minimum web server, I prefer OpenBSD. Sleek and elegant like the early days of Linux distros.
* I've used Dell PowerEdge 2650 rackmount servers and they're VERY well made and easy to use. Redundant power supplies, SCSI removable drives, good physical security (lots of locks).
Every time that you click on a link and get bumped back to the front page here on Slashdot, it's a failure of mysql. So much for high-performance.
Why hasn't Slashdot changed to postgresql?
I thought this was a good question, if slightly off-topic.
I know that in the server market you often go for tried-and-tested rather than latest-and-greatest, and that the Pentium III still sees some use in new servers. But 1.26GHz with PC133 SDRAM? Surely they'd have got better performance from a single 2.8GHz Northwood with Rambus or DDR memory, and it would have required less cooling and fewer moving parts. Even a single Athlon 2200+ might compare favourably in many applications.
SMP isn't a good thing in itself, as the article seemed to imply: it's what you use when there isn't a single processor available that's fast enough. One processor at full speed is almost always better than two at half the speed.
-- Ed Avis ed@membled.com
Step one: Submit story on high performance web servers.
Step two: ???
Step three: Die of massive slashdotting, loss of reputation and business
Still, if someone has a link to a cache...
Karma:This parrot is dead! (and so is the joke.)
... Don't forget to post an article on /. so you can actually measure high-volume bulk traffic.
/content/article/1549/ HTTP/1.0
[~] edwin@topaz>time telnet www.hardwareanalysis.com 80
Trying 217.115.198.3...
Connected to powered.by.nxs.nl.
Escape character is '^]'.
GET
Host: www.hardwareanalysis.com
[...]
Connection closed by foreign host.
real 1m21.354s
user 0m0.000s
sys 0m0.050s
Do as we say, don't do as we do.
bash$
Maybe it's their idea of a stress test. It's kinda like testing a car's crash durability by parking it in front of an advancing tank.
-Kevin
An article about creating high performacne webservers being slashdotted
Microsoft IIS is to webserving as KFC is to healthy eating
Server has nothing to do with it.
10,000 slashdotters * 500k pages = 5gigs in about an hour.
these figures are both estimates, but you can see that network congestion is obviously more of a bottleneck than their performance server.
Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
Many other people will likely post a comment like mine, if they haven't already. But hey, karma was made to burn!
According to my computer clock and the timestamp on the article posting, it's only been about 33 minutes (since the article was posted). Even so, it took me over a minute to finally receive the "Hardware Analysis" main page. The top of that page has:
Draw your own conclusions.
Furry cows moo and decompress.
> One processor at full speed is almost always better than two at half the speed.
You can safely drop that 'almost'.
1. goto here :)
2. click buy
3. upon delivery open box and plugin
4. turn on Apache with the click of a button
5. happily serve up lots of content
6. (optional) wait for attacks from ppl at suggesting using apple hardware...
I don't understand.
Their article is about building a high performance web server, and they tell people to use Apache.
Apache is featureful, but it has never been designed to be fast.
Zeus is designed for high performance.
The article supposes that money is not a problem. So go for Zeus. The Apache recommendation is totally out of context.
{{.sig}}
The article is about *WEB* high performance.
I don't see your point. "ping" has never been designed to benchmark web servers AFAIK.
My servers don't answer to "ping". Does it mean that the web server is down? Noppe... it's up a running...
"ping" is not an all-in-one magic tool. By using "ping" you can test a "ping" server. Nothing else.
{{.sig}}
Have a good weekend,
Sander Sassen
Email: ssassen@hardwareanalysis.com
Visit us at: http://www.hardwareanalysis.com
http://www.microsoft.com/backstage/whitepaper.htm
-Kevin
Servers will generally carry on pinging even if they're heavily overloaded. Lag or missing packets is generally either a congested or bad link.
1) use multiple machines / round robin DNS
2) use decent speed hardware but stay away from
'top of the line' stuff (fastest processor,
fastest drives) because they usually are not
more reliable
3) replicate your databases to all machines so
db access is always LOCAL
4) use a front end cache to make sure you use
as little database interaction as you can
get away with (say flush the cache once per
minute)
5) use decent switching hardware and routers, no
point in having a beast of a server hooked up
to a hub now is there...
that's it ! reasonable price and lots of performance
MP3 Search Engine
I was really excited to see this article, because oddly enough I am seriously considering setting up my own webserver. In fact am thinking of running slashcode. So far everyone has been saying that the article generally sucks. So the question remains where should I start? I was thinking of buying a few of my company's used PCs and building a cluster... that scares me a bit, as I'm not a computer genius, but I can get a great deal on these computers (between 5 and 10 500mhz wintel computers)
OK, I know that was rambling so to recap simply, is it better to go with a expenive single MP solution like the article, or with a cheaper cluster of slow/cheap computers
Business News and Resources: www.usasource.net
What kind of 'high performance' web server uses back-leveled software? Apache 2.x may not be totally API compliant, but it certainly provides more than 1.3x in terms of performance.
I am glad they used an IDE RAID, however. The SCSI myth can now go on the shelf.
- Use lots, and I mean lots of graphics. Cute ones, animated ones, you name it and people expect to see them. Skimping here will hurt your image.
- CSS style sheets may be the way of the future, but just for now make sure you include dozens or even hundreds of font tags, color tags, and tables in your site. Trust us. This has the added benefit of increasing your page file size by at least 30%. You do want a robust site right?
- Make sure you are serving plenty of third party ads! Their bandwidth matters also, and you know the way to make money on the web is be serving lots of "fun" animated ads. This will not slow down the user experience of your site one bit! Those ad people are slick, they know that you are building a high bandwidth / high performance site and will be expecting the traffic.
- A site is not a high performance site until is has withstood the infamous Slashdot effect. You will want to post a link to your site on
/. post haste to begin testing.
That should be enough to get you started. Now you too can build a rocking 200K per page site, and having read our hardware guidelines, you can expect it to perform just as well as ours did. One more free tip: Placing a cool dynamic hit counter or traffic meter on your site in a prominent position will encourage casual visitors to hit the reload button again and again, driving the performance of your site through the roof.Thanks, I wish I hadn't posted earily in this article so I could use my mod points. Now, my only question is how fast is decent speed? I'm about to build my own server (actually I'm going to have some help, but I want to at least sound like I know what I'm doing) nothing fancy. I don't expect a huge hit count or anything, so would using older (500-750 mhz)second hand computers, properly upgraded memory and storage, work? Also would you recomend replacing the powersuply. One the guys whoes helping me swears that will save me money in the long run on energy costs, but I don't know if its worth the cost.
Business News and Resources: www.usasource.net
Does building this high performace web server prevent you from being slashdotted?
Draw your own conclusions.
How nice of them to share that information.
The obvious conclusion is that my cable modem could take a minor slashdoting if Cox did not crimp the upload and block ports. Information could be free but thanks to the local Bell's efforts to kill DSL things will get worse until someone fixes the last mile problem.
The bit about IDE being faster than SCSI was a shocker. You would think that some lower RPM SCSIs set to strip would have greater speed and equivalent heating. The good IDE performance is good news.
Friends don't help friends install M$ junk.
3) replicate your databases to all machines so
db access is always LOCAL
This is probably a bad idea. Accessing the database over a socket is going to be much less resource intensive than accessing it locally. With the database locally, the database server uses up CPU time and disk I/O time. Disk I/O on a web server is very important. If the entire database isn't cached in memory, then it is going to be hitting the disk. The memory used up caching the database cannot be used by the OS to cache web content. A separate database server with a lot of RAM will almost always work better than a local one with less RAM.
This Apache nonsense of cramming everything into the webserver is very bad engineering practice. A web server should serve web content. A web application should generate web content. A database server should serve data. These are all separate processes that should not be combined.
The company I work for successfully runs our webserver(php & MySQL) on an old pentium 166. We have several thousand visitors every month & use it for an ftp site for suppliers, a router, firewall, gateway & squid server.
:)
I think that your 700mhz machine would work fine for just web pages.
I'd post sooner, but it took forever to get to the article.. here are my thoughts...
First off SCSI.
IDE drives are fast in a single user/workstation environment. As a file server for thousands of people sharing an array of drives? I'm sure the output was solid for a single user when they benched it... looks like /. is letting them know what multiple users do to IDE. 'Overhead of SCSI controller'... Methinks they do not know how SCSI works. The folks who share this box will suffer.
Heat issues with SCSI. This is why you put the hardware in a nice climate controlled room that is sound proof. Yes, this stuff runs a bit hot. I swear some vendors are dumping 8K RPM fans with ducting engineered to get heat out of the box and into the air conditioned 8'x19" chassis that holds the other 5-30 machines as well.
I liked the note about reliability too... it ran, it ran cool, it ran stable for 2 weeks. I've got 7x9G Cheetahs that were placed into a production video editing system and ran HARD for the last 5+ years. Mind you, they ran about $1,200 each new... but the down time cost are measured in minutes... Mission critical, failure is not an option.
OS
Lets assume the Windows 2000 Pro was service packed to at least SP2... If that is the case, the TCP/IP stack is neutered. Microsoft wanted to push people to Server and Advanced Server... I noticed the problem when I patched my counter strike server and performance dogged on w2kpro w/sp2 - you can find more info in Microsoft's KB... (The box was used for other things too, so be gentle) Nuking the TCP/IP stack is was the straw that cracked my back to just port another box to Linux and run it there.
Red Had does make it easy to get a Linux box up and running, but if this thing is going outside the firewall, 7.3 was a lot of work to strip out all the stuff that are bundled with a "server" install. I don't like running any program I did not actually install myself. For personal boxes living at my ISP, I use slackerware (might be moving to gentoo however). Not to say I'm digging through the code or checking MD5 hashes as often as I could, but the box won't even need an xserver, mozilla, tux racer, or anything other than what it needs to deliver content and get new stuff up to the server.
CPU's (really a chassis problem):
I've owned AMD's MP and Intel's Xeon dually boards. These things do crank out some heat. Since web serving is usually not processor bound, it does not really matter. Pointing back to the over heating issues with the hard drives, these guys must have a $75 rack mount 19" chassis. Who needs a floppy or CD-ROM in a web server? Where are the fans? Look at the cable mess! For god's sake, at least spend $20 and get rounded cables so you have better airflow.
+++ UGUCAUCGUAUUUCU
Not so...
.02 secs .00003 secs
You can cache with technologies like Sleepycat's DBM (db3).
We have a PHP application that caches lookup tables on each local server. If it cant find the data in the local cache, then it hits our Postgresql database. The local DBM cache gets refreshed every hour.
Typical comparison
-------------------
DB access time for query:
Local cache (db3) time:
We server load dropped from typical 0.7 to an acceptable 0.2, and the load on the DB server dropped like a rock! This is with over a million requests (no graphics, just GETS to the PHP script) every day.
We also tuned the heck out of Apache (Keepalive, # of children, life of children etc).
Some other things we realized after extensive testing:
1. Apache 2.0 sucks big time! Until modules like PHP and mod_perl are properly optimized, there's not much point in moving there.
2. AolServer is great for Tcl, but not for PHP or other plugin technologies
Because of all these changes, we were able to switch from a backhand cluster of 4 machines, back down to a single dual processer machine, with another machine available on hot standby. Beat that!
ICMP REPLY doesn't exist. Maybe you mean ICMP ECHO REPLY which has nothing to do with MTU discovery.
{{.sig}}
I'll just mention a couple of items:
1) For a high performance web server one *needs*
SCSI. SCSI can handle multiple request at one time and performs some DISK related processing compared to IDE that can only handle request for data single file and uses the CPU for disk related processing a lot more than SCSI does.
SCSI disk also have higher mean times to failure than SCSI. The folks writting this article may have gotten benchmark results showing their RAID 0+1 array matched the SCSI setup *they* used for comparison, but most of the reasons for choosing SCSI are what I mention above -- not the comparitive benchmark results.
2) For a high performance webserver, FreeBSD would be a *much* better choice than Redhat Linux. If they wanted to use Linux, Slackware or Debian would have been a better choice than Redhat Linux for a webserver. Ask folks in the trenches, and lots will concur with what I've written on this point due to mainenance, upgrading, and security concerns over time on a production webserver.
3) Since their audience is US based, It would make sense to co-lo their server in the USA. Both from the standpoint of how many hops packets take from their server to their audience, and from the logistical issues of hardware support -- from replacing drives to calling the data center if there are problems. Choosing a USA data center over one in Amsterdam *should* be a no brainer. Guess that's what happens when anybody can publish to the web. Newbies beware!!
You are pinging Sourceforge.
{{.sig}}
Ooh! Ooh! I really want you guys to teach me how to build a high performance webserver! What's that? You can't, because your webserver is down? Curses!
(Obligatory disclaimer for humor-impaired: yes I understand that the slashdot effect is generally caused by lack of bandwidth rather than lack of webserver performance.)
Karma: Incomprehensible (Mostly affected by posting at +5, reading at -1, and metamoderating everything unfair.)
we serve up between 5 and 7 million pageviews daily to up to 100,000 individual IP's
Decent speed to me is one in which the server is no longer the bottleneck, in other words serving up
dynamic content you should be able to saturate the pipe that you are connected to.
I have never replaced the power supply because of energy costs, it simply isn't a factor in the
overal scheme of things (salaries, bandwidth, amortization of equipment)
500-700 Mhz machines are fine for most medium volume sites, I would only consider a really fast machine to break a bottleneck, and I'd have a second one on standby in case it burns up
MP3 Search Engine
Really? There was an earlier discussion on this topic. (Related to 9/11 or some other day with extremely high traffic.)
From that discussion I got the impression that what happens when you are bumped to the front page is that you have tried to access a story with non-standard setup. (What you get if you are logged in and change your view preferences.) The system is setup so that some servers only serve static content. (Because that's what most users view.)
During high load situations a dynamic request is sometimes sent to a static serving server. This is when you are bumped to the front page. (Unfortunately I couldn't find anything about this in the FAQ/About, so I can't verify it.)
Too bad "millions of page views every month" is simply not even in the realm that would require "High-Performance Web Server"(s). These guys need to come back and write an article once they've served up 5+ million page views per day. Not hits. Page views.
We tested it in the workshop by hooking it up to a 3Com X500 Terabit Switch, and using over 500 RedHat servers to ping -f. This baby handled it well -- the time we'd spent optimizing the Oracle backend really paid off.
Yeah. Or maybe I should just have more coffee...
Carousel is a lie!
1. load it full of pr()n /.
2. post the link on
3. check back in 30seconds
if it still works, it's high-performance
The mere fact that they recommended 7200 rpm Western Digital drives for their high performance system gives me the impression they haven't a clue.
I disagree with the assertion that a 10,000 rpm SCSI drive is more prone to failure than a 7,200 IDE drive because it "moves faster". I've had far more failures with cheap IDE drives than with SCSI drives. Not to mention that IDE drives work great with minor loads, but when you start really cranking on them, the bottlenecks of IDE start to haunt the installation.
Guy who didn't read the article makes an uninformed M$ bash and gets modded to four...
The Microsoft line was the poster's sig. Check your Slashdot preferences, there's an option to include a "--" between post content and sig. I don't know why this isn't on by default, it eliminates mistakes like this.
(I added the "--" to my sig myself because it seems a lot of people don't have this enabled)
I like my women like my coffee... pale and bitter.
This setup doesn't account for HA or scaleability. With hardware as cheap as it is today there is no excuse for not using multiple servers to avoid downtime, and allow for maintenace without taking the site down. Also what about backup, not even mentioned. Last I don't fully agree with the RAID 0 + 1. For a large database, but on a small setup like this I wouldn't. They article seems to imply the data is more read than write RAID 5 has better read performace.
So article was missing a lot for a professional setup.