Real World Webserver Price vs. Performance Figures?
Borgoth asks: "At my company we just broke 10 million pageviews per day. We use 5 2-processor 1U off-the-shelf Intel boxes running Apache, Linux, mod_perl, and MySQL. This averages out to about 2 million pageviews per day per server (about 20 million hits/server, including images). Most of our pages have some dynamism using mod_include SSIs, and maybe one pageview in five directly results in a db query. We think we should be pretty happy that we're doing so much with so little, but we don't really have any idea how much horsepower other sites are using in their server farms. So, what sort of webfarms do Slashdot readers maintain, and how does their performance compare?"
I've been looking at this laely. Most sites seem to be able to do a million pages per webhead.
The answer for slashdot is more complex because we have three groups.
Article/comment servers can handle 200K of pages views a piece.
Index/All can handle 100K.
Static/XML can take a million per server.
I have a fix that goes in this week which should up Article/Comment, for index I am looking at a new system for caching the stories that should increase the index servers.
You can't grep a dead tree.
Who cares what everyone else does?
What is your system load? If it's less than 1, you've got processor power to spare. If it's more than one, you could add more processors IF you think that site response is too slow.
What is the throughput to your disks? Actually benchmark this with vmstat or something like that. If that shows that your disks are constantly maxed you could get more servers to spread the disk activity around, or you could build a faster disk subsystem if you've got a centralized database. Smart architecting helps too. Don't run the database on the same processors that run scripts and serve pages. Use the database load handling features to improve that specific part of the site. See what pages you can generate statically - I doubt that every single page on a site needs to be from the database.
Get your stinking paws off me you damn dirty ape
Here is my anecdotal evidences for the site I run:
The total outfit is 8 servers, 6 active: 1 DB Server with one hot backup (dual P-III 750, 1.5GB), 4 web servers (~1.1ghz, 1GB), 1 uniproc dedicated image server (1ghz, 1GB) with a hot backup.
The 4 web servers toss a combined total of about 1.5 million pageloads a day, of which 1.4 mil are dynamically generated using FastCGI/Perl and that others are shtml and stylesheets. A lot of the data that is queried from the DB server can and is cached on the web heads for better performance so that during peak times the server doesn't have to do much more than 80 queries/sec. The image server using stock Apache 1.3 however, does something like 3m serves a day without much sweat since it's all static content.
All told that works out to each web server doing something like 325,000 pageviews a day. I don't have a barometer of whether that's good or not, but honestly I worry more about bandwidth than computrons.
I think you should be pretty happy with what you're doing. I don't know of the current figures, but last september Slashdot was doing 2.4m pageviews a day with ~10 web heads (as gleaned from 'Taco's journal). Understand that's not an apples to apple comparison since I guess you're serving more static content while slashdot (and my site) are by and large dynamic.
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
I'd say that you should probably talk to JW Smythe. He posted on an article, not too long ago, on bandwidth and porn. From his post he seems like someone who would be able to help you with your question.
Frankly, I don't think that even Slashdot gets as many page views per day, as you do.
My company works off a dual 933 with 2 gigs of ram and is currently serving out 1 million pageviews per day. Most of the site is cached with PHP/MYSQL
You are using mod_gzip, aren't you? Depending on content, you may be able to reduce your bandwidth usage by 50%, at the expense of some CPU time.
As many others have pointed out the question really should have been "What setup are you running MY site on and how much traffic are you handling?" This is, in no way, apples to apples.
We are comfortably serving 2.5M dynamic generated pageviews every month across 3 webheads, 1 software load balancer and two large DB servers. This is all mod_perl work here. Last I looked we were doing about 1.5TB/month in bandwidth from these dynamic pages.
Webhead data (currently 3, adding 2 more soon):
2x1.67Ghz Athlon
3GB Ram / 18GB SCSI Disk (only used for logs, content is read over NFS)
LB data (we're moving this to a CISCO CSS 11050):
1x1.4Ghz PIII
2GB Ram / Disk unimportant, it's never touched.
Software load balancer: Pound, quite an amazing piece of software.
DB server (one live, one hot-spare)
4x1.6Ghz Xeon (PowerEdge 6650)
4GB Ram / Big ass disks and a 40GB database
MySQL currently sees about 500-600 queries per second on the DB. We need to implement more server-side caching though, we are seeing an alarming 54% query cache hit rate (4.0.12).
One thing I'm looking at is less computation on the forward-facing webservers. Instead, using SOAP to build the page components from a separate cluster of application servers. Preliminary testing is promising.
We got about a million pages a day or so... 95% or higher was highly dynamic database driven (plus there was a very active forums section). 5 1U 2CPU webservers (apache/php/coldfusion), 3 database servers (mysql w/ replication), and were at prob 40 to 50 percent capacity... would have been much better if there was more caching and/or a better indexed database
If it is linux you are wrong in a couple of ways.
1. Linux maps treads to processes so you get a mass of processes anyway.
2. If you want to run things that are not tread safe like PHP you have to pre-fork. In fact PHP's web site states not to run PHP, Apache, and UNIX-like OS 2.x on any production web site. Beause most libs for are not thread safe. Which means mod_perl and mod_* are going to have the same problems. It may work it may not that is not what I want to base my job on.
3. Single process, single threaded web servers are nice for static content but not for most modern hign volumn web sites which have lots of dynamic parts.
One day people will learn the folly of Winbloze, Linux Rules!