Real World Webserver Price vs. Performance Figures?
Borgoth asks: "At my company we just broke 10 million pageviews per day. We use 5 2-processor 1U off-the-shelf Intel boxes running Apache, Linux, mod_perl, and MySQL. This averages out to about 2 million pageviews per day per server (about 20 million hits/server, including images). Most of our pages have some dynamism using mod_include SSIs, and maybe one pageview in five directly results in a db query. We think we should be pretty happy that we're doing so much with so little, but we don't really have any idea how much horsepower other sites are using in their server farms. So, what sort of webfarms do Slashdot readers maintain, and how does their performance compare?"
You probably won't find a whole lot of comparable situations, even on slashdot, except maybe slashdot itself.
But if you give us the URL of your web site, the kind folks here at /. would be happy to give it a load test for ya. :)
No I didnt spell check this post...
It would be nice if the editors read their own site, then maybe you'd get some good answers.
If they maintained their own servers.
If they're not already compiling an answer that isn't a flippant troll like this;)
Jesus was all right but his disciples were thick and ordinary. -John Lennon
My buddy over at Oesterly.com seems to think that a Pentium 100 and 128MB is sufficient.
It's a bit like saying "we just shipped 5000 thingies last month using 3 vehicles". Um, 5000 beanie babies or 5000 tractor engines?
Was the vehicle a rowboat or a train?"
Every site is different. I don't really care that the servers are 1U at the expense of telling us things like how large the database is and is it mostly cached reads or read-write activity? How big is the pipe? What is the CPU speed and RAM size? What is the speed and type of disk? How many bytes are transferred?
Incidentally, a much more important number is peak capacity, ie. what is your 5 minute peak load? Whatever you can reasonably handle for 5-10 minutes you can probably handle constantly but a supposedly high-volume site can melt down when the site gets flashed up on the morning news or Slashdot.
~~~~~~~
"You are not remembered for doing what is expected of you." - Atul Chitnis
Imagine a bewolf cluster of these! DOES IT RUN LINUX?! now?! does it?! DOES IT?! *IMPLODES* wtffffffffffffffffffffffffffffffff
To a purely static server, like thttpd. Then you can focus the dynamic servers on serving purely dynamic stuff, and optimize accordingly. Also, MySQL 4's query cache is a great thing, so if you're not using it yet, look into it.
Who cares what everyone else does?
What is your system load? If it's less than 1, you've got processor power to spare. If it's more than one, you could add more processors IF you think that site response is too slow.
What is the throughput to your disks? Actually benchmark this with vmstat or something like that. If that shows that your disks are constantly maxed you could get more servers to spread the disk activity around, or you could build a faster disk subsystem if you've got a centralized database. Smart architecting helps too. Don't run the database on the same processors that run scripts and serve pages. Use the database load handling features to improve that specific part of the site. See what pages you can generate statically - I doubt that every single page on a site needs to be from the database.
Get your stinking paws off me you damn dirty ape
Here is my anecdotal evidences for the site I run:
The total outfit is 8 servers, 6 active: 1 DB Server with one hot backup (dual P-III 750, 1.5GB), 4 web servers (~1.1ghz, 1GB), 1 uniproc dedicated image server (1ghz, 1GB) with a hot backup.
The 4 web servers toss a combined total of about 1.5 million pageloads a day, of which 1.4 mil are dynamically generated using FastCGI/Perl and that others are shtml and stylesheets. A lot of the data that is queried from the DB server can and is cached on the web heads for better performance so that during peak times the server doesn't have to do much more than 80 queries/sec. The image server using stock Apache 1.3 however, does something like 3m serves a day without much sweat since it's all static content.
All told that works out to each web server doing something like 325,000 pageviews a day. I don't have a barometer of whether that's good or not, but honestly I worry more about bandwidth than computrons.
I think you should be pretty happy with what you're doing. I don't know of the current figures, but last september Slashdot was doing 2.4m pageviews a day with ~10 web heads (as gleaned from 'Taco's journal). Understand that's not an apples to apple comparison since I guess you're serving more static content while slashdot (and my site) are by and large dynamic.
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
I'd say that you should probably talk to JW Smythe. He posted on an article, not too long ago, on bandwidth and porn. From his post he seems like someone who would be able to help you with your question.
Frankly, I don't think that even Slashdot gets as many page views per day, as you do.
My company works off a dual 933 with 2 gigs of ram and is currently serving out 1 million pageviews per day. Most of the site is cached with PHP/MYSQL
You are approaching the point where the information you'll get from others won't apply to you, because you are pioneering new territory with your company's technology.
You have a website that has its needs. I can't imagine what kind of application you are using, how much memory it needs, whether it is processor intensive or disk intensive, or both. Depending on how your website works, there are a variety of solutions available. One solution to one problem might actually cause more problems for you if applied inappropriately.
It might make a lot of sense to consolidate the database onto an advanced server -- with 2 procs, RAID SCSI drives, and a fair amount of memory. It might make a lot of sense to get cheaper boxes with more memory and only one processor to run the web servers. Perhaps you can mount them all off of one giant NFS file server, and have the data that the web servers need held in a cache on the web server. It might make a lot of sense to go talk to IBM and Sun and see what they have to offer as well. It might also make a lot of sense to redesign the way your web application works to reduce the load.
But no one can tell you the right way to do it, because your situation is unique. No one can even give you a good estimate of cost. Your best bet if you are truly lost is to hire someone to analyze your code, your servers, and your needs, and come up with a plan. Those guys cost a bit of money, and finding a good one is near impossible. You're better off at studying up on what your website really needs and experimenting with possible solutions.
This is where you start to realize why web people can earn up to 6 digits. We don't just design web sites or program applications. We have to make sure they scale as well.
The radical sect of Islam would either see you dead or "reverted" to Islam.
You are using mod_gzip, aren't you? Depending on content, you may be able to reduce your bandwidth usage by 50%, at the expense of some CPU time.
You neglected to mention what DBMS you use. Or is it a given nowadays that everybody uses MySQL?
Which is my cue for my usual anti-MySQL flame. Except that it's old, I'm tired of doing it, you've all heard it. Still, I'd like to see some serious benchmarks comparing MySQL with PostgreSQL, Firebird, and Berkeley DB. With attention to realistic web-style queries, scalability and (except for Berkeley DB, of course) complex queries.
I just got this puppy w/1GB ECC memory and it is doing a fine job even with the high demand for college party pics you owe it to yuorself to checkout the elevated horizontal body shot and the wet-fun-fountain photos :-)
Offtopic -2, Lovely Ladies +5
As many others have pointed out the question really should have been "What setup are you running MY site on and how much traffic are you handling?" This is, in no way, apples to apples.
We are comfortably serving 2.5M dynamic generated pageviews every month across 3 webheads, 1 software load balancer and two large DB servers. This is all mod_perl work here. Last I looked we were doing about 1.5TB/month in bandwidth from these dynamic pages.
Webhead data (currently 3, adding 2 more soon):
2x1.67Ghz Athlon
3GB Ram / 18GB SCSI Disk (only used for logs, content is read over NFS)
LB data (we're moving this to a CISCO CSS 11050):
1x1.4Ghz PIII
2GB Ram / Disk unimportant, it's never touched.
Software load balancer: Pound, quite an amazing piece of software.
DB server (one live, one hot-spare)
4x1.6Ghz Xeon (PowerEdge 6650)
4GB Ram / Big ass disks and a 40GB database
MySQL currently sees about 500-600 queries per second on the DB. We need to implement more server-side caching though, we are seeing an alarming 54% query cache hit rate (4.0.12).
One thing I'm looking at is less computation on the forward-facing webservers. Instead, using SOAP to build the page components from a separate cluster of application servers. Preliminary testing is promising.
We got about a million pages a day or so... 95% or higher was highly dynamic database driven (plus there was a very active forums section). 5 1U 2CPU webservers (apache/php/coldfusion), 3 database servers (mysql w/ replication), and were at prob 40 to 50 percent capacity... would have been much better if there was more caching and/or a better indexed database
Too many people seem to concentrate on processing power and hardware while neglecting the software side of things.
Using a web server which pre-forks (example-- Apache 1.3x), is probably the best way to dramatically reduce performance and scalability in most situations. The sheer number of processes under high load makes most schedulers crap themselves in most situations.
Multithreadedness, an example is Apache 2.x, can greatly improve performance and scalability as can single process, single threaded multiplexing non-blocking IO based web servers such as Thttpd, BOA or Zeus.
Once one has selected a server which works effeciently for them given their content, fine tuned their OS, then one can move towards actual processing power and system throughput.
Think about your network, load balancing, and other sorts of issues.
For example, I had a site that I ran for a while that was fairly poorly built from an application perspective. However, the client had prepped a flash load (ie: a bursty, concentrated load) for a specific time period.. and I had about a month to prepare. The problem was that we couldn't rewrite the apps part of the site to ease the congestion, nor could we rewrite some apps to be distributed to multiple servers. (They stored state on the server..)
So, I brought in a Foundry ServerIron, and used the URL switching to map all static files/items to a pair of Ultra 5 workstations. These had a bunch of memory and had iPlanet Enterprise Server configured with very agressive caching parameters. For the dynamic content, I also increased any caching parameters available.
(This is high level, but you get the idea. Basically, serve as much out of memory as possible.. other tuning issues.. turn off name resolution obviously.. make sure you aren't I/O bound.. or network bound for that matter.)
The day came around and we served 5 or 6 million hits in two hours or so.. the average load on the servers was around 0.1. In fact, even on the servers with the static content getting lots of hits, there was only really disk activity when access logs were flushed to disk (Every 30 seconds)..
So, don't just think about servers.. consider all options when trying to balance and handle your load.
I'd like to point out I talked to the author of INNOdb tables, and he assured me that innodb tables use a "double write MVCC" mechanism to assure the D part of ACID. So, the only part MySQL is still missing is check constraints to be ACID.
--- It is not the things we do which we regret the most, but the things which we don't do.
Lotus/IBM Domino GoWebserver 4.6.2.#
Equipment Needed to handle 2 million
page views a day: P2 350 256MB RAM. PII 400 for
MySQL. OS/2 Warp Server for e-Business (latest
release is Feb 2003 - release(s) we've used are
2003, 2001, 1999 and Warp Server Advanced 1996). The SQL server can sometimes come close to breaking a sweat, but (1) we have a dual SMP box waiting for us to work up the initiative to make the switchover, and (2) the web server doesnt come close to breaking a sweat - sometimes I wonder if it knows it's even doing anything. Memory is always a good thing to increase though for dynamic pages. Shame DominoGo is kinda tough to come by - comes with WSeB releases and as an obscure security option (sold with another IBM update package under that package's name), and that performance is only there under Warp (and maybe AIX) because of Domino Go's extensively intensive use of threads (up to 4,000 per CPU).
Oh - and unless you have extremely fast disks and caching controllers, JFS (included) or HPFS386 (add-on) is the better choice over HPFS due to larger (up to the machine's max available RAM) caches and the fact that they are designed to pipeline the data in a better fashion to the httpd or directly to the network card(s). HPFS (either variant) will save you the chore of needing to de-fragment, which is its big advantage over JFS if you are serving lots of files - IBM calls it "fragmentation resistant" though it's near fragmentation proof... I've got machines with thousands of directories and in many of those, thousands of files, some written as long ago as 7 years, and low single digit fragmentation for the drives.
Sorry to say, nothing yet I've tried comes even remotely close... IBM really blew it... this is what Warp Server Advanced was designed for... on one CPU it still beats NT on 4 CPUs for this type of serving (or any actually), and WSeB 1999 till present is even faster and better optimized. We've worked with some big clients (on the graphical end of web stuff) who have each opted for various other web solutions (Linux, NT, 2K, XP, Be, etc), one big one had MS's help in setting up and installing their network (big cash outlay, big stake in getting the results wanted). They ended up with 6 dual CPU boxes that still hit 15% or more "Server too busy" errors to try to match our traffic.
Next sad note is that there are numerous IBM 4-way, 8-way (and even a few 16, 32, and occassionally that rare 64way) boxes out there that are VERY cheap (some 4 ways in the $700 range) - they're used and refurbished (sometimes by IBM themselves) - and Warp Server (Advanced or eBusiness) flies on them... native 64-way per node support. I know it can do clusters, but not sure how many nodes it is designed for... but with up to 64 in a single node, why would anyone want to?
WSeB (with DominoGo, WebSphere, and the availability from IBM or elsewhere of MySQL, DB/2, Notes and more) is still for sale - as is eComStation and eComStation Pro (SMP version) - though eCS doesnt come with DominoGo AFAIK. Finding WSeB for sale may be a pain though...
Probably not a viable solution for you, but we and numerous of our customers arent turning back any time soon. Good luck with what you do choose though...