Load Balancing Heavy Websites on Current Tech?
squared99 asks: "I have just delved into some research on a set up for very high traffic websites. I'm particularly interested in how many webservers would be needed at minimum and the type of technology powering them. Slashdot seemed like a good sample site to check out, so I went to Slashdot's technology FAQ to get a starting point. This setup seems to be from 2000, is most likely a bit out of date, and I'm assuming the same number of webservers would not be needed with current server technology. What would experts in the Slashdot community recommend as a required setup to handle Slashdot-like volumes, if they had to do it today using more current hardware? How many webservers could it be reduced to, while maintaining enough redundancy to keep serving pages, even under the heaviest of loads?"
Anything fairly new should be alright, I think the big problem is your pipe size. I mean if you have 30,000 new connections and only 300 kbs, its not goign to transfer data very well.
-------
Support Indy Music. Buy
Akamai
http://meta.wikimedia.org/wiki/Wikimedia_servers This is how they do it.
... and you've just missed your greatest opportunity for this by not providing a link to your website! ;-)
Paul B.
Many sites are moving towards utility based hosting or virtualized setups. The problem with high capacity sites is that you often end up having to purchase enough servers to deal with peak time, but don't need the servers during off hours. Utility based hosting services charge you for what you _use_ and allow you to scale as needed. Savvis (http://www.savvis.net/ I know offers a utility hosting platform based on Inkra, 3Par and blade servers. IBM has a similar setup.
Akamai for static content and take a look at livejournal's setup for dynamic content (master-master replication based on mysql).
Other people are much more qualified than I to answer the number of servers questions though.
groklaw, wired and slashdot. The holy trinity of work based time wasting.
RTF Server Load Balancing by Tony Bourke. After reading that book you will at least know what you need to look for. Also, you can outsource your load balancing if that is optimal for your needs using something like the Akamai's servers (Microsoft.com uses Akamai, Netcraft confirms).
Take Pound, a few web server machines, a database server and a NFS server (no Coda, AFS or GFS needed in most cases) and you should be set. This is a setup that I installed for a high traffic website and it is very stable.
From those, you will get an idea of the type and scope of technolgy the slash teams use to maintain one of the world's most popular sites.
Granted, your team is not as stilled as the crack techs at /. central, but the specs on that page will get you pointed in the right direction.
Yeah, right.
I've worked on both the Windows and Linux development sides of a shop that receives about a million hits a day on each side. On both sides, the bottlenecks where always the database.
Both sides used pretty much the same setup for webservers... 4 load balanced webservers with hyperthreading at around 3ghz (at the moment... always around 75% of the fastest processors out there to save money). These are sitting in datacenters with multiple 10gbps connections, and each has a hotswapable copy of the entire system running at another data center.
I've found SQL Server can take quite a bit more abuse than replicated mysql, but mysql is extremely fast. However, we have 4 admins for the mysql servers. I myself admin the SQL Servers in addition to my programming and tech support roles. Biggest downside to SQL Server is price.
My suggestion: tread very lightly on the database and you'll be able to handle more load than you'd expect. Cron static pages off the database when possible, or have static pages generated automatically when you update your database. Also look into caching mechanisms for frequently used data.
And as in an programming project, profile and tweek and measure and patch.
We use Foundry ServerIrons. We have two of them set up in an active/standby configuration. We've got approximately 35 servers (5 services) load balanced between the SI's, and average traffic over 50Mb/s just to those services. The SI's are very robust, and I'm quite pleased we got them.
I run quizilla.com, a pseudo-entertainment site that does 60-70 million pages a month, at least 2/3rds being dynamic database backed.
The site faq has the grity details, but basically everything is running on 8 web servers with a cluster of 4 database servers. Mod_perl is used for the most highly trafficed pages, though some less used pages are still static CGIs.
For the way I have it set up, this farm has reached it's limit with the web servers getting pegged pretty constantly during peak hours, and the database servers aren't far behind (mostly due to lack of ram).
The site makes heavy use of Memcached as well as a homebrew ghetto load balancing system based on apache mod_rewrite and some ansilary code.
If I had my druthers, I'd keep the number of machines but have the web heads be 2.8-3ghz Xeons or Opterons with 1.5 GB ram each and the database servers could be dual 1.8ghz xeons with at least 3GB ram each. Idea memcache would be at least 2GB, but more is always better. From my guess, a setup like that would run my site at 100mil quick pages a month, instead of like now where pages often take 5 seconds or more.
One big things that you don't really notice until you try to make things on this scale is that optimization is king -- optimize the hell out of your code. A stray regex might not look expensive, but when it's happening twenty times a second on every machine it quickly adds. up.
Code is almost always the weakest link in a big cluster in that seldom are things sufficiently planned -- I've had huge growing pains since I never planned on scalling past one machine so when i had to move to 2,3,4 and up to 8 is has been a real hassle making things work "right" in a massive cluster. Plan for clustering from the get-go if you even have the slightest inkling it will do high traffic volumes.
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
It is impossible to answer your question unless you define "heavy" traffic.
Some people might consider a hundred thousand pageviews per day to be heavy. Others might consider a million pageviews per day to be heavy.
From experience a hundred thousand for a reasonable application can be handled on one server. A million would probably require 2 to 4.
At my work we use Ultramonkey with LVS-kiss and Mon.
Our hardware infrastructure includes 2 load-balancers running in a failover system with 3 web servers in the backend (1.8ghz, 512ram, 40gig hdd, 100mbps network) systems. That hosts over 60 million page views a month, it also supports real-time failover. For monitoring there are tools out there that use MRTG/RRD for cluster statistics.
Check out Mon and Mon.cgi
Check out http://www.netscaler.com/>. The companies behind the top 10 websites on the internet have, maybe you should too.
Disclaimer, I work for Netscaler, but the customers we have gained should help in your decision.
It's somewhat dated but the FUD busting response to the Mindcraft fiasco has all the formulas on how to figure out what you hardware you need for your pipe. You only need to plug in current processor specs to see what you need. I could only find it in the archives: http://web.archive.org/web/20040409223206/http://c s.alfred.edu/~lansdoct/mstest.html
As previously mentioned, Pound is a wicked, lean load balancer/HA arbitrator that runs well on Linux.
-psy
NFS sucks. Use something like CVS to keep your webroots in sync and have each server host it's own copy of the content.
You get to get out a massive single point of failure (the NAS) and you get a little closer to linear scalability (adding another webserver doesn't put more load on your NFS box).
-- DrZaius - Minister of Sciences and Protector of the Faith
Outsource to geocities /ducks
Hmm... would your experience be MySQL-based? :P
Carnage Blender serves about 1M pages a day off a single (dual 2.4GHz xeon; 4 GB ram) machine. Those are database-backed pages, with a lot more updates than most read-only-ish sites.
Powered by postgresql. And a lot of tuning.
Some examples here. The examples are heavy on Corporate speak, but you were asking about a large Web/Content architecture, right?
This page shows the server specifications for some of the busiest message boards on the internet, along with what software they use. You'll see the configurations are quite disparate, from the 90+ servers serving the anime fans at Gaia Online on 100% open source software, to the sometimes single server hosting some of the other top 50 forums.
Hey, it's one method.
Hey man, I love Carnage Blender!
I had to stop playing because it was too addictive, but you've got a cool thing there.
But before I switched, I got demos from all three players and put them in a head-to-head contest. I would suggest doing the same. In a lab setting, we couldn't hit the devices hard enough to pick a clear winner based on performance. When I looked at administration and features, the F5 pulled to the front.
The GUI is clear and concise for those who like GUI's. The OS in the old version (4.X) was BSD, while the new version (9.X) is Linux based. You get full root access to the box, so you can write scripts in the shell language of your choice. The last time I worked with the CSS it had a proprietary scripting language; it may be different now.
The power for me as a network guy is the scripting. I can rebuild our entire site in about 30 seconds thanks to the scripts. I've also been able to move some of the administration to admins that normally wouldn't be qualified to work with a load balancer because I can give them scripts and restrict their access so that they can only run the scripts I specify.
SSL proxies are a cool feature as well. It's not documented, but there is an easy process to convert IIS certificates to apache-style certificates for use on the F5. It helps a lot when you are adding/removing servers on a regular basis.
Our site takes enough hits to keep 4 servers loaded all day long. The load average on the F5 never hits 1.0. This is with about 40 Mbps of traffic running through the F5.
Still, with a plan, you only get the best you can imagine. I'd always hoped for something better than that. -CP
One mistake that I see lots of people make is use a PC-based load-balancer. A hardware device (Foundry ServerIron, Nortel Alteon, Cisco CSM, etc) is well worth the money (especially if you get it on ebay).
- 26 July 2004 article
- 27 Nov 2004 article
There are a few on that site about database server performance, too...Our web site serves about 3 millions pages a day on two gigabit links. The balancing rules are quite complex since the content is splitted on multiple servers and SANs. We use software load balancers: Zeus ZXTM on Gentoo Linux. The nice thing about software load balancers is that you can easily replace the hardware if it fails. Having a spare PC is way cheaper than a spare load balancer. We are very pleased with ZXTM so far. Very reliable, fast, and very flexible. It uses a PHP-like scripting language to process requests and you can really handle any specific backend architecture with that.
{{.sig}}
I know the topic is mostly about hardware, this is actually a pretty big thing that can get you in trouble.
If you're serving out static content, you need to make sure that your graphics are low in size whenever possible. For instance, you don't need a full page graphic like the top of the reply area here on slashdot, just the 1 corner graphic and the rest is a table color.
Going past the basics and assuming its dynamic content, you'll need to make sure your code is optimized (ie: not the type of code someone just slapped together). I can't tell you how big of a difference it can make when I take a page written when I started programming in PHP and adjusting it now that I now a heck of a lot more. Also, if you find that certain pages are more popular and your data isn't changing hour by hour, just cache the results of the page into static content. Every hour or so your cache can be dumped and setup again if needed. This stops a lot of reads off the server for the same exact data.
You'll also need to optimize the database as much as possible.
General things that can help with dynamic pages are simple things like timedate calls for pages - let javascript do that (although i don't like to use it, it takes a very, very, slight load off the server), etc...
I have seen a lot of people say that they need better hardware when they just need to optimize their software and/or increase the pipesize. I even ran into one person that refused to use a specific database because it was killing his site (he forgot to change the max connections...).
The answer to this question depends entirely on how heavy each request you serve is. If you are just throwing together some PHP code that pulls information from a few databases, possibly updating some others, on every hit, it could require quite a large number of machines to handle the load. If you are clever, effectively making the results static pages, it may take very few systems.
A good starting place is to just measure it by testing how long it takes to serve a page like what you are expecting to be publishing, and come up with an average of how long it takes to serve that request. Multiply this by the number of expected visitors and you get the number of machines you get a rough idea of the number of machines you need to handle the load. Very rough, but it's a starting place.
For example, the last time my site got slashdotted, users were hitting a page that is generated from a database. Through clever design, these pages get cached quite heavily, so only the first view requires 200-ish ms to generate. After that, unless the database is updated, the pages require more around 10ms to serve. Serving a static page through Apache requires around 8ms.
During the heaviest hit period, there was only around 4% CPU load on the server. Without the caching, this probably would have been more like 80%.
Nobody can tell you what the answer to this question will be for your situation. I can tell you that everyone plans for their new site to be as popular as slashdot, but would remind you that trying to come out of the chute able to handle the load of slashdot is probably a waste of time and money. Sure, if you have a few hundred thousand dollars to spend on hardware you can happily build up an infrastructure that will handle huge loads. However, if it takes a year or two for those loads to come, at that time you can buy the same computing horsepower for probably half what it costs today. In the mean time, why don't you spend the time you would have spent architectuing this massive database cluster and set of apache workers, instead providing content and marketing to your site?
It's easy to spend time on the geeky network and computing parts of a design, as a geek, but the marketing and content side is the one that's most likely to make it a slashdot.
Sean