Load Balancing Heavy Websites on Current Tech?
squared99 asks: "I have just delved into some research on a set up for very high traffic websites. I'm particularly interested in how many webservers would be needed at minimum and the type of technology powering them. Slashdot seemed like a good sample site to check out, so I went to Slashdot's technology FAQ to get a starting point. This setup seems to be from 2000, is most likely a bit out of date, and I'm assuming the same number of webservers would not be needed with current server technology. What would experts in the Slashdot community recommend as a required setup to handle Slashdot-like volumes, if they had to do it today using more current hardware? How many webservers could it be reduced to, while maintaining enough redundancy to keep serving pages, even under the heaviest of loads?"
Akamai
http://meta.wikimedia.org/wiki/Wikimedia_servers This is how they do it.
Akamai for static content and take a look at livejournal's setup for dynamic content (master-master replication based on mysql).
Other people are much more qualified than I to answer the number of servers questions though.
groklaw, wired and slashdot. The holy trinity of work based time wasting.
RTF Server Load Balancing by Tony Bourke. After reading that book you will at least know what you need to look for. Also, you can outsource your load balancing if that is optimal for your needs using something like the Akamai's servers (Microsoft.com uses Akamai, Netcraft confirms).
At my work we use Ultramonkey with LVS-kiss and Mon.
Our hardware infrastructure includes 2 load-balancers running in a failover system with 3 web servers in the backend (1.8ghz, 512ram, 40gig hdd, 100mbps network) systems. That hosts over 60 million page views a month, it also supports real-time failover. For monitoring there are tools out there that use MRTG/RRD for cluster statistics.
Check out Mon and Mon.cgi
Check out http://www.netscaler.com/>. The companies behind the top 10 websites on the internet have, maybe you should too.
Disclaimer, I work for Netscaler, but the customers we have gained should help in your decision.
NFS sucks. Use something like CVS to keep your webroots in sync and have each server host it's own copy of the content.
You get to get out a massive single point of failure (the NAS) and you get a little closer to linear scalability (adding another webserver doesn't put more load on your NFS box).
-- DrZaius - Minister of Sciences and Protector of the Faith