Slashdot Mirror


Load Balancing Heavy Websites on Current Tech?

squared99 asks: "I have just delved into some research on a set up for very high traffic websites. I'm particularly interested in how many webservers would be needed at minimum and the type of technology powering them. Slashdot seemed like a good sample site to check out, so I went to Slashdot's technology FAQ to get a starting point. This setup seems to be from 2000, is most likely a bit out of date, and I'm assuming the same number of webservers would not be needed with current server technology. What would experts in the Slashdot community recommend as a required setup to handle Slashdot-like volumes, if they had to do it today using more current hardware? How many webservers could it be reduced to, while maintaining enough redundancy to keep serving pages, even under the heaviest of loads?"

15 of 63 comments (clear)

  1. One Word by cuntzilla · · Score: 2, Informative
  2. Prime Example: wikipedia by dyftm · · Score: 5, Informative
    1. Re:Prime Example: wikipedia by joebp · · Score: 2, Insightful

      72 servers and it still runs slower than any other website of its popularity.

    2. Re:Prime Example: wikipedia by FooAtWFU · · Score: 3, Insightful

      Well, few sites of that popularity are quite as 'read-write'. When you have people submitting edits to articles every second, things get a little trickier.

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
    3. Re:Prime Example: wikipedia by FooAtWFU · · Score: 2, Insightful
      Exactly. MediaWiki and the Wikimedia sites are put together with off-the-shelf components: Apache, PHP, MySQL, Squid, and a few caching systems for various data whose name escapes me at the moment.

      A complete-and-total system rewrite in something that's not PHP would do wonders for efficiency, but the development manpower is not there- it would take an enormous amount of effort to get it usable, let alone useful.

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
  3. Test, test, test... by PaulBu · · Score: 3, Funny

    ... and you've just missed your greatest opportunity for this by not providing a link to your website! ;-)

    Paul B.

  4. Google for "Virtualized" or "Utility Hosting" by matheny · · Score: 2, Interesting

    Many sites are moving towards utility based hosting or virtualized setups. The problem with high capacity sites is that you often end up having to purchase enough servers to deal with peak time, but don't need the servers during off hours. Utility based hosting services charge you for what you _use_ and allow you to scale as needed. Savvis (http://www.savvis.net/ I know offers a utility hosting platform based on Inkra, 3Par and blade servers. IBM has a similar setup.

  5. Take a look at livejournal's setup by Artega+VH · · Score: 2, Informative

    Akamai for static content and take a look at livejournal's setup for dynamic content (master-master replication based on mysql).

    Other people are much more qualified than I to answer the number of servers questions though.

    --
    groklaw, wired and slashdot. The holy trinity of work based time wasting.
    1. Re:Take a look at livejournal's setup by Cecil · · Score: 2, Insightful

      Please god, don't *ever* duplicate Livejournal's setup. It's a horrible, nasty hack and anyone who uses Livejournal will tell you that it doesn't work very well either. Although it's gotten better in the last year or so. But that's way, way, way more computing power than they should need to run that site. It's mostly a sign of a system that expanded without any real future-proof planning at all, which isn't really their fault, but if you have the opportunity to think it over and actually plan things, please do it better. You'll thank yourself later.

  6. Pound by slashflood · · Score: 3, Interesting

    Take Pound, a few web server machines, a database server and a NFS server (no Coda, AFS or GFS needed in most cases) and you should be set. This is a setup that I installed for a high traffic website and it is very stable.

  7. Your question cannot be answered by Guspaz · · Score: 3, Insightful

    It is impossible to answer your question unless you define "heavy" traffic.

    Some people might consider a hundred thousand pageviews per day to be heavy. Others might consider a million pageviews per day to be heavy.

    From experience a hundred thousand for a reasonable application can be handled on one server. A million would probably require 2 to 4.

    1. Re:Your question cannot be answered by dubl-u · · Score: 2, Insightful

      It is impossible to answer your question unless you define "heavy" traffic.

      Amen to that.

      Step one is to figure out what you mean by heavy traffic. Slashdot is probably at a couple million pageviews per day, and Alexa tells us that there are nearly 1500 sites bigger. A top-10 site will get circa 1000x what Slashdot gets.

      In step two, figure out what kind of traffic you're dealing with. Most of Slashdot's page views are probably just hits on the front page or current article by guests, so they can be heavily cached. I'd guess maybe 15% of Slashdot's page views are ones that need to be seriously dynamic. That's a bonus, as even a commodity server these days can give you quite a lot of static traffic. And it's important to think about what kind of static content you're serving. Slashdot's is mostly HTML, and you'll do things very differently for a media-heavy site like Flickr or Atom Films, and very differently again for something like Orbitz or Base Camp.

      Step three is to start asking yourself some serious questions about what kind of data you have, where it will live, how much it gets changed, what kind of transactional integrity you need to have, what kind of reliability you're wiling to pay for, and how it will get to the places that need to serve it up.

      Step four is to think broadly about the possible architectures. Yes, your average web site is basically an engine for turning HTTP requests into SQL queries, and turning SQL result sets into HTML. But there are many more ways of storing, managing, and rendering your data than that, many of which have radical performance implications. A great example is Google's architecture; if they'd tried to build it with a standard web approach, they'd be six or eight orders of magnitude poorer.

      Then in step five, build a cartoon version of your architecture and test it until it bleeds. Even better, build models of your top three architectures and see how they work. The only way you'll know if you can take massive load is to take massive load. Yes, this can be a pain to set up, but it's much, much less pain than you'll feel when a few hundred thousand people watch your site fail.

      And then for the last step, build your site incrementally, regularly testing performance as you go. Suppose it takes you six months to build it. If you save all your testing until the end, you've got six months of code to dig through to find the culprits, and six months during which you might have baked in an assumption that leaves you screwed. If you start out small and add to your test suite over time, you're much more likely to find problems when they're small and cheap to fix.

      And since this is Slashdot, I'll add step 7: Profit!

  8. Ultramonkey + LVS-Kiss + Mon by Plake · · Score: 2, Informative

    At my work we use Ultramonkey with LVS-kiss and Mon.

    Our hardware infrastructure includes 2 load-balancers running in a failover system with 3 web servers in the backend (1.8ghz, 512ram, 40gig hdd, 100mbps network) systems. That hosts over 60 million page views a month, it also supports real-time failover. For monitoring there are tools out there that use MRTG/RRD for cluster statistics.

  9. Obvious answer... by ebrandsberg · · Score: 2, Informative

    Check out http://www.netscaler.com/>. The companies behind the top 10 websites on the internet have, maybe you should too.

    Disclaimer, I work for Netscaler, but the customers we have gained should help in your decision.

  10. Re:Some more considerations by DrZaius · · Score: 3, Informative

    NFS sucks. Use something like CVS to keep your webroots in sync and have each server host it's own copy of the content.

    You get to get out a massive single point of failure (the NAS) and you get a little closer to linear scalability (adding another webserver doesn't put more load on your NFS box).

    --
    -- DrZaius - Minister of Sciences and Protector of the Faith