Slashdot Mirror


Load Balancing Heavy Websites on Current Tech?

squared99 asks: "I have just delved into some research on a set up for very high traffic websites. I'm particularly interested in how many webservers would be needed at minimum and the type of technology powering them. Slashdot seemed like a good sample site to check out, so I went to Slashdot's technology FAQ to get a starting point. This setup seems to be from 2000, is most likely a bit out of date, and I'm assuming the same number of webservers would not be needed with current server technology. What would experts in the Slashdot community recommend as a required setup to handle Slashdot-like volumes, if they had to do it today using more current hardware? How many webservers could it be reduced to, while maintaining enough redundancy to keep serving pages, even under the heaviest of loads?"

7 of 63 comments (clear)

  1. Re:Prime Example: wikipedia by Anonymous Coward · · Score: 1, Insightful

    Since I can't reach the wikipedia server around 2 out of three times I wouldn't call this a successfull example

  2. Re:Take a look at livejournal's setup by Cecil · · Score: 2, Insightful

    Please god, don't *ever* duplicate Livejournal's setup. It's a horrible, nasty hack and anyone who uses Livejournal will tell you that it doesn't work very well either. Although it's gotten better in the last year or so. But that's way, way, way more computing power than they should need to run that site. It's mostly a sign of a system that expanded without any real future-proof planning at all, which isn't really their fault, but if you have the opportunity to think it over and actually plan things, please do it better. You'll thank yourself later.

  3. Re:Prime Example: wikipedia by joebp · · Score: 2, Insightful

    72 servers and it still runs slower than any other website of its popularity.

  4. Your question cannot be answered by Guspaz · · Score: 3, Insightful

    It is impossible to answer your question unless you define "heavy" traffic.

    Some people might consider a hundred thousand pageviews per day to be heavy. Others might consider a million pageviews per day to be heavy.

    From experience a hundred thousand for a reasonable application can be handled on one server. A million would probably require 2 to 4.

    1. Re:Your question cannot be answered by dubl-u · · Score: 2, Insightful

      It is impossible to answer your question unless you define "heavy" traffic.

      Amen to that.

      Step one is to figure out what you mean by heavy traffic. Slashdot is probably at a couple million pageviews per day, and Alexa tells us that there are nearly 1500 sites bigger. A top-10 site will get circa 1000x what Slashdot gets.

      In step two, figure out what kind of traffic you're dealing with. Most of Slashdot's page views are probably just hits on the front page or current article by guests, so they can be heavily cached. I'd guess maybe 15% of Slashdot's page views are ones that need to be seriously dynamic. That's a bonus, as even a commodity server these days can give you quite a lot of static traffic. And it's important to think about what kind of static content you're serving. Slashdot's is mostly HTML, and you'll do things very differently for a media-heavy site like Flickr or Atom Films, and very differently again for something like Orbitz or Base Camp.

      Step three is to start asking yourself some serious questions about what kind of data you have, where it will live, how much it gets changed, what kind of transactional integrity you need to have, what kind of reliability you're wiling to pay for, and how it will get to the places that need to serve it up.

      Step four is to think broadly about the possible architectures. Yes, your average web site is basically an engine for turning HTTP requests into SQL queries, and turning SQL result sets into HTML. But there are many more ways of storing, managing, and rendering your data than that, many of which have radical performance implications. A great example is Google's architecture; if they'd tried to build it with a standard web approach, they'd be six or eight orders of magnitude poorer.

      Then in step five, build a cartoon version of your architecture and test it until it bleeds. Even better, build models of your top three architectures and see how they work. The only way you'll know if you can take massive load is to take massive load. Yes, this can be a pain to set up, but it's much, much less pain than you'll feel when a few hundred thousand people watch your site fail.

      And then for the last step, build your site incrementally, regularly testing performance as you go. Suppose it takes you six months to build it. If you save all your testing until the end, you've got six months of code to dig through to find the culprits, and six months during which you might have baked in an assumption that leaves you screwed. If you start out small and add to your test suite over time, you're much more likely to find problems when they're small and cheap to fix.

      And since this is Slashdot, I'll add step 7: Profit!

  5. Re:Prime Example: wikipedia by FooAtWFU · · Score: 3, Insightful

    Well, few sites of that popularity are quite as 'read-write'. When you have people submitting edits to articles every second, things get a little trickier.

    --
    The World Wide Web is dying. Soon, we shall have only the Internet.
  6. Re:Prime Example: wikipedia by FooAtWFU · · Score: 2, Insightful
    Exactly. MediaWiki and the Wikimedia sites are put together with off-the-shelf components: Apache, PHP, MySQL, Squid, and a few caching systems for various data whose name escapes me at the moment.

    A complete-and-total system rewrite in something that's not PHP would do wonders for efficiency, but the development manpower is not there- it would take an enormous amount of effort to get it usable, let alone useful.

    --
    The World Wide Web is dying. Soon, we shall have only the Internet.