Slashdot Mirror


Handling 1,000,000 Hits a Day?

Mr. Ed asks: "Hi! We run a very busy subscription-based service and we receive about 1,000,000 hits per day. My question is what is the best way to deal with this? DBM based access isn't very flexible and so I would like to switch to MySQL/mod_auth_mysql but MySQL doesn't seem to be able to deal with the load very well. What is everyone else doing? Am I going to have to go to a closed commercial solution or is there some open source software that can handle this sort of a beating? Thanks in advance!" We've talked about optimizing Apache/MySql in a production environment, but this sounds more of a configuration problem. Thoughts?

2 of 22 comments (clear)

  1. My experience by Matts · · Score: 5

    I worked for a while at an extremely large web site (in excess of 50m page views a day). Here's how they cope:

    Distributed servers. Their content is served from several different servers around the world. Static content is distributed with a simple script that copies static content to other servers. I think this is only really necessary when your hits reach the scale of Yahoo (although it wasn't Yahoo I was working at).

    shtml. Server side includes provided enough templating facilities to "get by" for most content.

    No fluff. Cookies and javascript were mostly banned. You had to get extra special permission to use either.

    Simple perl CGI. Although the content wasn't particularly dynamic, simple perl CGI's can go an awful long way, and often scale better than most people assume for simple scripts.

    The questioner's comment though related to scaling <B>authentication</B> to 1m hits/day. So let's deal with that.

    I'm pretty certain that mod_auth_mysql will provide enough for you. You don't need locking, transactions or any fancy facilities. So MySQL's raw speed will do you just fine. Handling dbm's for that many users or hits is just going to kill you.

    If that doesn't work, consider writing your own authentication handler, either in C, or pick up mod_perl.

    To all the other posters going on about how you need Zues or khttpd to serve that many hits - you obviously don't run a site taking that many hits. The benchmarks that show Zues faster than Apache show taking about 200 million hits a day. I don't know anybody mad enough to try and do that on a single server. The reality is that Apache provides the right level of stability and configuration options and speed to suit almost every site out there.

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  2. The obvious by drix · · Score: 5

    Well this might be painfully apparent, but you'd be amazed how many times I've seen people do stupid things like turn their front page into an .asp script merely to have today's date on it (cron! I tell them). Anyways, make everything static, as much as possible. Stat out what pages are receiving the most hits and endeavor to make them as static as possible. This is what most big sites do - Yahoo recieves well over 100 million hits daily and they sure as hell aren't dynamically creating anything besides search results. Any page that is not immediately created or modifyed based on a user request can be made static when used in combination with scripting.

    This being said, don't enslave yourself to Apache. There are lots faster ways to shooting static text over a socket, namely Zeus and khttpd, the kernel http server. It (khttpd) only serves static pages, but by placing it out of user space you get to bypass all the kernel gunk that accompanies a user-level proces. Needless to say, it's really, really fast (especially compared to Apache) and I have no doubt that even on a modest Pentium 1 you could crank out hundreds of thousands of pages a day using static HTML and the khttpd. Also, it can be used in combination with another webserver, so you don't have to sacrifice any dynamic fuctionality. Check Zeus if you need more features. Either way, they're faster than Apache. Other than that, all the obvious apply - put the SQL server on a separate box and make SQL queries as sparingly as possible. If your site is image-laden, consider putting those on a separate box. Or clone your boxes and employ load-balancing. Or just pray for Apache 2 to be released :) Lots of ways to skin the cat here.

    --

    --

    I think there is a world market for maybe five personal web logs.