Slashdot Mirror


Building a Scaleable Apache Site?

bobm writes "I'm looking for feedback on any experience building a scaleable site. This would be a database driven site, not just a bunch of static pages. I've been looking for pointers to what other people have learned (either the easy way or hard way). I would like to keep it Apache based and am looking for feedback on the max # of children processes that you've been able to run, etc. Hardware-wise, I'm looking at using quad Xeons or even Sun E10K systems. I would like to stay non-clustered if possible."

3 of 60 comments (clear)

  1. Cache, Cache, Cache by Longstaff · · Score: 5, Insightful
    is *the* most important word around for dynamic sites.

    I've built a site that's able to handle 1-2 million dynamic page views per day. There's not a single static page on the whole site except for the 404 page.

    /. doesn't generate these pages on the fly, they're generated by a background process that runs every minute or so and stored as a file. There's no reason to requery the database if you don't have to.

    One trick that we currently use is a little daemon that runs on our app servers (custom java app). It's essentially a tcp socket interface to a hashtable with an expiration timestamp. Here's how the site works:

    1. request comes in
    2. front end server takes GET params and queries the local cache daemon to see if those objects are local
    3. if the objects are local - great - slap them together and deliver the page, otherwise
    4. query the database for the object info
    5. populate the cache daemon
    6. deliver the page
    Another trick we use is dumping the output from one dynamic page to be included by another. So, have a page that generates nothing but an element (eg. slashbox). Have a mechanism on the back end that requests that page and stores the result as a text file. The dynamic page (say, php or jsp) just uses an include directive pointing to the static text file - which can be formatted html.

    Of course, the real weak point of the system (without clustering) is the database. Make sure that your data is index properly and that your queries are optimised. We have 2 tables with over a million rows each that get hit all the time. Proper data layout, quick queries and the local caches help our puny dual P3-733 (NON xeon) with a paltry 1GB of RAM dish out well over a million dynamic pages per day.
  2. Re:Kegel's site by Longstaff · · Score: 4, Insightful

    mod parent up - great link!

  3. NFS vs. rsync? by LedZeplin · · Score: 2, Insightful

    Several of you have discussed using NFS for cluster webservers to access a shared web root. My current setup uses rsync to distribute the files to the cluster nodes and I'm wondering why NFS? It seems that the rsync method would be a lot more failure resistant. If my primary server goes down the cluster nodes can serve the site as it was at time of failure. With an NFS server you would need a high availibility failover other wise all the cluster nodes are SOL right? I'm curious what the plus side to NFS is, maybe I'm missing out on something.