Slashdot Mirror


Scaling Server Setup for Sharp Traffic Growth?

Ronin asks: "We are a young startup developing a yet another collaborative platform for academic users. Our platform (a) requires users to log-on to the website for extended period of time, and (b) is content intensive - stuff like courses, quizzes and assignments gets posted regularly. We're using a LAMP setup on a 1 GB P4 server. Our user base is small (about 1,200 users, 5-7% connected at any given time) but we expect it to grow rapidly. We expect sharp traffic growth, and are working to scale our server software & hardware setup linearly. What kind of server setup plan should we go for keeping in mind our content heavy application and that we may have to scale up rapidly. Can anyone share his/her experience with LAMP in dealing with scalability of high-traffic sites? Taking clues from the Wikimedia servers, we understand that the final configuration involves proxy caching for content, database masters/slave servers and NFS servers. We of course don't have such a high traffic, but it will be interesting to note what kind of server config you'd go for."

4 of 19 comments (clear)

  1. Well, for the early growth by SmallFurryCreature · · Score: 4, Informative
    When your current server is growing to small but you are not yet ready or willing to go to the big boys and simply buy a solution.

    The easiest way is to split your server. A typical website has 4 pieces wich can be easily seperated. Scripts (if you use lamp perl or php), images/static content, the database and finally logging.

    If you site is "normal" you can get some very good results by splitting these tasks up across different servers. Each task really asks for a different hardware setup.

    Scripts tend to be small but require a fair chunk of crunching power. There is little point in scsi as the scripts could just remain in memory without ever needing to swap. Depending on your scripts you don't need gigs of memory either. What you could really use is a multi-core machine. server side scripting practically begs for multicore. Why process one page request when you can do 2 or 4 etc. It may be me but I had better results with dual P3 then single P4.

    Images are almost the opposite. Depending of course on your site they could easily come to several gigs and worse constantly change. IF you cannot fit your content in memory you better have a fast hd. SCSI still is the best for this. CPU power on the other hand is less important. What you may want to look into is that your hardware is optomized. I believe that Linux has some support for more direct throughput (reducing the amount of times the image is shuffled around before going out of your network card). Raw CPU is less needed. Here I also got some really good experiences in preffering multi core over raw gigahertz power.

    Database is in a class of its own. With certain databases there isn't even a benefit to having multicore it seems perhaps due to whole table locking. The main advantage you can get by seperating it from the rest is that it means your apache server can concentrate on one task. Also removing the outside connection on your database is a nice bit of extra security. Database server really depends all on the way your site is setup. For a typical page request I usually asume the following, 1 script request, a dozen image requests, 3 database queries. (verification, retrieval, update)

    Logging is often overlooked but it takes up a serious amount of resources. Not logging is an option of sorts but I don't like it. Switching it to a machine dedicated to the task can seriously speed up your other servers AND provide a level of extra security. A logging server can be very lightweight and just needs a decent HD setup.

    Anyway that is the amateurs way to save a website creaking at the seams when their is no money to get a pro solution. It is a hazzle as you now got four machines to admin but it is easy to setup and usually does not require a major redesign.

    Load balancing and stuff sounds nice but most customers get such odd reactions when they here the prices charged.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:Well, for the early growth by sootman · · Score: 4, Informative

      These are all very good tips. There are also several things you can do with just one box:

      - PHP has lots of caching options available and other things that can boost performance. Learn them. One good overview is in the powerpoint slideshow here. Just like you can't put a heavy building on a weak foundation, it's very hard to speed up an app that's badly written in the first place.

      - SQL can be badly misused. Make sure that your page uses as few queries as possible and that those queries are as good as possible. Don't use PHP for things that SQL does very well--joins, filtering, etc. Your goal should be for every database query to return as much information as you need to build the page and not an ounce more.

      - you can take a half-step towards multiple boxes by running multiple servers on one box. Apache is great but it's overkill for static work like serving images--look at tux, boa, lighttpd, thttpd, etc. for those duties. For example, serve the app from www.example.com on Apache and the images from images.example.com via Boa. Or, have Apache on :80 and serve images via Boa on :8080.

      - the last thing to do before splitting up to multiple servers is to get one better box. from the box you describe, you might realize a 200-300% improvement with a fast dual-CPU box with 2-4 GB RAM and either a) RAID or b) different disks for different tasks--logs (writes) on one, images (reads) on another, etc.

      - be scientific. measure, make one change, and measure again.

      - many things can be quickly tested before being fully implemented. turn off logging and see if performance improves. if it doesn't, then there's no reason to go through the trouble of making /var/log/ and NFS mounted share. visit the site using a browser with images turned off to see how much faster it is when images aren't being asked for.

      - on a related note, determine where the bottlenecks are before optimizing. There's no reason to split image-serving duties if the only image you have is your logo and a couple nav buttons.

      - if possible, when you're done, do a writeup and submit it to slashdot. I always say "the best way to be successful is to find someone who has done what you want to do and copy them" and your experiences might help the next person who's in the same boat you're in now.

      - talk to people who have experience building fast servers. there's lots of stuff to know. for just one example, I've often heard that PIIIs and PIII Xeons are better than P4s for almost all server duties. there are religious wars in server land as well--SCSI vs. ATA, etc.--but talk to a few people and patterns will emerge.

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
  2. Start with a scalable pipe by aminorex · · Score: 3, Insightful

    Cheap scalability means load balancing over commodity components, which you can add quickly to a set for linear scaling. The first challenge is where the client traffic comes in the door. If you can't them in, you can't serve them. When you add commodity components, you reduce MTF, so your configuration needs to
    do dynamic-failover and rebalancing.

    The best way I know to scale your front door is to start with two netfilter firewalls sharing a MAC, and getting load balance by MAC layer filtering rules. It's pretty easy to plug in additional firewall transit capacity and to script-in failover using a heartbeat daemon. You can do firewalls in failover pairs more quickly and easily than you can do odd-numbered rings, but both are quite doable by relatively straightforward scripting and configuration.

    I strongly recommend against breaking your traffic into categories, like static pages, etc., and balancing load by moving different categories to different servers. If you do that, you end up with way too much hardware underloaded, and way to much hardware overloaded, and either no failover provisioning, or else a very complex failover configuration. Instead, make the individual servers identical, and cheap. Just add more clones to the pack as needed, and keep the traffic balanced.

    By this time you're starting to see my basic approach to scalable commodity 'nix clusters. See this lame ASCII art for detail. It amounts to a series of independently scalable layers,
    Firewalling, app serving, db caching, db serving.

    The memcached layer is indicated if you have a lot of read-only db traffic.
    These nodes are cheap, don't even really need hard drives. You could boot them
    off of CD or off the network, diskless. They hold as much RAM as possible.
    The number of MC servers required depends strongly on how much RAM each can hold
    but the amount of RAM required per DB node depends on the characteristics of your
    application DB traffic.

    I'd rather install a memcached server and keep a hot DB spare than try to maintain
    transparent failover on a DB cluster. Coherence requirements complicate the performance curves when you have multiple DBs accepting write operations, which can lead to unpleasant surprises. Delay scaling your DB cluster as long as you can.

    --
    -I like my women like I like my tea: green-
  3. Don't forget about C-JDBC. by aminorex · · Score: 4, Interesting

    I should mention that if you didn't code-in memcached, you probably don't want to retrofit it, just for performance tuning or capacity scaling. In that case, I should suggest C-JDBC. You don't need to use a Java AS node in order to use C-JBDC, either.

    I haven't made a production deployment of C-JDBC, so I defer to the experience of others, but from my research, it looks like a hot ticket for scaling DB performance while simultaneously isolating you from the specificities of a given DB product.

    --
    -I like my women like I like my tea: green-