Scaling Server Setup for Sharp Traffic Growth?
Ronin asks: "We are a young startup developing a yet another collaborative platform for academic users. Our platform (a) requires users to log-on to the website for extended period of time, and (b) is content intensive - stuff like courses, quizzes and assignments gets posted regularly. We're using a LAMP setup on a 1 GB P4 server. Our user base is small (about 1,200 users, 5-7% connected at any given time) but we expect it to grow rapidly. We expect sharp traffic growth, and are working to scale our server software & hardware setup linearly. What kind of server setup plan should we go for keeping in mind our content heavy application and that we may have to scale up rapidly. Can anyone share his/her experience with LAMP in dealing with scalability of high-traffic sites? Taking clues from the Wikimedia servers, we understand that the final configuration involves proxy caching for content, database masters/slave servers and NFS servers. We of course don't have such a high traffic, but it will be interesting to note what kind of server config you'd go for."
The easiest way is to split your server. A typical website has 4 pieces wich can be easily seperated. Scripts (if you use lamp perl or php), images/static content, the database and finally logging.
If you site is "normal" you can get some very good results by splitting these tasks up across different servers. Each task really asks for a different hardware setup.
Scripts tend to be small but require a fair chunk of crunching power. There is little point in scsi as the scripts could just remain in memory without ever needing to swap. Depending on your scripts you don't need gigs of memory either. What you could really use is a multi-core machine. server side scripting practically begs for multicore. Why process one page request when you can do 2 or 4 etc. It may be me but I had better results with dual P3 then single P4.
Images are almost the opposite. Depending of course on your site they could easily come to several gigs and worse constantly change. IF you cannot fit your content in memory you better have a fast hd. SCSI still is the best for this. CPU power on the other hand is less important. What you may want to look into is that your hardware is optomized. I believe that Linux has some support for more direct throughput (reducing the amount of times the image is shuffled around before going out of your network card). Raw CPU is less needed. Here I also got some really good experiences in preffering multi core over raw gigahertz power.
Database is in a class of its own. With certain databases there isn't even a benefit to having multicore it seems perhaps due to whole table locking. The main advantage you can get by seperating it from the rest is that it means your apache server can concentrate on one task. Also removing the outside connection on your database is a nice bit of extra security. Database server really depends all on the way your site is setup. For a typical page request I usually asume the following, 1 script request, a dozen image requests, 3 database queries. (verification, retrieval, update)
Logging is often overlooked but it takes up a serious amount of resources. Not logging is an option of sorts but I don't like it. Switching it to a machine dedicated to the task can seriously speed up your other servers AND provide a level of extra security. A logging server can be very lightweight and just needs a decent HD setup.
Anyway that is the amateurs way to save a website creaking at the seams when their is no money to get a pro solution. It is a hazzle as you now got four machines to admin but it is easy to setup and usually does not require a major redesign.
Load balancing and stuff sounds nice but most customers get such odd reactions when they here the prices charged.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
I should mention that if you didn't code-in memcached, you probably don't want to retrofit it, just for performance tuning or capacity scaling. In that case, I should suggest C-JDBC. You don't need to use a Java AS node in order to use C-JBDC, either.
I haven't made a production deployment of C-JDBC, so I defer to the experience of others, but from my research, it looks like a hot ticket for scaling DB performance while simultaneously isolating you from the specificities of a given DB product.
-I like my women like I like my tea: green-