On Building High Volume Dynamic Web Sites
"Apart from this I have been talking to commercial vendors like BEA (I was very impressed) who provided application servers with load-balancing, replication, etc., starting at $20,000 (Australian) -- they run sites like Amazon.com, Qwest, Wells-Fargo etc.
There is an issue here (is there? I don't have any experience to really know hence am asking you) ... I can build a custom solution with load balancing written at the application level. But how does this affect my maintainability (for example Amazon.com moving from just books to all sorts of other stuff .. how long did it take to redesign the site etc.)?
The site I first built could potentially hold information about a million refugees, and allowed searching on most fields regarding information on a person (wildcard queries). Unfortunately, on doing some stress testing (with around 700,000 records) I found that at most 15 hits could be handled every ten seconds. I optimized the code, switched JDBC drivers to a faster driver, wrote a simple load balancer (and I mean very simple) and limited searching of fields to a few fields as well as preventing bad wildcard queries (e.g., a wildcard at the start would make little if any use of the index). Consequently, I managed to get the system to handle slightly more load (200 hits at 5 seconds) (Hardware was Dual Pentium II 450Mhz I think, 512MB RAM, 2x8G Ultra-wide SCSI hard drives, and running Linux of course). BTW, The Kosova refugees articles has a lot of misinformation, e.g. encrypted databases, and the time to actually build it was actually one week (and two weeks of overcoming red tape, etc.)."
Philip & Alex's Guide to Web Publishing and the Web Tools Review are some good sources of information on this topic. Both can be easily found at http://www.photo.net/. Philip Greenspun, who is the creator of photo.net and wrote the Guide to Web Publishing, also is the founder of ArsDigita. ArsDigita does web dev consulting and offers a free, open source toolkit for building robust, high-utilization sites. The previous poster directed you to a good info source, I'm not sure why they were rated down to 0...
- Ignore your application server vendor. They have to pass on some of the cost to Oracle, and they don't really manage Amazon.com with their product - but they probably do some small part of it so they can say that legally. I'm willing to bet that its the most unreliable part of Amazon.com.
- Use well known, well respected, and evolved tools. These include things like mod_perl, Apache, Oracle, java servlets are getting there (but you saw that they don't scale fantastically, and their JDBC drivers are much slower than Perl's equivalent), but they just aren't that fast yet on large projects. AOLServer also looks like a fairly nippy option, but you need to use tcl to program it AFAIK.
- Tune your database. This can't be stressed enough. It may take the rest of your life, but do it anyway. And if you can't do it, then hire a proffesional. These guys are expensive though - but you get what you pay for in this respect.
- Split up your hardware. A separate DB and Web server can increase your application's speed no end due to removing contention for resources.
- Cache! Cache whatever you can. If using something like mod_perl then stick the "Oops" proxy server in front of it to cache page accesses (there are good reasons why this speeds things up). Cache stuff in your server's ram. Cache stuff in shared memory.
- Be ready to spend. Running a fast, large hits web site is expensive. There's no ifs nor buts about this unless you don't mind downtime. PhilG of "Phillip and Alex's" fame estimates something like $100,000+++ a year to run a web site like this, taking into account Oracle costs, support, DBA costs (yes, you do need one), hardware and network costs.
And read "Philip and Alex..." - even if you only get the web version - somewhere off http://photo.net. He debunks the myths of application servers and reducing the costs and time of development of this sort of thing. And read "The Mythical Man Month" - that also debunks the idea of reducing the time to develop complex things.Good Luck!
Matt. Want XML + Apache + Stylesheets? Get AxKit.