Slashdot Mirror


Supporting Tens Of Thousands Of Users With Apache?

embo writes: "The company I work for has been approached recently by an academic organization looking for advice on providing web space for 30,000 - 40,000 users. They are limited by budget, so I'd like to recommend something with Linux and apache. They are thinking of offering around 50 MB if disk space per user (which at maximum utilization would be ~2 TB of data storage), and no database driven content (though they want to allow CGI through Perl and Python, for example)." This is a huge undertaking. Can anyone think of solutions better than the ones embo outlines below?

"I can only think of a couple of ways of doing this. One is to have an enormous single fileserver and have a cluster of apache web servers that NFS mount the home directories to serve up web pages. Then the users FTP into the main file server to store their web pages. To me, this seems wildly inefficient, and you have no real redundancy if the main fileserver crashes unless you are using a SAN (which is very expensive), or you have a hot backup that is rsync'ed or something. And I'm thinking that rsyncing up to 2TB of data would be an exercise in futility.

My other thought was to have several back end file servers with a fixed number of users on each server, and then send all HTTP requests through an LDAP server first, which would then do a redirect to the machine that user's web page resided on. The big problem then is how to make sure users are FTP'ing into the machine that their account is on? They may also use FrontPage extensions with Apache, and this could complicate things even worse.

I know there has got to be a better architecture for this. How do enormous sites like Yahoo and Excite tackle this problem? They have hundreds of thousands of users! Better yet, how could they tackle it with Open Source tools? Would, for example, a Turbo Linux cluster help this problem any, or would I still have to replicate the data across every node in the cluster (meaning I'd need up to 2TB of storage for each cluster node!) Then what happens if they decide they want to add another 10,000 users? I can't find pointers to information, or ideas on how to do this *anywhere*. Can you fellow Slashdotters give me any advice?"

0 of 33 comments (clear)

No comments match the current filter.