Slashdot Mirror


Supporting Tens Of Thousands Of Users With Apache?

embo writes: "The company I work for has been approached recently by an academic organization looking for advice on providing web space for 30,000 - 40,000 users. They are limited by budget, so I'd like to recommend something with Linux and apache. They are thinking of offering around 50 MB if disk space per user (which at maximum utilization would be ~2 TB of data storage), and no database driven content (though they want to allow CGI through Perl and Python, for example)." This is a huge undertaking. Can anyone think of solutions better than the ones embo outlines below?

"I can only think of a couple of ways of doing this. One is to have an enormous single fileserver and have a cluster of apache web servers that NFS mount the home directories to serve up web pages. Then the users FTP into the main file server to store their web pages. To me, this seems wildly inefficient, and you have no real redundancy if the main fileserver crashes unless you are using a SAN (which is very expensive), or you have a hot backup that is rsync'ed or something. And I'm thinking that rsyncing up to 2TB of data would be an exercise in futility.

My other thought was to have several back end file servers with a fixed number of users on each server, and then send all HTTP requests through an LDAP server first, which would then do a redirect to the machine that user's web page resided on. The big problem then is how to make sure users are FTP'ing into the machine that their account is on? They may also use FrontPage extensions with Apache, and this could complicate things even worse.

I know there has got to be a better architecture for this. How do enormous sites like Yahoo and Excite tackle this problem? They have hundreds of thousands of users! Better yet, how could they tackle it with Open Source tools? Would, for example, a Turbo Linux cluster help this problem any, or would I still have to replicate the data across every node in the cluster (meaning I'd need up to 2TB of storage for each cluster node!) Then what happens if they decide they want to add another 10,000 users? I can't find pointers to information, or ideas on how to do this *anywhere*. Can you fellow Slashdotters give me any advice?"

7 of 33 comments (clear)

  1. Use FreeBSD without CGI allowed by Dr.+Sp0ng · · Score: 3

    First of all, I love Linux and I'm not a huge FreeBSD fan. However, FreeBSD with Apache is probably a better choice for such a large site - it's known to hold up with these types of heavy loads.

    Also, recommend to them to not allow CGI scripting - that would be a NIGHTMARE to support with 30,000 users. Not only would there be a huge amount of security holes, imagine the amount of server power that would take.

    Of course, if you have a large amount of money to spend, get an S/390 and give each user a virtual machine running Linux :-) Then security isn't a concern, since the most they can fuck up is their own stuff.
    --

    1. Re:Use FreeBSD without CGI allowed by barracg8 · · Score: 2
      • Not only would there be a huge amount of security holes, imagine the amount of server power that would take.
      Just because you give 30,000 people permission to run CGI scripts, does not mean that 30,000 people go out and learn perl :-)

      I doubt that you should worry about the processing overhead that CGI may add - it will hardly be used. Now the security issues, well that's another matter.

      cheers,
      G

    2. Re:Use FreeBSD without CGI allowed by barracg8 · · Score: 2

      Excellent, I'll check this out.

      Many thanks,
      G

  2. What is the expected usage? by cperciva · · Score: 2

    They say they have 40,000 users and want to provide 50MB per user.

    Ok, but how much are people *really* going to use? The university I am at has about 20,000 users, and provides each with 50MB of disk space (to be used for everything including webspace). In total about 100GB is used, so the average per person is only 5MB. Since this includes much more than just webspace, I'm guessing that you'd find that 200GB would be more than enough for your users.

    Other notes which might be of interest; my university runs apache on solaris, with the file system on a separate NFS-mounted box. The webserver (which is also FTP server and telnet server) is a four processor SUN box IIRC.

  3. Software Virtual Hosts by winterstorm · · Score: 2

    The most straight foward way is to avoid the use of fileserver entirely. The problem your having is your assuming your going to use URLs of the form http://www.sitename.edu/path-to-userdir/. If you instead give everyone their own subdomain this is quite easy.

    Setup 10 apache/linux-or-bsd servers with 3000 users each. Setup a single DNS server that manages the subdomain "users.sitename.edu". Then give each user a subdomain of "username.users.sitename.edu". Map each subdomain to the IP of the appropriate server. You can manually configure apache or you can use one of the dynamic boot-time configuration schemes.

    Of course this could be a problem if your client has a policy about DNS subdomain allocation. And 50MB * 3000 users per server is still big... you might want to buy a big big disk array and plug all the linux servers into it.

    For that matter a rack full of Sun Netra T1s and a fibre channel disk array should be cheap enough and supported by Sun to boot. Netra's are dream to manage.

  4. 50Mb is quite a bit of space by scotpurl · · Score: 2

    If it's HTML, graphics, and some animations, then 10 meg is still plentiful. But if you keep the quotas down, say, to a meg per person, it gets easier to do backups, it prevents the warez, MP3, and pr0n sites. Or at least limits them.

    If folks want more than that, they can pay extra, or go to a third-party hosting system.

  5. Many servers and Squid by Citrix · · Score: 2

    If you decide to go with more then one web server take a look at Squid. It can reverse proxy web request to make many servers look like one. You should be able to split the user names on alpha ranges.

    From the Squid FAQ

    The Squid redirector can make one accelerator act as a single front-end for multiple servers. If you need to move parts of your filesystem from one server to another, or if separately administered HTTP servers should logically appear under a single URL hierarchy, the accelerator makes the right thing happen.

    This doesn't quite solve your ftp problem. I did a quick search and didn't find anything that would direct a ftp to a different server based off of username. It shouldn't be hard to adapt a ftp proxy to do this for you, but I've never tried.

    It wouln't be hard to write a quick php/cgi help page that given the user name would provide the used with the correct server address. Or you could make a few dns entries like a.ftp.host, b.ftp.host, etc and if the users name was tom they would use t.ftp.host.

    Or you could ask Geocites for their user management software ;~)
    Leknor
    http://Leknor.com

    --
    Leknor
    http://Leknor.com
    "So many idiots, so few comets"