Slashdot Mirror


Building a "Distributed" FTP Server?

austad asks: "At my company, we run a fairly large Web site. It's distributed on multiple servers in three geographic locations. In each of these locations, we have several Real Video servers which all serve the same content for redundancy, and load balancing. We have a central FTP server in one geographic location that files get uploaded to, and then replicated out to the Real Video servers. The problem with this model is that there is a single point of failure. We would like to put an identical FTP server in each location, and when a producer wants to upload a file, they are randomly directed to an active FTP server (we use a distributed DNS system that will direct users to machines that are marked as "up"), and they upload the file." (Continued in body...)

"The problem is keeping the other two FTP servers current. How does each FTP server know who has the most current file tree? What if multiple producers are uploading simultaneously, and each has been directed to a different FTP server? Keep in mind that when replicating, we need to delete files on the Realservers that are no longer in the file tree on the FTP server. "

2 of 13 comments (clear)

  1. You need some intelligence in the uploading by davew · · Score: 3

    The other comments about rsync et. al. are spot on for replicating (rsync is great), but they don't address the problem of authority; if a file exists on a particular server, is it new (so replicate) or old (so delete)?

    I think you need an upload procedure to get around that. Try this:

    • Restrict uploads to a particular "upload" directory, on every server.
    • Wait for a file to be uploaded to this dir.
    • Use a separate rsync to copy this file to the appropriate place on the nominal master server
    • Use your regular rsync to synchronise the mirrors with the master.

    There are a couple of issues with this, but you can get around them with a little added complexity in your uploading-to-master algorithm. If the master server goes down then it's true, you can't update; but the master doesn't need to be static, it just needs to be consistent. If uploads are that critical, you can use another protocol - say DNS? - to designate an arbitrary server as master.

    Dave

    --

  2. Well, it sounds to me... by Zaffle · · Score: 3
    like you need rysnc. From what little I know of it, it basically maintains a mirror of directories. I think its normaly used one way (as in, mirroring from a central server), but I can't see why you couldn't use it both ways. Run rsync in a cron job, say every 10 mins, and that should be fine. I would definatly take a close look at rsync if I were you.

    Taking a very quick look at the documentation myself, I see that you'd probably have a rsync server running on each site, and then have a cronjob run on each site that mirrors every other site. If all 3 sites do this, it should mirror pretty well. The lag time will probably be something like 2T, where T is the time between cronjob runs.

    In regards to your specific what-if questions, I think the best way to answer those will be to try it out yourself. :) Hope that helps

    ---

    --

    I use to have a funny sig, but slash cut it off, and I forgot what the punchline was.