Building a "Distributed" FTP Server?
austad asks: "At my company, we run a fairly large Web site. It's distributed on multiple servers in three geographic locations. In each of these locations, we have several Real Video servers which all serve the same content for redundancy, and load balancing. We have a central FTP server in one geographic location that files get uploaded to, and then replicated out to the Real Video servers. The problem with this model is that there is a single point of failure. We would like to put an identical FTP server in each location, and when a producer wants to upload a file, they are randomly directed to an active FTP server (we use a distributed DNS system that will direct users to machines that are marked as "up"), and they upload the
file." (Continued in body...)
"The problem is keeping the other two FTP servers current. How does each FTP server know who has the most current file tree? What if multiple producers are uploading simultaneously, and each has been directed to a different FTP server? Keep in mind that when replicating, we need to delete files on the Realservers that are no longer in the file tree on the FTP server. "
how about turbolinux clustering, they are open sourceing it pretty soon, I dunno may be worth a show though http://www.turbolinux.com/product/cluster.html
Whereas rsync i think, allows multiple machines to stay highly concurrent when presented new content at runtime.
or i might be wrong :)
--
matthewg {matthewg@zevils.com} (Matthew Sachs), not at home
The other comments about rsync et. al. are spot on for replicating (rsync is great), but they don't address the problem of authority; if a file exists on a particular server, is it new (so replicate) or old (so delete)?
I think you need an upload procedure to get around that. Try this:
There are a couple of issues with this, but you can get around them with a little added complexity in your uploading-to-master algorithm. If the master server goes down then it's true, you can't update; but the master doesn't need to be static, it just needs to be consistent. If uploads are that critical, you can use another protocol - say DNS? - to designate an arbitrary server as master.
Dave
--
I love rsync. I have a client who cannot handle rotating backup tapes (I know... I know) so I took their tape drive from them and I rsync their fileserver to mine and backup from my local machine once a day. And you dont HAVE to have a rsync server running on each end, just the executable. It can launch itself over rsh (ewww) or ssh (woohoo!) Depending on the size of the ftp server you will have some lag before all the sites update.
Taking a very quick look at the documentation myself, I see that you'd probably have a rsync server running on each site, and then have a cronjob run on each site that mirrors every other site. If all 3 sites do this, it should mirror pretty well. The lag time will probably be something like 2T, where T is the time between cronjob runs.
In regards to your specific what-if questions, I think the best way to answer those will be to try it out yourself. :) Hope that helps
---
I use to have a funny sig, but slash cut it off, and I forgot what the punchline was.
How about some Cisco DistributedDirector love along with Veritas clustering/mirroring solutions. If you're running Solaris, you can't go wrong with this combination. If you're running Linux or something else, then s/Veritas/CODA/ or something.
RedHat provides piranha which can load balance FTP sessions..what he is looking for is really coda with piranha...or commercial AFS with piranha if you have 15K to blow.
There you can build identical "points of failiure", so that if one falls out, the other one takes over. Or something like that. Good luck! /pyder.....
_
/
\_\ sig under construction
Your own FTP server would do nicely. Log all incoming files in a special place and then set up a cron job that mirrors these files to the other servers (you'd have to use a special user whose transfers were not logged in the same way so you wouldn't be mirroring hundreds of times). Similarly, a delete request would pass to the other servers.
There is quite likely an FTP server available that is flexible enough with its logging to do this. The capability would not have to be in the FTP server; it could be a script that searched the server's log files. However, implementing it on the server side allows you to ensure that the mirroring is accurate and keep any parsing scripts from worrying about parsing date/times (unless you have a server that logs in Unix ticks; in that case you would just store the tick when the script last mirrored it, and only be concerned about the transfers after that date).
I would suggest running an rsync every so often just to make sure.
The key here would be to ensure that everything you are doing is accurate. This is a "high-profile" environment. You might want to consider something other than FTP, e.g., HTTP POSTs. (Yes, there are problems with using this method to upload large files. (No progress indication.) However, considering most users will be on a fast network, this should not be too much of a problem. A Java(Script) applet that broke the upload into managable chunks and displayed the progress to the user might cut it.) An HTTP POST would let you keep track of other information along with the file, such as specific user comments.
Kenneth