Sharing a Subset of Data Between 2 Sites?
"Some people spend 95% of their time in lab 2, so that is their 'home' server, but when they come to lab 1 for a week's stay or so, they scp/rsync their files to the lab 1 server, and at the end of the week push the changes back to lab 2. When people login to a workstation, they usually remain logged in for days at a time and xlock the screen. [If we can get this caching system working], it would mean that people moving between the labs would not need to copy files around since there would always be a 'local' copy.
The network between the labs is not fast enough for direct automounting of lab 1's server on the lab 2 workstations, especially since some files can be over 300Mb in size. We have a VPN (via freeswan) between the different labs, so all data transmitted is encrypted. Also, because lab 2 has 1/6 the capacity of lab 1's RAID it needs to be cached copies of in-use or probable in-use data only.
Crontab entries set for night copies are not useful because people often appear from both places on any given day.
The 3 servers currently run 2.4.18 with XFS so any solution should be compatible with XFS but at a real push we could consider changing the filesystem to another one."
If you have a very reliable connection you may want to go for AFS
In case the connection is not realiable (or not fast enough), you may want to try CODA which is a distributed filesystem which supports disconnected operations. Beware: AFS is a mature project, while CODA may still be a work-in-progress.
Don't over-engineer, keep it simple use CVS or rsync.
http://tinyurl.com/3t236
http://www.cis.upenn.edu/~bcpierce/unison/
works very well and is designed for this kind of thing.
BTW - weekly backups!!!! daily surely?
Similar to afs and coda suggested before, but with local caching to allow much higher performance. Also works in disconnected mode.