Sharing a Subset of Data Between 2 Sites?
"Some people spend 95% of their time in lab 2, so that is their 'home' server, but when they come to lab 1 for a week's stay or so, they scp/rsync their files to the lab 1 server, and at the end of the week push the changes back to lab 2. When people login to a workstation, they usually remain logged in for days at a time and xlock the screen. [If we can get this caching system working], it would mean that people moving between the labs would not need to copy files around since there would always be a 'local' copy.
The network between the labs is not fast enough for direct automounting of lab 1's server on the lab 2 workstations, especially since some files can be over 300Mb in size. We have a VPN (via freeswan) between the different labs, so all data transmitted is encrypted. Also, because lab 2 has 1/6 the capacity of lab 1's RAID it needs to be cached copies of in-use or probable in-use data only.
Crontab entries set for night copies are not useful because people often appear from both places on any given day.
The 3 servers currently run 2.4.18 with XFS so any solution should be compatible with XFS but at a real push we could consider changing the filesystem to another one."
If you have a very reliable connection you may want to go for AFS
In case the connection is not realiable (or not fast enough), you may want to try CODA which is a distributed filesystem which supports disconnected operations. Beware: AFS is a mature project, while CODA may still be a work-in-progress.
Don't over-engineer, keep it simple use CVS or rsync.
http://tinyurl.com/3t236
I'm not sure you'd find caching a subset of your file base to work very well. You might wish to consider instead installing some additional machines at the main location and allowing your researchers to log onto them remotely, using X or VNC if necessary. This should work much better than trying to maintain a local partial cache if you think you're going to experience many cache misses, especially since some of those files are so large.
http://www.cis.upenn.edu/~bcpierce/unison/
works very well and is designed for this kind of thing.
BTW - weekly backups!!!! daily surely?
Similar to afs and coda suggested before, but with local caching to allow much higher performance. Also works in disconnected mode.