Slashdot Mirror


Sharing a Subset of Data Between 2 Sites?

eldrich asks: "We have two labs: a main lab (lab 1) has 1.2Tb of on-line data storage -- two machines with 600Gb RAID-5s hung off of them. These happily service about 30 Linux machines via NFS over fast ethernet. There are 5-6 WinXP machines that connect via SMB and Samba. The lab is on a private network with a single firewall between it and the world, and we use LDAP for practically everything (hostname, usernames, password, autofs, etc). The students' lab (lab 2) is 40 miles away, with 8 workstations and 2 WinXP machines. This lab also has a small RAID-5 Linux server with 180GB space which serves via NFS and Samba. Sometimes we have people from lab 2 at lab 1 and while they are at the main lab, they need their files. What I want to do is make lab 2's 180GB RAID a subset cache of the 1.2Tb one in lab 1. This puts everyone's main storage at lab 1 (which is backed up weekly) but a local copy can be cached on the lab 2 raid system. This gives the students a local copy for fast access, but all the safety of the backups made from our system. Does anyone know of a filesystem or programs that can help with this?"

"Some people spend 95% of their time in lab 2, so that is their 'home' server, but when they come to lab 1 for a week's stay or so, they scp/rsync their files to the lab 1 server, and at the end of the week push the changes back to lab 2. When people login to a workstation, they usually remain logged in for days at a time and xlock the screen. [If we can get this caching system working], it would mean that people moving between the labs would not need to copy files around since there would always be a 'local' copy.

The network between the labs is not fast enough for direct automounting of lab 1's server on the lab 2 workstations, especially since some files can be over 300Mb in size. We have a VPN (via freeswan) between the different labs, so all data transmitted is encrypted. Also, because lab 2 has 1/6 the capacity of lab 1's RAID it needs to be cached copies of in-use or probable in-use data only.

Crontab entries set for night copies are not useful because people often appear from both places on any given day.

The 3 servers currently run 2.4.18 with XFS so any solution should be compatible with XFS but at a real push we could consider changing the filesystem to another one."

1 of 23 comments (clear)

  1. Me too! by G4from128k · · Score: 0, Redundant

    I too would like such a capability. We don't have terabytes of data, but my wife and I find it frustrating to co-create documents and manage who has which version on which machine while ensuring the portablity of my wife's laptop and providing the speed of accessing files locally. Ideally, we would like all of our 12,000 shared files to be in at least two or three places at once (cached on my machine, cached on her laptop, and stored on a central file server).

    I'm envisioning some type of write-through file caching and distributed access control system that maintains near real-time synchronization between a local copy of a directory and an ostensibly identical copy of that directory on a remote server and any other machines that "share" that directory. I suspect that a relatively soft access control system would be OK in the sense that you could open your local copy of the file and propagate a lock afterward. Also, in the event of a network disconnect (e.g., using the laptop is on the airplane), the local system would journal any changes to the cached/shared file set and transmit/reconcile those changes when the network was reconnected.

    BTW, being one of those silly Mac users, I want a system that is totally transparent without extra steps (like a CVS check-out/check-in process), nasty batch processes, etc. When I open a file or close a file, I expect the system to appropriately handle the ugly details of caching, propagating changes to other machines, alerting me that the file is in use by someone else, etc.

    --
    Two wrongs don't make a right, but three lefts do.