Organizing Data Across a Heterogeneous Net?
angst_ridden_hipster asks: "Like many people, I have a bunch of machines I use regularly. These include Linux machines, BSD machines, a Mac OS X machine, and a Windows machine. These machines are on a number of networks. All have internet connectivity. Some of them are always powered on. A few of them are not. Obviously, I have a bunch of accounts. And, it goes without saying, I have a bunch of data. What are the best approaches to sharing data? I want to be able to securely access my home data while at work, and from one machine to another, etc. Opening ssh terminals is the approach I have traditionally used, but I'm beginning to wonder if some mirroring software (e.g., Unison) might be in order. It'd provide the function of backups, as well as guaranteeing availability. Would it be wiser to tunnel nfs over ssh? Or is there some better option?
Assuming I actually start mirroring data across multiple machines, I'll need to organize it in a portable taxonomy. This is almost easy, since I use cygwin on the Windows machines, so I can assume a standard Unix-ish directory structure. But this gets more complicated when there are scripts or other code involved. What about application/platform-specific data? How do other people organize their data, anyway? Are there any useful standards? I'm hoping people will describe their approaches, and why they think they're (not) the best."
I found this on google: Amoeba WWW Home Page
This seems to me to be a unique way of sharing data, since it isn't machine centric. Rather, it focuses on the user and the user's data. I have no experience with Amoeba, but on the face, it seems to answer this person's question.
My question is this: Why has interest for Amoeba dried up? (Or has it?) What with the proliferation of alternative OS'es over the past few years, why hasn't Amoeba caught on?
"kio_fish is a kioslave for KDE 2/3 that lets you view and manipulate your remote files using just a simple shell account and some standard unix commands on the remote machine. You get full filesystem access without setting up a server - no NFS, Samba,
It works through SSH, so everthing is encrypted.
I use this with the konqueror file browser, but all KDE apps can transparently access files on remote hosts using this amazing utility, which required no special setup on either end, at least on my systems.
Solved all my data sharing needs - and andromeda solved the rest :)
sig sig sputnik
I have twelve computers in my apartment and use all of them for something-or-other. Several are just test machines but even with those, I used to run into situations all the time where I saved something on one machine and forgot to do anything with it.
:)
My solution was to write a series of little scripts to copy data from common share points on each machine to a large, central data store, and into a "backed-up" directory on the workstations. Presently my central data store is 600GB of IDE disks in a RAID1 array (10 disks, total). If I lose the central fileserver, all my data, and the scripts needed to recreate the information in that 600GB is sitting out on my workstations
It's kind of a brute force approach, but it works OK. I'm not sure how well it would work for non-local systems, though.
I'm sure there are better ways to do what I do, too, but it's nice to have a single place to look for my MP3s or whatever, while knowing they're backed-up in multiple locations as well.
-- I wanna decide who lives and who dies - Crow T. Robot, MST3K
I might be off base here, but..
Why not use Gnutella or a similar P2P system? There are clients for basically any OS out there, the files don't have to reside in a central location.
It works for the internet - Why not your own 'mini-internet'?
One modification you would want make is to get it to make a listing of all that you have.
Could you use SSH tunneling with a system like that?
There's this great standard for sharing files over the internet called the World Wide Web. Perhaps you've heard of it?
Seriously -- run a webserver + WebDAV on each of your machines. Then you can read/write from anywhere, and with any platform.
Systems like YouServ/uServ provide a webserver, access control, and mirroring/replication support in a single package. This way as long as only some of your machines are online, the data from every machine remains accessible. Unfortunately the system is not available for general public use, but the system may be in open source soon.
Use the excellent rsync from Paul Makerras (of pppd fame) and Andrew Tridgell (samba team) in combination with OpenSSH and SSH for windows (both based on Tatu Ylonen's work; OpenSSH is maintained by and expert team including Markus Friedl and the recently monkey-cracked Dug Song, among others).
Set up your accounts to rsync-upload changes to whichever server is most secure when you log out, and use a cron job on that server to rsync-download to all the other servers nightly. You can make a tar backup part of the system also.
You will have to remember what's going on so you don't modify the same file differently on two different systems within 24 hours. If you want to overcome that shortcoming by making this work on an immediate sync basis rather than periodically, you'll need something like SGI's fam (included with recent linux distros) to trigger the updating processes.
You should already be 90% there if you have your ssh keys set up for passwordless login. Passwordless PKI logins are not significantly less secure than passworded logins in most situations (granted hostile system management can get you, but the BOFH can trojan your login anyway).
Lots of people use this technique to sync CVS trees over slow links. Rsync is very efficient for that kind of thing (large volume of files, low number of changed bytes).
This question (or ones like it) has come up many times. This isn't the first time something like this has been posted on Slashdot. I'm currently looking at doing something like this myself and I'm obviously not the only one. While that lays the ground for a good open source project (ie- a distro that is set up for something like this, or a project that easily combines several tools to do this kind of thing), what I think we really need is a good HOW-TO. Maybe there already is one or are several related HOW-TO about setting up this type of file access. There have already been a number of good suggestions posted here on Slashdot. We need to get these and others together and put into a HOWTO so that it's not a research project every time someone starts exploring this idea of distributed data and somehow consolidating the mess. (And no, I'm not volunteering yet since I haven't done this yet and currently don't have the resources. But if something doesn't happen in a while, maybe I will...). If you know of a HOWTO or other site that covers this info, you should post it somewhere here.
Who said Freedom was Fair?
In addition to the usual $HOME/Documents/(doctype|projectname) structure, you mention binaries; uou probably have a bundle of $HOME/bin and $HOME/lib files for each platform? In my case, I changed to $HOME/bin/$arch and $HOME/bin/share; replace $arch as appropriate, and set up your profile(s) to set PATH and LD_LIBRARY_PATH as needed. (Most all shells give you some idea, or you can resort to "uname=`uname -a`; if [ "$uname" = "..." ] then export arch="..."; else ... fi") For example, $HOME/bin/cygwin-win32; $HOME/bin/linux-i386; $HOME/bin/darwin-ppc; Perl, shell scripts in $HOME/bin/share and include that always.
For the Macs, there's also netatalk. Turn off exporting "hidden" files and samba is more polite with both Unix dotfiles and the plethora of netatalk Finder Info / Resource Fork files; but don't use samba/nfs to move a file without moving its resource fork as well! (a touch annoying, that) -- same applies for Finder Info files in OS X on BSD partitions. You can always mount a partition in hfs, then samba can move around files nicely (resource fork moves with data fork), but GNU+Unix clients get pishy about the permissions (or lack thereof); this only applies to files set up for "Classic" MacOS.
DAV sharing rocks for remote access, and Linux and OS X have it natively -- I think ? it's available for Win32. There are some DAV-like interfaces available for browsers. (Does OS X have DAV-over-SSL? I think so, but anyone confirm/deny?) Browser interface with an input type="file" uploader and the usual download set-up is pretty near a universal interface.
In extreme cases, you can share files over IMAP; pretty much limited to text, but I've used it more than once to dump shell scripts, useful recipes, and software-in-progress out onto a network. Limitations are in the client software, not the IMAP protocol; IMAP from maildirs is particularly cool, since you can just MIME-encode files and pump a "fake" header onto them with a pretty short shell script to push them in, no mbox-mangling required.
If you use cvs for this kind of thing, cron-jobs can make things a lot smoother when dealing with program settings and the like (I use it for stuff like my bookmarks). Just make the cron-job sync all your machines to a central repository at short intervals. That way you should be able to maintain consistent files in all machines, and if something goes wrong you can roll back to an earlier version.
Add a bit of clever scripting, and you might also handle whole dirs automagically (cvs works on individual files).
One word of caution: Be careful with binary files, and programs that restructure files, since thats not what cvs is made for (you can set files as binary though).