Organizing Data Across a Heterogeneous Net?
angst_ridden_hipster asks: "Like many people, I have a bunch of machines I use regularly. These include Linux machines, BSD machines, a Mac OS X machine, and a Windows machine. These machines are on a number of networks. All have internet connectivity. Some of them are always powered on. A few of them are not. Obviously, I have a bunch of accounts. And, it goes without saying, I have a bunch of data. What are the best approaches to sharing data? I want to be able to securely access my home data while at work, and from one machine to another, etc. Opening ssh terminals is the approach I have traditionally used, but I'm beginning to wonder if some mirroring software (e.g., Unison) might be in order. It'd provide the function of backups, as well as guaranteeing availability. Would it be wiser to tunnel nfs over ssh? Or is there some better option?
Assuming I actually start mirroring data across multiple machines, I'll need to organize it in a portable taxonomy. This is almost easy, since I use cygwin on the Windows machines, so I can assume a standard Unix-ish directory structure. But this gets more complicated when there are scripts or other code involved. What about application/platform-specific data? How do other people organize their data, anyway? Are there any useful standards? I'm hoping people will describe their approaches, and why they think they're (not) the best."
I've been thinking of tackling this problem for awhile too. The best I can do is that you abstract the 'directory' (the list of what you have), for replication, accessibility (with convenience as the priority, especially). Then, when you need to do something with that data, your directory knows where it is and how to get at it. In this case, the convenience of accessibility isn't as crucial, and thus the need to transparently glue all these platforms and protocols, etc together isn't quite as important.
For me, I'd just like a top down, real time view with convenient access of what I have - getting it anywhere and anytime isn't quite as crucial for me.
Maybe you make a little daemon that can monitor your data respositories at several sources and 'merge' the data listings at a central source for publishing to multiple sources again?
"Old man yells at systemd"
What you need is something known as a "server." A server is where you can store all your files, and in some cases, account information.
With the right kind of server, it can do AppleShare, NFS, and SMB, allowing all your other machines to mount the shares and make them appear as local drives. This keeps all your data in one place, allowing for easy backups, and also makes it easy to get at the same files from any computer.
My personal preference is a Linux computer with several cheap IDE drives each on their own IDE controller (no slave drives). The drives are configured as software RAID 5 and ext3. Regular backups are setup through cron to a tape drive. Samba handles file sharing, printing, roaming profile, and PDC duties for Windoze. Netatalk 1.6cvs handles file sharing duties for pre-OSX systems. NFS is used for file sharing to *nix systems. The only thing I'm missing is a NetInfo daemon for Linux so it can act as a complete configuration server for NeXTSTEP, OPENSTEP, and MacOS X systems.
first of all, seperate your home life and work life. Then seperate the data. I understand that once in a while you need data from one place at the other, but avoid those situations.
At work: that is IS's problem. Store all work data on the work machines, and make IS do the backups. Use SSH, or other VPN when you want to work from home. Compile (or whatever) at work as much as possible. If you have data that you need on the road, get a laptop or PDA for work, and synchronize that when you are at work.
At home: set up a linux box (a 386 is enough, though you might want more) with a big disk, a UPS, and a network card. Put it in a closet or on a shelf. Install SAMBA, and Netatalk. with NFS built in (though there is better than NFS if you look, nfs is there) Use one loging for all machines.
Laptops are a problem, because you often want to use them where you can't get to the network. The first solution to that problem is 820.11. Use it at home, and look for open access on the road. With good VPN (ssh+nfs) you can get to your network server from many places. I manually synchronize only the files I need, but my laptop is rarely used outside of 802.11 areas, if you travel often, then you might need more. (CODA? AFS? )
Unfortunately, much of the data I have is not sufficiently structured for an RDBMs. To be more specific, I have about 5 GB of digital photographs / scanned negatives, 1 GB of email archives, 1 GB of various and sundry text files, 100 MB of assorted MS Office-type documents, 100 MB of source code (only about half of which is in CVS), 500 MB of web site source material (Photoshop files, HTML, etc).
So I figure that the filesystem is the best database for this kind of information. But I could well be wrong!
Eloi, Eloi, lema sabachtani?
www.fogbound.net
Now this is not totally fair, since it implies a pointy haired boss situation. All it really means is that that you would would have to have a better definition of the problem.
What it seems that you really need is an application, a database, that would constantly monitor in realtime the status and availablility of your various resources. This would tie into your other dataservices so that when you do a query on "XP sourcecode", or whatever, one of the result you get is from this resource monitor database saying that "the resource is offline" or "the data is available, but you don't have access rights", etc. depending of the resource status, and other realtime situations.
It occurs to me that clever design of the database may be able to do the resource availibily query in advance of the actual access of the data, so that you do not get a crash or whatever if a child record or whatever is unavailable.
Currently, I do not know of any tool that does this, although obviously this is not my area of expertise.
"It is a greater offense to steal men's labor, than their clothes"
Here's my situation: I have a dual-booting Linux/Win98 machine at home, a Win98 laptop, a Linux server sitting in some network in a galaxy far, far away; and a bunch of other computers around the world.
At one point, managing all my data (I would change a bit here, and a bit there, then try to copy and synchronize by hand) was manageable, but I got real tired of it real fast. I considered putting together a CVS server, and then synchronizing that way, but it's really overkill and not a very user-friendly solution anyway.
Enter Unison. Now I just have a few directories designated as shared, and they get synchronized by Unison automatically. At home, my data is on a FAT partition, which is accessible to both Linux and Win98.
The good thing about this is that since I synchronize with the laptop when I'm connected, I get to use my data even when I'm on the move - not so with NFS. And I get free backups as well - I do have roughly 2Gigs of data, which would be a hassle to backup any other way. Besides, if I took tape backups, I would have to manually carry them off-site in case of a fire; now Unison takes care of backups to and from my remote machines.
This works well for me to keep about 30 accounts in sync, most of them just get a minimal checkout of my home directory (5 mb or so), while 3 or 4 get the whole home directory and rsynced files (5 gb). The CVS repository is about half a gigabyte in size these days.
Once something that allows proper file rename tracking, like subversion, comes along, I plan to stop using rsync alltogether, and just check all the files in.
As has been noted elsewhere in this thread, one of the key things is coming up with a consistent directory structure and sticking with it.
see shy jo
This is not a Fugazi
If someone replaces a machine on my home network so they can snoop on my home directory, I'm much more concerned about the security of my apartment than my network. There's no way I'd use standard NFS over the internet, but there's no reason to either. I can use ssh to login or tunnel vnc across the internet into my home systems, so there really isn't a reason for not using NFS. Even ignoring security, I can't think of any network file system that actually performs ok over the internet.
BTW I've never seen a "lost" mount with Solaris 8, so you've got some other problem.
I'd stay away from an OS that can't implement something as old and common as NFS.