How Do You Sync & Manage Your Home Directories?
digitalderbs writes "A problem plaguing most people with multiple computers is the arduous task of synchronizing files between them: documents, pictures, code, or data. Everyone seems to have their own strategies, whether they involve USB drives, emailed attachments, rsync, or a distributed management system, all of which have varying degrees of success in implementing fast synchronization, interoperability, redundancy and versioning, and encryption. Myself, I've used unison for file synchronization and rsnapshot for backups between two Linux servers and a Mac OS X laptop. I've recently considered adding some sophistication by implementing a version control system like subversion, git, or bazaar, but have found some shortcomings in automating commits and pushing updates to all systems. What system do you use to manage your home directories, and how have they worked for you for managing small files (e.g. dot configs) and large (gigabyte binaries of data) together?"
I recently started playing around with Dropbox for some smaller folders than my entire home directory and haven't yet run into any major problems. And the versioning it provides is nice as well, and as a plus they don't consider the deleted files that they still retain versions of as part of the quota.
I use multiple OS X, Linux, and FreeBSD machines daily. One cannot sync all home directory files, as all the config stuff differs between Gentoo, Debian, FreeBSD, Tiger, and Leopard. So it's mostly down to documents, graphics, and a few audio and video files. For the larger ones, I use a usb stick, the smaller ones I email to myself so they're always available via IMAP servers. But most of all I have a bootable, customized version of systemrescuecd installed on a 16GB usb stick, which at any given moment has all the currently important stuff I need. It works well enough for me.
Caveat Utilitor
I wouldn't recommend using one tool for every purpose. I wouldn't want to store multi-GB files in SVN, and I wouldn't want to store all my code on an external hard drive. Maybe using DropBox, or rsyncing with a server somewhere would work.
For small backups, every ten minutes, I use backintime (based on rsync). For larger, nightly or more rare backups, I use rdiff-backup. Both work over the LAN, or to locally-mounted hard drives.
I carry a 16 Gig USB flash drive with my working files on it. I've using this method since the days of 100 Meg Zip drives and just keep upgrading the media. My flash drive is automatically backed up to my backup server at home in the middle of the night so, if I forget it at the office, I'm only a few hours behind. Besides, I can use free Logmein to log into the office computer and transfer a file if it's got new and important information on it. It works the same way in reverse if I forget it at home. Since my working files are on the USB drive which is also compatible with my Linux machines, it really doesn't make much difference which machine I plug it into. Did I mention encryption? That's a good idea in case you lose the drive if you've got any sensitive information on it.
"Do the Right Thing. It will gratify some people and astound the rest." - Mark Twain
I use git, with flashbake and cron to automate commits, and a simply cron job to automatically update a backup copy on an external hard drive.
I spent a long time tackling this, as I am situated at different locations on different days.
I have 2 desktops and a laptop which must remain sync'd and encrypted. I use TrueCrypt for the encryption.
On my Windows boxes - SyncBack handles it. It can be triggered on write or on insertion, or just periodically. Has version control support. Will sync over FTP (poorly) and can create zip files or burn Cds etc. It's a swiss army knife of sync tools.
The key for getting the most out of a sync program is granularity. Inevitably, you'll have exceptions, and you don't want a PASS/FAIL result for your entire backup set. It works much better to sort files into categories and sync the individual groups than to try to make one profile that does your entire disk array. My 2 cents.
I embed all my documents in porn and post them on various web forums. The recovery procedure involves spidering my spam folder. I recently found my high school history term paper in a jpg of Marylin Chambers.
Insightful and funny are really the same thing, except one has a punch line.
Unison
for backups I used to swear by rsync plus hardlinks. But since time machine came out it's oh so much much better. For one thing rsync is still a bit unstable on huge directory trees that contain lots of hard links. And it also boofs on some extended type attributes, forks and file types, though it keeps getting better (perhaps it's perfect now). Rsync + hardlinks also does not retain the ownership and privledges and ACL faithfully either.
But even if Rsync + hardlinks didn't have those troubles, time machine is so flawless it's just the thing to use. What is especially nice about time machine is the recovery and inspection process. it's not too hard to figure out what files chaged (there's even a 3rd party gui application for this) and because this info is stored in meta data it's faster and more relaible to retreive than a massive FIND command looking at time stamps. The time machine interface for partial recoveries is intuitive and easy to drill down. In many cases it's even application aware so you can drill not on the file system itself but on say your mail folders in the mail application. this is actually a pretty stunning achievement that needs to be seen to be believed how paradigm shifting it is.
And full recoveries could not be easier. you just boot off the CD and within ten clicks you have picked the souece and destination and it has done a series of idiot checks. While that might not seem too amazing, it sure is comforting. It's a mildly nerve wracking process of trying to recover from a back-up cause there's lots of ways to goof and maybe even wreck your original ( like oops, I didn't do a -delete, or I didn't tell it to reassign links, or worse I copied the wrong direction).
Here's a super nice tip: you can have two disks operating with time machine that you rotate. Actually the best way i've found is to have one constantly attached then on fridays attach the other one, redirect time machine to it, let it back up all the changes since last friday, then detatch it and let time machine go back to the main disk.
You can even use this as a way to sync your two computers though it's better as a backup than as a synch. have time machine back up just your home directory to a thumb drive, take this from home to work. plug it to the drive at work, back it up. then revert this to the backup from home. now home and work are synced plus, if there was one special file or two that was newer at work, well you have that in the backup you made! ( by the way to do this kind of thing requires fiddling with the backup cookie so two computers can share the same repository. google this if you want o know how)
Some drink at the fountain of knowledge. Others just gargle.
I have found that using Subversion (svn) with the aid of a bash script that is run manually actually works really well and provides a number of special advantages. Here's how I have it constructed:
First, I don't actually make my whole home directory a svn checkout. I have a subdirectory in it that is the checkout, and my bash script ensures there are symlinks into it for the things I want sync'd. This makes it easy to have some differences between locations. In particular, I can have a different .bashrc for one machine than another, but keep them both in svn as separate files; it is just a matter of making the symlink point to the one I want to use in each location. My bash script will make the symlink if the file doesn't exist, and warn if the file does exist but isn't a symlink. It does this for a number of files.
Another benefit of this method is that I don't put all my files in one checkout. The core files I'll want in all my home directories (e.g. .bashrc, .vimrc, ssh .config and public keys, etc.) go in a checkout called "homedir". But my documents go elsewhere. And my sensitive files (e.g. private keys) go somewhere else still. I choose what is appropriate to install at each location (usually just the "homedir" checkout on boxes I don't own). My bash script detects which checkouts I have and does the appropriate steps.
The bash script not only sets up the symlinks but it also does an "svn status" on each checkout so I'll know if there are any files I've created that I haven't added, or any files I've modified that I haven't committed. I prefer not to automate adds and commits. I'll definitely see any pending things when I run my sync script, and can simply do an "svn add" or "svn commit" as necessary.
I also prefer not to automate the running of the sync script. I like being in control of my bandwidth usage, especially when connected via slow links (e.g. Verizon EV-DO, AT&T GPRS). Plus dealing with conflicts is much easier when it is interactive (although I can usually avoid that scenario). It also simplifies authentication to run it from my shell, as it can just use my ssh agent (which I forward, which is setup in my sync'd ssh config).
The sync bash script takes care of a few other edge-case issues, like dealing with files in ~/.ssh that have to have certain permissions and whatnot. And I've taken care to ensure that the script doesn't just blow away files; it will warn if things don't look right, and leaves it to me to fix it.
Using Subversion has another big advantage: it is likely to be installed already in many places. So when I'm given an account on someone's computer, I can usually get my environment just the way I like it in a few short steps:
svn co svn+ssh://my.server.tld/my/path/to/svn/trunk/homedir ~/homedir .bashrc not being a symlink and whatnot .bashrc
~/homedir/bin/mysync # This is my bash script to do the syncing
# Correct any complains about
~/homedir/bin/mysync
# Log out and back in, or source
No fuss, no muss. No downloading some sync package and building it just to get your .bashrc or .vimrc on a random box, or asking the admin to install something. Subversion is usually there, and even if it isn't, most admins are happy to install it. Subversion deals well with binary files, and even large files. For bulk things (like a music library), I'm more likely to rsync it, partly because it is bulk, partly because it doesn't benefit from versioning, and partly because it only needs to be a unidirectional sync. I could easily add that to my sync script.
I am simply in the habit of typing "mysync" from time to time (my .bashrc puts ~/bin/ in my $PATH). This works for me very nicely. Some people may prefer a little more automation, and of course my script could automatically do adds and commits, and even skip the log messages. But I prefer a bit more process; after all, this is my data we're talking about!
If there is interest, I may post my sync script.