Ask Slashdot: Temporary Backup Pouch?
An anonymous reader writes "It looks simple. I've got a laptop and a USB HDD for backups. With rsync, I only move changes to the USB HDD for subsequent backups. I'd like to move these changes to a more portable USB stick when I'm away, then sync again to the USB HDD when I get home. I figured with the normality of the pieces and the situation, there'd be an app for that, but no luck yet. I'm guessing one could make a hardlink parallel-backup on the laptop at the same time as the USB HDD backup. Then use find to detect changes between it and the actual filesystem when it's time to backup to the USB stick. But there would need to be a way to preserve paths, and a way communicate deletions. So how about it? I'm joe-user with Ubuntu. I even use grsync for rsync. After several evenings of trying to figure this out, all I've got is a much better understanding of what hardlinks are and are not. What do the smart kids do? Three common pieces of hardware, and a simple-looking task."
First, ignore the people who encourage you not to try, and who point you in other directions. Sure, there are much better ways of doing this, but who cares? The whole point is that you should be able to do whatever you want -- and actually doing this is going to leave you _so_ much smarter, trust me.
Some douche criticized you for not knowing beforehand why hard links wouldn't work. . . . because, you know, you should have been born knowing everything about filesystems. To hell with him, sally forth on your journey of discovery, this can be hella fun and you'll get an awesome feeling of accomplishment.
First off, you're going to have trouble using rsync with the flash drive, because I assume your constraint is that you can't fit everything on the flash drive, it's only big enough to hold the differences.
Next, come to terms with the fact that you'll need to do some shell scripting. Maybe more than just some, maybe a lot, but you can do it.
I'd recommend cutting your hard drive in two -- through partitions or whatever -- to make sure that "system" is fully segmented from "data." No sense wasting all your time and effort getting backups of /proc/ and /dev/, or, hell, even /bin/ and /usr/. Those things aren't supposed to change all that much, so get your backups of /home/ and /var/ and /etc/ working first. Running system updates on the road is rarely worth it, and will be the least of your concerns if you end up needing to recover.
Next, remind yourself how rsync was originally intended to work at a high level. It takes checksums of chunks of files to see which chunks have changed, and only transfers the changed chunks over the wire in order to minimize network use. Only over time did it evolve to take on more tasks -- but you're not using it for its intended purpose to begin with, since you're not using any network here. So rsync might not have to be your solution while travelling unless you start rsyncing to a personal cloud or something -- but its first principles are definitely a help as you come up with your own design.
The premise is that, while travelling, you need to know exactly what files have changed since your last full backup, and you need to store those changes on the flash drive so that you can apply the changes to a system restored from the full backup you left at home. You won't be able to do a full restore while in the field, and you won't be able to roll back mistakes made without going home, but I don't think either of those constraints would surprise you too much, you likely came to terms with them already.
So, when doing the full backup at home, also store a full path/file listing with file timestamps and MD5 or CRC or TLA checksums either on your laptop or on the flash disk, preferably both.
Then, when running a "backup" in the field, have your shell script generate that same report again, and compare it against the report you made with the last full backup. If the script detects a new file, it should copy that file to the flash disk. If the script detects a changed timestamp, or a changed checksum, it should also copy over the file. When storing files on the flash disk, the script should create directories as necessary to preserve paths of changed/new files.
For bonus points, if the script detects a deleted file, it should add it to a list of files to be deleted. For extra bonus points, it should store file permissions and ownerships in its logfiles as replayable commands.
The script would do a terrible job at being "efficient" for renamed files, but same is true for rsync, so whatevs.
I built a very similar set of scripts for managing VMWare master disk images and diff files about ten years ago, and it took me two 7hr days of scripting/testing/documenting -- this should be a similar effort for a 10-yr-younger me. I learned *so* much in doing that back then that I'm jealous of the fun that you'll have in doing this.
Of course, document the hell out of your work. Post it on sourceforge or something, GPL it, put it on your resume.
Actually, I'm not even US citizen, and I travel in South East Asia. When talking about shitty internet, I know what shitty internet is. For example when I'm staying in Cambodia, internet can (and often does) go down for the whole day and night. It also happens often. The speed is also ridiculously slow. You can try to get around some of the downtimes by getting mobile internet for backup, but if there's a wider outage, there's nothing you can do.
Yet, I've found Dropbox to be the best backup solution. Files will get there eventually, and I don't need to do anything. There's also revision history of files, so if you upload corrupted files or something like that you can reverse it. You can access them from other computers in case your laptop goes poof (happened to me). And the most important thing - if you get robbed or lose your luggage, you will still have access to your files (and of course, I keep my laptop encrypted).
The good sides of online cloud backup far outweights the negative ones or worries about bandwidth. Especially since most of the time the files that need backup aren't large. No one in their right mind would try to sync their media files.
Obviously Skydrive is of no use but there are several other alternatives that would be better suited to this purpose although if, as he says, it is for use while travelling an internet based system is useless.
That's why I liked Crashplan when i first saw it. This may sound like a sales pitch but I'm just a happy customer.
With Crashplan you can have multiple destinations for your backup set. I usually have three:
- same HD in case I accidentally deleted some files.
- USB HD for faster recovery in case my primary HD breaks.
- Online "in the cloud", in case my house burns down etc.
Crashplan detects when I plug in the USB HD and automatically starts running updating the backup on it. If there's no internet the first two destinations will still keep me pretty safe. Once the internet is back it catches up on the cloud destination.
It works just fine on my Linux Mint laptop as well as my Windows desktop pc.
rsync doesn't handle deletions
rsync handles deletions just fine - that's why it has a --delete option...
Not everything that can be measured matters; Not everything that matters can be measured.