Ask Slashdot: Temporary Backup Pouch?
An anonymous reader writes "It looks simple. I've got a laptop and a USB HDD for backups. With rsync, I only move changes to the USB HDD for subsequent backups. I'd like to move these changes to a more portable USB stick when I'm away, then sync again to the USB HDD when I get home. I figured with the normality of the pieces and the situation, there'd be an app for that, but no luck yet. I'm guessing one could make a hardlink parallel-backup on the laptop at the same time as the USB HDD backup. Then use find to detect changes between it and the actual filesystem when it's time to backup to the USB stick. But there would need to be a way to preserve paths, and a way communicate deletions. So how about it? I'm joe-user with Ubuntu. I even use grsync for rsync. After several evenings of trying to figure this out, all I've got is a much better understanding of what hardlinks are and are not. What do the smart kids do? Three common pieces of hardware, and a simple-looking task."
Oh dear.
Hardlinks don't span storage devices. They are files that share the same inodes on single storage device. Soft links do, but they are pointers to the inode, so "backup" using softlinks and you have a bunch of pointers to data that is on the original system. NOT on the thumb drive!
Use one of the backup packages out there, you are not at the point of rolling your own.
Not even close.
Never answer an anonymous letter. - Yogi Berra
Since you are an ubuntu user, and it looks like you just need a nice rsync front-end to handle backup of the same data to two different drives, I'll suggest unison-gtk.
Very nice, simple front-end, and will do what I think you need.
You can never know everything, and part of what you do know will always be wrong. Perhaps even the most important part.
I hesitate to offer this, because I've not experimented with it in the precise scenario you describe. However, being another Joe User with ubuntu, I took a look at rsync as a way to implement backups between my home PC and an Apple Time capsule that I was using as a secondary backup device.
After some tinkering I settled on Unison, which is available in the ubuntu repositories. It's essentially a sophisticated rsync front end, with a few bells and whistles. You get 2-way directory replication between your 'local' and 'remote' file systems [though they could both be local or both remote if you choose] and you can essentially script multiple different backups into the single interface. For example, I have "Office" for documents, spreadsheets and the like, "Photos", for camera images, "Music", and so on.
Like most tools, Unison is imperfect, but it's simple to use once set up. The key point with it, as with any product you put in this space, will be knowing and keeping track of your definitive data source. If you have a document that exists on both your local and backup systems, and you edit that file separately at each location, then run Unison, only the most chronologically recent copy will be preserved. To go beyond this level of functionality and get to something that can intelligently merge changes, I think you're going to need something more like a CVS tool... There are hugely expensive proprietary solutions (like Livelink), but I've not come across anyone using a good FOSS alternative. HTH...
I use DirSyncPro to automate my backup tasks. Not sure how to set it up for your particular task, or whether you can, but it might be worth looking into. A lot of options while still being easy to use.
I listen to both RIAA and non-RIAA stuff if I like the music, tangential business/politics nonwithstanding.
You Sir are completely correct.
The smart kids use Linux.
Duplicity uses librsync to generate the changeset that rsync would use, then stores the change set. If you stored the change set to the USB drive, this could then be "restored" to the destination drive, perhaps? I don't know if there's any way to do this out of the box, or with a bit of scripting, or if this would need to be a whole new toolchain.
there are better solutions than rsync.
rdiff-backup
dupicity
for example.
i probably don't understand what you are trying to accomplish.
I can't see how internet based system would be useless. SkyDrive and Dropbox both can sync files when you get internet connection. I am traveling too (have been for 4 months) and that's what I do, even while internet is really crap at times. But it will get synced eventually, and it gets synced automatically without me doing anything. On top of that de-duplication and only syncing parts that need to be uploaded saves bandwidth.
rsync and other low level solutions are much more work and on top of that you need to carry around extra devices that might get destroyed too. But with SkyDrive or Dropbox the files will always be there no matter what happens.
this is what dump(8) does
As you say, the internet is really crap at times when you are travelling so why make life difficult? It is also fair to say that you obviously think of travelling as a bit of wandering around in the US. Once you broaden your horizons you will find that the internet is often not even an option.
Skydrive is not going to integrate with Ubuntu (have you read the summary yet?) so it is a stupid option whereas there is a dropbox client. It is still flakey and not going to be easy to use as required so he is still better off doing something that will work well and therefore get done regularly. If he is using some client for a service that sometimes works and sometimes doesn't you can guarantee that the time when he needs that backup will be one of the times that it did not work.
I love stacking my barbecues in the shed at the end of summer - you can't beat a bit of grill on grill action.
First, ignore the people who encourage you not to try, and who point you in other directions. Sure, there are much better ways of doing this, but who cares? The whole point is that you should be able to do whatever you want -- and actually doing this is going to leave you _so_ much smarter, trust me.
Some douche criticized you for not knowing beforehand why hard links wouldn't work. . . . because, you know, you should have been born knowing everything about filesystems. To hell with him, sally forth on your journey of discovery, this can be hella fun and you'll get an awesome feeling of accomplishment.
First off, you're going to have trouble using rsync with the flash drive, because I assume your constraint is that you can't fit everything on the flash drive, it's only big enough to hold the differences.
Next, come to terms with the fact that you'll need to do some shell scripting. Maybe more than just some, maybe a lot, but you can do it.
I'd recommend cutting your hard drive in two -- through partitions or whatever -- to make sure that "system" is fully segmented from "data." No sense wasting all your time and effort getting backups of /proc/ and /dev/, or, hell, even /bin/ and /usr/. Those things aren't supposed to change all that much, so get your backups of /home/ and /var/ and /etc/ working first. Running system updates on the road is rarely worth it, and will be the least of your concerns if you end up needing to recover.
Next, remind yourself how rsync was originally intended to work at a high level. It takes checksums of chunks of files to see which chunks have changed, and only transfers the changed chunks over the wire in order to minimize network use. Only over time did it evolve to take on more tasks -- but you're not using it for its intended purpose to begin with, since you're not using any network here. So rsync might not have to be your solution while travelling unless you start rsyncing to a personal cloud or something -- but its first principles are definitely a help as you come up with your own design.
The premise is that, while travelling, you need to know exactly what files have changed since your last full backup, and you need to store those changes on the flash drive so that you can apply the changes to a system restored from the full backup you left at home. You won't be able to do a full restore while in the field, and you won't be able to roll back mistakes made without going home, but I don't think either of those constraints would surprise you too much, you likely came to terms with them already.
So, when doing the full backup at home, also store a full path/file listing with file timestamps and MD5 or CRC or TLA checksums either on your laptop or on the flash disk, preferably both.
Then, when running a "backup" in the field, have your shell script generate that same report again, and compare it against the report you made with the last full backup. If the script detects a new file, it should copy that file to the flash disk. If the script detects a changed timestamp, or a changed checksum, it should also copy over the file. When storing files on the flash disk, the script should create directories as necessary to preserve paths of changed/new files.
For bonus points, if the script detects a deleted file, it should add it to a list of files to be deleted. For extra bonus points, it should store file permissions and ownerships in its logfiles as replayable commands.
The script would do a terrible job at being "efficient" for renamed files, but same is true for rsync, so whatevs.
I built a very similar set of scripts for managing VMWare master disk images and diff files about ten years ago, and it took me two 7hr days of scripting/testing/documenting -- this should be a similar effort for a 10-yr-younger me. I learned *so* much in doing that back then that I'm jealous of the fun that you'll have in doing this.
Of course, document the hell out of your work. Post it on sourceforge or something, GPL it, put it on your resume.
If I understand your problem right, How about dar? It can make an empty archive of your main backup to act as a reference (just file info, no files). Then it makes archives relative to that, with just changed files. It can then apply the changes to the original dir, including deletions, if you need that.
Forgot to mention:
To accomplish this, you'll need to read up on:
- bash
- find
- grep
- awk
- sed
- md5sum
- chmod/chown
- mkdir -p
- diff/patch (for general reference, and also look up binary diffing tools)
Extra extra extra bonus points if you compress the changed files when storing them on the flash drive.
Actually, I'm not even US citizen, and I travel in South East Asia. When talking about shitty internet, I know what shitty internet is. For example when I'm staying in Cambodia, internet can (and often does) go down for the whole day and night. It also happens often. The speed is also ridiculously slow. You can try to get around some of the downtimes by getting mobile internet for backup, but if there's a wider outage, there's nothing you can do.
Yet, I've found Dropbox to be the best backup solution. Files will get there eventually, and I don't need to do anything. There's also revision history of files, so if you upload corrupted files or something like that you can reverse it. You can access them from other computers in case your laptop goes poof (happened to me). And the most important thing - if you get robbed or lose your luggage, you will still have access to your files (and of course, I keep my laptop encrypted).
The good sides of online cloud backup far outweights the negative ones or worries about bandwidth. Especially since most of the time the files that need backup aren't large. No one in their right mind would try to sync their media files.
Comment removed based on user account deletion
Obviously Skydrive is of no use but there are several other alternatives that would be better suited to this purpose although if, as he says, it is for use while travelling an internet based system is useless.
That's why I liked Crashplan when i first saw it. This may sound like a sales pitch but I'm just a happy customer.
With Crashplan you can have multiple destinations for your backup set. I usually have three:
- same HD in case I accidentally deleted some files.
- USB HD for faster recovery in case my primary HD breaks.
- Online "in the cloud", in case my house burns down etc.
Crashplan detects when I plug in the USB HD and automatically starts running updating the backup on it. If there's no internet the first two destinations will still keep me pretty safe. Once the internet is back it catches up on the cloud destination.
It works just fine on my Linux Mint laptop as well as my Windows desktop pc.
Or he could save himself a ton of grief and just use rdiff-backup, which happens to use librsync, produces incremental differential backups, stores said backups as files you can simply browse, works equally well on local and remote filesystems, and is dead simple to use. I've used it for years now on a ton of systems.
Write failed: Broken pipe
Try keeping current on the status of Dropbox and SkyDrive services so you can pull your data before they disappear.
Email? Twitter? Facebook? All kind of "push notification" technologies where you don't really need to do anything if you use them.
Besides, we are talking about Microsoft here. A company that has ridiculously long phase outs for their products as a standard practice so businesses feel safe using them (seriously, they announced that a version 4.0 of SilverLight will see end of support in two years from now). If there is any tech company in the world that you can trust not just going to end support suddenly, it's Microsoft.
From what I have parsed the OP wants to have a full back-up on a USB-HDD and the diffs on the USB-Flash, because the Flash is limited in size.
Just write two rsync (or grsync) scenarios: one for HDD and the other for the Flash. On the HDD you will have a directory that is a mirror copy of your laptop. On the Flash you will keep the diffs for the time between syncs to the HDD.
When at home
1. rsync your laptop to the HDD (mirror).
2. copy the incremental stuff from the Flash to a separate directory (e.g. diff-2012-May-21) on the HDD, and wipe the Flash.
At the road:
just rsync diffs to that Flash.
I guess the recovery plan is quite obvious too. Should any _one_ of those three devices die, you are still good to go.
...a stunned silence fell upon the hall.
No, he's made part of his username into a fake uid to make it look like he's been here longer. (hint: the second one is his uid).
For 99.9% of all users a backup is simply that, a failsafe in case their main HD gets lost / damaged. So what if dropbox or skydrive suddenly were to go out of business (as unlikely as that is, youd know in advance)? You suddenly lose access to that safety copy of your data and will know right away because the client cannot connect anymore. But you still have your primary copy of everything, nothing was lost, you can just switch providers or change your backup strategy. The chances that something would happen right then in the time-frame that the cloud provider fails and you make another copy with another provider are incredibly low. If you can't take that risk then you'd have a third backup anyways.
Try keeping current on the status of Dropbox and SkyDrive services so you can pull your data before they disappear.
You clearly have never used DropBox. It's just a shared folder that populates on every computer you install it on. If DropBox were to die right this instant, you would still have all of your data on every one of your computers - it would just stop syncing.
If you are worried about the revision history, you could pick one computer to run a rsync job between the DropBox folder and another folder of your choice, or if you are on a Mac just use TimeMachine, or if you are on Windows run something like Areca backup or any number of other free incremental backup solutions. You should be doing this anyway.
Then when DropBox goes out of business, you can switch to one of the several competitors out there and continue as before.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.