Ask Slashdot: Temporary Backup Pouch?
An anonymous reader writes "It looks simple. I've got a laptop and a USB HDD for backups. With rsync, I only move changes to the USB HDD for subsequent backups. I'd like to move these changes to a more portable USB stick when I'm away, then sync again to the USB HDD when I get home. I figured with the normality of the pieces and the situation, there'd be an app for that, but no luck yet. I'm guessing one could make a hardlink parallel-backup on the laptop at the same time as the USB HDD backup. Then use find to detect changes between it and the actual filesystem when it's time to backup to the USB stick. But there would need to be a way to preserve paths, and a way communicate deletions. So how about it? I'm joe-user with Ubuntu. I even use grsync for rsync. After several evenings of trying to figure this out, all I've got is a much better understanding of what hardlinks are and are not. What do the smart kids do? Three common pieces of hardware, and a simple-looking task."
Just a suggestion, but SkyDrive is great. You get 25GB for free and it is also going to be integrated fully into Windows 8 in future.
What do the smart kids do?
The smart kids don't use linux.
Oh dear.
Hardlinks don't span storage devices. They are files that share the same inodes on single storage device. Soft links do, but they are pointers to the inode, so "backup" using softlinks and you have a bunch of pointers to data that is on the original system. NOT on the thumb drive!
Use one of the backup packages out there, you are not at the point of rolling your own.
Not even close.
Never answer an anonymous letter. - Yogi Berra
Since you are an ubuntu user, and it looks like you just need a nice rsync front-end to handle backup of the same data to two different drives, I'll suggest unison-gtk.
Very nice, simple front-end, and will do what I think you need.
You can never know everything, and part of what you do know will always be wrong. Perhaps even the most important part.
I hesitate to offer this, because I've not experimented with it in the precise scenario you describe. However, being another Joe User with ubuntu, I took a look at rsync as a way to implement backups between my home PC and an Apple Time capsule that I was using as a secondary backup device.
After some tinkering I settled on Unison, which is available in the ubuntu repositories. It's essentially a sophisticated rsync front end, with a few bells and whistles. You get 2-way directory replication between your 'local' and 'remote' file systems [though they could both be local or both remote if you choose] and you can essentially script multiple different backups into the single interface. For example, I have "Office" for documents, spreadsheets and the like, "Photos", for camera images, "Music", and so on.
Like most tools, Unison is imperfect, but it's simple to use once set up. The key point with it, as with any product you put in this space, will be knowing and keeping track of your definitive data source. If you have a document that exists on both your local and backup systems, and you edit that file separately at each location, then run Unison, only the most chronologically recent copy will be preserved. To go beyond this level of functionality and get to something that can intelligently merge changes, I think you're going to need something more like a CVS tool... There are hugely expensive proprietary solutions (like Livelink), but I've not come across anyone using a good FOSS alternative. HTH...
I use DirSyncPro to automate my backup tasks. Not sure how to set it up for your particular task, or whether you can, but it might be worth looking into. A lot of options while still being easy to use.
I listen to both RIAA and non-RIAA stuff if I like the music, tangential business/politics nonwithstanding.
Duplicity uses librsync to generate the changeset that rsync would use, then stores the change set. If you stored the change set to the USB drive, this could then be "restored" to the destination drive, perhaps? I don't know if there's any way to do this out of the box, or with a bit of scripting, or if this would need to be a whole new toolchain.
there are better solutions than rsync.
rdiff-backup
dupicity
for example.
i probably don't understand what you are trying to accomplish.
this is what dump(8) does
Hahaha, "temporary backup pouch" how gay.
I stuck my pen0r in your momz pouch. Wait I didnt because I am gay with Rob Malda. Or cowboy neal. You haven't heard from ol' neal in a while because he's still tied up in my basement with a Broos Willys look-a-like!
SUCK MY BALLS SLASHDOT!
-KD
If all you are doing is copying files from laptop to your USB HD.
Then prepare to be shattered.
A work college who did exactly the same as you, Went to restore some files that he removed previously (Space issue) learnt the hard way.
Came to work and asked - How can I recover files from my HD? I can still open them from the HD but the images are all corrupt....
My advice was ALWAYS have 3 (THREE) copies of any important data.
1) Live (Laptop)
2) First backup
3) Second backup for when you first backup FAILS while trying to restore the first.
Full backups are best. Move the backup off site (Even in the back shed with an airtight sealed container).
First, ignore the people who encourage you not to try, and who point you in other directions. Sure, there are much better ways of doing this, but who cares? The whole point is that you should be able to do whatever you want -- and actually doing this is going to leave you _so_ much smarter, trust me.
Some douche criticized you for not knowing beforehand why hard links wouldn't work. . . . because, you know, you should have been born knowing everything about filesystems. To hell with him, sally forth on your journey of discovery, this can be hella fun and you'll get an awesome feeling of accomplishment.
First off, you're going to have trouble using rsync with the flash drive, because I assume your constraint is that you can't fit everything on the flash drive, it's only big enough to hold the differences.
Next, come to terms with the fact that you'll need to do some shell scripting. Maybe more than just some, maybe a lot, but you can do it.
I'd recommend cutting your hard drive in two -- through partitions or whatever -- to make sure that "system" is fully segmented from "data." No sense wasting all your time and effort getting backups of /proc/ and /dev/, or, hell, even /bin/ and /usr/. Those things aren't supposed to change all that much, so get your backups of /home/ and /var/ and /etc/ working first. Running system updates on the road is rarely worth it, and will be the least of your concerns if you end up needing to recover.
Next, remind yourself how rsync was originally intended to work at a high level. It takes checksums of chunks of files to see which chunks have changed, and only transfers the changed chunks over the wire in order to minimize network use. Only over time did it evolve to take on more tasks -- but you're not using it for its intended purpose to begin with, since you're not using any network here. So rsync might not have to be your solution while travelling unless you start rsyncing to a personal cloud or something -- but its first principles are definitely a help as you come up with your own design.
The premise is that, while travelling, you need to know exactly what files have changed since your last full backup, and you need to store those changes on the flash drive so that you can apply the changes to a system restored from the full backup you left at home. You won't be able to do a full restore while in the field, and you won't be able to roll back mistakes made without going home, but I don't think either of those constraints would surprise you too much, you likely came to terms with them already.
So, when doing the full backup at home, also store a full path/file listing with file timestamps and MD5 or CRC or TLA checksums either on your laptop or on the flash disk, preferably both.
Then, when running a "backup" in the field, have your shell script generate that same report again, and compare it against the report you made with the last full backup. If the script detects a new file, it should copy that file to the flash disk. If the script detects a changed timestamp, or a changed checksum, it should also copy over the file. When storing files on the flash disk, the script should create directories as necessary to preserve paths of changed/new files.
For bonus points, if the script detects a deleted file, it should add it to a list of files to be deleted. For extra bonus points, it should store file permissions and ownerships in its logfiles as replayable commands.
The script would do a terrible job at being "efficient" for renamed files, but same is true for rsync, so whatevs.
I built a very similar set of scripts for managing VMWare master disk images and diff files about ten years ago, and it took me two 7hr days of scripting/testing/documenting -- this should be a similar effort for a 10-yr-younger me. I learned *so* much in doing that back then that I'm jealous of the fun that you'll have in doing this.
Of course, document the hell out of your work. Post it on sourceforge or something, GPL it, put it on your resume.
It seems like the poster confuses two tasks: Backup and version control.
/home, but that does not include huge files like RAW files from a DSLR.
/etc/hosts, or ~/ssh/config). You can then simply refer to it has "home" where ever you are, and tools like git and svn stay happy.
For the former, use archiving tools to perform full and incremental backup. How is it done? You could use find to list files with certain criteria, e.g. last modified timestamps. Pass that list to using the -T flag, where you also use -X to exclude files and directories like "*/.thumbnails" and "*/.[cC]ache*". Once the tar is done, use your favourite checksum tool; md5sum, shasum to store a checksum of the archive in a separate file. Once you get home, move the archives, verify the checksums, and you're done.
As pointed out in the summary, deleted directories and files will be an issue, thus, you perform full back from time to time. The time frame will depend on how big the changes are, and how much data you have. Personally, I've settled on every second week for
As for version control, set up your own git repository with git init, copy it to the laptop with git clone, and you're ready to go. Pro tip: Make sure you name your home computer with a real or "fake" DNS (e.g. in
If I understand your problem right, How about dar? It can make an empty archive of your main backup to act as a reference (just file info, no files). Then it makes archives relative to that, with just changed files. It can then apply the changes to the original dir, including deletions, if you need that.
Local backups via TimeMachine when your laptop is not connected to the backup disk.
Forgot to mention:
To accomplish this, you'll need to read up on:
- bash
- find
- grep
- awk
- sed
- md5sum
- chmod/chown
- mkdir -p
- diff/patch (for general reference, and also look up binary diffing tools)
Extra extra extra bonus points if you compress the changed files when storing them on the flash drive.
If you want to stick with rsync for backups, what you want to do is get beyond having just a mirror on the external drive. After all, a backup should help you recover from mistakes, and mirroring will replicate your mistakes to the external drive too.
Instead, keep a series of backup snapshots on the external drive, representing your data at a certain point in time. Each rsync pass creates a new snapshot, like /mnt/backup/2012-05-20/ which represents your internal drive and shares common files by hardlink with older snapshots such as /mnt/backup/2012-05-19/ but doesn't have links to things that were deleted. Merging backups between two different external devices is then a matter of transferring around whichever dated snapshots you want to mirror or migrate between devices.
Here's a hint about how to implement one such snapshot:
mkdir /mnt/backup/2012-05-20 /. /mnt/backup/2012-05-20/.
rsync -Ravx --link-dest=/mnt/backup/2012-05-19/. --exclude=/var/{tmp,cache} --exclude=/tmp
see the man page for more details, and the general inductive step is left as an exercise to the reader. In practice, this sort of backup has very little overhead for snapshots of non-changing files. I allocate approximately 150% of my source data volume for my backup volume to maintain a long-term history of many daily, weekly, and monthly backups. My script decimates older backups (with rm -rf /mnt/backup/YYYY-MM-DD) to turn a series of dailies into weeklies, weeklies into monthlies, etc.
The pathological case for this kind of backup is a huge file that slowly grows, such as /var/log/btmp on older Linux systems where logrotate did not rotate that file. An optimization for rotated logs is to make sure they get a name like basename.YYYY-MM-DD instead of basename.1, so that the name of older files doesn't change each time it rotates. Also use appropriate exclude patterns so you don't waste time and space backing up junk you don't need. You can even go the other way and only selectively retain stuff like:
rsync -Ravx --link-dest=/mnt/backup/2012-05-19/. /./etc /./boot /./home /./var/log /mnt/backup/2012-05-20/.
This is an example of the power of the -R (--relative) naming scheme understood by rsync. The position of the extra "." inside the source file paths is not a typo, but actually essential to the purpose of this example. Learn it. Live it. Love it.
They'd have the skinny on pouches for sure.
I have not used it myself, but I belive that git-annex does exactly what the OP asks for: http://git-annex.branchable.com/
You can get started on fire.
If you mod me down the terrorists will have won
To detect the changes, you can utilise snapshotting (rsnapshot package perhaps?) to do it and it would allow you to see the changes from a day-to-day view.
All you need to then do is transfer the changes listed until the daily.1 (or appropriate folder) to your USB stick. It maintains paths, which is one of the items you're after.
As for deletions, I think a daily email of the snapshot.log file could list this information for you. I wouldn't want it more complicated than that I guess...
Hope this helps,
@chayharley
timestamps.
could you:
store the timestamp of latest backup, using touch filename.
run find with -newer timestampfile as one of the params, can't remember others.
pipe output to tar.
http://dbaspot.com/shell/399852-creating-tar-file-only-files-have-changed-since-specified-date.html
Comment removed based on user account deletion
Install Windows, use a briefcase
Windows hard links do not span volumes, you are probably thinking of junctions. See:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365006(v=vs.85).aspx
Best post!
Or he could save himself a ton of grief and just use rdiff-backup, which happens to use librsync, produces incremental differential backups, stores said backups as files you can simply browse, works equally well on local and remote filesystems, and is dead simple to use. I've used it for years now on a ton of systems.
Write failed: Broken pipe
With backup drives and file transfers, I also tend to run into the problem that I have different UIDs on different systems. Maybe not such a problem for the OP, but you mention backups of /var which is typically full of files owned by system users (e.g. cups, and mysqld/apache if you use the laptop for web development).
Avantslash: low-bandwidth mobile slashdot.
Sweet post. Yes indeed you get full five points for reading comprehension and reading between the lines. And of course for great advice. Have been AC on /. since a couple of months after it started, so no problem ignoring the inevitable chaff.
I'm just rather baffled that what I'm looking for isn't already done; figure I must be blind. And at 50 I'm getting slow and less interested in skinning the cat myself. But if it comes to that (we'll see how the thread looks like by morning) then now I've got great advice for the next step. Thank you so much.
I don't know if I'm one of the "smart kids" or not, but I'm a standard non-technical user and have found LuckyBackup or BackInTime run along with an online sync/backup service like DropBox or SpiderOak the most handy options.
Both LuckyBackup & BackInTime are GUI tools that set up rsync rules (even complicated ones) with an easy point-and-click interface, then schedule them in cron. They can do anything rsync can: synchronize the drives so the backup matches the current, or make a backup of everything present plus never delete anything, and they won't waste time/energy by backing up files that haven't changed;
LuckyBackup can be set to keep up to 99 snapshots of anything that changes, and they're structured in the exact same way as the original. BackInTime can have unlimited snapshots, and each backup is in a different nested folder by date/time, with unchanged files within each folder being soft links back to the most recent backup copy. Both programs just create file copies, not compressed archives.
Right now, I'm using LuckyBackup for my regular files, and I have BackInTime handling my writing directory so I can go back to an unlimited extent in case -- as happened once -- I realize that I had made a major change several months ago (more backup dates than LuckyBackup tolerates in snapshots) that turned out to be a horrible mistake, so I don't have to try to reconstruct the original from memory.
I use the web/online backup solution partly to keep my computers in sync without a thumbdrive. It's also because it acts as a free minute-by-minute backup with a few months of snapshots, so if an .odt file becomes corrupt while I'm working on it, I don't lose everything since the previous system backup. I lost about 30 hours of intense revisions a couple of years ago because the thumbdrive I was saving & transporting my files on had a glitch that evidently had messed up everything I'd been saving for a few days -- and as it turns out, it's not possible to extract text from a bad .odt file even with a hex editor.
Apathy Sucks, Nobody for President!
Interesting, since I used rdiff-backup in the past and found it a pain. If files are stored as diffs of diffs of diffs of diffs of a full copy, it is rather easy to corrupt the backup. These days, I make backups using rsync, with
For the first backup, omit the --link-target argument. Only modified files are stored. As long as you don't have tons of big files that have only a few bytes changed, I don't see the advantage of rdiff-backup. However, it requires that the backup filesystem supports hard links (see my other comment on the use of a unix filesystem on a flash drive). When you come back home, you can do something similar (with --delete) to sync back to your regular backup drive.
The modify-window option is there to because I have to backup Windows filesystems as well.
Avantslash: low-bandwidth mobile slashdot.
Replying to myself: of course, I realize that the OP cannot use a hard-link backup if the usb drive cannot hold all his important data. It's too long ago that I used rdiff-backup; can you reliably split the master backup and the differential backups to different filesystems (say the drive at home and the usb stick)? Preferably without risking corrupted backups if it involves manually merging diff trees.
Avantslash: low-bandwidth mobile slashdot.
From what I have parsed the OP wants to have a full back-up on a USB-HDD and the diffs on the USB-Flash, because the Flash is limited in size.
Just write two rsync (or grsync) scenarios: one for HDD and the other for the Flash. On the HDD you will have a directory that is a mirror copy of your laptop. On the Flash you will keep the diffs for the time between syncs to the HDD.
When at home
1. rsync your laptop to the HDD (mirror).
2. copy the incremental stuff from the Flash to a separate directory (e.g. diff-2012-May-21) on the HDD, and wipe the Flash.
At the road:
just rsync diffs to that Flash.
I guess the recovery plan is quite obvious too. Should any _one_ of those three devices die, you are still good to go.
...a stunned silence fell upon the hall.
Look at the --delete option of rsync.
unison has already been suggested multiple times.
I used unison. It's perfect to sync from A to B (it only syncs the diffs) then modify B and later sync B to A
You also can modify A and B at the same time as long as it's not the same file, then sync and then A and B are identical.
You can even sync in cycles: A->B->C->A with modifications on all three directory trees and it still works
Unison also handles deletions on both sides fine.
Hint: use the -group -owner -times flags
Atari rules... ermm... ruled.
The first problem to consider is how you determine which files to backup. Filesystems like xfs, zfs, and btrfs have nice convenient ways to get a list of changed files (and for xfs and zfs, the contents of those files as well). For ext2/3/4 (and other older unixy filesystems) look at "dump". And of course, if you're working with a completely dumb filesystem, you can always use rsync (if your backup disk is remotely accessible) or some external/manual indexing to figure out what files to backup.
If your filesystem supports some form of dump (send for zfs), you can use that to create your incremental changes. If you only have a list of files, use tar, or rsync. If you have want to keep a full backup on the same drive, you can use rsync's batch mode (see the manpage) to efficiently generate incrememental backups, for filesystems that don't do a good job of that.
You don't want to hard link between your live tree and a backup tree. That will result in the changes showing up in both trees, obscuring the changes when you run a backup. It's a technique used with rsync for snapshotting, where two backups trees represent the state of the original filesystem at different times. To make that work, the links are broken for the files that differ between the two snapshots.
Or learn Perl. Perl can do easily the same things as bash, find, grep, awk, sed, ...
Avoiding to learn all the intricacy of all these tools was one of the main purposes of Perl.
Try MacBak works on Ubuntu as well :
https://github.com/daemonza/MacBak
For years I have used rsync scripts.
My problem was syncing a desktop and a laptop. So I made upload_to and download_from scripts to sync as needed.
I also try to keep a third master backup copy on a different server so all three are synced.
One problem comes when trying to work on both desktop and laptop simultaneously. Just map a drive and modify files on one side.
I think git has got what you seek. http://git-scm.com/
Wait. Stop scrolling for a sec. O.K. Thanks. - P
You could set up a robocopy script to do incremental backups. Then when you run it just set the 'edited after' property to only backup files since your last home backup. There are plenty of templates for this online as well as a GUI application if you prefer it.
In the grand internet tradition of answering a loosely related question which is no use at all to the asker, I will say that the "smart kids" might use something like ZFS, which almost handles this for you. (Take snapshots, save delta streams on your USB stick. Requires the backup to be a ZFS copy, not just the same files.)
Useless right now at least. But I've been pretty happy with switching my storage to ZFS, even if the Linux version sucks. (I mostly don't use the Linux version.) I'd recommend it to anyone who doesn't mind a bit of transitional pain.
Easiest solution is probably to use tar's incremental backups. The -g argument creates a relatively small file listing the files already backed up, so future incremental runs can skip them. If you keep the incremental files on the laptop then you can put each days actual tar backup on whatever devices you have handy.
Back in 1993 when I was new to *NIX, I asked a seasoned sysadmin which scripting language I should learn. He started listing all those same tools .... then said - "or you could just learn perl."
I did and you sir are 100% correct. I know grep, find, ksh/sh, sed very well - my awk is extremely weak. Perl hasn't let me down all these years. It is still my go-to scripting language whether I'm hacking some crap code together OR performing a system design and pushing out a beautiful website with XML or json services (check out Perl-Dancer) in a few hours.
Perl can be written cross platform - damn you Windows file systems - with a little effort. I'm constantly hacking Windows Perl scripts using strawberry perl.
Buy a bigger flash drive! 64GB isn't too expensive.
If you are running Linux, then 64G should easily handle your OS and documents with many versions.
It is only video, audio and photos that will eat up lots of storage and you need to get those to a server in a different country ASAP anyway. Use rsync+ssh to do that for the big files.
I saw a 32G USB flash drive for $17 yesterday. Unless you are storing HiDef video, this should cover a few weeks overseas. Be certain to encrypt it AND your laptop, but leave a tiny OS that boots so you can show it to customs and border control.
Do not carry your passphrases with you. Keep the KeePassX data file in the cloud somewhere ... so it isn't on the machine at boarder crossings. You will not physically be able to decrypt the HDD this way.
and I know this is not what he asked for, but wouldn't the simplest solution be to purchase a second external drive (maybe an SSD for durability) and actually have a complete backup on the road... Or even just take his current external with him - he has it backed up in the cloud any way...
I ask because he never stated why that external drive was stuck at home..
If that won't do, another possible solution.
1) I don't see a need to sync the USB stick when he returns - just perform your usual backup when you return and only care about the USB stick if it you have a failure before returning home.
If (1) is workable for him then: 2) how much data will he really need to backup when out-and-about? Can't you setup an 'away from home' backup profile that will only backup the things you'll be changing while away - document, current working folders, but skip the movies, porn and music for this backup (it's still at home and in the cloud anyway)..
Since he expects the deltas to fit on a USB stick, I'm assuming he's not wanting to backup a heap of video editing or some other hungry activity...
Never happened. True story.
Just do a clone with Git. You can track changes, deletions and it can resolve conflicts easily.
Thete are multiple "watch" apps out there in various languages that will run a script every time a directory changes.
Google "watching files with ruby"
Substitute ruby for python or perl or...
A fool throws a stone into a well and a thousand sages can not remove it.
if ilusb | grep
then
backup script
fi
and so on
This is how my backup script works I do incremental backups to an external usb hdd if it is connected and just create local restore points if it is not.
Someone had suggested using Git and I was going to suggest the same. If you are only backing up documents then it should be easy enough to create repos on the USB HDD, Laptop and USB drive. You can then commit/merge changes between repos to keep in sync, perhaps use some shell scripts to ease administration. Also, I use a product called Super Flexible File Synchronizer to sync a subfolder on my laptop's filesystem with a WebDAV server. It's got lots of features and supports Linux, Windows and Mac. http://www.superflexible.com/
Agreed. If you wade into this you're going to want to use Perl.
I wrote pretty much exactly this about 15 years ago in OS/2 REXX of all things. Trying to use system tools will end up being way too slow; you're going to want to use an in-memory hash. REXX supported hashes before they were called hashes. (indirect array references, I think? can bash do that?)
If you do write this, please do share it somewhere.
Since you can't use dump(8) as others have pointed out, maybe you can do something with UnionFS. After you do your full backup to USB HDD and are about to leave on a trip, mount a unionfs over top of your critical filesystem(s)... Then every day, copy the union layer off to thumb-drive.
Or you can learn all the utilities if you want to learn right and be a better sysadmin and work with other sysadmins.
Perl is for programmers that like to pretend they know how to be sysadmins and don't need to share their tools with other sysadmins.
If you want to "copy" junk with its metajunk from an ext3 filesystem on to a FAT32 filesystem, remember that you can always create an 8GB file with dd from /dev/zero, run mkfs.ext3 against that file, and then mount that file as an ext3 filesystem thanks to the loopback adapter. You won't be able to read that junk from a Windows machine, but you probably won't care, and if you create an 8GB file on a 16GB FAT32 flash disk, you'll still have 8GB of space available for use in Windows -- and Windows will be able to copy the 8GB filesystem file and stuff.
Someone else's explanation: http://nst.sourceforge.net/nst/docs/user/ch04s04.html
I know people hate it, but setting up an auto-running batch script for backup upon plug-in. I've had no issues doing this from Windows to Linux.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
You might try bitpocket (see https://github.com/sickill/bitpocket or http://ku1ik.com/2011/07/18/bitpocket-as-a-dropbox-alternative.html ) or Unison.
- David A. Wheeler (see my Secure Programming HOWTO)
Isn't incremental tar just made for that ?
http://www.gnu.org/software/tar/manual/html_node/Incremental-Dumps.html
Thems fightin words. Perl is for everyone.
find . -newer last_backup_timestamp | cpio -o snapshot$(date +%Y%m%d) && touch last_backup_timestamp
((lambda (x) (x x)) (lambda (x) (x x))) http://www.endpointcomputing.com a scientific approach to custom computing.
Extra extra extra bonus points if you compress the changed files when storing them on the flash drive.
Hint for the bonus question: gzip ;)
Not everything that can be measured matters; Not everything that matters can be measured.
This is the only readworthy comment in this whole thread. Thank you for posting this!
Have 2 different rsync tasks: copy laptop to stick (for when you're on the go) and copy laptop to backup HDD (for when you get home). If your laptop breaks while on the go, you copy the stick to the HDD.
If you're worried about losing all your data at customs or whatever, you need a networked system.
You can't have an 8GB file in a FAT32 filesystem. You'd need exFAT for that, but if you're going to use a filesystem that many existing Windows installations can't read, you might as well format the stick with a proper filesystem in the first place.
hotplugd + duid's and rsync (which you sort of already use) on OpenBSD makes this a cakewalk, the real kind, not the Iraq kind. Or find their equivalents in linux.
Perl is for anyone, sure, but it's certainly not for everyone.
If the author and the user are the same singular individual, and always will be, then sure, perl can be a fun toy and a timesaver.
But if you're mentoring junior sysadmins as part of your succession plan in a collaborative and evolving ecosystem, Perl is pretty much the worst choice available.
bzip is better than gzip if space is at a premium. There are even multi-core versions of bzip2 that are very efficient. You could also look at p7zip. If you want a really efficient compressor, try nanozip as well, although its page says it is still experimental, but it seems to be at the top of several compression benchmarks.
Make full "level 0" dumps at home to the big disk. Make delta "level 1" or incrementing levels dumps to the flash drive. Each level will back up everything that changed since the previous level.
Now, you want the backups at home to be a file copy - there's no reason you can't do that and then do level 1 dump backups on the road - ie, never actually make a level 0. Just do your rsync and then update /etc/dumpdates to reflect your rsync-level-0.
This is exactly what dump was designed to do, and it's going to be a lot easier than hacking rsync to fit.
If you have time to figure out what this guy needs, then you need a job or at least to be paid for doing the analysis. Geesh.
Yes, bash can do that, and I highly recommend learning not only bash but also plain Bourne Shell - still, if you want to create actual backup tool and release it, I'd go with Perl. If you want it to be cross platform and not limited to *nix systems in that, again perl.
If you haven't mastered the *nix tools mentioned, doing this with shell scripts, especially limiting yourself to plain Bourne Shell (Heirloom Bourne Shell is something to look into if your system, like linux distros, has /bin/sh as symlink to bash/dash), and those tools is a *great* learning experience, but another consideration for shell scripting vs. perl is speed and memory use. Consider these for making correct choice.
Also, there can be other reasons against perl and for shell scripting - want to be able to share it between *nix systems that you might not have perl, just for one. Shell scripting can be great for cross platform, but limits you to *nix likes, and often limits you to plain Bourne Shell - one of many things bash has but plain sh has not is hashes, and you can program a way around that, but it will be hackish, slow and resource eating - but it's also awesome experience :)
In capitalist USA corporations control the government.
...also, some people can be jerks about these choices, like this poster replying about perl:
Perl is for programmers that like to pretend they know how to be sysadmins and don't need to share their tools with other sysadmins.
...that is such a load... while with shell scripting combined with *nix tools you can get far, there are limits - scripts are great when used where they best fit, and sometimes the cost of hackish solutions around limitations, costs of taking tools beyond their optimal area, etc. are just too large to justify - except for doing something just for fun. Perl is a serious programming language, and even in sysadmin's scripts perl can sometimes be better than shell scripts (especially if it's not even meant to be shared with others), but there is also a point where a project starts to look more like program than script (not going into debating of the exact difference between those, I know anyone who can program probably knows what I mean), and that's one point where one should consider if shell scripting should give way to some other high level programming language. And as a high level programming language, perl has earned it's place, the above quote has no place in real world.
In capitalist USA corporations control the government.