Ask Slashdot: Asynchronous RAID-1 Free Software Backup For Laptops?
First time accepted submitter ormembar writes "I have a laptop with a 1 TB hard disk. I use rsync to perform my backups (hopefully quite regularly) on an external 1 TB hard disk. But, with such a large hard disk, it takes quite some time to perform backups because rsync scans the whole disk for updates (15 minutes in average). Does it exist somewhere a kind of asynchronous RAID-1 free software that would record in a journal all the changes that I perform on the disk and replay this journal later, when I plug my external hard disk on the laptop? I guess that it would be faster than usual backup solutions (rsync, unison, you name it) that scan the whole partitions every time. Do you feel the same annoyance when backing up laptops?"
Use mdadm -C -b internal to create a bitmap. Detach and readd the mirror at will and it will only sync the difference.
RAID is not backup.
Wouldn't solve his problem. TimeMachine takes considerable time to prep and start a backup before it starts actually doing any work, I'd guess its likely doing the same sort of thing that Rsync, gathering a list of changes.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
How is traversing the whole directory tree with find different from what rsync does?
Running a daemon that lists modified files using inotify might work.
thegodmovie.com - watch it
Are you backing up EVERYTHING on the laptop -- OS and data included? Even if you are only backing up your home directory there is stuff you don't need to backup like the .thumbnails directory which can be quite large. Try using rysnc's exclude option to restrict the backup to only what you care about.
DNA
AKA mrascii
TimeMachine takes about 15 minutes to do the prep work before it starts copying for me, on a 2012 Retina MBP with 16Gb of RAM and only 256GB of disk space ... 64 GB taken by an unbacked up BootCamp part and another 120 or so eaten in Windows VMs that don't get backed up either ... i.e. Its not a slow spinning platter backing up a terabyte of data.
I see no indication of any Journal, it certainly isn't making it faster. Pretty freaking slow actually.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
In this case, it sounds like you want a fast on-demand sync rather than a RAID.
However, you could possibly use dm-raid for this if you're a linux user.
Have the internal disk(s) as a degraded md-raid1 partition. When you connect the backup disk, have it become part of the RAID and the disks should sync up. That said, it likely won't be any faster than rsync, quite possibly slower as it'll have to go over the entire volume.
Alternate solutions: /home is a combination of /mnt/home-ro (ro) and /mnt/home-rw (rw, COW filesystem). When external media is connected, /mnt/home-rw is synced to external media, then back over /mnt/home-ro
* Have a local folder that does daily syncs/backups. Move those to the external storage when it's connected.
CAVEATS: Takes space until the external disk is available
* Use a differential filesystem, or maybe something like a COW (copy-on-write) filesystem. Have the COW system sync over to the backup disk (when connected) and then merge it into the main filesystem tree after sync
For example,
The OP doesn't mention which OS he's on - the tools he mentions both run across multiple OS's. Would be helpful to know. I know as a group we probably assume some form of Linux but..... I use MS Home Server at the house to back up my family's multiple Windows machines. Runs on crappy hardware, does incrementals on a schedule, allows file level or bare metal restore, keeps daily/weekly/fulls as long as I ask it to. I know we aren't a Windows friendly crowd but this product does exactly what it promises and does it pretty well.
"Would you, could you, with a goat?" Dr Seuss
This doesn't match my experience. Time Machine fires up in the background, does its thing, and then stops shortly thereafter. Certainly much less than 15 minutes. More like five or less. This is on a new-ish iMac with a 3TB internal drive.
It wouldn't even be noticeable were it not for the fact that I can hear the TM destination drive (sitting on a shelf behind me) spin up once an hour.
People who say "sheeple" have about as much sophistication as an AOL user, and in fact are probably actually AOL users.
CrashPlan is free, but not open, and I think will do everything you need. You can backupto an external disk, over the network to one of your own machines, or back up to a freind who also runs it. Great key based encryption support. If you want, you can pay them for offsite backups (which is a great deal as well, in my opinion). It's cross-platform, and easy to use. Never underestimate the benefits of off-site backups.
It's different in that you don't have to sit and wait for it and doing the backup will consist of only the actual copying
I suggest you look again at rsync. /proc and there are some directories such as /tmp and /run which you may not care about).
- It compares changed files and copies only what has been changed. Changed files are identified by differing mtimes (by default).
- rsync can also handle removed files with the --delete option.
- It can do the entire filesystem tree in a single command
- There are filter options so you can include/exclude what paths to copy (eg you don't want to copy
Windows Backup (since Vista) use Volume Shadow Copy (VSS) to do block level reverse incremental backup. I.e. it uses the journaling file system to track changed Blocks and only copies over the changed Blocks.
Not only that, it also backs up to a virtual harddisk file (VHD) which you can attach (Mount) as a seperately. This file system will hold the complete history, i.e. you can use the "previous versions" feature to go back to a specific backup of a directory or file.
Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
This is my current experience with mine too. However during the prep stage it is making room on my time machine drive to receive the changes. Consolidating the older files will take time.
When my drive was new and had plenty of space, the prep stage was much shorter.
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Holy cow people, your missing the OP point. It's taking 15 minutes to SCAN the 1TB drive.
I've run into the same problem on windows and Linux. Especially for remote rsync updates on Linux on slow wireless connections. It's not the 1TB that kills since I can read 4TB drives with hundreds of movies in seconds. It's the amount of files that kill performance.
My solution on windows is to take some of the directories with 10,000 files and put them into an archive (think clipart directories). Zip, Truecrypt, tar, whatever. This speeds up reading the sub-directories immensely. Obviously, this only works for directories that are not accessed frequently. Also, FAT32 is much faster on 3000+ files in a directory than NTFS is. Most of my truecrypt volumes with LOTS of files are using FAT32 just because of the directory reading speed.
On Linux systems, I just run rsync on SUB-directories. I run the frequently accessed ones more often and the less-accessed directories less often. Simple, No. My rsyncs are all across the wire, so I need the speed. Plus some users are on cell-phone wireless plans, so need to minimize data usage.
Religion and science are both 90% crap..but that doesn't negate the other 10%.
Well, of course I goofed, it's not that easy (well it is, read on). A snapshot keeps track of what has changed, yes, but it records not the new state, but the old state. What you want to transfer over is the new state. So you can use the snapshot for the location of changed state (for its metadata only), and the parent volume for the actual state.
That's precisely what lvmsync does. That's the tool you want to do what I said above, only that it'll actually work :)
A successful API design takes a mixture of software design and pedagogy.
You're holding it wrong. ;)
rsync 2.x was horribly slow as it would scan the entire source looking for changed files, build a list of files, and then (once the initial scan was complete) would start to transfer data to the destination.
rsync 3.x starts building the list of changed files, and starts transferring data right away.
Unless you are changing a tonne of files between each rsync, it shouldn't take more than a few minutes using rsync 3.x to backup a 1 TB drive. Unless it's an uber-slow PoS drive, of course. :)
We use rsync to backup all our remote school servers. Very rarely does a single server backup take more than 30 minutes, and that's for 4 TB of storage using 500 GB drives (generally only a few GB of changed data). And that's across horrible ADSL links with only 0.768 Mbps upload speeds!
Going disk-to-disk should be even faster.
two pools, internalPool, externalPool
use ZFS send and receive to migrate your data from internal to external, you and do whole fs or incremental if you keep a couple of snaps local on your internal disk, this can get excessive if you have a lot of delta or you want a long time.
http://docs.oracle.com/cd/E18752_01/html/819-5461/gbchx.html
of course you will need a system that can use ZFS, there are more options for that than time machine, its block level and its fast, and it doesn't depend on just one device, you can have multiple devices (I like to keep some of my data at work, why? my backup solution is in the same house that would burn, if it burned...)
Unix, an obscure operating system developed by bored researchers in an attempt to get a better game playing experience.
Btrfs send/receive should possible be doing the trick. After first cloning the disk and before every subsequent transfer create a reference-snapshot on the laptop and delete the previous one after the transfer.
$ btrfs subvolume snapshot /mnt/data/orig /mnt/data/backup43 /mnt/data/backup42 /mnt/data/backup43 | btrfs receive /mnt/backupdata /mnt/data/backup42
$ btrfs send -p
$ btrfs subvolume delete
I havn't tried this for myself, so the necessary disclaimer: this may eat your disk or kill a kitten ;-)
Just curious, why do you require access time? I set 'noatime' on all partitions.