Slashdot Mirror


Ask Slashdot: Asynchronous RAID-1 Free Software Backup For Laptops?

First time accepted submitter ormembar writes "I have a laptop with a 1 TB hard disk. I use rsync to perform my backups (hopefully quite regularly) on an external 1 TB hard disk. But, with such a large hard disk, it takes quite some time to perform backups because rsync scans the whole disk for updates (15 minutes in average). Does it exist somewhere a kind of asynchronous RAID-1 free software that would record in a journal all the changes that I perform on the disk and replay this journal later, when I plug my external hard disk on the laptop? I guess that it would be faster than usual backup solutions (rsync, unison, you name it) that scan the whole partitions every time. Do you feel the same annoyance when backing up laptops?"

227 comments

  1. find & diff by Spazmania · · Score: 0

    You can find | sort | diff ahead of time (maybe in the background) and then constrain the rsync to only the files recorded to have changed.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    1. Re:find & diff by rrohbeck · · Score: 3, Insightful

      How is traversing the whole directory tree with find different from what rsync does?
      Running a daemon that lists modified files using inotify might work.

    2. Re:find & diff by ormembar · · Score: 1

      OK. But you still scan the whole disk with a find command.

    3. Re:find & diff by iserlohn · · Score: 1

      That will still take ages...

      Why not give Bittorrent Sync a go? It's a decentralized "dropbox" on steriods!

      http://labs.bittorrent.com/experiments/sync.html

    4. Re:find & diff by emt377 · · Score: 0

      How is traversing the whole directory tree with find different from what rsync does?

      It's different in that you don't have to sit and wait for it and doing the backup will consist of only the actual copying. That said, updatedb already scans (for locate), so modifying this to spit out a list of actual state changes (atime,ctime,mtime) since the last run, and using this to construct one or more rsync commands might be the easiest approach. Updatedb also notices when things are removed, permitting these to be removed from the clone as well (and perhaps moved into an archive for later time travel, making it useful as an actual backup).

    5. Re:find & diff by Anonymous Coward · · Score: 0

      "Ahead of time" is the key part.

    6. Re:find & diff by Anonymous Coward · · Score: 5, Informative

      It's different in that you don't have to sit and wait for it and doing the backup will consist of only the actual copying

      I suggest you look again at rsync.
        - It compares changed files and copies only what has been changed. Changed files are identified by differing mtimes (by default).
        - rsync can also handle removed files with the --delete option.
        - It can do the entire filesystem tree in a single command
        - There are filter options so you can include/exclude what paths to copy (eg you don't want to copy /proc and there are some directories such as /tmp and /run which you may not care about).

    7. Re:find & diff by Anonymous Coward · · Score: 1

      and perhaps moved into an archive for later time travel, making it useful as an actual backup

      A better way to handle that is to use a copy-on-write filesystem and taking snapshots after each backup. That way you get the tree in each snapshot as it was at the time, and without duplicating space.

    8. Re:find & diff by Desler · · Score: 2

      Did you even read the title of the submission. He wants FOSS.

    9. Re:find & diff by rrohbeck · · Score: 2

      Rsync copies only changed files. The time-consuming part is reading all directories in the directory tree.

    10. Re:find & diff by SuperTechnoNerd · · Score: 2

      Exactly. I think rsync will do nicely. I use it for nightly backups and I rotate through 5 increments. The oldest goes to the bit bucket. Note the copy link -l option.
      A snippet:


      [more rotations above]
      if [ -d $BACKUP_DEST/$(basename $i)/increment.0 ]; then
      cp -al $BACKUP_DEST/$(basename $i)/increment.0 $BACKUP_DEST/$(basename $i)/increment.1
      fi

      rsync -av --delete --exclude-from="$EXCLUDE_LIST" $i/ $BACKUP_DEST/$(basename $i)/increment.0/
      touch $BACKUP_DEST/$(basename $i)/increment.0
      done
      echo "Backup Complete on "$(date)

    11. Re:find & diff by brodock · · Score: 1

      BtSync is not officially opensource, but authors are considering opening it: http://forum.bittorrent.com/topic/17782-bittorrent-sync-faq-unofficial/

    12. Re:find & diff by leonardluen · · Score: 1

      when has a typical user ever planned ahead?

    13. Re:find & diff by egarland · · Score: 1

      Stating the files on filesystems that require that is usually orders of magnitude more time consuming than the actual directory reading. IMO, filesystems should store mtime in the directory entry and readdir calls should return it.

      --
      set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
    14. Re:find & diff by ploppy · · Score: 1

      Do you know anything about hard-links? Hint, you have multiple directory entries pointing to the same file (inode).

    15. Re:find & diff by Anonymous Coward · · Score: 0

      I think you missed the key point: rsync has to scan the whole tree to figure out what has changed and that takes a considerable amount of time. Having some kind of file watcher (in inotify, whatever) to feed the list of changed/deleted files to rsync would be a ++win!

    16. Re:find & diff by Dadoo · · Score: 1

      I suggest you look again at rsync

      However, he'll want to keep in mind that, depending on his environment, he may have some other issues. For instance, I'd like to use it at work, but I can't because file access times are important to us, and rsync changes the access times on the source files. Last I checked, there was no option to make it stop that, so I'm stuck with tar.

      --
      Sit, Ubuntu, sit. Good dog.
    17. Re:find & diff by hobarrera · · Score: 2

      Why don't you try "--link-dest". It's pseudo-incremental, that is: unchanged files are hardlinked to the previous backup, meaning that there's no space or bandwidth consumption for unchanged files, but each day's replica is a full backup.

    18. Re:find & diff by Anonymous Coward · · Score: 0

      While all of that is true, it's not relevant to the problem. rsync must still look at the filesize and timestamp of each file that it's considering copying in order to make a decision whether to copy it or not. That takes time. Sure, it's fast to stat a file, but when you multiply that times potentially millions of files, it can take a while. In the original poster's problem it takes 15 minutes.

      emt377 and Spazmania are on the right track. Scan the filesystem for files that have changed since the last backup in the background as a cronjob. That work can be automated. The scan can exclude folders or whatever just like what you'd do for rsync. Then have the results of the scan put the filenames in a file. Then, when you're ready to perform the backup, the list of files already exists because the scan of files that have changed happened earlier. You can run rsync and use the list of files as an include filter to tell it what to copy to the destination. That way rsync can focus 100% on copying without incurring any delays from checking which files have changed or not.

    19. Re:find & diff by Anonymous Coward · · Score: 0

      Last I checked, there was no option to make it stop that, so I'm stuck with tar.

      For good reason - it's the filesystem itself that updates the atime, and if the file is modified rsync reads the file. So the filesystem sees it as accessed. That's unavoidable (directly).

      A workaround is to create a read-only filesystem snapshot of the source and rsync from the snapshot. That way it doesn't affect the live files.

    20. Re:find & diff by Anonymous Coward · · Score: 0

      That works well, though personally I still think copy-on-write filesystem snapshots are better. They mean if you accidentally edit any of the files you haven't also modified the file in every other backup you have.

    21. Re:find & diff by Anonymous Coward · · Score: 0

      It's not open. No thanks!

    22. Re:find & diff by Dadoo · · Score: 1

      For good reason - it's the filesystem itself that updates the atime, and if the file is modified rsync reads the file. So the filesystem sees it as accessed. That's unavoidable (directly).

      Tar does it. Why can't rsync? Sorry, but that makes it pretty much useless for backup (in all the cases I have to work with), and most the other IT people with whom I've discussed this agree.

      --
      Sit, Ubuntu, sit. Good dog.
    23. Re:find & diff by nullchar · · Score: 3, Interesting

      Just curious, why do you require access time? I set 'noatime' on all partitions.

    24. Re:find & diff by egarland · · Score: 1

      IMO, wherever hard links aren't, inodes should be inlined into the directory entries and read and stored in cache whenever the directory entry is read. Hard links make up a small percentage of files, especially in typical large-scale storage systems. Inlining inodes should solve a lot of unixy performance issues, but retrofitting arbitrary placement of inodes into filesystem code is prohibitively difficult because unix expects inodes to be accessible independant from a directory entry. A nice middle ground might be making inlining inodes a filesystem creation option which disables hardlinking on that filesystem. This is obviously doable since NTFS can be mounted on Linux.

      Another nice option might be to add a separate piece of metadata which would be changed whenever an inode or the file it's attached to change that's attached to the directory entry. This way, a scan of a filesystem for changes could be quickly thorough. Essentially a directory stored mtime.

      --
      set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
  2. TimeMachine by Anonymous Coward · · Score: 1, Insightful

    Just buy a mac :-)

    1. Re:TimeMachine by BitZtream · · Score: 4, Insightful

      Wouldn't solve his problem. TimeMachine takes considerable time to prep and start a backup before it starts actually doing any work, I'd guess its likely doing the same sort of thing that Rsync, gathering a list of changes.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    2. Re:TimeMachine by Anonymous Coward · · Score: 0

      Additionally, TimeMachine is prone to the occasional "Time Machine must create a new backup for you", effectively destroy all previous file history. If there's one thing I expect from a data duplication system is it does not corrupt itself, barring hardware failures.

    3. Re:TimeMachine by omnichad · · Score: 1

      Does it destroy the old backup or create a new folder? I realize you might want to delete the old backup to save space when it does this, but you wouldn't have to do it.

      I wish there was something more like Time Machine for Windows and Linux - especially the part where there's dated directories with hard links back to the original revision of the files.

    4. Re:TimeMachine by omnichad · · Score: 1

      Part of why Time Machine takes so long is that it has to create hard links to every file that hasn't changed. If you look in your backup folder, there's a list of dated directories. Each one is a complete image of your hard drive at the time of the backup. The deduplication comes from hard links, so you can delete older backups directly from the filesystem without messing anything up. While it does take time, I really wish more backup systems were set up that way.

    5. Re: TimeMachine by robpow · · Score: 1

      This +100 ... I honestly has never spared a second thought at having to manage a backup regime since I switched to Macs. A Time Capsule (you can get 3rd party ones as well as Apple ones) backs up the Macs in the house. If you have two TCs on the network Time Machine will even alternate between them so you can have two backups in different parts of the house.

    6. Re:TimeMachine by multimediavt · · Score: 1

      Wouldn't solve his problem. TimeMachine takes considerable time to prep and start a backup before it starts actually doing any work, I'd guess its likely doing the same sort of thing that Rsync, gathering a list of changes.

      AFAIK, Time Machine is a GUI frontend for rsync. Watch Activity Monitor.app when it fires up. That will tell you. I don't use Time Machine, personally, I know how to use rsync.

    7. Re:TimeMachine by Smurf · · Score: 2

      AFAIK, Time Machine is a GUI frontend for rsync. Watch Activity Monitor.app when it fires up. That will tell you. I don't use Time Machine, personally, I know how to use rsync.

      No, Time Machine is NOT a frontend for rsync. Yes, you can achieve something that resembles Time Machine by using the --link-dest option.

      I use rsync --link-dest regularly through a script called tym ("Time rsYnc Machine") to backup stuff on systems at work for which I don't have admin privileges to configure Time Machine (oh, I haven't done it in a few weeks, I should do it asap!). So I know it has some drawbacks compared to TM, the main two being:

      • - It always traverses the whole directory hierarchy looking for changes. Time Machine doesn't always do that.
      • - It always creates a hard link for every file being backed up that has not changed. Hard links are very inexpensive, but still it takes a considerable amount of time if you need to create over a million hard links *every* time you back up.

      If you read John Siracusa's excellent OS X Leopard review you will find that Time Machine avoids traversing the whole hierarchy because it taps into FSEvents which keeps a record of the files that have been modified since the last backup. TM will only do a full, "deep" traversing if it decides that the record is stale (not sure how it does that) and only then the backup takes an inordinate amount of time.

      In Siracusa's review you will also find that Time Machine creates hard links to directories for which none of the content has changed since the last backup (as odd as that may sound) thus avoiding the creation of the possibly hundreds of thousands of hard links for all the files inside them.

    8. Re:TimeMachine by RulerOf · · Score: 2

      I wish there was something more like Time Machine for Windows and Linux - especially the part where there's dated directories with hard links back to the original revision of the files.

      As far as I'm aware, the "File History" feature in Windows 8 will do this, and it's much more granular than what was sort of "built in" by the "Previous Versions" tab on a file or folder's properties. However with it set up properly, even the "Previous Versions" feature that dates back to at least Vista (if not XP SP3, I don't recall off hand) will provide you with exactly what you're asking for though: browseable point-in-time snapshots of your files/folders.

      One of the things that piqued my interest in MS Data Protection Manager was that it would keep 15-minute snapshots of "covered" systems, both servers and workstations, and those snapshot backups snapped directly into the "previous versions" tab on the files. It allowed our users to recover old copies of things often enough at the site we deployed it at. It was still a pain in the ass product though... :P

      --
      Boot Windows, Linux, and ESX over the network for free.
    9. Re:TimeMachine by Smurf · · Score: 1

      Wouldn't solve his problem. TimeMachine takes considerable time to prep and start a backup before it starts actually doing any work, I'd guess its likely doing the same sort of thing that Rsync, gathering a list of changes.

      No, it doesn't. It only takes a considerable amount of time to prep if you haven't backed up in many days. If you have backed up recently the prep time is quite short. And if you use the default configuration (in which it backs up every hour) the prep time is almost nil.

      If you read John Siracusa's excellent OS X Leopard review you will find that Time Machine avoids traversing the whole hierarchy because it taps into FSEvents which keeps a record of the files that have been modified since the last backup.

    10. Re:TimeMachine by darkstar101 · · Score: 1

      rsync can do daily hard linked backups:

      create the initial backup:
      rsync -a /directory/to/backup "${backup_vol}/$(date +"%Y%m%d")"

      run rsync in a nightly cron job:
      rsync -a --link-dest="${backup_vol}/$(date -d yesterday +"%Y%m%d")" /directory/to/backup "${backup_vol}/$(date +"%Y%m%d")"

      or if you want to just link to the last backup:
      rsync -a --link-dest="$(ls -d "${backup_vol}"/* | tail -1)" /directory/to/backup "${backup_vol}/$(date +"%Y%m%d")"

    11. Re:TimeMachine by Smurf · · Score: 1

      Part of why Time Machine takes so long is that it has to create hard links to every file that hasn't changed.

      No, it does not. If you read John Siracusa's excellent OS X Leopard review... oh, fuck it. Just read my reply to a sister comment of yours.

      tl; dr: FSEvents and hard links to directories.

    12. Re:TimeMachine by hedwards · · Score: 1

      Not really, if the file is still on the same disk, then it's not backed up. This is nice for times when you fat finger something, but it's not going to protect you from HDD problems or other hardware failure.

      Personally, I use crashplan to backup to an external HDD as well as to their servers. When I need a restore, I pretty much always use the local copy, but I'm still protected in case that becomes unavailable.

    13. Re:TimeMachine by omnichad · · Score: 1

      Time Machine uses an external HDD.

    14. Re:TimeMachine by Zenin · · Score: 1

      Check out the --link-dest option of rsync, it does exactly as you're describing, hard linking to unchanged files.

      The dated directories you'll have to code yourself with a trivial amount of shell scripting.

      --
      My /. uid is better then your /. uid
    15. Re:TimeMachine by FLaSh+SWT · · Score: 1

      Not true in my experience. I can plug my Thunderbolt hard drive into my MacBook Pro and Time Machine is done in just a few minutes; even when I've got about 30 GB of new photos on the laptop and haven't run the backup in over a week. It is very fast

    16. Re:TimeMachine by hedwards · · Score: 1

      Not with hardlinks it doesn't.

    17. Re:TimeMachine by omnichad · · Score: 1

      I have an external hard drive named TimeMachine. In it, is a folder named Backups.backupdb. In that is a folder named the same as my computer. Inside that is a list of folders named after datetimes. Each one of the 51 dated full hard drive backup folders contains the full contents of my hard drive. Yes, it uses file AND directory hard links to link to files that are unchanged. Each backup is almost 100GB stored on a 250GB drive that still has 92GB free. That would not be possible without hard link deduplication.

  3. mdadm can do this by Fruit · · Score: 5, Informative

    Use mdadm -C -b internal to create a bitmap. Detach and readd the mirror at will and it will only sync the difference.

    1. Re:mdadm can do this by Anonymous Coward · · Score: 1

      Use mdadm -C -b internal to create a bitmap. Detach and readd the mirror at will and it will only sync the difference.

      I have enough Linux experience that I've used mdadm from the command line to make RAID1 partitions, but I still don't understand what you posted. Could someone clarify or post a link explaining that?

    2. Re:mdadm can do this by Anonymous Coward · · Score: 5, Informative

      Effectively you create a RAID 1 mirror. When you remove the external drive the RAID degrades. The raid bitmap keeps track of changes. When you plug the external drive in you just have to tell it to bring it up to date. Which syncs the only changes.

    3. Re:mdadm can do this by Anonymous Coward · · Score: 1

      I think he's saying to create a mirrored array, create a bitmap on it to record what's out of sync, and then degrade the array by removing one of the disks.

      Then to backup, you re-add the removed disk, and it should only copy over the parts that have changed.

      I can't say I've ever added a networked disk to a raid array, though. I'm not confident it's a good idea.

    4. Re:mdadm can do this by kasperd · · Score: 2

      Use mdadm -C -b internal to create a bitmap. Detach and readd the mirror at will and it will only sync the difference.

      I am going to test this on my next laptop, or if I decide to upgrade my current with an SSD some day.

      Meanwhile, I do have a couple of questions. How automated is this going to be? Will it automatically start to sync, once the USB/eSata disk is connected?

      Can I safely attach that disk to another computer for reading? I am worried such operation might corrupt data, even if I don't write anything. If I connect the external disk to a workstation, do I risk that the RAID layer will declare the SSD to be dead and record this fact on the external disk? Is reading from the external disk going to perform a journal replay and thereby perform some unintended writes? Is the raid layer going to increase the event counter on the external disk and potentially run past the SSD or end up at the same event counter due to the same number of cycles, but on different machines?

      --

      Do you care about the security of your wireless mouse?
    5. Re:mdadm can do this by bitMonster · · Score: 3, Informative

      Actually, that is done for HA pairs. You can use nbd (network block device) and then create a RAID-1 pair across the local disk and the nbd. There are better alternatives now (such as drbd), but I'm not aware of any problem with nbd+RAID. Jeff

    6. Re:mdadm can do this by Anonymous Coward · · Score: 0

      As the external drive is commonly slower you will find also useful:
      echo writemostly >/sys/devices/virtual/block/mdXXX/md/dev-YYY/state
        - device will only be subject to read requests if there are no other options. This applies only to raid1 arrays.

    7. Re:mdadm can do this by ormembar · · Score: 2

      What will happen if the laptop hard disk fail? Let's say the laptop harddisk is disk0 in the RAID-1 configuration. The external hard disk is disk1. The degraded RAID-1 is due to the presence of disk0, and the absence of disk1. If disk0 fails for some reason, can I put a new "empty" disk0 in the laptop and mirror disk1 to disk0? I am not sure how to do that with mdadm.

    8. Re:mdadm can do this by Anonymous Coward · · Score: 0

      Wouldn't disk1 be the de facto boot disk since it is the only one left standing? How to boot from it is an exercise left to the reader.

    9. Re:mdadm can do this by fnj · · Score: 2

      Seriously? If the drive in the laptop fails, it has failed in any scenario. It doesn't matter what strategy you use to back up. You are looking at installing a new one and copying the backup in any event. In any backup scenario you have to do an added trick with grub to copy the boot sector to the second drive. Then all you have to do to recover is pop a new drive in the laptop and dd the backup drive to the new drive, boot sector, partition table, file system, files and all.

    10. Re:mdadm can do this by Fruit · · Score: 4, Interesting

      If you boot the laptop with disk1 and a blank disk, mdadm will see disk1 as the raidset, in degraded mode. Just add the blank disk just as you would if a disk failed in a regular setup. Do test this beforehand. :)

    11. Re:mdadm can do this by Anonymous Coward · · Score: 0

      You would want to manually detach the external disk from the array before removing it from the system.

      You might be able to assemble the raid array using just the external drive and the --readonly option to mdadm, but even then I would not count on it. You might have to force add this back to the laptop's array if it has been marked as active since being detached.

      Note, a 'read only' mount of ext3/4 will not be truly read only... it will still try to modify superblock info such as last mount time even though it will prevent writes to the filesystem content.

    12. Re:mdadm can do this by Anonymous Coward · · Score: 1

      If disk0 fails for some reason, can I put a new "empty" disk0 in the laptop and mirror disk1 to disk0? I am not sure how to do that with mdadm.

      Yes, the RAID would be rather useless if you couldn't repair it.

      You simply boot off the external disk and then use the mdadm commands to add the new drive in the laptop to the existing array. Remove the old drive from the array while you're at it since it'll never be used again. It'll be detected as a brand new disk, out of date and sync over. Once synced you can boot off the drive in the laptop.

    13. Re:mdadm can do this by Anonymous Coward · · Score: 0

      dd would not be best in this case, it'd be better to boot off the external drive and add the new drive to the array.

      Reason being dd will copy the raid superblock containing the array member IDs, which will rather confuse things if both are connected at the same time.

    14. Re:mdadm can do this by Anonymous Coward · · Score: 0

      For bonus points have 2 external disks in the array and alternate the one you sync to. Then if the laptop drive fails during the backup it's only corrupted the data on one of the drives, leaving you the other as a recent backup (which you can then sync to the other external drive and a replacement internal one).

    15. Re:mdadm can do this by GigaplexNZ · · Score: 1

      I doubt alternating disks would work if using a bitmap to only sync incrementally.

    16. Re:mdadm can do this by nullchar · · Score: 1

      Agreed, it would be better to boot from another device (usb) and then use mdadm to rebuild the array (sync disk1 to the replaced disk0).

    17. Re:mdadm can do this by Anonymous Coward · · Score: 0

      ok. Ill bite.
      I used the rsync scripts here to do a sync,
      and it worked, and it also synced all the OS 9 files too.

      So I modified it to work on different drives on different days.

      Just so you know, your idea was absolutely brilliant. absolutely.

  4. Time Machine by Roger+W+Moore · · Score: 0, Offtopic

    Time Machine on a Mac laptop does exactly this - it uses a journal of filesystem changes to update only the files it needs to. While this is probably not much use to you since I'm guessing that if you had a Mac you would not be asking this question it would be a system to look at if there is no FOSS alternative and you want to code your own.

    1. Re:Time Machine by BitZtream · · Score: 3, Informative

      TimeMachine takes about 15 minutes to do the prep work before it starts copying for me, on a 2012 Retina MBP with 16Gb of RAM and only 256GB of disk space ... 64 GB taken by an unbacked up BootCamp part and another 120 or so eaten in Windows VMs that don't get backed up either ... i.e. Its not a slow spinning platter backing up a terabyte of data.

      I see no indication of any Journal, it certainly isn't making it faster. Pretty freaking slow actually.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    2. Re:Time Machine by h4rr4r · · Score: 1

      Considering how slow it is I doubt it.

      I sometimes use a Mac, I still prefer rsnapshot over some backup that is likely hard to deal with if you don't have another mac.

    3. Re:Time Machine by zieroh · · Score: 3, Interesting

      This doesn't match my experience. Time Machine fires up in the background, does its thing, and then stops shortly thereafter. Certainly much less than 15 minutes. More like five or less. This is on a new-ish iMac with a 3TB internal drive.

      It wouldn't even be noticeable were it not for the fact that I can hear the TM destination drive (sitting on a shelf behind me) spin up once an hour.

      --
      People who say "sheeple" have about as much sophistication as an AOL user, and in fact are probably actually AOL users.
    4. Re:Time Machine by zieroh · · Score: 1

      Sorry, internal drive is 2TB. Time Machine destination is 3TB.

      --
      People who say "sheeple" have about as much sophistication as an AOL user, and in fact are probably actually AOL users.
    5. Re:Time Machine by Anonymous Coward · · Score: 0

      i have the same machine with a 2TB disk and back up to a western digital mybook... it only backs up to changes and takes less than 15 seconds. you're either a liar or an idiot.

      #CHOOSE

    6. Re:Time Machine by Bill_the_Engineer · · Score: 3, Interesting

      This is my current experience with mine too. However during the prep stage it is making room on my time machine drive to receive the changes. Consolidating the older files will take time.

      When my drive was new and had plenty of space, the prep stage was much shorter.

      --
      These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
    7. Re:Time Machine by killmofasta · · Score: 1

      But Time Machine requires you run a Mac that can run Mac OS X 10.5, and is useless with classic,
      even on a triple boot. It does not work on HFS+ volumes that have been used by 10.4, or OS 9.

      Time Machine is useless to me and my client...so your primise is faulty.

      No, you mean buy a RECENT Macintosh.

    8. Re:Time Machine by sl4shd0rk · · Score: 1

      TimeMachine takes about 15 minutes to do the prep work

      Yes, because naturally he's using a ma. He must have certainly been in a Starbucks or Panera when he posted as well. Around the Bay area nonetheless.

      --
      Join the Slashcott! Feb 10 thru Feb 17!
    9. Re:Time Machine by robably · · Score: 1

      Eh? Over 15 minutes? Are you backing up to an AirPort Disk rather than a wired disk? The bottleneck there would be the wireless, not your computer.

      I backup a 2012 MacBook Air every evening to a 1TB 5400RPM USB drive - plug it in, it detects it, and the backup is done in 3 minutes.

    10. Re:Time Machine by DigiShaman · · Score: 1

      TimeMachine will backup a running VM. It just has to backup the entire VM each time whereas with the OSX environment, only the delta changes transfer after your original full backup. To address the underlaying performance issue however, just replace the internal with an SSD. It's that bloody simple of a solution. I can run my VMs, the rest of my Mac applications and run TimeMachine all at the same time. Previously I couldn't do this with a standard HDD due to a saturation in disk I/O (hung with a spinning rainbow wheel mouse cursor) .

      --
      Life is not for the lazy.
    11. Re:Time Machine by wonkey_monkey · · Score: 2

      Read from the top level and you'll see that no-one's made the assumption that he's using a Mac. This has simply become a side discussion on TM.

      --
      systemd is Roko's Basilisk.
    12. Re:Time Machine by Blakey+Rat · · Score: 2

      TM could be doing 15 minutes of work on your own HD before it bothers spinning-up the external, you realize.

      You may be correct, but your evidence doesn't match your assertion.

    13. Re:Time Machine by greg1104 · · Score: 1

      Time Machine keeps an event store journal of changes, the process is described at How Time Machine Works its Magic. What you're describing might be a "deep scan" pass. It's also possible you're touching a lot of directories with updates, which makes the optimization they apply not as useful.

      There are cases where the event store makes Time Machine backups nearly instant, which is never the case for rsync based approaches being complained about here.

    14. Re:Time Machine by Alan+Shutko · · Score: 1

      Yes, something more recent than 2004.

      What are you doing that means you need to keep OS 9 and machines older than 9 years running?

      (FWIW, I don't think the OP has this problem, if he's got a laptop with a 1TB internal disk.)

    15. Re:Time Machine by Roger+W+Moore · · Score: 1

      TimeMachine takes about 15 minutes to do the prep work before it starts copying for me

      Well with my 2012 non-Retina MBP with a 1TB disk it only takes a few minutes at most. I guess those extra screen pixels must really slow it down! ;-)

    16. Re:Time Machine by omnichad · · Score: 1

      I think the slowdown is from the hard links it creates in the backup directory on the external drive. That takes a lot of time. Every file that's changed gets written to the backup directory as a new file. Every file that hasn't changed gets written as a hard link to the inode of the original backup of that file. So if you have 200,000 files, and 10 of them changed, you still have to write 200,000 entries for the backup.

      Still - I don't ever see 15 minutes. I'm curious what's causing your problem and wonder if just blowing away your backup and starting over wouldn't help.

    17. Re:Time Machine by omnichad · · Score: 1

      The Airports are really flaky for TM backup. The bottleneck I've seen with them is that they just quit working and need to be reset. Even over Ethernet.

    18. Re:Time Machine by omnichad · · Score: 1

      As a mac owner, I'm sure you realize that mention of it being Retina is only related to Apple not using model numbers in almost all of their documentation and sales pages (except maybe in the fine print).

    19. Re:Time Machine by omnichad · · Score: 2

      Welcome to the future. We can even use variable-width fonts now.

    20. Re:Time Machine by multimediavt · · Score: 1

      TimeMachine takes about 15 minutes to do the prep work before it starts copying for me, on a 2012 Retina MBP with 16Gb of RAM and only 256GB of disk space ... 64 GB taken by an unbacked up BootCamp part and another 120 or so eaten in Windows VMs that don't get backed up either ... i.e. Its not a slow spinning platter backing up a terabyte of data.

      I see no indication of any Journal, it certainly isn't making it faster. Pretty freaking slow actually.

      To what are you backing up and how much data do you generate in a backup interval? It sounds like you're backing up to a network storage device on a wireless network or just a SLOW network, OR you are generating 100s of megabytes if not gigabytes of data during a backup interval. Basically, something is either very wrong or you are a data hog for an SSD equipped machine to backup that slowly.

    21. Re:Time Machine by Smurf · · Score: 1

      I bet you don't back up very frequently, and Time Machine determines that the record of files modified kept by FSEvents is stale. That would force it to do a deep scan, i.e., it traverses the whole directory hierarchy to figure out what has changed, much like rsync does.

      If you back up every couple of days the whole backup including prep time should take under a couple of minutes. That's particularly true if you keep the default functionality of Time Machine (that is, backing up every hour).

    22. Re:Time Machine by Smurf · · Score: 1

      No, when zieroh says "Time Machine fires up in the background, does its thing, and then stops shortly thereafter" he is talking about the Time Machine icon spinning around in the menu bar. That will happen throughout the whole back up process, including the prep.

      I'm pretty sure that the difference is that zieroh back up very frequently, maybe using the default functionality (i.e., backup every hour), while BitZtream is more like most of us and backups every few weeks (or when he decides he has accumulated so many changes that losing them would really be painful).

    23. Re:Time Machine by Smurf · · Score: 1

      No, it does not. If you read John Siracusa's excellent OS X Leopard review... oh, wait, you are the same guy. Nevermind.

    24. Re:Time Machine by dbIII · · Score: 1

      With respect, there are machines running OS X where it's not worth putting recent versions that have time machine. Not all Macs have Intel.

    25. Re:Time Machine by Anonymous Coward · · Score: 0

      Or he's got more than 10 files on his hard drive. Idiot.

    26. Re:Time Machine by Anonymous Coward · · Score: 0

      As an English speaker, I'm sure you realize that the mention of extra screen pixels was meant as a joke, something that was reinforced by the use of the emoticon ";-)".

    27. Re:Time Machine by Anonymous Coward · · Score: 0

      I've found that as well. I mainly use it for NAS disk alone.

  5. DRBD by Anonymous Coward · · Score: 0

    Take a look at DRBD.

    1. Re:DRBD by Desler · · Score: 1

      But what about Dr. Feelgood?

    2. Re:DRBD by greg1104 · · Score: 1

      Specifically how DRBD handles recovery after an outage of the replication network. The situations where the disk isn't plugged in will look just like the network outage scenario DRBD handles. I'm not sure whether this will be more or less efficiency than the mdadm bitmap approach outlined above, but those are the two main ways people do this specific operation.

  6. Obligatory by Anonymous Coward · · Score: 5, Informative

    RAID is not backup.

    1. Re:Obligatory by XanC · · Score: 5, Informative

      True. I'd recommend he check out rdiff-backup, which keeps snapshots of previous syncs. Fantastic tool.

    2. Re:Obligatory by hawguy · · Score: 2

      RAID is not backup.

      It is in this situation since he wants to mirror to an external disk , then break the mirror and unplug the disk.

      It's no worse than if he does "rsync --delete" to the backup medium. (well ok, slightly worse since if the mirror fails in the middle, the backup disk is left in an inconsistent state and could be unreadable, but the rsync would also leave an unknown number of files/folders unsynced, so it's not a perfect backup itself)

      As long as you have more than one backup disk, then a mirror is as safe as rsync. There may be better solutions, but either backup solution will let you recover your system from the backup disk if there's a failure of the primary system.

      Back in the day (before I could make filesystem level or SAN level snapshots) that used to be how I did backups of a large database system (where "large" was 15GB, which tells you how long ago it was). I'd mirror the production system disks to a separate set of disks on the live system (the disks were already mirrored, so this was a "third mirror"), after the mirror was complete (which took most of the night) I'd quiesce the database and filesystem in the morning, break the mirror, then mount the disks on another machine to backup to tape. But I could have chosen to just pull the disks in that RAID set out of the array and put them in the tape cabinet as the backup and it would have still been a backup.

    3. Re:Obligatory by Crimey+McBiggles · · Score: 3, Informative

      Just because you've hacked RAID into part of a backup strategy does not mean that backup is a standard use-case for RAID. It's far too easy for the wrong disk to get overwritten because of all the things RAID is set up to do by default. With rsync, you're telling the disks exactly which direction the data needs to flow. In a production environment, there's also a greater chance of failure using RAID because of the whole "plugging / unplugging drives" thing. Sure, it's rare, but your operating system and/or motherboard may or may not enjoy having drives attached and detached from its SATA bus. Hearing the above, a systems administrator would assume you're confused between the terms "backup" and "mirror". It's a non-standard use-case, so the admin that arrives after you've moved on to another job will have to deal with that confusion.

      --
      Crimey
    4. Re:Obligatory by hawguy · · Score: 2

      Just because you've hacked RAID into part of a backup strategy does not mean that backup is a standard use-case for RAID. It's far too easy for the wrong disk to get overwritten because of all the things RAID is set up to do by default. With rsync, you're telling the disks exactly which direction the data needs to flow.

      In a production environment, there's also a greater chance of failure using RAID because of the whole "plugging / unplugging drives" thing. Sure, it's rare, but your operating system and/or motherboard may or may not enjoy having drives attached and detached from its SATA bus.

      Hearing the above, a systems administrator would assume you're confused between the terms "backup" and "mirror". It's a non-standard use-case, so the admin that arrives after you've moved on to another job will have to deal with that confusion.

      My RAID backup strategy was fully supported and recommended by the manufacturer of the storage array, and was a big selling point. It wasn't a hack. Even tape backups can suffer problems from overwriting the wrong tape if someone does something stupid. "Oh hey, the backup system says this tape isn't expired yet, I'm sure I loaded the right tape, so I'll just do a hard-erase so I can write to it"

    5. Re:Obligatory by hawguy · · Score: 2

      My RAID backup strategy was fully supported and recommended by the manufacturer of the storage array, and was a big selling point. It wasn't a hack. Even tape backups can suffer problems from overwriting the wrong tape if someone does something stupid. "Oh hey, the backup system says this tape isn't expired yet, I'm sure I loaded the right tape, so I'll just do a hard-erase so I can write to it"

      Here's a Sun/Oracle doc that explains the procedure:

      http://docs.oracle.com/cd/E19683-01/817-2530/6mi6gg886/index.html

      How to Use a RAID 1 Volume to Make an Online Backup
      You can use this procedure on any file system except root (/). Be aware that this type of backup creates a “snapshot” of an active file system. Depending on how the file system is being used when it is write-locked, some files and file content on the backup might not correspond to the actual files on disk.

      The following limitations apply to this procedure:

      * If you use this procedure on a two-way mirror, be aware that data redundancy is lost while one submirror is offline for backup. A multi-way mirror does not have this problem.

      * There is some overhead on the system when the reattached submirror is resynchronized after the backup is complete.

    6. Re:Obligatory by Score+Whore · · Score: 1

      You're getting way too detailed to be implicating the highly generic term RAID in your list of fault conditions.

      I will point out however that your premise that a system won't know which way to sync the data is wrong. Any running RAID implementation that syncs from a recently attached disk to the currently in use disk is just broken and would never get out of QA.

      However using a RAID1 to mirror to an external drive isn't going to be a particular benefit unless the raid implementation manages a changed block map. If the implementation does have a changed block map then this is the exact use case intended.

    7. Re:Obligatory by phorm · · Score: 1

      It runs into some pretty nasty issues if a backup is interrupted though. Or at least it did when last I used it a few years back.

    8. Re:Obligatory by Sloppy · · Score: 1

      It is, if you then disconnect half of it and move it offsite! I'm not sure that's the best way to do backups, though.

      If I were this guy, I'd look into why it takes rsync so long to read the dir tree. This is one of those situations where no matter how much people say "Linux filesystems don't suffer from fragmentation," I nevertheless suspect you're suffering from highly fragmented directories. Let me guess: do you repeatedly come close to filling the disk? Maybe it's time to do this: after the next rsync, destroy your original with a new mkfs.whatever (I hope you have at least two backups) and then cp the data back to it.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    9. Re:Obligatory by Anonymous Coward · · Score: 0

      This.

    10. Re:Obligatory by jon3k · · Score: 1

      It is when it's detached and asynchronous. In fact it's not really even RAID anymore.

  7. umm yah by Anonymous Coward · · Score: 0

    Its called time machine. It basically takes a series of snap shots with lvm then offloads them when you reconnect. You could achieve the same thing with shadow copy and a script as well.

  8. DRBD or ZFS by Anonymous Coward · · Score: 1

    You could try DRBD (whole disk level) or ZFS with detached mirror or snapshots. Both will keep track of changes and resync only things that changes from the last sync.

    1. Re:DRBD or ZFS by Anonymous Coward · · Score: 0

      "ZFS Send" command is one i have used for backing up live servers in the past.
      1. take snapshot(time 1 second for my 500g raid 1)
      2. use zfs send command and zfs receive at the other end
      3. watch as only changed data between the point of snapshot and the external image gets updated

      e.g
      1. zfs snapshot test/home/rh@20130719
      2. zfs send myharddrive/test@20130719-1600 | ssh testbox zfs receive test/test@20130719-1600

      if you don't have zfs you can also do a bitmap update with mdadm but i find it is simpler to auto snapshot and script my systems to zfs send the latest snapshot over ssh that way you don't have to wory about the receiving end and what state it is in or what storage capacity it has. also resilient to interrupted backups.

  9. ZFS: Snapshot + send by Anonymous Coward · · Score: 2, Interesting

    Cleanest implementation of this I've seen is with ZFS.

    You do a snapshot of your filesystem, and then do a zfs send to your remote backup server, which then replicates that snapshot by replaying the differences. If you are experiencing poor speed due to read/write buffering issues, pipe through mbuffer.

    The only issue is that it requires that you have your OS on top of ZFS.

    1. Re:ZFS: Snapshot + send by Anonymous Coward · · Score: 0

      Don't worry, the only OSes worth using can boot from ZFS or comparable FS.

    2. Re:ZFS: Snapshot + send by berenddeboer · · Score: 1

      +1 people use zfs for by the minute snapshots between data centres.

      --
      If I had a sig, I would put it here.
  10. ZFS by Anonymous Coward · · Score: 1

    You want two ZFS filesystems. One local laptop pool, one backup pool (and it really should have two disks, but one will work fine). Snapshot your laptop filesystem periodically (cron or something), and then zfs send/receive that snapshot to the backup pool when you have access.

  11. Exclude directories by Anonymous Coward · · Score: 5, Informative

    Are you backing up EVERYTHING on the laptop -- OS and data included? Even if you are only backing up your home directory there is stuff you don't need to backup like the .thumbnails directory which can be quite large. Try using rysnc's exclude option to restrict the backup to only what you care about.

    DNA
    AKA mrascii

  12. COW or desync'ed RAID by phorm · · Score: 5, Informative

    In this case, it sounds like you want a fast on-demand sync rather than a RAID.

    However, you could possibly use dm-raid for this if you're a linux user.
    Have the internal disk(s) as a degraded md-raid1 partition. When you connect the backup disk, have it become part of the RAID and the disks should sync up. That said, it likely won't be any faster than rsync, quite possibly slower as it'll have to go over the entire volume.

    Alternate solutions:
    * Have a local folder that does daily syncs/backups. Move those to the external storage when it's connected.
        CAVEATS: Takes space until the external disk is available
    * Use a differential filesystem, or maybe something like a COW (copy-on-write) filesystem. Have the COW system sync over to the backup disk (when connected) and then merge it into the main filesystem tree after sync
        For example, /home is a combination of /mnt/home-ro (ro) and /mnt/home-rw (rw, COW filesystem). When external media is connected, /mnt/home-rw is synced to external media, then back over /mnt/home-ro

    1. Re:COW or desync'ed RAID by cthulhu11 · · Score: 1

      1) ZFS snapshots 2) CrashPlan

  13. My solution by kiriath · · Score: 1, Interesting

    Is to not try to keep 1TB of crap on a laptop... or anywhere for that matter. Travel light says me ;)

    1. Re:My solution by Ravaldy · · Score: 1

      That works for sales, managers and those who don't touch lots of data. Programmers are a good example of people who need a lot of arsenal when going on site. Customers DB, version of software, tools and more. All that amounts to lots of data. I usually recommend keeping the data on a removable USB drive that you backup when you aren't using it.

    2. Re:My solution by h4rr4r · · Score: 1

      Have you heard of the internet?
      It is super cool, you can leave the data in your datacenter and get to it from anywhere! You can even show the customer right on the server instead of dealing with your laptop and a painfully slow USB connection.

    3. Re:My solution by ccool · · Score: 1

      How does a painfully slow USB connection compares with a painfully slow Internet connection?

    4. Re:My solution by h4rr4r · · Score: 2

      You don't transfer anywhere near as much data over it.

      You leave that on the server and use the internet just for the nice cheap display.

    5. Re:My solution by Anonymous Coward · · Score: 0

      Last I checked, your Internet explorer doesn't do ms database connections.

    6. Re:My solution by Ravaldy · · Score: 1

      Have you ever gone into remote locations such as mines in Brazil and Africa. You don't get internet and if you do it's slow as a 14k modem.

  14. OS? by ralf1 · · Score: 4, Insightful

    The OP doesn't mention which OS he's on - the tools he mentions both run across multiple OS's. Would be helpful to know. I know as a group we probably assume some form of Linux but..... I use MS Home Server at the house to back up my family's multiple Windows machines. Runs on crappy hardware, does incrementals on a schedule, allows file level or bare metal restore, keeps daily/weekly/fulls as long as I ask it to. I know we aren't a Windows friendly crowd but this product does exactly what it promises and does it pretty well.

    --
    "Would you, could you, with a goat?" Dr Seuss
    1. Re:OS? by The+MAZZTer · · Score: 1

      I use robocopy on Windows for my 1:1 backup copy since it will use timestamps and file sizes to determine if a file needs to be synced or not. But I assume rsync does the same thing.

    2. Re:OS? by DamageLabs · · Score: 1

      Robocopy doesn't keep the ACM dates across volumes. So it is certainly not a 1:1 copy.

      The only thing that comes close, but still not there completely, is the legacy MS (Veritas) backup utility. And that one is far from automated.

    3. Re: OS? by Anonymous Coward · · Score: 0

      Backup software on Windows has been able to use the NTFS change journal feature to avoid scanning file systems for file level change for... I'd guess at least around ten years now.

      I've been waiting for the Linux community to "invent" this feature for a long time.

      It's been long enough that I'd recommend looking into either a ZFS, NTFS or Time Machine capable system/file server if you intend to do frequent incremental backups of your data. Certainly ZFS or NTFS for systems with large numbers of files to scan. You need block level or journaling for that.

      You waste a ton of time and resources to get near continuous backups on Linux, for what, a mirror?? At least finagle resync into doing point in time backup sets for you. /enterprise backup admin

    4. Re:OS? by CanHasDIY · · Score: 1

      Robocopy doesn't keep the ACM dates across volumes. So it is certainly not a 1:1 copy.

      The only thing that comes close, but still not there completely, is the legacy MS (Veritas) backup utility. And that one is far from automated.

      What about SyncToy? Seems to work pretty well, at least it does for me.

      --
      An enigma, wrapped in a riddle, shrouded in bacon and cheese
    5. Re:OS? by JoshRosenbaum · · Score: 1

      Robocopy doesn't keep the ACM dates across volumes. So it is certainly not a 1:1 copy.

      Maybe I'm misunderstanding you, but robocopy does keep dates across volumes. You can also control whether or not you want to copy them. File times are copied by default and for directory you add the DCOPY:T parameter. Are you speaking of some other underlying file system date?

    6. Re:OS? by DamageLabs · · Score: 1

      What about SyncToy? Seems to work pretty well, at least it does for me.

      Synctoy doesn't do created date nor automagically sync on network drive / local drive discovery.

      Too much manual work.

    7. Re:OS? by DamageLabs · · Score: 1

      Are you speaking of some other underlying file system date?

      Check the directory created date and the "." and ".." sublink dates on FATXX.

      Then get back to me.

      On NTFS, the dir modified date also gets updated. But that is by design of the filesystem.

      Sadly, the only way to, in this day and age, get fully identical copy of a directory tree is by cloning the whole drive. With all other data. While NU could clone dirs easily some 20 years ago.

    8. Re:OS? by JoshRosenbaum · · Score: 1

      Hmm, that's good to know that I should watch out for this if using FAT. I pretty much only use NTFS these days, so this is not something I've ever noted.

      I checked out the options in robocopy and I wonder if one of these two would fix this issue: /FFT :: assume FAT File Times (2-second granularity). /DST :: compensate for one-hour DST time differences.

      That seems lame it isn't handled automatically by default with an option to switch it off.

    9. Re:OS? by DamageLabs · · Score: 1

      Those swiches are only relevant if copying from FAT to NTFS or viceversa. They do not help with dir dates being wrong - defaulted to current.

      As I said, strange that I still use a nearly extinct tool - NT Backup - to get a near identical copy of dir trees. And on both filesystems, especialy on NTFS, it is very obvious that and when the tree was copied.
      Ghost is also a very useful tool that still does a great file by file volume copy, but it never could do a tree to tree copy on Windoze.

  15. Re:you backup 1TB from a laptop? by ormembar · · Score: 1

    Well, all my research work from the last 20 years, that makes some data, and you never know which data you need when you travel. So, when I change laptop, I copy all my data from the old disk to the new disk. That's why also I want to backup only the diff, and not spend my time scanning the disk to find these differences (which can be spread all over the disk).

  16. CrashPlan by Nerdfest · · Score: 3, Informative

    CrashPlan is free, but not open, and I think will do everything you need. You can backupto an external disk, over the network to one of your own machines, or back up to a freind who also runs it. Great key based encryption support. If you want, you can pay them for offsite backups (which is a great deal as well, in my opinion). It's cross-platform, and easy to use. Never underestimate the benefits of off-site backups.

    1. Re:CrashPlan by EW87 · · Score: 1

      I second this

    2. Re:CrashPlan by lw54 · · Score: 1

      For the last month, I've been using CrashPlan to back up a 5.5TB filesystem over AFP to a remote AFS file share over the Internet. I did the initial backup across the LAN and then moved the drive array to its final destination. I'm now a few weeks in after the move and for the last 4 days, it has not backed up and is instead synchronizing block information. 4 days in and it's up to 59.9%. It spent 1.5 days about a week ago doing something like recursive block purging. I wish the client could do these housekeeping chores while also performing the backup.

    3. Re:CrashPlan by Nerdfest · · Score: 1

      That is a long time. I think I had something similar when I 'adopted' a backup. Once it's in sync the backups are quite quick, with pretty much no 'start-up scan time'.

    4. Re:CrashPlan by lw54 · · Score: 1

      Excellent. Thank you!

  17. RSync by Anonymous Coward · · Score: 0

    RSync. Pretty simple, works on multiple operating systems.

    It doesn't do exactly this, but it gets the job done efficiently.

  18. Use a backup tool by Anonymous Coward · · Score: 0

    Such as CrashPlan [www.crashplan.com]. You can do local backups to an external hard drive or a friend for free. You only need to pay if you want to use their cloud storage instead of or in addition to local storage. You'll capture versions, it will be differential, and the data is deduplicated. Plus, CrashPlan is extremely fast for data that is compressible and/or dedup-able.

  19. Just use Windows Backup by benjymouse · · Score: 3, Insightful

    Windows Backup (since Vista) use Volume Shadow Copy (VSS) to do block level reverse incremental backup. I.e. it uses the journaling file system to track changed Blocks and only copies over the changed Blocks.

    Not only that, it also backs up to a virtual harddisk file (VHD) which you can attach (Mount) as a seperately. This file system will hold the complete history, i.e. you can use the "previous versions" feature to go back to a specific backup of a directory or file.

    --
    Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
    1. Re:Just use Windows Backup by h4rr4r · · Score: 1

      Lots of backup software uses VSS, pretty much any credible backup software on windows. It totally lacks automation, which is a pretty big downside.

      I doubt he is using windows, since he mentions rsnapshot.

    2. Re:Just use Windows Backup by DigiShaman · · Score: 2

      Unless you're running Windows 8 or Server 2012, Windows Backup on Windows 7 and below is functionally obsolete due to the new 3TB + drives now in 4k sector Advanced Format technology. As long as you can still find working 2TB drives and you don't have that much data to backup, you'll be fine with Windows Backup. Otherwise, upgrade the OS or use ArcServeD2D which I know works well (and expensive too).

      http://support.microsoft.com/kb/2510009

      --
      Life is not for the lazy.
    3. Re:Just use Windows Backup by benjymouse · · Score: 1

      It totally lacks automation, which is a pretty big downside.

      wbadmin.exe is available since Vista (where the VSS based image backup was introduced).

      How is that totally lack of automation?

      --
      Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
    4. Re:Just use Windows Backup by h4rr4r · · Score: 1

      You can use it for automation sure, but out of the box it does not do any. Nearly no windows user will know how to use that. It would need a shiny wizard and other mythical figures to do that for you.

      My personal favorite is to have bacula do it, that is even less end user friendly though. It does mean all the schedules live on the server not the client, which is nice.

    5. Re: Just use Windows Backup by fluffy99 · · Score: 1

      Home versions of windows don't support scheduled backups. You might be able to hack something yourself using task scheduler and a batch file though.

    6. Re: Just use Windows Backup by benjymouse · · Score: 2

      Home versions of windows don't support scheduled backups. You might be able to hack something yourself using task scheduler and a batch file though.

      No, that is not correct.

      At least in Windows 7 *all* editions have the full image capability. Only the professional/enterprise editions can backup to a *network* drive. But in this case it is a local or attached disk, so the edition really does not matter.

      --
      Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
    7. Re:Just use Windows Backup by benjymouse · · Score: 1

      You can use it for automation sure, but out of the box it does not do any. Nearly no windows user will know how to use that. It would need a shiny wizard and other mythical figures to do that for you.

      My personal favorite is to have bacula do it, that is even less end user friendly though. It does mean all the schedules live on the server not the client, which is nice.

      I have this feeling that if I had shown you the GUI part first, you would have dismissed that as "not automation". Hence, I chose to point out the wbadmin.exe tool.

      BTW, wbadmin.exe *is* "in-box". So your assertion "but out of the box it does not do any" is false. And yes, you can *also* set up daily scheduled backup with the GUI. It will actually create a task in task scheduler which uses (surprise!) wbadmin.exe!

      --
      Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
    8. Re:Just use Windows Backup by benjymouse · · Score: 3, Interesting

      Unless you're running Windows 8 or Server 2012, Windows Backup on Windows 7 and below is functionally obsolete due to the new 3TB + drives now in 4k sector Advanced Format technology.

      Nice. So because you can buy large-capacity drives that immediately would "functionally obsolete" backup solutions even if a system does not have such a drive? Tell me, did you buy a new BMW when apple changed the connector for iPhone 5? You know, the old BMW are now "functionally obsolete".

      Not that it matters much here anyway, because you got it wrong. Windows backup *will* backup to drives larger than 3TBs - as long as they use the 512e advanced formatting where it logically uses 512 bytes allocation units but physically 4096 bytes units. The solution is to use the GPT (GUID Partition Table) format. This will work for Vista and up.

      The drives that are exclusively 4096 cannot be used with Windows 7 / Server 2012 - that's a limitation of the OS and not the backup software, however.

      --
      Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
    9. Re:Just use Windows Backup by drinkypoo · · Score: 1

      Windows backup *will* backup to drives larger than 3TBs - as long as they use the 512e advanced formatting where it logically uses 512 bytes allocation units but physically 4096 bytes units. The solution is to use the GPT (GUID Partition Table) format. This will work for Vista and up.

      My god, it's so simple! Why would anyone ever use Linux, which is all hard and stuff?

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    10. Re:Just use Windows Backup by DigiShaman · · Score: 1

      Exactly! But being Windows Backup is part of the OS, you can't just upgrade the app without the OS.

      BTW, this just happened recently when I rolled out a Windows SBS 2011 box not too long ago. And I've seen this issue on a Windows 2008 R2 as well after a previous IT group jacked with the VSS writers and tried all sorts of backup programs that failed. Duh, it's the 4k sectors not being supported by Windows VSS properly! The external drives also didn't support 512e. Actually, I was a bit surprised by that last part as I could not format in NTFS specifically stating 512. WTF?!

      Below is a copy of the FSUTIL readout of the drive in question. (edited due slashdot junk filter)

      C:\Windows\system32>FSUTIL FSINFO NTFSINFO G:
      NTFS Volume Serial Number :
      Version : 3.1
      Bytes Per Sector : 4096
      Bytes Per Cluster : 4096
      Bytes Per FileRecord Segment :4096

      --
      Life is not for the lazy.
    11. Re:Just use Windows Backup by DigiShaman · · Score: 1

      Because Linux sucks balls for business that need a real AD forest, GPOs, and full Microsoft OS and app integration. About the only think Linux is good for in a Microsoft environment is LAMP boxes virtualized in either ESXi or Hyper-V. That, or as an appliance OS platform such as Dell KACE deployment server or phone system voicemail server.

      --
      Life is not for the lazy.
    12. Re:Just use Windows Backup by h4rr4r · · Score: 1

      This is no different than claiming ls is an automation tool because you can cron it. Anything you can call from a cron like system can be automated, but cron/scheduled tasks is the secret sauce not this tool.

  20. Get a Pi by Anonymous Coward · · Score: 0

    Get a raspberry Pi, plug the drive into it, and have it run rsync daemon (set it up to auto-start in case of power failure triggering a reboot). Plug the Pi's ethernet cable into your router, and then on your laptop run a script to trigger rsync (works via Cygwin too for Windows users) via a scheduler / cron. Have it verify that you are connected to your home (work / whatever) network, and if so call up the pi and start the backup.

    I set mine to try this on my work system at 11 am every day, because by that time I had come to work, dealt with my morning emails etc, and gone off to do something else, so by and large it would happen when I was not at the keyboard. You probably won't notice the slowdown anyways, even if you are there.

    For bonus points add a second script that checks the logs and alerts you if you've gone a while without backing up (in case the Pi dies or something)

  21. Re: Dr. Seuss Question by hyades1 · · Score: 0

    From the back, or in its throat?

    --
    I've calculated my velocity with such exquisite precision that I have no idea where I am.
  22. Whooosh by jayteedee · · Score: 3, Interesting

    Holy cow people, your missing the OP point. It's taking 15 minutes to SCAN the 1TB drive.

    I've run into the same problem on windows and Linux. Especially for remote rsync updates on Linux on slow wireless connections. It's not the 1TB that kills since I can read 4TB drives with hundreds of movies in seconds. It's the amount of files that kill performance.

    My solution on windows is to take some of the directories with 10,000 files and put them into an archive (think clipart directories). Zip, Truecrypt, tar, whatever. This speeds up reading the sub-directories immensely. Obviously, this only works for directories that are not accessed frequently. Also, FAT32 is much faster on 3000+ files in a directory than NTFS is. Most of my truecrypt volumes with LOTS of files are using FAT32 just because of the directory reading speed.

    On Linux systems, I just run rsync on SUB-directories. I run the frequently accessed ones more often and the less-accessed directories less often. Simple, No. My rsyncs are all across the wire, so I need the speed. Plus some users are on cell-phone wireless plans, so need to minimize data usage.

    --
    Religion and science are both 90% crap..but that doesn't negate the other 10%.
    1. Re:Whooosh by ormembar · · Score: 1

      I used to do that: a scan using the find command to find modified directories, then using --exclude directives from rsync, I backup only the unchanged directories. At that time, I was also evaluating the size of the sub-directories to backup only less than 1 GB for each part. But the find search was still too long for my taste. At the end, the gain in time was very limited (if not longer than a full rsync).

    2. Re:Whooosh by jayteedee · · Score: 1

      That takes longer since the find command scans the entire directory and file structure to find the directories. It also takes longer because of querying the size takes more than just querying the name. I just used rsync to scan some of the directories hourly (accounting data, document directories, etc). Other directories were daily, and others were only monthly (install directories, tools, etc). I had to force the users into a certain file hierarchy, but that's what sys admins are for :)

      --
      Religion and science are both 90% crap..but that doesn't negate the other 10%.
    3. Re:Whooosh by UnknownSoldier · · Score: 1

      Yup almost everyone missed the point of having to deal with shitty File Systems.

      Agreed about using the "dumb" FAT32 FS for speedy access!

      It's too bad you couldn't load the FS meta-info into a RAM drive, or onto a SSD, kind of like how ZFS gives you the option with the ZIL on SSD.

    4. Re:Whooosh by jayteedee · · Score: 1

      Agreed. My first thoughts were ZFS, but with the laptop I figured it was more-than-likely a windows box. Plus I wouldn't use BSD on a laptop either and I don't quite trust ZFS on Linux yet...(but it's getting close). Also agree on the ZIL on SSD. I can keep quite a few VMs (websites) in cache on the SSD and hardly have to worry about the speed of the HDs. Plus backups from the filesystem level. One of those tools I can't believed I've lived without all these years.

      --
      Religion and science are both 90% crap..but that doesn't negate the other 10%.
    5. Re:Whooosh by benjymouse · · Score: 3, Informative

      My solution on windows is to take some of the directories with 10,000 files and put them into an archive (think clipart directories).

      I hope your are not an IT professional. Windows comes with a perfectly good backup solution built-in. It will use Volume Shadow Copy Service (VSS) to track changes as they occur and subsequently only do backup of the changes blocks. No need to scan *anything* as the journaling file system has already recorded a full list of changes in the journal.

      The backup is basically stored in a VHD virtual harddisk (and some catalog metadata around it), so you can even attach the VHD and browse it. It will by default let you browse the latest backup, but the previous versions feature will let you browse back in time to any previous backup still stored in the VHD (oldest backups vill be pruned from the backup when the capacity is needed). The VHD is a inverse incremental backup because it stores the latest backup as the readily available version and only the incremental (block level) differences between previous backup sets.

      Moreover, VSS also ensures persistent consistency for a lot of applications that are VSS aware (VSS writers), i.e. database systems like Oracle, SQL Server, Active Directory, registry etc. VSS coordinates with the applications so that exactly when the snapshot is taken, the applications ensure that they have flushed all state to disk. This means that applications will not need to be stopped to get a consistent backup, i.e. database systems will not see a restore of a backup that was taken from a running system as a "crash" (as they would without such a service) from which they must recover through some other means (typically a roll-forward log).

      --
      Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
    6. Re:Whooosh by Anonymous Coward · · Score: 0

      inotify. keep a list of changed files. feed this to rsync or rdiff-backup.

  23. I use two superb products by CAOgdin · · Score: 1

    1. For keeping two drives synchronized, check out GoodSync. It's powerful, and I use it to keep two separate computers holding identical copies of two major folders of data synchronized, so if one goes down, there's minimal loss of data (1 hour, max) I use this, for example, to keep a client's two 1TB collections of photos and iTunes synchronized. http://www.goodsync.com/

    2. For making backups that are compact, efficient and easy to recover, look at "Disk Snapshot". It's inexpensive, robust and I've never experienced a restore failure. I make "Disk Snapshot" images of every computer, every night, in a development environment. That way, if the thing I just did breaks the system, I can restore a 100 GB Drive is less than an hour by booting from a CD and pointing to the backup on an external drive. http://www.drivesnapshot.de/en/index.htm

  24. Do it on a lower level. by tibit · · Score: 2

    I'd think to use LVM and filesystem snapshots. The snapshot does the trick of journaling your changes and only your changes. You can ship the snapshot over to the backup volume simply by netcat-ing it over the network. The backup's logical volume needs to have same size as the original volume. It's really a minimal-overhead process. Once you create the new snapshot volume on the backup, the kernels on both machines are essentially executing a zero-copy sendfile() syscall. It doesn't get any cheaper than that.

    Once the snapshot is transferred, your backup machine can rsync or simply merge the now-mounted snapshot to the parent volume.

    --
    A successful API design takes a mixture of software design and pedagogy.
    1. Re:Do it on a lower level. by tibit · · Score: 3, Informative

      Well, of course I goofed, it's not that easy (well it is, read on). A snapshot keeps track of what has changed, yes, but it records not the new state, but the old state. What you want to transfer over is the new state. So you can use the snapshot for the location of changed state (for its metadata only), and the parent volume for the actual state.

      That's precisely what lvmsync does. That's the tool you want to do what I said above, only that it'll actually work :)

      --
      A successful API design takes a mixture of software design and pedagogy.
  25. BtSync by Fuzion · · Score: 0

    How about BtSync?

    It's based on the BitTorrent protocol, and it can sync over the internet as well.

    --
    "Knowledge makes us accountable." - Che Guevara
  26. Will your backup have you backup up and running? by Marrow · · Score: 1

    If you are spending time messing with a system that is not going to provide you with a running computer after a quick trip to the store for a new hard drive, then maybe you should rethink your goals.
    And perhaps you would regret the time spent less if you knew that in the event of an emergency, your backup would not only save your data, but prevent a re-installation and updates and more updates and more updates, and hunting for installation media and typing in software keys.
    AIX had/has a nice system for backups: it created a bootable backup tape. Just turn the key, boot from the tape, say go and your machine was recovered completely. The closest I have see to that recently is clonezilla.

  27. Step 1 get a real backup by silas_moeckel · · Score: 1

    Making a mirror every now and again is not a backup strategy to use. This is the canned RAID is NOT a backup and never will be advice. For a single laptop something like backblaze is probably a better bet.

    --
    No sir I dont like it.
  28. Backup solution by Anonymous Coward · · Score: 0

    Yes, there is a product named ActiveImage Protector from Japan. There is an English version available from www.activeimage.net. It creates a log of changed blocks between backups (incremental backups). Base backup is full disk or volume image (smart sector to only backup sectors in-use), subsequent incremental backup only backs ups changed blocks. Very good compression and very fast backup. Versions for Windows and Linux. You can mount backup images to restore file selectively or boot from a recovery CD to restore boot volumes and disks.

  29. 15 minutes is a fast 1-terabyte sync by Anonymous Coward · · Score: 0

    You are unlikely to find anything faster than rsync, nor as reliable. 15 minutes to synchronise your backup with your live data is an impressively short time - did you ever look at the "speedup" claim that rsync includes in its final message? This is the ratio of time taken to synchronise versus the time that would be required to transfer the complete dataset over the same bandwidth. USB3 has an absolute maximum of 4.8 Gbit/s, which is 600 MBytes/s (or 2 TBytes per hour), but you are unlikely to get better than 1 Gbit/s (0.4 TBytes per hour) sustained disk data transfer.

  30. Re:Will your backup have you backup up and running by Anonymous Coward · · Score: 0

    I do wish Linux had something like mksys/sysback.

    Unfortunately, there isn't much in the way of bare metal restorable backup utilities for Linux unless I reboot to another OS and run an image program. Even wbadmin.exe in Windows can create for me boot images where I just boot from recent Windows OS media, run wbadmin to restore, walk off, and reboot into a functioning OS.

    If I were to recommend a backup utility for Linux, I'd probably just worry about syncing off application data and documents, since it takes -far- less time to reinstall the OS than apps, than to try to piece together a working Linux box from backups.

  31. Upgrade your rsync! by phoenix_rizzen · · Score: 4, Informative

    You're holding it wrong. ;)

    rsync 2.x was horribly slow as it would scan the entire source looking for changed files, build a list of files, and then (once the initial scan was complete) would start to transfer data to the destination.

    rsync 3.x starts building the list of changed files, and starts transferring data right away.

    Unless you are changing a tonne of files between each rsync, it shouldn't take more than a few minutes using rsync 3.x to backup a 1 TB drive. Unless it's an uber-slow PoS drive, of course. :)

    We use rsync to backup all our remote school servers. Very rarely does a single server backup take more than 30 minutes, and that's for 4 TB of storage using 500 GB drives (generally only a few GB of changed data). And that's across horrible ADSL links with only 0.768 Mbps upload speeds!

    Going disk-to-disk should be even faster.

    1. Re:Upgrade your rsync! by Skater · · Score: 1

      I was hoping someone would say something like this. I do the exact same thing with my "media" drive - a two terabyte drive with our pictures, home videos, mp3s, etc. on it. I have another external 2 GB drive. It really doesn't take that long for the rsync to work - I start it and it finishes a couple minutes later, even when I haven't done it for 30 or 60 days. I've never sat there and timed it, because I usually start it and go do something else, but I don't think it takes 15 minutes on average - maybe 5 or 10. It sounds like something else is wrong in his setup. I'm curious, though, so I'll add date/time start/end time stamps to the script I use (which just calls rsync with the correct options so I don't accidentally make a second copy of the media drive under the original backup copy, or something like that) so I can monitor it.

    2. Re:Upgrade your rsync! by Mryll · · Score: 1

      Yeah I use rsync in cygwin to distribute from a Win7 desktop machine to both an external mybook sort of drive and a Linux machine with RAID array. When using the size/time/date matching method to a non-Windows machine it is important to use the --modify-window=1 flag recommended in the man page else the timestamps may fail to match resulting in more data transfer. I run it nightly from a scheduled task but it does not take very long to complete.

    3. Re:Upgrade your rsync! by Mryll · · Score: 1

      (Other than the initial run of course)

    4. Re:Upgrade your rsync! by Anonymous Coward · · Score: 0

      rsync 3.x starts building the list of changed files, and starts transferring data right away.

      That doesn't sound good. To simplify his requirements: his drive contains n files and he changes 1 file. If your solution's running time is O(n) he's not interested. He wants O(1) in this case.

    5. Re:Upgrade your rsync! by Anonymous Coward · · Score: 0

      He'll have the same problem with rsync 3.x. Notice that he's not complaining about the time to transfer the data. He's talking about the time needed to scan the drive to determine which files need to be transfered. It takes time to get the size and timestamp of each file so rsync can determine if it needs to be transfered or not. That will always need to be done no matter which rsync is used.

    6. Re:Upgrade your rsync! by Anonymous Coward · · Score: 0

      > rsync 3.x starts building the list of changed files, and starts transferring data right away.

      Thanks for posting that! I used to use rsync to backup a drive that stored backups from 24 Windows desktops with BackupPC (http://backuppc.sourceforge.net/). BackupPC worked great, but created a tremendous number of files. It took rsync more than two weeks to finish the initial scan and start transmitting. I'm glad to see that problem has finally been fixed.

  32. ZFS - incremental/snapshot? by Roskolnikov · · Score: 4, Informative

    two pools, internalPool, externalPool

    use ZFS send and receive to migrate your data from internal to external, you and do whole fs or incremental if you keep a couple of snaps local on your internal disk, this can get excessive if you have a lot of delta or you want a long time.

    http://docs.oracle.com/cd/E18752_01/html/819-5461/gbchx.html

    of course you will need a system that can use ZFS, there are more options for that than time machine, its block level and its fast, and it doesn't depend on just one device, you can have multiple devices (I like to keep some of my data at work, why? my backup solution is in the same house that would burn, if it burned...)

    --
    Unix, an obscure operating system developed by bored researchers in an attempt to get a better game playing experience.
    1. Re:ZFS - incremental/snapshot? by UnknownSoldier · · Score: 1

      Very nice suggestion about using two pools !

      >of course you will need a system that can use ZFS

      Actually I was suprised how well "ZFS on Linux" works if you don't have a FreeNas/BSD system.
      * http://zfsonlinux.org/

      It is too bad the ZFSonLinux documentation is total garbage but at least it was relatively painless to get it to work on a spare Ubuntu box. IIRC, ZFS on Linux setup was ...

      sudo apt-get update
      sudo apt-get install uuid-dev
       
      wget http://archive.zfsonlinux.org/downloads/zfsonlinux/spl/spl-0.6.1.tar.gz
      wget http://archive.zfsonlinux.org/downloads/zfsonlinux/zfs/zfs-0.6.1.tar.gz
       
      cd spl
      ./configure
      ./make
      sudo make install
      cd ..

    2. Re:ZFS - incremental/snapshot? by steak · · Score: 1

      I declare you the winner.

    3. Re:ZFS - incremental/snapshot? by Anonymous Coward · · Score: 0

      Alternatively, you could add the ZoL guys' PPA and use only apt-get:

      sudo add-apt-repository ppa:zfs-native/stable
      sudo apt-get update
      sudo apt-get install ubuntu-zfs

  33. DRBR? by Anonymous Coward · · Score: 0

    Setup DRBR on the initial configuration to the secondary drive. I haven't used it for a while but IIRC the changes sectors would be flagged for replication and when the secondary device was brought online it would replication. I know this what designed for a machine to machine block replication but it might be plausible on the same machine. It might be a good feature request if it doesn't (but then that wouldn't help you right now)

  34. lsyncd + some queue by Anonymous Coward · · Score: 0

    Lsyncd plus some of queue to sync when destination device is available.

    1. Re: lsyncd + some queue by Anonymous Coward · · Score: 0

      lsync is indeed nice: it uses inotify to be informed about changes by the kernel, instead of periodically having to scan the entire filesystem. And on top of it, it works transparently in the background, just using ssh.

  35. rsync is fast for me by Anonymous Coward · · Score: 0

    I currently backup 2.5TB of data using rsync. It takes about 2-3 minutes to determine the changes and then whatever time to do the actual copying. Post your actual rsync command, maybe you are doing something strange that isn't necessary.

    1. Re:rsync is fast for me by ormembar · · Score: 1
      Last rsync (version 3.0.9) was:
      $ rsync -av --delete /home/. /auto/passport/home/. | tee -a ~/backup.log
      ...
      sent 1001882570 bytes received 106527 bytes 1775002.83 bytes/sec
      total size is 527398084971 speedup is 526.3

      It took 10 minutes to scan the 500 MB partition (ext4). From the other posts, I guess the duration time is due to the number of files that is scanned by rsync.

  36. git-annex by Anonymous Coward · · Score: 0

    git-annex with a running assistant maybe (in direct mode)? But would like to hear how does it work. Would create git directory for holding metadata about files. Would be constrained to your home.

  37. Some alternatives (ubuntu) by brodock · · Score: 1

    Ubuntu documentation lists some alternatives: https://help.ubuntu.com/community/BackupYourSystem one that is not listed there, and that I used many years ago is "UNISON"... I found it faster and better then rsync, also it does binary diff, so for big files that only "metadata" changed, it transfers faster.

  38. Rebit by scsirob · · Score: 1

    Not free but reasonably priced and worth every penny. rebit.com keeps track of all changes, sends new versions of a file to a local harddisk, a network share or both. In case of a crash you can recover from a boot CD-ROM and I've used that to transfer my files to a new computer too. They have cloud-enabled versions too.

    --
    To Terminate, or not to Terminate, that's the question - SCSIROB
  39. Uhh... Why the rush? by moorley · · Score: 1

    I usually hate making posts where I am questioning the questioner, rather than providing an answer but with 1 TB of information you should put on the patience cap. It will take as long as it takes.

    To break down what you are wanting:
    I want a backup based on a journal file system sorta of thing that works incrementally slowing down every disk operation by a few milliseconds so I can shave 15 minutes off of a backup procedure, but I still have to send the same data. I don't think that would be very wise. The best existing method is to use mirror a volume but you're still experiencing the same "15 minutes" of delay.

    The best thing you can have is a "fire and forget" procedure where you can walk away and let it run.

    locate (based on updatedb) does not capture/sort on file modification dates so you are going to be left with a recursive file system search no matter what.

    You could use find to generate a list of files that have been modified since a certain date and then feed that to tar. That way you can pre-generate an incremental backup in a file that you can copy over. Then let whatever backup solution you like make a full backup from time to time. You can setup a script that would run a few times in your work day to generate the file so at least every 24 hours there is a tar file you can copy over when you get a chance.

    Good luck!

    --
    "Don't fear death... fear not living..." -me :)
  40. mechanical drives are slow by Anonymous Coward · · Score: 0

    i know this post won't help much but i just want to say that mechanical laptop drives are really slow. i haven't tried using a laptop with an solid state drive though. i just did a benchmark of an old 900 MHz Celeron Netbook. the average read transfer rate of the drive was about 33 megabytes per second vs 60 megabytes per second of my desktop SATA drive.

    come to think of it, my external USB drive has an average transfer rate of about 30 MB/s too. copying big files takes a long time. i remember spending about 40 minutes or more to copy a 20 GB MMORPG so that I can move it from one computer to another.

  41. Btrfs send/receive by jandar · · Score: 4, Informative

    Btrfs send/receive should possible be doing the trick. After first cloning the disk and before every subsequent transfer create a reference-snapshot on the laptop and delete the previous one after the transfer.

    $ btrfs subvolume snapshot /mnt/data/orig /mnt/data/backup43
    $ btrfs send -p /mnt/data/backup42 /mnt/data/backup43 | btrfs receive /mnt/backupdata
    $ btrfs subvolume delete /mnt/data/backup42

    I havn't tried this for myself, so the necessary disclaimer: this may eat your disk or kill a kitten ;-)

    1. Re:Btrfs send/receive by bill_mcgonigle · · Score: 1

      yeah, I was thinking of the ZFS equivalent, but either should work for just what the OP is asking for.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  42. Rsnapshot by buttfuckinpimpnugget · · Score: 0

    rsync + hard links, works just like time machine. http://www.rsnapshot.org/

  43. Btrfs send & receive by flux · · Score: 2

    Btrfs has tools for doing this. It also comes with find-new that allows to find exactly which files have been changed between snapshots, and it does it basically instantenously.

    Though Btrfs might not be the solution for ensuring data integirity at this point.. But setting up hourly snapshots of your drives can be quite nice when you accidentally destroy something you've created after the last backup.

    1. Re:Btrfs send & receive by ssam · · Score: 1

      >Though Btrfs might not be the solution for ensuring data integirity at this point.

      its certainly close though. it also has bunch of data integrity features (like checksumming) that will make it far safer than ext (and most other filesystems apart from zfs). if you have slightly dodgy hardware btrfs will let you know, whereas your data may silently corrupt on ext4.

    2. Re:Btrfs send & receive by flux · · Score: 1

      That is quite correct but now a week goes by without someone joining #btrfs and asking about how to recover. And personally I just restored my / just a week ago.. Technically I shouldn't have, because I was using btrfs raid10 and only one of the drives had had issues (it was the old drive I was putting back; btrfs scrub; btrfs balance somehow made all drives have errors). Btw, the system did not survive the death of the drive either, I had to reboot it without it, whereas linux md survives these things very nicely.

      But I'm hopeful of the future. Btrfs IS really a nice thing to have, at least if you have some SSD and fragmentation isn't an issue for you (say, for virtual machine images). I'm planning to put it on some spinning media as well, I shall see how that'll work out..

    3. Re:Btrfs send & receive by nullchar · · Score: 1

      Do you lose any functionality by using btrfs on top of an md device? (vs btrfs' own raid)

    4. Re:Btrfs send & receive by flux · · Score: 1

      I lose the btrfs functionality of it repairing bad blocks. I'm pretty sure md doesn't notice the error if it doesn't come from the backing device in the form of an IO error.. So if btrfs gets a bad block from md, there's not much to be done, except to make a guess which of the mirrors has the copy and manually resync (if btrfs manages multiple devices, it can just do that automatically). I don't know how that would be even done. But at least I would know which file has the error and I can restore it from the backups. Metadata is duplicated so that shouldn't be an issue.

      I also lose the ability to change the data layout at ease; in btrfs going from raid10 to raid1 to raid0 or adding new devices is just a matter of running btrfs balance (and btrfs device add). Md doesn't even support adjusting the number of devices in raid10 array, nor conversions from raid1 to raid10 (that can be done manually, but you don't have redundancy during the process).

      In future I may lose the forthcoming ability to set redundancy level per subvolume or possibly even per directory/file.

    5. Re:Btrfs send & receive by thingummy · · Score: 1

      I have a recovery question. It is all my fault , but how do you suggest I ask for help? I asked it at the FedoraForums, but not much help. thanks

    6. Re:Btrfs send & receive by nullchar · · Score: 1

      Thanks for the informative reply. I've been a long time md user, but not yet experimented with btrfs - I'll definitely try its native raid.

      I'm very excited for the checksum-on-read process to alert on corrupted data.

    7. Re:Btrfs send & receive by flux · · Score: 1

      irc: FreeNode #btrfs

      But be pacient, people are probably not responding immediately. Then there is also the mailing list.

  44. BTRFS send/receive snapshot by Anonymous Coward · · Score: 0

    http://lwn.net/Articles/506244/

  45. Re: you backup 1TB from a laptop? by Anonymous Coward · · Score: 1

    Sorry, but something doesn't sound right here. Even with the amount of data your g about, it sounds like something else is slowing it down. Namely, system/kernel config.

    Do some I/O observing if you haven't. You may have some task or process priority problem slowing things down.

    Just a thought...

  46. ZFS with -o copies=2 (or 3..) by Anonymous Coward · · Score: 0

    It will duplicate the file blocks the requested(2, 3, ..) number of times. Kind of like a single disk raid, or as close as you can reasonably be. With checksums it will also figure out which block(s) are corrupted.

  47. iFolder by maz2331 · · Score: 1

    I use iFolder for this. It has clients for Windows, Linux, and Mac platforms, and works reasonably well. The server was a bit of a pain to get set up though. It used to be a Novell product but has spun off as its own open source project. You can check it out at ifolder.com

  48. DRBD by david-the-go · · Score: 1

    It is relatively straightforward to use DRBD to loopback across 127.0.0.1 and protocol A to set exactly this up. I'd link to the blog which demonstrates how I did this, but don't want to get slashdotted. (Googling "drbd-for-ssdusb nz" should hopefully show you how to do this if you care enough to google for it)

  49. Ext. disk speed issue by Anonymous Coward · · Score: 0

    It sounds more like you have a slower USB 2.0 external drive or some other sort. If your laptop offers eSata go with that as it is faster than USB 2.0. If you have USB 3.0 your eSata can still be faster assuming your motherboard has the SATA 6.0 gbps speed.

  50. incron by ze_jua · · Score: 1

    It performs actions on file changes. It can also create a list of updates/created/deleted files on the fly.

    Then, you just have to use this list to sync juste the changed files.

  51. DRBD by Amorphous · · Score: 1

    Put your filesystem on a DRBD device?

    You'll get a consistent clone every time you leave your laptop connected to your home network long enough for it to sync the changed blocks

  52. XFS - incremental backups with xfsdump by Anonymous Coward · · Score: 0

    Use XFS with xfsdump, do a level 0 and then subsequent level 1 and level 2 backups. It'll finish in seconds.

  53. Re:you backup 1TB from a laptop? by viperidaenz · · Score: 2

    Do you modify all your research work from the last 20 years? If not, exclude it from backup, since you already have it backed up and are not changing it.

  54. IBM's TSM by jabuzz · · Score: 1

    Real backup and since 6.3 does journal based backups for Ext2, Ext3, Ext4; XFS, ReiserFS, JFS, VxFS, and NSS.

    The other option I have seen (surprisingly for GPFS as TSM does not do journal based backups for GPFS even though both are IBM products) is to register to the DMAPI (this would only work for XFS I think) and then use that to capture all activity on the file system. You could then use that to generate your list of files to backup. Admittedly this is going to require you to get your hands dirty and do some coding. I am also not sure what state DMAPI support is in XFS either.

  55. Come on, this is 101 stuff by Rob_Bryerton · · Score: 1

    Come on, this is 101 stuff, though very entertaining to see the usual geek overkill suggestions of RAID mirrors, ZFS pools, and other fun stuff. We sure love our overkill, huh? (I'm guilty too)

    Anyways, what I would do is 1st ask myself "why am I watching a backup run?" Whether it takes 5 minutes or 50 minutes is irrelevant; who watches their backups run? You log the job output and email the results, or whatever your favorite technique is.

    Second, set up your daily cron to launch a shell script a little before the earliest time you expect to arrive home. In this shell script you insert a simple loop that does a check to see if you're at home (more below); if you're not at home, the loop sleeps for 10 minutes (or whatever) and runs again. Set an upper bound on the cumulative total amount of time to sleep (6 hours or so); in the event you're out really late/all night, then the script exits cleanly without performing the backup and runs the next day. It can identify you are at home and ready to launch the rsync by one of several conditions: look for the presence of an NFS mount; look for a certain IP range; look for your USB drive mounted that perhaps has a file named ".backup_device"....whatever. When this condition is TRUE, break out of/exit the loop, and the next line contains the rsync command. Easy!

    One of the most important aspects of a backup job is that it is scheduled and runs automatically. If you have to rely on a person to manually start it, then sooner or later (like, within a week) you'll start putting it off or simply forget to do it.

  56. What about owncloud? by Anonymous Coward · · Score: 0

    Effectively syncs dirs (like /home) and versions files too. Put raid/drbd under it and you are set.

  57. ZFS by Anonymous Coward · · Score: 0

    ZFS snapshots and incremental send/recv do exactly what the person here wants. See Oracle's documentation for details of how the commands work:

    http://docs.oracle.com/cd/E19253-01/819-5461/gfwqb/index.html

    You can use ZFS on Darwin, FreeBSD, Illumos, Linux, Mac OS X and Solaris.

  58. lsync to the rescue! by Anonymous Coward · · Score: 0

    lsync (http://code.google.com/p/lsyncd/) is what you are looking for. It watches changes and copies them over, avoiding the scanning of a complete directory tree.
    When starting up, it uses rsync to make sure both sides are in sync.
    I used it in the past and it worked as advertized. Nowadays I use rsync over night though and let it run weekly, so the long time to run is not an issue anymore.

  59. or a desktop, the AD servers don't know by raymorris · · Score: 1

    The agency where I work is all Microsoft, all the time. They use Active Directory for everything. After I had been working there a year, security did a scan of the network and found out I was running Linux. I ran Linux for a year in that environment and had no trouble. My Linux desktop played just fine with their AD forest. LibreOffice had no trouble with any of the Microsoft Office documents I needed to handle. I've heard that some version of LibreOffice had some trouble with some feature as implemented by some version of MS Office. I saw no problems, though.

    I did get in trouble for running an unauthorized OS. Now I'm still at a command line, but on a Mac when I need to be connected to the office network. Mac is certified Unix.

    1. Re:or a desktop, the AD servers don't know by Anonymous Coward · · Score: 0

      And is OSX an authorized OS where you work?

  60. lsyncd by Anonymous Coward · · Score: 0

    If it's linux, we use a thing called lsyncd that I'm pretty sure will do what you want. It monitors for file changes using inotify and ships modified files back to the remote host via rsync. https://code.google.com/p/lsyncd/

  61. That's a great solution. wish I had mod points by raymorris · · Score: 1

    That's a great solution, and one that actually answers the OP's question, assuming he can do a little shell scripting. I spent YEARS designing a great enterprise grade backup solution using LVM. I'm the maintainer of Linux::LVM, but I hadn't heard of lvmsync before. I'm sure I'll use it sometime.

  62. snapshots by Anonymous Coward · · Score: 0

    using lvm (i hope), use snapshots.

  63. BackInTime by perles · · Score: 1

    I use BackInTime under similar conditions and it works fast and perfect. It uses diff, rsync and symbolic links to preserve the history of every snapshot. Check it out: http://backintime.le-web.org/