Slashdot Mirror


Linux Backups Made Easy

mfago writes "A colleague of mine has written a great tutorial on how to use rsync to create automatic "snapshot-style" backups. Nothing is required except for a simple script, although it is thus not necessarily suitable for data-center applications. Please try to be gentle on his server: it is the $80 computer that he mentions in the tutorial. Perhaps try the Google cache." An excellent article answering a frequently asked question.

23 of 243 comments (clear)

  1. First Mirror by doublem · · Score: 4, Informative

    I had the chance to be the first post, but decided to mirror the site first.

    My mirror is here

    --
    "Live Free or Die." Don't like it? Then keep out of the USA
  2. Re:'man dump' by GigsVT · · Score: 2, Informative

    Read the fucking article, that's the point. He uses hard links to make a second copy of the backed up directory, exploiting the fact that rsync always unlinks before changing a file, thereby effectively doing incremental backups without wasting hard disk space.

    --
    I've had enough abrasive sigs. Kittens are cute and fuzzy.
  3. So... by Squeezer · · Score: 1, Informative

    Slashdot is now a reference for tutorials? Ever try www.tldp.org or www.linuxtoday.com (they post links to tutorials).

    --
    Does the name Pavlov ring a bell?
  4. Because Linus says dump isn't reliable. by glrotate · · Score: 5, Informative
    1. Re:Because Linus says dump isn't reliable. by zrodney · · Score: 3, Informative

      he says right there in the linked article that
      dump can't reliably back up the filesystem
      because of the kernel filesystem caching, and that
      future kernel development is headed further in that
      direction, so you might as well not depend on dump.

      seems pretty reasonable to me, go ahead and use
      dump if you like though

  5. Re:'man dump' by TarpaKungs · · Score: 2, Informative

    Hi

    rsync --backup-dir ...

    2 years ago I wrote a script to do pretty much what the linked product does - ie: maintain a duplicate set of data areas on another machine via rsync.

    I use the --backup-dir option to relocate copies of the files which the current rsync run would otherwise delete or modify.

    With a bit of rotation, we can have users helping themselves to a full view of their
    home directory as of last night and also be able to restore files effectively from each day of the week going back 7 days in our case.

    Sure does cut down on the number of tape restore requests.

    As mentioned it is incredibly efficient - we deal with about 900GB of data backed up in this way - but rsync actually transfers about only 10-30GB of differences each night.

    Only problem is my script was a crap prototype which is why I'm not letting anyone see it ;-)

    But I do have a design in my head for a more professional effort (will be opensourced) - I'm might even get enough peace at work to write it one day!

    --
    Why can't women be like Hedy Lamarr - beautiful, talented and inventors of frequency-hopping spread-spectrum techn
  6. Re:'man dump' by GigsVT · · Score: 4, Informative

    It's an expression, it's not particularly abusive.

    rm -rf backup.3
    mv backup.2 backup.3
    mv backup.1 backup.2
    cp -al backup.0 backup.1
    rsync -a --delete source_directory/ backup.0/

    There. That's the script basically. Add more snapshot levels as needed, stick it in cron at whatever interval you need.

    dump only supports ext2/3. This supports any file system, and retreiving a file from backups is as simple as running "cd" to the directory of the snapshot you need and "cp" the file out.

    I run backups from Linux to IRIX and other UNIXs using gnu rsync and openssh. This little trick is going to be very handy for me. I can't waste my time worrying about which filesystem type the files came from originally.

    --
    I've had enough abrasive sigs. Kittens are cute and fuzzy.
  7. Check out glastree by Soylent+Beige · · Score: 3, Informative

    Been using a script called glastree on several production file servers for quite some time now.
    It work just great! At one site I've got about 7 weeks of depth from 3 different servers all
    mirrored via ssh-nfs on one lowly Penti 133. We still spin tapes mind you, but glastree has
    been flawless.

    Been meaning to buy the author a virtual beer for some time now . . .

    http://igmus.org/code/

    From the website:
    'The poor man's daily snapshot, glastree builds live backup trees, with branches for each day. Users directly browse the past to recover older documents or retrieve lost files. Hard links serve to compress out unchanged files, while modified ones are copied verbatim. A prune utility effects a constant, sliding window.'
    --

    --
    Everyone hates me because I'm paranoid.
    1. Re:Check out glastree by Jeremy+Wohl · · Score: 2, Informative
      Hi, I'm the glastree author.

      Yes, my software does essentially this, wrapped up in a nice utility (though, you get day resolution).

      What we want, of course, is a better replica of plan9's dumpfs, featuring a real filesystem layer and compressed block differences. This is on my TODO list.

      -jeremy

  8. SSH comment needs to be added! by Anonymous Coward · · Score: 3, Informative
    This sounds great, I would like to thank the author for the article. Only one thing really should be added. The way that you should do rsync for a back up server is to do rsync over ssh with a passwordless connection. (see http://www.unixadm.net/howto/rsync-ssh.html with google cache)

    Also, it should probably also be done from the real server to the backup server so that you can not just break one machine and get into all. (if you break into the real machine as root then you should be able to get into the backup machine)

    This allows the backup machine to have only one open port. ssh which can be tcpwrapped to allow connections only from the machines that it backsup.

  9. rdiff-backup is easier and more efficient by heydan · · Score: 5, Informative

    The backup scheme described here uses hard links to avoid storing multiple copies of identical files, but when a large file changes even in a small way it stores a whole fresh copy of that file. rdiff-backup is more efficient because it stores one complete copy of your current tree with reverse diffs that allow you to step back to previous versions if you need to. If a large file changes in a small way, only the reverse diff is stored to encode that. This is very handy for cases where, for example, a multiple megabyte e-mail inbox has had just a few kilobytes of new messages appended to the end (although the rsync/rdiff-backup algorithm is also efficient with changes in the middle of a file). Being more efficient in this way translates directly to an increase in the number of past versions you can fit in the same space which can make all the difference if it takes you a while to realize that a given file has been accidentally deleted or damaged.

    http://rdiff-backup.stanford.edu/

    1. Re:rdiff-backup is easier and more efficient by mikerubel · · Score: 2, Informative
      Thanks for mentioning this!

      Rdiff-backup is an excellent utility, and Ben Escoto (its author) and I link to each other. You must realize, though, that the purposes are different. Rdiff-backup is more space efficient for things like text, email, and so on. My rotating snapshot trick is less space-efficient, but much simpler for the average user to understand ("just go into your snapshot directory and copy the old file back into reality"). It works on all kinds of files, and barely touches the CPU (since it isn't doing diffs). I would use rdiff-backup for administrative backups of email, code, and that sort of thing, where text is involved and user restore is not an issue.

      Different tools for different jobs!

      Mike

    2. Re:rdiff-backup is easier and more efficient by heydan · · Score: 2, Informative

      Yes, but rdiff-backup uses librsync to do its work. It benefits from exactly the same algorithm that rsync does. I agree it's very efficient. I'm just saying you don't avoid any of the work of computing diffs by using rsync as opposed to rdiff-backup so that should not be a reason to choose one method over the other.

    3. Re:rdiff-backup is easier and more efficient by sc0rpi0n · · Score: 3, Informative

      I've used rsync for my backups until now, but I've downloaded rdiff-backup 0.9.5 and I love it already!

      New users: use the development version, it's a lot more efficient if you have a lot of small files, because it uses librsync instead of executing rdiff for each file. I've measured a factor 20 speedup on my devel directory!

  10. Not snapshots by Florian+Weimer · · Score: 5, Informative

    The method Mike describes does not create snapshots, so you can't use it to create consistent backups: Files can be written while they are read by rsync, and lots of software (including databases) requires cross-file data consistency (some broken software even expects permanent inode numbers!). rsync can be used for backups (if you trust the algorithm), but in most cases, you have to do other things to get a proper backup.

    At home, I store xfsdump output encrypted with GnuPG on an almost public (and thus untrusted) machine with lots of disk space (on multiple disks). At work, I do the same, but the untrusted machine is in turn backed up using TSM. In both cases, incremental backups work in the expected way. Of course, all this doesn't solve the snapshot problem (I'd probably need LVM for that), but with the encryption step, you can more easily separate the backup from your real box (without worrying too much about the implications).

    1. Re:Not snapshots by mikerubel · · Score: 2, Informative
      These are not snapshots in the sense of LVM or NetApp; they do not freeze the whole filesystem at a particular point in time between atomic transactions. This technique is a hack for something like a small-office file server. It helps deal with accidental deletions or overwrites, which seem to account for the majority of restore jobs. Think of it as an easier and more intuitive replacement for tar-to-tape. If you're running a database where every transaction counts, you'll need to spend the money and buy a more reliable system!

      Mike

  11. Re:What I'd really like... by gordon_schumway · · Score: 5, Informative
    Then you should check out LVM. From the LVM HOWTO:
    A wonderful facility provided by LVM is 'snapshots'. This allows the administrator to create a new block device which is an exact copy of a logical volume, frozen at some point in time. Typically this would be used when some batch processing, a backup for instance, needs to be performed on the logical volume, but you don't want to halt a live system that is changing the data. When the snapshot device has been finished with the system administrator can just remove the device. This facility does require that the snapshot be made at a time when the data on the logical volume is in a consistent state, later sections of this document give some examples of this.
    --

    Ha! I kill me!

  12. Re:not that great. by _Shorty-dammit · · Score: 1, Informative

    because raid isn't going to give you a file in the form that it was in 3 days ago, it's only going to safeguard your current data. The point here is to give you a backup as well as access to data as it was yesterday or the day before.

  13. Re:'man dump' by mikerubel · · Score: 2, Informative
    Hi TarpaKungs,

    I was originally using the --backup-dir trick, and you're right, it allows you to back up the same data. The advantage to doing it as described in the article is that you get what appear to be full backups at each increment. This makes it simpler for your users, who can now think of the backup directories as full backups.

    Hope that helps--

    Mike

  14. Re:What I'd really like... by nettdata · · Score: 3, Informative

    A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image.

    We used to do this years ago before any such "options" were provided by drive manufacturers.

    We were doing large Oracle backups, and there were issues with taking too much time to do a backup.

    What we did was to throw some extra drives into the (at the time, software) RAID, so that we had a mirror of what we wanted to backup. At backup time, we'd shut down the Oracle instance, break the mirror, and then re-start the Oracle instance. The whole procedure resulted in less than 2 minutes of downtime for the instance, which was more than acceptable. We'd then take the "broken" mirror, re-mount it under a "temp" mount point, and then take our time backing it up (it usually took about 6-8 hours). Once we were finished backing it up, we'd then re-attach the broken mirrors and re-silver it. This was all done via software RAID, before journalling was available.

    We did this about once a week, and it worked out great.

    --



    $0.02 (CDN)
  15. Re:simple encrypted backup by kcurrie · · Score: 2, Informative

    the only downside is that you need to feed a password:

    Not if you use the ssh-agent, and maybe keychain.
    Before you run that command in a script, put this code previous to it:

    keychain -q /root/.ssh/id_dsa
    . /root/.ssh-agent-box750
    tar cvzf - $1 | ssh $2 '( cd $3; tar xvzf - )'

    Now the first time you run the command, it will ask you for your key passphrase, but any subsequent runs will work passwordlessly.

    I use a similar script with rsync and it works great. Set up a cron job to automatically do the backup, and once after the box boots start a manual bkup (thus loading the key), and it'll work automatically from there.

    Keychain can be found here: http://www.gentoo.org/projects/keychain.html

    --
    -- I speak only for myself.
  16. Re:I use this method at work by Cool+E · · Score: 2, Informative

    you might want to take a look at this for doing backups via rsync over ssh.

  17. rsync-backup - a similar approach by wstearns · · Score: 2, Informative

    I have a similar script called rsync-backup. This one does automatic daily snapshots, works over ssh, and uses rsync and hardlinks (to save space), chroot, and an ssh forced command for security.

    --
    Mason, Buildkernel and more: http://www.stearns.org/