Slashdot Mirror


Linux Backups Made Easy

mfago writes "A colleague of mine has written a great tutorial on how to use rsync to create automatic "snapshot-style" backups. Nothing is required except for a simple script, although it is thus not necessarily suitable for data-center applications. Please try to be gentle on his server: it is the $80 computer that he mentions in the tutorial. Perhaps try the Google cache." An excellent article answering a frequently asked question.

75 of 243 comments (clear)

  1. thank you... by cmckay · · Score: 3, Insightful

    ...for posting a link to the Google cache in the story description on the main page! mfago, you are a genius!

    Perhaps more article submitters (or editors) could add these links more frequenly?

    1. Re:thank you... by zrodney · · Score: 2, Interesting

      Google cache

      yes -- that was a refreshing change from the
      usual postings where the page is /.ed . thank you!

    2. Re:thank you... by weird+mehgny · · Score: 2

      Also wait for Google blocking requests refered from slashdot.org when they find out how much bandwidth is at stake :)

      I prefer the idea that has been suggested by many previously, putting copies of linked articles right here on Slashdot.

    3. Re:thank you... by Monkeyman334 · · Score: 2, Insightful

      No way, I'd rather Joe Blow's server go down than waste google's bandwidth. Google doesn't have any ads on their cache pages. Slashdot should setup their own caching, or pay for a caching service, if they want to link it from the main page.

    4. Re:thank you... by Anonymous Coward · · Score: 4, Insightful

      Are you serious? Crush some guys server rather than using the publically available Google copy, because the Google page DOESN'T HAVE ADS?????? Who pays this guy for his server and bandwidth?? Do you make sure every page you view has ads on it? Are you a marketing exec or something??

      This "ads pay for everything on the internet" mentality is INSANE!!

    5. Re:thank you... by seanadams.com · · Score: 2

      Yes, lets not take unfair advantage of the hapless fools at google who aren't putting ads on their cached pages. Give me a break. Last I checked, those cached pages start with some text about google, and they contain google.com in the URL. EYEBALLS ==REVENUE for Google.

      I'm sure they're perfectly happy to get the exposure from Slashdot linking to their cache. If they weren't I'm sure their programmers could figure out if ($ENV{'REFERRER'}=~/slashdot/i) {print "Content-type:text/plain\n\nplease don't link directly to our cache.";}

  2. First Mirror by doublem · · Score: 4, Informative

    I had the chance to be the first post, but decided to mirror the site first.

    My mirror is here

    --
    "Live Free or Die." Don't like it? Then keep out of the USA
  3. 'man dump' by isa-kuruption · · Score: 2, Interesting

    What's wrong with dump? It works great, and you can send stuff to gzip, bzip2, etc for data compression... even pipe the stuff over ssh to a server somewhere else. Dump also supports incremental backups. It also works on a lower level than rsync (which works on the filesystem level) and supports multiple volumes easily.

    1. Re:'man dump' by dhogaza · · Score: 2

      Why not read the article? Then you'll see why the author thinks rsynch is a better tool for network-based backups. You may not agree with the author but if you actually took the time to read the article you'd see that he is fully aware of the existence of dump

    2. Re:'man dump' by isa-kuruption · · Score: 2

      The other thing I forgot to mention is that rsync does not support incremental backups. Sure, it will incrementally update the tree on the other end, but it will not allow you to go back to your filesystem snapshot from last saturday if you have done an rsync of your data since that point. It doesn't effectively keep a backup of old data, it just syncs the current data. This would make it difficult to recover from, for example, a box that was hacked and trojanned last week when you've done an rsync since.

    3. Re:'man dump' by GigsVT · · Score: 2, Informative

      Read the fucking article, that's the point. He uses hard links to make a second copy of the backed up directory, exploiting the fact that rsync always unlinks before changing a file, thereby effectively doing incremental backups without wasting hard disk space.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    4. Re:'man dump' by isa-kuruption · · Score: 2

      There's no need to get abusive, geesh... grow up.

      I specifically said "rsync" does not support that. Whether he fools around with a script to do it for him, is another story. MY POINT is that dump does this ALL FOR YOU, and would have required LESS time to implement and would be a more reliable solution, as it was designed for doing filesystem backups!

    5. Re:'man dump' by TarpaKungs · · Score: 2, Informative

      Hi

      rsync --backup-dir ...

      2 years ago I wrote a script to do pretty much what the linked product does - ie: maintain a duplicate set of data areas on another machine via rsync.

      I use the --backup-dir option to relocate copies of the files which the current rsync run would otherwise delete or modify.

      With a bit of rotation, we can have users helping themselves to a full view of their
      home directory as of last night and also be able to restore files effectively from each day of the week going back 7 days in our case.

      Sure does cut down on the number of tape restore requests.

      As mentioned it is incredibly efficient - we deal with about 900GB of data backed up in this way - but rsync actually transfers about only 10-30GB of differences each night.

      Only problem is my script was a crap prototype which is why I'm not letting anyone see it ;-)

      But I do have a design in my head for a more professional effort (will be opensourced) - I'm might even get enough peace at work to write it one day!

      --
      Why can't women be like Hedy Lamarr - beautiful, talented and inventors of frequency-hopping spread-spectrum techn
    6. Re:'man dump' by GigsVT · · Score: 4, Informative

      It's an expression, it's not particularly abusive.

      rm -rf backup.3
      mv backup.2 backup.3
      mv backup.1 backup.2
      cp -al backup.0 backup.1
      rsync -a --delete source_directory/ backup.0/

      There. That's the script basically. Add more snapshot levels as needed, stick it in cron at whatever interval you need.

      dump only supports ext2/3. This supports any file system, and retreiving a file from backups is as simple as running "cd" to the directory of the snapshot you need and "cp" the file out.

      I run backups from Linux to IRIX and other UNIXs using gnu rsync and openssh. This little trick is going to be very handy for me. I can't waste my time worrying about which filesystem type the files came from originally.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    7. Re:'man dump' by mikerubel · · Score: 2, Informative
      Hi TarpaKungs,

      I was originally using the --backup-dir trick, and you're right, it allows you to back up the same data. The advantage to doing it as described in the article is that you get what appear to be full backups at each increment. This makes it simpler for your users, who can now think of the backup directories as full backups.

      Hope that helps--

      Mike

  4. Works great! by schmutze · · Score: 3, Insightful

    I work with Mike and started using his scripts a while back for my own department. With HD space so cheap these days, it makes sense to have an online backup. Especially for those of us who can't afford a NetApp. It really saves time for restoring those every day user deletes. Way to go Mike!

  5. I use this method at work by Bistronaut · · Score: 2

    I am the "computer guy" for a small company, and I use this method to make back-ups of our Samba file server. It's great! The main file server has Samba and everyone works off of it. The backup server has almost twice the disk space, but it doesn't really need that much. It never seems to be more than a couple of percent bigger. I keep 'snapshots' going back various time intervals up to a week, and do the tape backup off of the backup machine early in the morning. Thank you Mike Rubel!

    1. Re:I use this method at work by Cool+E · · Score: 2, Informative

      you might want to take a look at this for doing backups via rsync over ssh.

  6. Re:Bandwidth isn't free, thats what. by NiteHaqr · · Score: 2, Insightful

    But wheres the sense of achievement of getting /.d if we all use the cache - /.ing is a sign that you have raised yourself above trollbait level.

    Its a sign of peer approval.

  7. Because Linus says dump isn't reliable. by glrotate · · Score: 5, Informative
    1. Re:Because Linus says dump isn't reliable. by zrodney · · Score: 3, Informative

      he says right there in the linked article that
      dump can't reliably back up the filesystem
      because of the kernel filesystem caching, and that
      future kernel development is headed further in that
      direction, so you might as well not depend on dump.

      seems pretty reasonable to me, go ahead and use
      dump if you like though

    2. Re:Because Linus says dump isn't reliable. by donutello · · Score: 2

      So Linux 2.4 was released with a major known bug in it which causes a critical backup feature to not work at all putting you at risk of losing all your work?

      I thought we beat up Billy Boy for doing that.

      --
      Mmmm.. Donuts
    3. Re:Because Linus says dump isn't reliable. by cloudmaster · · Score: 2

      Dump doesn't work with reiserFS, sync or no sync. AFAIK, it only works with the ext* systems, and it depends on the filesystem's internal structure being known. Low-level backups are bad.

      Tar or other systems that get the files through the regular file reading interface are better because they take advantage of the filesystem interface abstraction layer instead of going around it. That works well, and there's no reason to do backups otherwise. None. Not a single one. IMHO. :)

    4. Re:Because Linus says dump isn't reliable. by MSG · · Score: 2

      Dump doesn't work with reiserFS

      The fact that reiserfs doesn't include a "dump" of its own isn't a failing in "dump", but a failure of the ReiserFS developers.

      Yes, dump is and always was fs-specific. That's something that's always been understood.

      It's also the only way to back up ACL's and other extended metadata. Data backup is good, but file metadata is important, too. You wouldn't back up your data with no file names, would you? File names are a small part of the metadata associated with a file. Tar and cpio only get a subset of that data.

      As filesystems move toward extending the amount of metadata they store (ACL's, and extended attributes now, ReiserFS is moving toward ever more complex metadata), backup programs are going to have to be extended to store that information in the archives. Until they do, only dump is reliable.

      Spread the word.

    5. Re:Because Linus says dump isn't reliable. by cloudmaster · · Score: 2

      My point is that going around the kernel-provided filesystem access methods is bad. Dump's *implementation* is a bad one. If there's data stored that can't be read using standard utilities and the standard filesystem interface, then it shouldn't need to be backed up.

    6. Re:Because Linus says dump isn't reliable. by MSG · · Score: 2

      The problem isn't that the data can't be read by standard interfaces, but that tar and cpio just don't know about them yet. ACL's, for example, are a critical feature, long missing from open source UNIX platorms (they're common on some other UNIX platorms, and of course NT).

      Tar and cpio back up the standard UNIX permission set, but that set is really inadequate. Until they can back up the full set of ACL's, they're basically useless on systems that use them.

  8. Check out glastree by Soylent+Beige · · Score: 3, Informative

    Been using a script called glastree on several production file servers for quite some time now.
    It work just great! At one site I've got about 7 weeks of depth from 3 different servers all
    mirrored via ssh-nfs on one lowly Penti 133. We still spin tapes mind you, but glastree has
    been flawless.

    Been meaning to buy the author a virtual beer for some time now . . .

    http://igmus.org/code/

    From the website:
    'The poor man's daily snapshot, glastree builds live backup trees, with branches for each day. Users directly browse the past to recover older documents or retrieve lost files. Hard links serve to compress out unchanged files, while modified ones are copied verbatim. A prune utility effects a constant, sliding window.'
    --

    --
    Everyone hates me because I'm paranoid.
    1. Re:Check out glastree by Jeremy+Wohl · · Score: 2, Informative
      Hi, I'm the glastree author.

      Yes, my software does essentially this, wrapped up in a nice utility (though, you get day resolution).

      What we want, of course, is a better replica of plan9's dumpfs, featuring a real filesystem layer and compressed block differences. This is on my TODO list.

      -jeremy

  9. What about MAC OS X??? by Eric_Cartman_South_P · · Score: 2
    Does anyone know how something like this would work on Mac OS X? The backup utility is the only thing I like about their .NET^H^H^HMAC service.

    1. Re:What about MAC OS X??? by BlueGecko · · Score: 2

      SilverKeeper (http://www.silverkeeper.com/) from LaCie is the only free backup solution for OS of which I am aware. While not as full-featured as Retrospect, it's not bad if all you want to backup is your /User directory and maybe a few other things. You can set up specified things to backup, and then restore them or synchronize them. (In fact, its synchronization feature makes it extremely handy with an iPod, where you can use it to ensure that the Documents folder on both devices is always the same without having to delete the folder on the iPod and then recopy it each time.) If you need to backup everything on the disk, however, pretty much your only choice is going to be to use the extremely buggy ditto command with the command line utilities for manipulating .dmgs, or alternatively to purchase Retrospect. .Mac's backup solution is awful and does not seem, IMVHO, to offer anything over SilverKeeper. You'd be better of spending that $100 on Retrospect anyway if that is the only thing you are interested in.

    2. Re:What about MAC OS X??? by Permission+Denied · · Score: 2
      This will work fine with OS X if you use UFS.

      This won't work with HFS because of the file forks. If you use UFS with OS X, the file forks appear as normal files. Eg, if you have a file named "foo", "._foo" is the resource fork. I don't know where they keep the finder fork, and I've never cared to investigate.

      Here's a tip if you have to use OS X for a file server of any kind: use two partitions (or two disks), one HFS and one UFS. The OS and any applications are installed on the HFS partition and all data goes on UFS. Use HFS for the OS because a lot of stuff breaks when running under UFS and UFS performance is still roughly twice as bad as HFS in 10.2 (run your own little benchmarks if you don't believe me). Keep user data on UFS so you can use tools like tar, rsync, etc. to back up and manipulate files. Remember, tar won't work on most HFS files (those with forks). If you're deploying OS X Server, you should definitely keep user data on a separate partition anyway since any tiny little mistake (eg, LDAP typo in Directory Assistant) will require a reformat-reinstall.

      Another tip: if you create a tarball off a UFS filesystem and then untar that onto a HFS filesystem, it will preserve the forks correctly. This has come in quite useful in making "setup" scripts for end-user machines, where all the applications to install are stored in tarballs created on a UFS machine and you can untar them onto the target HFS machine (the advantage is that you can script this - add in a couple of niutil commands and you can recreate a user machine in a couple minutes from one script).

      I have a couple of OS X Server machines (bosses like the GUI user management stuff). I just tried rsync over NFS to a Linux box and it works fine since the data is on a UFS partition on the OS X Server box. PITA to set up an NFS share remotely (since I don't have Macs at home -> no Remote Desktop, no usable VNC servers for OS X -> have to do it over ssh -> must figure out how NFS exports are stored in netinfo -> gnashing of teeth), but it works and I might try this little trick next week since we're not doing anything systematic for backups on the OS X boxen.

      Also, radmind is a great tool for managing filesystems of OS X client machines. It supports HFS (by using AppleSingle internally).

  10. SSH comment needs to be added! by Anonymous Coward · · Score: 3, Informative
    This sounds great, I would like to thank the author for the article. Only one thing really should be added. The way that you should do rsync for a back up server is to do rsync over ssh with a passwordless connection. (see http://www.unixadm.net/howto/rsync-ssh.html with google cache)

    Also, it should probably also be done from the real server to the backup server so that you can not just break one machine and get into all. (if you break into the real machine as root then you should be able to get into the backup machine)

    This allows the backup machine to have only one open port. ssh which can be tcpwrapped to allow connections only from the machines that it backsup.

    1. Re:SSH comment needs to be added! by spencerogden · · Score: 2

      The way ssh keys work is such that if the remote machine (the machine being logged into and backed up) only has the public key, which as its name suggest is ment to be public.

  11. oh my, rsync backups roxxor by Dr.+Awktagon · · Score: 3, Interesting

    I've been doing backups this way on Linux for aLongTime(tm). On FreeBSD I've also used dump/restore to an NFS-mounted RAID drive (does dump work okay on Linux these days? I've always been afraid to try it for some reason, maybe earlier versions weren't stable).

    rsync is just so cool. First of all, it can work over the network through ssh, or through it's own daemon (faster), or on a local filesystem. You can "pull" backups from the server or "push" them from the client. Over the network, it can divides the files into blocks and just sends the blocks that are different. It has a fairly sophisticated way to specify files to exclude/include (for instance, exclude /home/*/.blah/* can be used to not save the contents of everybody's .blah directory, but keep the directory itself). You can set up a script to just backup given subdirectories so you can checkpoint your important project without backing up the whole show. etc etc.

    I use it both to save over the network using the rsync daemon, and to a local separate drive. On a local drive it's great, because you can easily retrieve files that you've accidentally deleted, just using cp. It's also great for stuff like "diff -r /etc /backups/etc" to see if something changed.

    I never thought of his technique for incremental backups, but since it uses hard links, I wonder how that interferes with the original hard links in your files?? Looks interesting.

    There are many flags and options that rsync has, here are the ones I use to pull complete backups from another host onto a local drive (yeah --archive is a bit redundant here).

    rsync --verbose --archive --recursive --links --hard-links \
    --perms --owner --group --devices --times --sparse \
    --delete --delete-excluded --numeric-ids --stats --partial
    --password-file=/root/.rsyncd.password \
    rsync://backupuser@xyz.dom.com/full/ \
    /backups/systems/xyz/
  12. rdiff-backup is easier and more efficient by heydan · · Score: 5, Informative

    The backup scheme described here uses hard links to avoid storing multiple copies of identical files, but when a large file changes even in a small way it stores a whole fresh copy of that file. rdiff-backup is more efficient because it stores one complete copy of your current tree with reverse diffs that allow you to step back to previous versions if you need to. If a large file changes in a small way, only the reverse diff is stored to encode that. This is very handy for cases where, for example, a multiple megabyte e-mail inbox has had just a few kilobytes of new messages appended to the end (although the rsync/rdiff-backup algorithm is also efficient with changes in the middle of a file). Being more efficient in this way translates directly to an increase in the number of past versions you can fit in the same space which can make all the difference if it takes you a while to realize that a given file has been accidentally deleted or damaged.

    http://rdiff-backup.stanford.edu/

    1. Re:rdiff-backup is easier and more efficient by mikerubel · · Score: 2, Informative
      Thanks for mentioning this!

      Rdiff-backup is an excellent utility, and Ben Escoto (its author) and I link to each other. You must realize, though, that the purposes are different. Rdiff-backup is more space efficient for things like text, email, and so on. My rotating snapshot trick is less space-efficient, but much simpler for the average user to understand ("just go into your snapshot directory and copy the old file back into reality"). It works on all kinds of files, and barely touches the CPU (since it isn't doing diffs). I would use rdiff-backup for administrative backups of email, code, and that sort of thing, where text is involved and user restore is not an issue.

      Different tools for different jobs!

      Mike

    2. Re:rdiff-backup is easier and more efficient by heydan · · Score: 2, Informative

      Yes, but rdiff-backup uses librsync to do its work. It benefits from exactly the same algorithm that rsync does. I agree it's very efficient. I'm just saying you don't avoid any of the work of computing diffs by using rsync as opposed to rdiff-backup so that should not be a reason to choose one method over the other.

    3. Re:rdiff-backup is easier and more efficient by sc0rpi0n · · Score: 3, Informative

      I've used rsync for my backups until now, but I've downloaded rdiff-backup 0.9.5 and I love it already!

      New users: use the development version, it's a lot more efficient if you have a lot of small files, because it uses librsync instead of executing rdiff for each file. I've measured a factor 20 speedup on my devel directory!

  13. What I'd really like... by MadAndy · · Score: 5, Interesting
    This method, like most backup solutions, doesn't take a backup as at a specific instant, but instead takes it over a period of time - the length of time required to make the backup, which can be a problem if the data being backed up is changing all the time.

    A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image. The backup software saw only the frozen image, while the rest of the OS saw the disc as normal including updates made after the freeze occurred. The disc array maintained the frozen image until the backup was complete, guaranteeing a true snapshot as at a specific instant in time.

    I wonder whether such a thing would be possible in software. Possibly it can even be done through cunning application of the tools that we already have. I imagined that you might be able to do something like it by extending the loopback device interface. Does anyone out there have any cunning ideas?

    1. Re:What I'd really like... by gordon_schumway · · Score: 5, Informative
      Then you should check out LVM. From the LVM HOWTO:
      A wonderful facility provided by LVM is 'snapshots'. This allows the administrator to create a new block device which is an exact copy of a logical volume, frozen at some point in time. Typically this would be used when some batch processing, a backup for instance, needs to be performed on the logical volume, but you don't want to halt a live system that is changing the data. When the snapshot device has been finished with the system administrator can just remove the device. This facility does require that the snapshot be made at a time when the data on the logical volume is in a consistent state, later sections of this document give some examples of this.
      --

      Ha! I kill me!

    2. Re:What I'd really like... by Cyberdyne · · Score: 2
      A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image. The backup software saw only the frozen image, while the rest of the OS saw the disc as normal including updates made after the freeze occurred. The disc array maintained the frozen image until the backup was complete, guaranteeing a true snapshot as at a specific instant in time.

      Sounds like the Network Appliance Filer's "snapshot" feature, but less advanced. (You can also get exactly the feature described under Linux purely in software, via LVM, now.) Under the NetApp version, you gain an extra directory ".snapshot", which contains previous versions of each file. So, if you screw up editing some file (delete/corrupt it, whatever) you can just grab a previous snapshot copy. Like having a series of online backups - but without all the extra space+hardware needs. Like CVS, but without the hassle (or fine-grained control) of doing "commits". Just tell the Filer "take a snapshot now" and 30 seconds later, it's done. Or "take snapshots every hour".

      Neat feature - you could almost get this using LVM under Linux, but not quite...

    3. Re:What I'd really like... by Fweeky · · Score: 2

      FreeBSD 5 will ship with UFS snapshots which will do what you want; it's also used to freeze the disk state for background fsck's, among other things. They're even stackable.

    4. Re:What I'd really like... by afidel · · Score: 2

      This and the remote mirroring is why I love our netapps so much. I have never had to pull files from tape for anything that is on the netapp because we have it set to pull snapshots hourly during the day and each day for a week plus each friday for a month. This way you have tight granularity for the day and week and can still pull back a file from up to a month ago. I don't care that our net f880 cluster is around $150K for only 4TB of raw space, or about 2 TB of usable space, it pays for itself in lower admin time and the basically zero loss of data it provides (yes we still do tape backups but mostly for disaster recovery, like I said I have never in 2 years pulled anything from tape for the netapp.)

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:What I'd really like... by nettdata · · Score: 3, Informative

      A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image.

      We used to do this years ago before any such "options" were provided by drive manufacturers.

      We were doing large Oracle backups, and there were issues with taking too much time to do a backup.

      What we did was to throw some extra drives into the (at the time, software) RAID, so that we had a mirror of what we wanted to backup. At backup time, we'd shut down the Oracle instance, break the mirror, and then re-start the Oracle instance. The whole procedure resulted in less than 2 minutes of downtime for the instance, which was more than acceptable. We'd then take the "broken" mirror, re-mount it under a "temp" mount point, and then take our time backing it up (it usually took about 6-8 hours). Once we were finished backing it up, we'd then re-attach the broken mirrors and re-silver it. This was all done via software RAID, before journalling was available.

      We did this about once a week, and it worked out great.

      --



      $0.02 (CDN)
  14. Not snapshots by Florian+Weimer · · Score: 5, Informative

    The method Mike describes does not create snapshots, so you can't use it to create consistent backups: Files can be written while they are read by rsync, and lots of software (including databases) requires cross-file data consistency (some broken software even expects permanent inode numbers!). rsync can be used for backups (if you trust the algorithm), but in most cases, you have to do other things to get a proper backup.

    At home, I store xfsdump output encrypted with GnuPG on an almost public (and thus untrusted) machine with lots of disk space (on multiple disks). At work, I do the same, but the untrusted machine is in turn backed up using TSM. In both cases, incremental backups work in the expected way. Of course, all this doesn't solve the snapshot problem (I'd probably need LVM for that), but with the encryption step, you can more easily separate the backup from your real box (without worrying too much about the implications).

    1. Re:Not snapshots by mikerubel · · Score: 2, Informative
      These are not snapshots in the sense of LVM or NetApp; they do not freeze the whole filesystem at a particular point in time between atomic transactions. This technique is a hack for something like a small-office file server. It helps deal with accidental deletions or overwrites, which seem to account for the majority of restore jobs. Think of it as an easier and more intuitive replacement for tar-to-tape. If you're running a database where every transaction counts, you'll need to spend the money and buy a more reliable system!

      Mike

  15. why get so complex? by ywwg · · Score: 3, Funny

    if [ `df |grep /mnt/backup |wc -l` != "1" ]
    then
    echo Backup drive not mounted, skipping procedure
    exit 2
    fi
    cd /mnt/backup
    rsync -vaz --exclude-from=/root/exclude $1 $2 $3 $4 $5 / .


    where exclude =
    /mnt/cdrom /mnt/usb /mnt/backup /mnt/abyss1 /mnt/abyss2 /proc /tmp


    stick in a cronjob. you can also add --delete if you want. it's basic, but easy.
    1. Re:why get so complex? by p3d0 · · Score: 2

      Want to know why so complex? READ THE ARTICLE. It's explained quite clearly.

      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  16. at WHAT time in the morning? by TheGratefulNet · · Score: 2
    at 4.20? is that right? the tutorial he included on rsync alluded to this.

    I guess its better to trust your server at 4.20 than the operator. well, for many operators that is. even if its 4.20pm, I'd still prefer to let the machine do the critical work instead of some sysadmins. knowing what I know about many sysadmins at 4.20 that is..

    [hint: double entendre on 420. not sure if the author knew this or not. or maybe I just stated what was terribly obvious.]

    --

    --
    "It is now safe to switch off your computer."
  17. Flexbackup by tzanger · · Score: 2

    I don't consider snapshot backups backups; they're snapshots.

    I've been using a utility called Flexbackup -- it's a perl script which will do multi-level backups (i.e. incremental), spew to tape or file, use tar, afio or dump and compression. Oh yes, and it will use rsh/ssh for network backups. I wish I could buy the author a beer or few but it seems to be unsupported now. Oh well.

    Email me if you want a copy and can't find it. I've also got a patch to fix a minor table of contents bug with modern versions of mt.

    1. Re:Flexbackup by p3d0 · · Score: 2
      I don't consider snapshot backups backups; they're snapshots.
      Care to explain the difference for the uninitiated? Why can't a "snapshot" serve the same function as a backup?
      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    2. Re:Flexbackup by tzanger · · Score: 2

      you may be interested in this [bentlogic.net]. it does backups via rsync over ssh. its still in development so i'm sure features can be added if users request them.

      It's got the exact same basic problem that the backup method featured in this article has -- they're snapshots, not multi-level backups. Each "pull" is a complete copy; I can't say "Give me all the files which have changed since my last levelx backup." That's what Flexbackup (well tar or afio) allows me to do. That's exactly what rsync isn't designed to do, as far as I can tell (I've used it before but I'm not an expert at it).

    3. Re:Flexbackup by tzanger · · Score: 2

      Care to explain the difference for the uninitiated? Why can't a "snapshot" serve the same function as a backup?

      I didn't say it couldn't serve as a backup, but it's not a backup in the sense that I can keep the last 6 months' worth of changes and pull from any of them. With snapshots I need to either keep 6 months of full daily backups or postprocess the daily snapshots and turn them in to differential backups.

      An example might help. I do daily backups of our servers. Let's call the daily backups level 3 backups. Now each week I do a level 2 backup. Each month I do a level 1 backup, and every quarter I do a level 0 backup. Let's analyze:

      • Level 0 - full backup, every quarter
      • Level 1 - Monthly backup, just changes from the last month's backup
      • Level 2 - weekly, just changes from last week's backup
      • Level 3 - daily, just changes from yesterday
        • I store the Level 0 backups on DVD-(+?)RW, and the rest on two 6-tape magazines. Level 1&2 on DDS3 IIRC, and Level 3 on DDS. I can pull back any file changed in the last quarter, just like someone could pull back a file from a particular day in CVS.

      With full snapshot backups this would take an insane amount of disk space. As I said earlier I could postprocess the snapshots and create differential backups but why do the extra work when tar/afio does this automatically? RSync isn't that special, and with an incredible script like Flexbackup it's even less special.

      It would be great if rsync could tell the other end "this file has changed, here are the changes" and have the backing-up end copy the file and apply the changes -- i.e. allowing the creation of differential backups. That's not what it's designed for, though.

    4. Re:Flexbackup by tzanger · · Score: 2

      I hate to break it to you but that is what rsync does. If the file already exits where is is copying to it will send the delta (think diff but more efficant and works with binary files.) and only update the changes.

      You didn't read carefully enough. and have the backing-up end copy the file and apply the changes -- I don't want one snapshot, I want a base snapshot and then any changes to be saved in an entirely new tree structure.

      Basically this: Take your snapshot normally. Now ask for all the files that changed between the snapshot and today. rsync sends the diff. Now for each file mentioned in the diff, copy the entire file from the snapshot to another directory and apply the diff to that copy. Now your new directory has full files that are up to date, but only the diffs were sent over. That is not what rsync does.

  18. Are backups the right solution? by anthony_dipierro · · Score: 2

    It seems that it would be much more efficient if each application handled its own backup scheme. I don't need to backup my whole drive. Certainly not my mp3s or my applications.

    1. Re:Are backups the right solution? by mikerubel · · Score: 2
      Anthony,

      You can exclude any part of the filesystem from the backups, or particular types of files, or files that match a particular pattern; see the "exclude" section in the rsync man page.

      I'm not sure I agree that applications should handle their own backups! Don't forget that applications are run as their owners, so if they are broken or hacked, they can destroy the backups too. Far better, I think, to have the backups removed where user-level processes can't touch them. And probably a lot simpler too!

      Mike

    2. Re:Are backups the right solution? by anthony_dipierro · · Score: 2

      You can exclude any part of the filesystem from the backups, or particular types of files, or files that match a particular pattern; see the "exclude" section in the rsync man page.

      I don't know about you, but my filesystem certainly isn't organized enough for that to be useful.

      Don't forget that applications are run as their owners, so if they are broken or hacked, they can destroy the backups too.

      Well, I was thinking more along the lines of backing up to a third party server over the internet, in which case there wouldn't be permission to delete old copies until after a certain period of time. I dunno, in the case of my system, there's very little that needs to be backed up. In fact, I really can't think of anything.

  19. Hard links and file diffs? by p3d0 · · Score: 2
    I'm wondering what happens to the hard links when rsync decides it only needs to update part of a file. If it is guaranteed to write a brand-new file with the merged changes, that's good. If, on the other hand, it changes the backup file in-place, then all the older backups that are only hard links will also see those changes, and that's a Bad Thing.

    Anyone know anything about this issue? I can't find the necessary info in the rsync docs.

    Judging by the fact that this technique does seem to work, I presume that rsync never modifies a file in-place, but I wonder if that's a guarantee, or just the current behaviour?

    (Also, I am aware of the --whole-files command-line argument, but that's an orthogonal issue.)

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  20. The answer? by p3d0 · · Score: 2
    I just found the answer looking through Mike Rubel's source code:
    # step 4: rsync from the system into the latest snapshot (notice that
    # rsync behaves like cp --remove-destination by default, so the destination
    # is unlinked first. If it were not so, this would copy over the other
    # snapshot(s) too!
    I wonder how he discovered this? I can't find it in the man page.
    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    1. Re:The answer? by mikerubel · · Score: 2, Interesting
      I wonder how he discovered this? I can't find it in the man page.

      Rsync source code, then a lot of testing! :)

      Mike

      ps: You're right, if there is any change in the file, the original is unlinked first, then the new one is written over top of it. So it does work as advertised! Thanks for your help answering questions btw.

  21. CVS, cron, and an RW by Screaming+Lunatic · · Score: 2
    That's what I'm using at the moment. I use a cron job to throw all my important directories into my repository every night. Then I burn it onto an RW.

    This works because I don't throw my mp3/ogg, pr0n, etc into the repository. I'll have to figure out a new solution when I hit the 650MB/800MB limit, but it works for now. I'll probably just have my repository on a different computer and use ssh or a get another HD speciffically for backup purposes.

    I started using this system after reading the Pragmatic Programmer. They recommend throwing using CVS for everything that is important. It's great for more than just code. And this way, whenever I install a new distro, I have all my settings since I save my .emacs, .mozilla, .kde, .etc directories.

  22. chattr +u by steveha · · Score: 2

    As others have noted, you can get snapshots using LVM.

    What I would really like, however, is the ability to have the file system keep versions of a file as the file is written to or deleted; I don't want a shapshot every hour, I want a new single-file snapshot for every change to the file. And I want to be able to set or clear an attribute to control which files/directories this gets done in (i.e., chattr +u, which currently doesn't really do anything). And I want the old snapshots to age and vanish on their own, say, 3 days after they are made (or however many days the sysadmin chooses).

    Under Windows, with Norton Utilities, you can get this sort of functionality with the Norton Protected Recycle Bin. I have been wishing for this on Linux for quite some time.

    I remember reading about something called the "Snap filesystem" which would someday offer this, but I can't find anything about it now on the web.

    steveha

    --
    lf(1): it's like ls(1) but sorts filenames by extension, tersely
  23. Re:Win2k has free backups made easy, too! by Enahs · · Score: 2
    Really. And backing up via "samba" preserves file permissions and file ownerships properly?



    I'd like to hear from you on this subject.

    --
    Stating on Slashdot that I like cheese since 1997.
  24. 4:20 by Sean+Clifford · · Score: 2
    You're right, 4:20 is a good time to do this. Let the system do the work for me while the sysadmin has a toke^H^H^H^H smoke. :)

    IMHO, this is a great solution - I've been looking for something like this for fuss-free backups at work. Viola.

    Being the only "computer guy" at work sucks ass when you're the programmer/sysadmin/engineer/tech. Gah.

  25. What about backups to tape ? by Fred_A · · Score: 2, Interesting

    Backing up to your disk is all very good agains errors of manipuation, but what if the disk fails?

    And what about people like me who backup to a DLT (or whatever) tape drive? Not much use then either.

    In any case I don't see this as being extremely useful in the real world (i.e. beyond the casual backing up of a home machine)...

    --

    May contain traces of nut.
    Made from the freshest electrons.
    1. Re:What about backups to tape ? by Cool+E · · Score: 2, Insightful

      Hard drives come out as being much cheeper than tape even in the long run.You don't need removable disks, you just need to have the machine in a different building if possible. A tape library to hold the amount of data that I need to hold would be over 5K and then I would have to buy tapes which are around $100 a peice, that doesn't seem very economical to me being that for less money I can build two 1TB, and yes thats a T for terabyte, backup systems and put them both in separate buildings. That way if one completely fails I still have all of my backups.

  26. i've been slashdotted! by mikerubel · · Score: 4, Interesting
    This slashdotting comes as a bit of a surprise; many readers have sent me improved scripts that I haven't quite gotten around to posting yet. I'll try to put them up later this weekend when the slashdotting dies down.

    The site was never down; it's just that my roommate, a windows user, noticed the connection was slow and reset the cable modem. He's quite upset about being unable to play Warcraft III. :)

    I've never had a slashdot nick before, so I just created this one today. I'll try to go through some of the comments and provide useful feedback.

    Thanks for your interest everyone!

    Mike

    1. Re:i've been slashdotted! by soloport · · Score: 3, Funny

      I know that listing my actual backup configuration here is a security risk; please be kind and don't use this information to crack my site. However, I'm not a security expert, so if you see any vulnerabilities in my setup, I'd greatly appreciate your help in fixing them. Thanks!

      First suggestion: Don't list your actual backup configuration.

  27. Re:simple encrypted backup by kcurrie · · Score: 2, Informative

    the only downside is that you need to feed a password:

    Not if you use the ssh-agent, and maybe keychain.
    Before you run that command in a script, put this code previous to it:

    keychain -q /root/.ssh/id_dsa
    . /root/.ssh-agent-box750
    tar cvzf - $1 | ssh $2 '( cd $3; tar xvzf - )'

    Now the first time you run the command, it will ask you for your key passphrase, but any subsequent runs will work passwordlessly.

    I use a similar script with rsync and it works great. Set up a cron job to automatically do the backup, and once after the box boots start a manual bkup (thus loading the key), and it'll work automatically from there.

    Keychain can be found here: http://www.gentoo.org/projects/keychain.html

    --
    -- I speak only for myself.
  28. Email messages shouldn't change at all. by Dwonis · · Score: 2

    You should be using maildirs

    1. Re:Email messages shouldn't change at all. by Electrum · · Score: 2

      No, you shouldn't. Maildir solves a problem that doesn't need solving, and it is much, much slower than just about any other mailbox format.

      Really? How do you safely modify or delete a message from an mbox file? You make a new file and copy the existing one while changing the data that you need changed and then atomically replace the original file. This means you use double the space of the mbox file and take the time to rewrite the entire file. Or, you modify the original mbox file and hope the system doesn't crash while doing so or you risk corrupting the entire thing. And you have to deal with locking in both cases. How is this better than Maildir?

      As for being slow, there are benchmarks to prove you wrong (on courier-mta.org).

  29. Re:what's the big deal? by s88 · · Score: 2, Insightful

    Cuase he doced and shared it; you didn't, thats the big deal.

  30. Re:But how do you do cron+ssh+rsync? by SCHecklerX · · Score: 2
    I like rsync and it works great over ssh. But there seems to be no way to run rsync as a cron job because it will hang asking for the ssh password. Keys and ssh-agent seems like the solution - until you try it and find that don't work with cron :(

    Use an RSA key with no password. If you are paranoid enough to be using ssh, you should be paranoid enough to be using the strong authentication provided by using RSA Keys.

    You don't really need ssh-agent.

  31. Re:Critical daily backups done by the clueless. by Old+Man+Kensey · · Score: 2
    Yet another Anonymous Coward spewed forth:

    [referring to using 'tar' to do daily backups]
    And people wonder why computer techs get a bad name.

    Eh? There's nothing wrong with tar per se. For example, let's say you want to transport your backups over a network securely (i.e., via ssh). Your choices are:

    1. Allow ssh access with no password (public-key access, preferably). I'm leery of this, because allowing anything like this to run automatically means entrusting all the auth data to the machine, where it can be compromised.

    2. Copy the backups asynchronously from making them, allowing user-initiated authentication. This was the approach I opted for when I had to put together a backup system overnight at one company.

    Couple of cron jobs that ran incremental tar's on a list of directories, storing them in the scratch partition with higher permissions (so user processes cleaning up after themselves couldn't nuke them accidentally). Then at my leisure I would run the transport script (mornings about 10 AM, typically) which would suck the backups across and copy them to the tape. This worked fine for the time the project was active. Note that I was backing up to tape, which meant I needed to manually rotate tapes anyway, so this system helped ensure that new backups didn't overwrite old ones if I came in late -- and we definitely did not want these backups exported to our network. I also had the advantage of only needing to worry about one server.

    Just because tar is old and a bit... esoteric at times, doesn't mean it's therefore automatically a stupid idea to use it. If it's what you know, and it gets the job done, there's no need to feel guilty about not using a fancier system. Even Linus likes tar, because it's rock-solid reliable.

    Now if you have (faint hope) a valid criticism of this guy's use of tar in his environment, then I'm all ears. But I doubt that, since he didn't give enough detail for you to have one.

    I don't know why I even bother with this given it's an AC post, except that assholes like this are a major reason why Linux advocates get a bad rep.

    --
    -- Old Man Kensey
  32. rsync-backup - a similar approach by wstearns · · Score: 2, Informative

    I have a similar script called rsync-backup. This one does automatic daily snapshots, works over ssh, and uses rsync and hardlinks (to save space), chroot, and an ssh forced command for security.

    --
    Mason, Buildkernel and more: http://www.stearns.org/