Linux Backups Made Easy

← Back to Stories (view on slashdot.org)

Posted by michael on Saturday September 7, 2002 @05:06AM from the no-more-tapes dept.

mfago writes "A colleague of mine has written a great tutorial on how to use rsync to create automatic "snapshot-style" backups. Nothing is required except for a simple script, although it is thus not necessarily suitable for data-center applications. Please try to be gentle on his server: it is the $80 computer that he mentions in the tutorial. Perhaps try the Google cache." An excellent article answering a frequently asked question.

19 of 243 comments (clear)

Min score:

Reason:

Sort:

thank you... by cmckay · 2002-09-07 05:12 · Score: 3, Insightful

...for posting a link to the Google cache in the story description on the main page! mfago, you are a genius!

Perhaps more article submitters (or editors) could add these links more frequenly?
1. Re:thank you... by Anonymous Coward · 2002-09-07 06:38 · Score: 4, Insightful
  
  Are you serious? Crush some guys server rather than using the publically available Google copy, because the Google page DOESN'T HAVE ADS?????? Who pays this guy for his server and bandwidth?? Do you make sure every page you view has ads on it? Are you a marketing exec or something??
  
  This "ads pay for everything on the internet" mentality is INSANE!!
First Mirror by doublem · 2002-09-07 05:13 · Score: 4, Informative

I had the chance to be the first post, but decided to mirror the site first.

My mirror is here

--
"Live Free or Die." Don't like it? Then keep out of the USA
Works great! by schmutze · 2002-09-07 05:20 · Score: 3, Insightful

I work with Mike and started using his scripts a while back for my own department. With HD space so cheap these days, it makes sense to have an online backup. Especially for those of us who can't afford a NetApp. It really saves time for restoring those every day user deletes. Way to go Mike!
Because Linus says dump isn't reliable. by glrotate · 2002-09-07 05:36 · Score: 5, Informative

"So anybody who depends on "dump" getting backups right is already playing russian rulette with their backups." Linus Torvalds
That's probably one good reason.
1. Re:Because Linus says dump isn't reliable. by zrodney · 2002-09-07 05:50 · Score: 3, Informative
  
  he says right there in the linked article that
  dump can't reliably back up the filesystem
  because of the kernel filesystem caching, and that
  future kernel development is headed further in that
  direction, so you might as well not depend on dump.
  
  seems pretty reasonable to me, go ahead and use
  dump if you like though
Re:'man dump' by GigsVT · 2002-09-07 05:45 · Score: 4, Informative

It's an expression, it's not particularly abusive.

rm -rf backup.3
mv backup.2 backup.3
mv backup.1 backup.2
cp -al backup.0 backup.1
rsync -a --delete source_directory/ backup.0/

There. That's the script basically. Add more snapshot levels as needed, stick it in cron at whatever interval you need.

dump only supports ext2/3. This supports any file system, and retreiving a file from backups is as simple as running "cd" to the directory of the snapshot you need and "cp" the file out.

I run backups from Linux to IRIX and other UNIXs using gnu rsync and openssh. This little trick is going to be very handy for me. I can't waste my time worrying about which filesystem type the files came from originally.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Check out glastree by Soylent+Beige · 2002-09-07 05:47 · Score: 3, Informative

Been using a script called glastree on several production file servers for quite some time now.
It work just great! At one site I've got about 7 weeks of depth from 3 different servers all
mirrored via ssh-nfs on one lowly Penti 133. We still spin tapes mind you, but glastree has
been flawless.

Been meaning to buy the author a virtual beer for some time now . . .

http://igmus.org/code/

From the website:
'The poor man's daily snapshot, glastree builds live backup trees, with branches for each day. Users directly browse the past to recover older documents or retrieve lost files. Hard links serve to compress out unchanged files, while modified ones are copied verbatim. A prune utility effects a constant, sliding window.'
--

--
Everyone hates me because I'm paranoid.
SSH comment needs to be added! by Anonymous Coward · 2002-09-07 05:54 · Score: 3, Informative

This sounds great, I would like to thank the author for the article. Only one thing really should be added. The way that you should do rsync for a back up server is to do rsync over ssh with a passwordless connection. (see http://www.unixadm.net/howto/rsync-ssh.html with google cache)
Also, it should probably also be done from the real server to the backup server so that you can not just break one machine and get into all. (if you break into the real machine as root then you should be able to get into the backup machine)
This allows the backup machine to have only one open port. ssh which can be tcpwrapped to allow connections only from the machines that it backsup.
oh my, rsync backups roxxor by Dr.+Awktagon · 2002-09-07 05:56 · Score: 3, Interesting

I've been doing backups this way on Linux for aLongTime(tm). On FreeBSD I've also used dump/restore to an NFS-mounted RAID drive (does dump work okay on Linux these days? I've always been afraid to try it for some reason, maybe earlier versions weren't stable).

rsync is just so cool. First of all, it can work over the network through ssh, or through it's own daemon (faster), or on a local filesystem. You can "pull" backups from the server or "push" them from the client. Over the network, it can divides the files into blocks and just sends the blocks that are different. It has a fairly sophisticated way to specify files to exclude/include (for instance, exclude /home/*/.blah/* can be used to not save the contents of everybody's .blah directory, but keep the directory itself). You can set up a script to just backup given subdirectories so you can checkpoint your important project without backing up the whole show. etc etc.

I use it both to save over the network using the rsync daemon, and to a local separate drive. On a local drive it's great, because you can easily retrieve files that you've accidentally deleted, just using cp. It's also great for stuff like "diff -r /etc /backups/etc" to see if something changed.

I never thought of his technique for incremental backups, but since it uses hard links, I wonder how that interferes with the original hard links in your files?? Looks interesting.

There are many flags and options that rsync has, here are the ones I use to pull complete backups from another host onto a local drive (yeah --archive is a bit redundant here).

rsync --verbose --archive --recursive --links --hard-links \ --perms --owner --group --devices --times --sparse \ --delete --delete-excluded --numeric-ids --stats --partial --password-file=/root/.rsyncd.password \ rsync://backupuser@xyz.dom.com/full/ \ /backups/systems/xyz/
rdiff-backup is easier and more efficient by heydan · 2002-09-07 05:57 · Score: 5, Informative

The backup scheme described here uses hard links to avoid storing multiple copies of identical files, but when a large file changes even in a small way it stores a whole fresh copy of that file. rdiff-backup is more efficient because it stores one complete copy of your current tree with reverse diffs that allow you to step back to previous versions if you need to. If a large file changes in a small way, only the reverse diff is stored to encode that. This is very handy for cases where, for example, a multiple megabyte e-mail inbox has had just a few kilobytes of new messages appended to the end (although the rsync/rdiff-backup algorithm is also efficient with changes in the middle of a file). Being more efficient in this way translates directly to an increase in the number of past versions you can fit in the same space which can make all the difference if it takes you a while to realize that a given file has been accidentally deleted or damaged.

http://rdiff-backup.stanford.edu/
1. Re:rdiff-backup is easier and more efficient by sc0rpi0n · 2002-09-07 13:50 · Score: 3, Informative
  
  I've used rsync for my backups until now, but I've downloaded rdiff-backup 0.9.5 and I love it already!
  
  New users: use the development version, it's a lot more efficient if you have a lot of small files, because it uses librsync instead of executing rdiff for each file. I've measured a factor 20 speedup on my devel directory!
What I'd really like... by MadAndy · 2002-09-07 06:09 · Score: 5, Interesting

This method, like most backup solutions, doesn't take a backup as at a specific instant, but instead takes it over a period of time - the length of time required to make the backup, which can be a problem if the data being backed up is changing all the time.
A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image. The backup software saw only the frozen image, while the rest of the OS saw the disc as normal including updates made after the freeze occurred. The disc array maintained the frozen image until the backup was complete, guaranteeing a true snapshot as at a specific instant in time.
I wonder whether such a thing would be possible in software. Possibly it can even be done through cunning application of the tools that we already have. I imagined that you might be able to do something like it by extending the loopback device interface. Does anyone out there have any cunning ideas?
1. Re:What I'd really like... by gordon_schumway · 2002-09-07 06:21 · Score: 5, Informative
  
  Then you should check out LVM. From the LVM HOWTO:
  A wonderful facility provided by LVM is 'snapshots'. This allows the administrator to create a new block device which is an exact copy of a logical volume, frozen at some point in time. Typically this would be used when some batch processing, a backup for instance, needs to be performed on the logical volume, but you don't want to halt a live system that is changing the data. When the snapshot device has been finished with the system administrator can just remove the device. This facility does require that the snapshot be made at a time when the data on the logical volume is in a consistent state, later sections of this document give some examples of this.
  
  --
  Ha! I kill me!
2. Re:What I'd really like... by nettdata · 2002-09-07 09:25 · Score: 3, Informative
  
  A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image.
  
  We used to do this years ago before any such "options" were provided by drive manufacturers.
  
  We were doing large Oracle backups, and there were issues with taking too much time to do a backup.
  
  What we did was to throw some extra drives into the (at the time, software) RAID, so that we had a mirror of what we wanted to backup. At backup time, we'd shut down the Oracle instance, break the mirror, and then re-start the Oracle instance. The whole procedure resulted in less than 2 minutes of downtime for the instance, which was more than acceptable. We'd then take the "broken" mirror, re-mount it under a "temp" mount point, and then take our time backing it up (it usually took about 6-8 hours). Once we were finished backing it up, we'd then re-attach the broken mirrors and re-silver it. This was all done via software RAID, before journalling was available.
  
  We did this about once a week, and it worked out great.
  
  --
  
  $0.02 (CDN)
Not snapshots by Florian+Weimer · 2002-09-07 06:12 · Score: 5, Informative

The method Mike describes does not create snapshots, so you can't use it to create consistent backups: Files can be written while they are read by rsync, and lots of software (including databases) requires cross-file data consistency (some broken software even expects permanent inode numbers!). rsync can be used for backups (if you trust the algorithm), but in most cases, you have to do other things to get a proper backup.

At home, I store xfsdump output encrypted with GnuPG on an almost public (and thus untrusted) machine with lots of disk space (on multiple disks). At work, I do the same, but the untrusted machine is in turn backed up using TSM. In both cases, incremental backups work in the expected way. Of course, all this doesn't solve the snapshot problem (I'd probably need LVM for that), but with the encryption step, you can more easily separate the backup from your real box (without worrying too much about the implications).
why get so complex? by ywwg · 2002-09-07 06:13 · Score: 3, Funny

if [ `df |grep /mnt/backup |wc -l` != "1" ]
then
echo Backup drive not mounted, skipping procedure
exit 2
fi
cd /mnt/backup
rsync -vaz --exclude-from=/root/exclude $1 $2 $3 $4 $5 / .

where exclude =

/mnt/cdrom /mnt/usb /mnt/backup /mnt/abyss1 /mnt/abyss2 /proc /tmp

stick in a cronjob. you can also add --delete if you want. it's basic, but easy.
i've been slashdotted! by mikerubel · 2002-09-07 08:49 · Score: 4, Interesting

This slashdotting comes as a bit of a surprise; many readers have sent me improved scripts that I haven't quite gotten around to posting yet. I'll try to put them up later this weekend when the slashdotting dies down.
The site was never down; it's just that my roommate, a windows user, noticed the connection was slow and reset the cable modem. He's quite upset about being unable to play Warcraft III. :)
I've never had a slashdot nick before, so I just created this one today. I'll try to go through some of the comments and provide useful feedback.
Thanks for your interest everyone!
Mike
1. Re:i've been slashdotted! by soloport · 2002-09-07 13:04 · Score: 3, Funny
  
  I know that listing my actual backup configuration here is a security risk; please be kind and don't use this information to crack my site. However, I'm not a security expert, so if you see any vulnerabilities in my setup, I'd greatly appreciate your help in fixing them. Thanks!
  
  First suggestion: Don't list your actual backup configuration.