Linux Backups Made Easy
mfago writes "A colleague of mine has written a great
tutorial on how to use rsync to create automatic "snapshot-style" backups. Nothing is required except for a simple script, although it is thus not necessarily suitable for data-center applications. Please try to be gentle on his server: it is the $80 computer that he mentions in the tutorial. Perhaps try the Google cache." An excellent article answering a frequently asked question.
...for posting a link to the Google cache in the story description on the main page! mfago, you are a genius!
Perhaps more article submitters (or editors) could add these links more frequenly?
I had the chance to be the first post, but decided to mirror the site first.
My mirror is here
"Live Free or Die." Don't like it? Then keep out of the USA
What's wrong with dump? It works great, and you can send stuff to gzip, bzip2, etc for data compression... even pipe the stuff over ssh to a server somewhere else. Dump also supports incremental backups. It also works on a lower level than rsync (which works on the filesystem level) and supports multiple volumes easily.
I work with Mike and started using his scripts a while back for my own department. With HD space so cheap these days, it makes sense to have an online backup. Especially for those of us who can't afford a NetApp. It really saves time for restoring those every day user deletes. Way to go Mike!
Do not install your programs to it. But wouldn't I have to reinstall most of my programs if I reinstalled Windows anyways?
I am the "computer guy" for a small company, and I use this method to make back-ups of our Samba file server. It's great! The main file server has Samba and everyone works off of it. The backup server has almost twice the disk space, but it doesn't really need that much. It never seems to be more than a couple of percent bigger. I keep 'snapshots' going back various time intervals up to a week, and do the tape backup off of the backup machine early in the morning. Thank you Mike Rubel!
And the "small" partition had better be > 4GB otherwise downloading 1.1GB files over http may just come unstuck unless you move your windows temp directory.
I am really tempted to mod this up as Funny, but I am afraid that you are serious. There are so many things wrong with your statements that I don't know where to start. I guess I'll just have to assume that this was a joke, and you weren't really serious. If you *were* serious, then you really really need to educate yourself about these new fangled computer dealies.
My beliefs do not require that you agree with them.
Slashdot is now a reference for tutorials? Ever try www.tldp.org or www.linuxtoday.com (they post links to tutorials).
Does the name Pavlov ring a bell?
But wheres the sense of achievement of getting /.d if we all use the cache - /.ing is a sign that you have raised yourself above trollbait level.
Its a sign of peer approval.
That's probably one good reason.
I use tar to maintain critical daily backups. I am still pretty new to Linux, so does this essentially do the same thing?
Dawn of the Dead
I use rdist to do much the same thing.
/misc/backup/current ; ; /misc/backup/current /misc/backup/snapshot.$DATE" ;
A simple example for my home directory is:
#
# Make a local copy of the contents of the home directory.
# Also make a local copy populated with hard links.
#
# This has the effect of preserving snapshots through time
# without too much overhead. (Cost of hard links + changed files.)
#
~ -> localhost
install -oremove
except ~/tmp
cmdspecial ~ "DATE=`/bin/date +\"%%Y-%%m-%%d.%%T\"` ; cp -al
Note that I get dated backup directories, and that I can add as many "except" clauses as I want, so I don't need to backup junk directories.
(.mozilla caches, etc.)
My backup drive is mounted via automount, so it is rarely mounted. Just change "localhost" to host the backup on another machine.
I guess the whole thing goes to prove that, within anything computer related, there is more than one way to do it. Clever tutorial, gang. =^_^=
This sig no verb.
And people wonder why computer techs get a bad name.
Except that with your approach you have to get up every 4 hours through the night to replicate what this guy has achieved.
Move along, sheep. Go wait for your shepherd to tell you what to do.
My beliefs do not require that you agree with them.
Been using a script called glastree on several production file servers for quite some time now.
.
It work just great! At one site I've got about 7 weeks of depth from 3 different servers all
mirrored via ssh-nfs on one lowly Penti 133. We still spin tapes mind you, but glastree has
been flawless.
Been meaning to buy the author a virtual beer for some time now . .
http://igmus.org/code/
From the website:
'The poor man's daily snapshot, glastree builds live backup trees, with branches for each day. Users directly browse the past to recover older documents or retrieve lost files. Hard links serve to compress out unchanged files, while modified ones are copied verbatim. A prune utility effects a constant, sliding window.'
--
Everyone hates me because I'm paranoid.
We have a hybrid network of Win2k and Linux servers at work. Our backup server is a Win2k box with a little over a terabyte of storage. We have an internal "utility" Linux box that is running Samba and rsync. For our production Linux boxes, we only have rsync to use for backups. The interesting way we back up the production boxes is by rsyncing to a backup share on the Win2k backup server that is mounted on the utility box using samba. At first, we thought that this would make things kinda slow, but actually they run at full speed.
Just thought I'd share our little Linux backup experience.
experienced windows (l)users here in the asia side usually uses 3 partitions - one for windows and its applications, another for ghost and the ghost image to renew the partition for windows and applications. the thrid partition stores data like word files and icq database...
*(change all pronouns to the appropriate gender)
Really? You can use Win2k to back up your Linux drives? How does that work?
There's a little daemon called 'Samba.' It's fairly obscure, and doesn't have many users, but I will explain how it works:
one can create SMB shares available to Windows clients, and then use the rather trivial Win2k backup util.
You might find mention of 'samba' in the bowels of Google somewhere, but I don't think anyone still uses it.
It's ok to use easy stuff for backups, just as users of automatic transmissions aren't pussies or limp wristed buffoons for picking something simple and automated.
Backups are for wimps. Real men upload their data to an FTP site and have everybody else mirror it.
Also, it should probably also be done from the real server to the backup server so that you can not just break one machine and get into all. (if you break into the real machine as root then you should be able to get into the backup machine)
This allows the backup machine to have only one open port. ssh which can be tcpwrapped to allow connections only from the machines that it backsup.
I've been doing backups this way on Linux for aLongTime(tm). On FreeBSD I've also used dump/restore to an NFS-mounted RAID drive (does dump work okay on Linux these days? I've always been afraid to try it for some reason, maybe earlier versions weren't stable).
rsync is just so cool. First of all, it can work over the network through ssh, or through it's own daemon (faster), or on a local filesystem. You can "pull" backups from the server or "push" them from the client. Over the network, it can divides the files into blocks and just sends the blocks that are different. It has a fairly sophisticated way to specify files to exclude/include (for instance, exclude /home/*/.blah/* can be used to not save the contents of everybody's .blah directory, but keep the directory itself). You can set up a script to just backup given subdirectories so you can checkpoint your important project without backing up the whole show. etc etc.
I use it both to save over the network using the rsync daemon, and to a local separate drive. On a local drive it's great, because you can easily retrieve files that you've accidentally deleted, just using cp. It's also great for stuff like "diff -r /etc /backups/etc" to see if something changed.
I never thought of his technique for incremental backups, but since it uses hard links, I wonder how that interferes with the original hard links in your files?? Looks interesting.
There are many flags and options that rsync has, here are the ones I use to pull complete backups from another host onto a local drive (yeah --archive is a bit redundant here).
The backup scheme described here uses hard links to avoid storing multiple copies of identical files, but when a large file changes even in a small way it stores a whole fresh copy of that file. rdiff-backup is more efficient because it stores one complete copy of your current tree with reverse diffs that allow you to step back to previous versions if you need to. If a large file changes in a small way, only the reverse diff is stored to encode that. This is very handy for cases where, for example, a multiple megabyte e-mail inbox has had just a few kilobytes of new messages appended to the end (although the rsync/rdiff-backup algorithm is also efficient with changes in the middle of a file). Being more efficient in this way translates directly to an increase in the number of past versions you can fit in the same space which can make all the difference if it takes you a while to realize that a given file has been accidentally deleted or damaged.
http://rdiff-backup.stanford.edu/
I was about to start using --backupdir with my rsyncs to do incremental, but this is a lot more slick. Right now I just run it with --delete weekly, so my live backups vary from none to 7 days old for deleted files. We run tapes too, so it wasn't a big deal, but the tape robot is on the way out, so I needed to get true incrementals going soon.
It's stories like this that keep me reading Slashdot. (Other than ranting on YRO stories, but that is no where near as cool as a neat trick like this)
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Comment removed based on user account deletion
A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image. The backup software saw only the frozen image, while the rest of the OS saw the disc as normal including updates made after the freeze occurred. The disc array maintained the frozen image until the backup was complete, guaranteeing a true snapshot as at a specific instant in time.
I wonder whether such a thing would be possible in software. Possibly it can even be done through cunning application of the tools that we already have. I imagined that you might be able to do something like it by extending the loopback device interface. Does anyone out there have any cunning ideas?
The method Mike describes does not create snapshots, so you can't use it to create consistent backups: Files can be written while they are read by rsync, and lots of software (including databases) requires cross-file data consistency (some broken software even expects permanent inode numbers!). rsync can be used for backups (if you trust the algorithm), but in most cases, you have to do other things to get a proper backup.
At home, I store xfsdump output encrypted with GnuPG on an almost public (and thus untrusted) machine with lots of disk space (on multiple disks). At work, I do the same, but the untrusted machine is in turn backed up using TSM. In both cases, incremental backups work in the expected way. Of course, all this doesn't solve the snapshot problem (I'd probably need LVM for that), but with the encryption step, you can more easily separate the backup from your real box (without worrying too much about the implications).
where exclude =
stick in a cronjob. you can also add --delete if you want. it's basic, but easy.
Comment removed based on user account deletion
Did anyone else think to themselves..I'm gunna click on that link just because it said go easy on it?
#!/bin/sh rm -Rf /SAVE/bkup.tar.gz.5
mv /SAVE/bkup.tar.gz.4 /SAVE/bkup.tar.gz.5
mv /SAVE/bkup.tar.gz.3 /SAVE/bkup.tar.gz.4
mv /SAVE/bkup.tar.gz.2 /SAVE/bkup.tar.gz.3
mv /SAVE/bkup.tar.gz.1 /SAVE/bkup.tar.gz.2
mv /SAVE/bkup.tar.gz /SAVE/bkup.tar.gz.1
tar -zcf /SAVE/bkup.tar.gz /etc /var/spool/mail /home /var/www
Then I have an FTP script that runs once per day on the OTHER server sitting there (dare I say, the MS box) that grabs the BKUP.TAR.GZ from the linux box..And does much the same as far as replication.
= Grow a brain...
I guess its better to trust your server at 4.20 than the operator. well, for many operators that is. even if its 4.20pm, I'd still prefer to let the machine do the critical work instead of some sysadmins. knowing what I know about many sysadmins at 4.20 that is..
[hint: double entendre on 420. not sure if the author knew this or not. or maybe I just stated what was terribly obvious.]
--
"It is now safe to switch off your computer."
I don't consider snapshot backups backups; they're snapshots.
I've been using a utility called Flexbackup -- it's a perl script which will do multi-level backups (i.e. incremental), spew to tape or file, use tar, afio or dump and compression. Oh yes, and it will use rsh/ssh for network backups. I wish I could buy the author a beer or few but it seems to be unsupported now. Oh well.
Email me if you want a copy and can't find it. I've also got a patch to fix a minor table of contents bug with modern versions of mt.
Speaking from experience.
Deleted
It seems that it would be much more efficient if each application handled its own backup scheme. I don't need to backup my whole drive. Certainly not my mp3s or my applications.
The problem I've had with rsynch is that it seems to have a file tree size limit at which is spits up and fails. I dont know how to work around this. Yes I do do evil things like have 20,000 files in my direcories, but I have sound reasons for doing so not shear lazyness. Anybody know how to fix this
Some drink at the fountain of knowledge. Others just gargle.
Anyone know anything about this issue? I can't find the necessary info in the rsync docs.
Judging by the fact that this technique does seem to work, I presume that rsync never modifies a file in-place, but I wonder if that's a guarantee, or just the current behaviour?
(Also, I am aware of the --whole-files command-line argument, but that's an orthogonal issue.)
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Well, funny this should come up as I just got my new backup drive yesterday and was going to sit down to write a backup script. Thanks! :)
This sig has been temporarily disconnected or is no longer in service
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
A little plug for my own script, to make personal backups.. Gained its name because it selectively only backs up directories you specify and ignores those you specify also. I use it to create a selective backup of my system, leaving out things I backup seperately like music, etc.. and I usually can make it end up with a 400-500mb iso of just system dirs with conf's, the kernel source, and my home dirs with all configurations in tact. Some example files are included.
:)
Take a look if you're interested
Click on "backpack"
-r
This works because I don't throw my mp3/ogg, pr0n, etc into the repository. I'll have to figure out a new solution when I hit the 650MB/800MB limit, but it works for now. I'll probably just have my repository on a different computer and use ssh or a get another HD speciffically for backup purposes.
I started using this system after reading the Pragmatic Programmer. They recommend throwing using CVS for everything that is important. It's great for more than just code. And this way, whenever I install a new distro, I have all my settings since I save my .emacs, .mozilla, .kde, .etc directories.
rsync has faithfully served my backup needs for many a year, but recently I've needed to conserve disk space and fell back to tar because I couldn't find any way to run rsync on a compressed backup to update it... the closest thing I found was to patch the kernel for a compressed filesystem support and then run rsync on that... any tips anyone?
no no no, what I want, is a DVD burner that holds a stack of dvds and automatically burns them, then dumps them into storage for me.
Liberty.
Anyone have any solutions for creating win32 backups w/o an expensive commerical package?
-- Eric
As others have noted, you can get snapshots using LVM.
What I would really like, however, is the ability to have the file system keep versions of a file as the file is written to or deleted; I don't want a shapshot every hour, I want a new single-file snapshot for every change to the file. And I want to be able to set or clear an attribute to control which files/directories this gets done in (i.e., chattr +u, which currently doesn't really do anything). And I want the old snapshots to age and vanish on their own, say, 3 days after they are made (or however many days the sysadmin chooses).
Under Windows, with Norton Utilities, you can get this sort of functionality with the Norton Protected Recycle Bin. I have been wishing for this on Linux for quite some time.
I remember reading about something called the "Snap filesystem" which would someday offer this, but I can't find anything about it now on the web.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
I'd like to hear from you on this subject.
Stating on Slashdot that I like cheese since 1997.
I don't know, I've never tried it. I use rsync.
IMHO, this is a great solution - I've been looking for something like this for fuss-free backups at work. Viola.
Being the only "computer guy" at work sucks ass when you're the programmer/sysadmin/engineer/tech. Gah.
Backing up to your disk is all very good agains errors of manipuation, but what if the disk fails?
And what about people like me who backup to a DLT (or whatever) tape drive? Not much use then either.
In any case I don't see this as being extremely useful in the real world (i.e. beyond the casual backing up of a home machine)...
May contain traces of nut.
Made from the freshest electrons.
The site was never down; it's just that my roommate, a windows user, noticed the connection was slow and reset the cable modem. He's quite upset about being unable to play Warcraft III. :)
I've never had a slashdot nick before, so I just created this one today. I'll try to go through some of the comments and provide useful feedback.
Thanks for your interest everyone!
Mike
Mike
I was going to suggest cpbk but the web page seems to have disappeared. I hope someone else has the source for this because all I bothered to download was the RPM.
Here is part of the man page that describes cpbk:
Backup Copy is basically a smart copy program that allows
a user to copy mass files from one place to another. When
coping over a previous copy, the key features will allow
coping only of new or non existing files in the backup.
This results in saving time and less load on the drive.
Built into the same feature of copying new files only, is
a file removal procedure. If a file is removed from the
source path, the same file will be removed when the next
backup is performed. This provides a backup that is
exactly the same as the source without filling up the
drive. As an added option, all files that will be over-
written or deleted when doing a copy over a previous
backup, have the opportunity to be stored in a trash bin.
You can leave this trash bin to grow and grow just in case
you need a backup of your backup. When you start running
out of disk space you will need to remove or clean up the
trash bin.
It's dead simple to use. Just cpbk srcdir destdir and all files in your backup directory (including subdirectories) are updated to the latest version in the source directory and deleted files are removed and so on.
Slick as all hell.
If you're a zombie and you know it, bite your friend!
Interestingly enough, the server has plenty of extra capacity, it's the cable modem connection that seems to have been saturated.
Thanks in advance for being courteous and using a mirror or the google cache!
Mike
Mike
the only downside is that you need to feed a password:
/root/.ssh/id_dsa /root/.ssh-agent-box750
Not if you use the ssh-agent, and maybe keychain.
Before you run that command in a script, put this code previous to it:
keychain -q
.
tar cvzf - $1 | ssh $2 '( cd $3; tar xvzf - )'
Now the first time you run the command, it will ask you for your key passphrase, but any subsequent runs will work passwordlessly.
I use a similar script with rsync and it works great. Set up a cron job to automatically do the backup, and once after the box boots start a manual bkup (thus loading the key), and it'll work automatically from there.
Keychain can be found here: http://www.gentoo.org/projects/keychain.html
-- I speak only for myself.
I do it BOFH-style. /dev/null
1.2 seconds, no hassle with tapes:
backup * >
I like rsync and it works great over ssh. But there seems to be no way to run rsync as a cron job because it will hang asking for the ssh password. Keys and ssh-agent seems like the solution - until you try it and find that don't work with cron :(
"Don't belong. Never join. Think for yourself. Peace." V.Stone, Microsoft Corporation
You have to be careful on macs in my experinece. simply copying all you mac files to another unix file system can sometimes loose the extra stuff in mac files like resource forks and creator codes. I'm not an expert on this, but macs have the unix command "ditto" (see man ditto) which is like 'cp' but takes care of that extra stuff. I dont know what rsynch would do.
Some drink at the fountain of knowledge. Others just gargle.
You should be using maildirs
Cuase he doced and shared it; you didn't, thats the big deal.
Keychain is indeed a great tool, it really makes ssh shine.
In fact it even works on winblows, using cygwin.
I use it all the time to synchronise files via cvs.
Have you ever wanted to do backups easily and cheaper than tape? Well then Stitch is the answer for you. Use the features of rsync without the hassle of setting it up. This is not meant for data center scale backups, but for small departments or institutions. A mirror of the page can be found here. You don't have a client side, all thats required is ssh on the client. This way there isn't a client side to maintain. It does incremental backups on a monthly rotation and allows you to easily restore systems.
First, where are you getting all of this money for tape? I want some. Second, CD-RW are not praticle for backing up large amount of data and neither are DVD-R disks. I have to backup 0.5TB of data and yes thats a T as in terabyte. Therefore disks work much better. I just have a 1TB raid5 arry with one hot spare that cost under 5K to backup systems.
Third, hard drive come out as being much cheeper than tape even in the long run.You don't need removable disks, you just need to have the machine in a different building if possible. A tape library to hold the amount of data that I need to hold would be over 5K and then I would have to buy tapes which are around $100 a peice, that doesn't seem very economical to me being that for less money I can build two 1TB, and yes thats still a T for terabye, backup systems and put them both in separate buildings. That way if one completely fails I still have all of my backups.
Fourth, I really want to see you try to fit 0.5TB, and yes again thats a T for terabyte, on a CD-RW. Just let me know when you manage to do that.
Size problems I've run into with Apple Software Restore (4GB limit) and Rsych (appears to have either a file count limit or a directory tree depth limit) are concerns. Can you say if this new rsynch has removed this limit, or if psynch has it?
Some drink at the fountain of knowledge. Others just gargle.
First, tar is not reliable.
Second, you could use ssh keys.
Third, look here for something that already does this.
tar isn't that reliable
This has absolutely nothing to do with this thread and should be moderaleted down to -1
Linus says dump is deprecated. Although I hear that patches have been added to make it stable in the most recent 2.4.x kernels.
I've lost count of the number of times you've said that tar isn't reliable. What on earth do you mean?
status is failure. status is failure
I have a similar script called rsync-backup. This one does automatic daily snapshots, works over ssh, and uses rsync and hardlinks (to save space), chroot, and an ssh forced command for security.
Mason, Buildkernel and more: http://www.stearns.org/
I have been working all night trying to figure out why rsync keeps crashing my server. Yes, compiled and installed 2.5.4 (supposedly stable).
/dev/hda1 to /dev/hdc1 /dev/hdc1 and now I actually get an error message (and clean error exit) saying:
... dones g8 sep02.log/ log/httpd/error_log
..Trevor..
I am backing up from
After 6 reboots and 12 fsck I finally got a complete image on
building file list
var/log/
var/log/boot.msg
var/log/boot.om
var/log/httpd/
var/log/httpd/access.combined.
var/log/httpd/access.combined.log
var
rsync: error writing 4 unbuffered bytes - exiting: Broken pipe
rsync error: error in rsync protocol data stream (code 12) at io.c(464)
rsync: error writing 70 unbuffered bytes - exiting: Broken pipe
rsync error: error in rsync protocol data stream (code 12) at io.c(464)
A quick search of Usenet on Google shows the same problem reported twice in June 2002. How buggy is rsync? Anybody else (besides us three) have stability problems?
I encountered similar errors using rsync. It was a while ago and I really can't recall the exact error codes, but the problem was a lack of disk space on the receiving machine.
http://blog.grcm.net/
If you use tar to backup files it can screw you over because if the tar archive get currupted 4K into the file then you can't restore the rest of your files. If you insist on using something like tar I have heard good things about BRU
That error doesn't say that its not completing the backups because of too many files. It sounds like you are running out of space.
C'mon guys,
/var/log/httpd/access.combined.log
..Trevor..
I haven't admitted making a mistake like 'out of space' for years- Not since I started writing Linux columns for BYTE.com -lol
I have been trying to work the problem out with the rsync mailing list. Every second message I receive tells me its my kernel or my hardware - sheez.
Anyway - rsync v2.5.5 no longer crashes the server (THIS IS A KNOWN BUG IN 2.5.4 - can you believe that? The 'fork' bug - too many forks while executing rsync as root will bring down all open processes)(thanks to Paul Haas for that info)
I now have a consistent and repeatable rsync failure. Whenever it tries to copy
I guess it doesn't like having to argue with Apache over who has the log open -lol