Linux Backups Made Easy
mfago writes "A colleague of mine has written a great
tutorial on how to use rsync to create automatic "snapshot-style" backups. Nothing is required except for a simple script, although it is thus not necessarily suitable for data-center applications. Please try to be gentle on his server: it is the $80 computer that he mentions in the tutorial. Perhaps try the Google cache." An excellent article answering a frequently asked question.
...for posting a link to the Google cache in the story description on the main page! mfago, you are a genius!
Perhaps more article submitters (or editors) could add these links more frequenly?
I had the chance to be the first post, but decided to mirror the site first.
My mirror is here
"Live Free or Die." Don't like it? Then keep out of the USA
What's wrong with dump? It works great, and you can send stuff to gzip, bzip2, etc for data compression... even pipe the stuff over ssh to a server somewhere else. Dump also supports incremental backups. It also works on a lower level than rsync (which works on the filesystem level) and supports multiple volumes easily.
I work with Mike and started using his scripts a while back for my own department. With HD space so cheap these days, it makes sense to have an online backup. Especially for those of us who can't afford a NetApp. It really saves time for restoring those every day user deletes. Way to go Mike!
I am the "computer guy" for a small company, and I use this method to make back-ups of our Samba file server. It's great! The main file server has Samba and everyone works off of it. The backup server has almost twice the disk space, but it doesn't really need that much. It never seems to be more than a couple of percent bigger. I keep 'snapshots' going back various time intervals up to a week, and do the tape backup off of the backup machine early in the morning. Thank you Mike Rubel!
But wheres the sense of achievement of getting /.d if we all use the cache - /.ing is a sign that you have raised yourself above trollbait level.
Its a sign of peer approval.
That's probably one good reason.
Been using a script called glastree on several production file servers for quite some time now.
.
It work just great! At one site I've got about 7 weeks of depth from 3 different servers all
mirrored via ssh-nfs on one lowly Penti 133. We still spin tapes mind you, but glastree has
been flawless.
Been meaning to buy the author a virtual beer for some time now . .
http://igmus.org/code/
From the website:
'The poor man's daily snapshot, glastree builds live backup trees, with branches for each day. Users directly browse the past to recover older documents or retrieve lost files. Hard links serve to compress out unchanged files, while modified ones are copied verbatim. A prune utility effects a constant, sliding window.'
--
Everyone hates me because I'm paranoid.
Also, it should probably also be done from the real server to the backup server so that you can not just break one machine and get into all. (if you break into the real machine as root then you should be able to get into the backup machine)
This allows the backup machine to have only one open port. ssh which can be tcpwrapped to allow connections only from the machines that it backsup.
I've been doing backups this way on Linux for aLongTime(tm). On FreeBSD I've also used dump/restore to an NFS-mounted RAID drive (does dump work okay on Linux these days? I've always been afraid to try it for some reason, maybe earlier versions weren't stable).
rsync is just so cool. First of all, it can work over the network through ssh, or through it's own daemon (faster), or on a local filesystem. You can "pull" backups from the server or "push" them from the client. Over the network, it can divides the files into blocks and just sends the blocks that are different. It has a fairly sophisticated way to specify files to exclude/include (for instance, exclude /home/*/.blah/* can be used to not save the contents of everybody's .blah directory, but keep the directory itself). You can set up a script to just backup given subdirectories so you can checkpoint your important project without backing up the whole show. etc etc.
I use it both to save over the network using the rsync daemon, and to a local separate drive. On a local drive it's great, because you can easily retrieve files that you've accidentally deleted, just using cp. It's also great for stuff like "diff -r /etc /backups/etc" to see if something changed.
I never thought of his technique for incremental backups, but since it uses hard links, I wonder how that interferes with the original hard links in your files?? Looks interesting.
There are many flags and options that rsync has, here are the ones I use to pull complete backups from another host onto a local drive (yeah --archive is a bit redundant here).
The backup scheme described here uses hard links to avoid storing multiple copies of identical files, but when a large file changes even in a small way it stores a whole fresh copy of that file. rdiff-backup is more efficient because it stores one complete copy of your current tree with reverse diffs that allow you to step back to previous versions if you need to. If a large file changes in a small way, only the reverse diff is stored to encode that. This is very handy for cases where, for example, a multiple megabyte e-mail inbox has had just a few kilobytes of new messages appended to the end (although the rsync/rdiff-backup algorithm is also efficient with changes in the middle of a file). Being more efficient in this way translates directly to an increase in the number of past versions you can fit in the same space which can make all the difference if it takes you a while to realize that a given file has been accidentally deleted or damaged.
http://rdiff-backup.stanford.edu/
A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image. The backup software saw only the frozen image, while the rest of the OS saw the disc as normal including updates made after the freeze occurred. The disc array maintained the frozen image until the backup was complete, guaranteeing a true snapshot as at a specific instant in time.
I wonder whether such a thing would be possible in software. Possibly it can even be done through cunning application of the tools that we already have. I imagined that you might be able to do something like it by extending the loopback device interface. Does anyone out there have any cunning ideas?
The method Mike describes does not create snapshots, so you can't use it to create consistent backups: Files can be written while they are read by rsync, and lots of software (including databases) requires cross-file data consistency (some broken software even expects permanent inode numbers!). rsync can be used for backups (if you trust the algorithm), but in most cases, you have to do other things to get a proper backup.
At home, I store xfsdump output encrypted with GnuPG on an almost public (and thus untrusted) machine with lots of disk space (on multiple disks). At work, I do the same, but the untrusted machine is in turn backed up using TSM. In both cases, incremental backups work in the expected way. Of course, all this doesn't solve the snapshot problem (I'd probably need LVM for that), but with the encryption step, you can more easily separate the backup from your real box (without worrying too much about the implications).
where exclude =
stick in a cronjob. you can also add --delete if you want. it's basic, but easy.
I guess its better to trust your server at 4.20 than the operator. well, for many operators that is. even if its 4.20pm, I'd still prefer to let the machine do the critical work instead of some sysadmins. knowing what I know about many sysadmins at 4.20 that is..
[hint: double entendre on 420. not sure if the author knew this or not. or maybe I just stated what was terribly obvious.]
--
"It is now safe to switch off your computer."
I don't consider snapshot backups backups; they're snapshots.
I've been using a utility called Flexbackup -- it's a perl script which will do multi-level backups (i.e. incremental), spew to tape or file, use tar, afio or dump and compression. Oh yes, and it will use rsh/ssh for network backups. I wish I could buy the author a beer or few but it seems to be unsupported now. Oh well.
Email me if you want a copy and can't find it. I've also got a patch to fix a minor table of contents bug with modern versions of mt.
It seems that it would be much more efficient if each application handled its own backup scheme. I don't need to backup my whole drive. Certainly not my mp3s or my applications.
Anyone know anything about this issue? I can't find the necessary info in the rsync docs.
Judging by the fact that this technique does seem to work, I presume that rsync never modifies a file in-place, but I wonder if that's a guarantee, or just the current behaviour?
(Also, I am aware of the --whole-files command-line argument, but that's an orthogonal issue.)
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
This works because I don't throw my mp3/ogg, pr0n, etc into the repository. I'll have to figure out a new solution when I hit the 650MB/800MB limit, but it works for now. I'll probably just have my repository on a different computer and use ssh or a get another HD speciffically for backup purposes.
I started using this system after reading the Pragmatic Programmer. They recommend throwing using CVS for everything that is important. It's great for more than just code. And this way, whenever I install a new distro, I have all my settings since I save my .emacs, .mozilla, .kde, .etc directories.
As others have noted, you can get snapshots using LVM.
What I would really like, however, is the ability to have the file system keep versions of a file as the file is written to or deleted; I don't want a shapshot every hour, I want a new single-file snapshot for every change to the file. And I want to be able to set or clear an attribute to control which files/directories this gets done in (i.e., chattr +u, which currently doesn't really do anything). And I want the old snapshots to age and vanish on their own, say, 3 days after they are made (or however many days the sysadmin chooses).
Under Windows, with Norton Utilities, you can get this sort of functionality with the Norton Protected Recycle Bin. I have been wishing for this on Linux for quite some time.
I remember reading about something called the "Snap filesystem" which would someday offer this, but I can't find anything about it now on the web.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
I'd like to hear from you on this subject.
Stating on Slashdot that I like cheese since 1997.
IMHO, this is a great solution - I've been looking for something like this for fuss-free backups at work. Viola.
Being the only "computer guy" at work sucks ass when you're the programmer/sysadmin/engineer/tech. Gah.
Backing up to your disk is all very good agains errors of manipuation, but what if the disk fails?
And what about people like me who backup to a DLT (or whatever) tape drive? Not much use then either.
In any case I don't see this as being extremely useful in the real world (i.e. beyond the casual backing up of a home machine)...
May contain traces of nut.
Made from the freshest electrons.
The site was never down; it's just that my roommate, a windows user, noticed the connection was slow and reset the cable modem. He's quite upset about being unable to play Warcraft III. :)
I've never had a slashdot nick before, so I just created this one today. I'll try to go through some of the comments and provide useful feedback.
Thanks for your interest everyone!
Mike
the only downside is that you need to feed a password:
/root/.ssh/id_dsa /root/.ssh-agent-box750
Not if you use the ssh-agent, and maybe keychain.
Before you run that command in a script, put this code previous to it:
keychain -q
.
tar cvzf - $1 | ssh $2 '( cd $3; tar xvzf - )'
Now the first time you run the command, it will ask you for your key passphrase, but any subsequent runs will work passwordlessly.
I use a similar script with rsync and it works great. Set up a cron job to automatically do the backup, and once after the box boots start a manual bkup (thus loading the key), and it'll work automatically from there.
Keychain can be found here: http://www.gentoo.org/projects/keychain.html
-- I speak only for myself.
You should be using maildirs
Cuase he doced and shared it; you didn't, thats the big deal.
Use an RSA key with no password. If you are paranoid enough to be using ssh, you should be paranoid enough to be using the strong authentication provided by using RSA Keys.
You don't really need ssh-agent.
[referring to using 'tar' to do daily backups]
And people wonder why computer techs get a bad name.
Eh? There's nothing wrong with tar per se. For example, let's say you want to transport your backups over a network securely (i.e., via ssh). Your choices are:
1. Allow ssh access with no password (public-key access, preferably). I'm leery of this, because allowing anything like this to run automatically means entrusting all the auth data to the machine, where it can be compromised.
2. Copy the backups asynchronously from making them, allowing user-initiated authentication. This was the approach I opted for when I had to put together a backup system overnight at one company.
Couple of cron jobs that ran incremental tar's on a list of directories, storing them in the scratch partition with higher permissions (so user processes cleaning up after themselves couldn't nuke them accidentally). Then at my leisure I would run the transport script (mornings about 10 AM, typically) which would suck the backups across and copy them to the tape. This worked fine for the time the project was active. Note that I was backing up to tape, which meant I needed to manually rotate tapes anyway, so this system helped ensure that new backups didn't overwrite old ones if I came in late -- and we definitely did not want these backups exported to our network. I also had the advantage of only needing to worry about one server.
Just because tar is old and a bit... esoteric at times, doesn't mean it's therefore automatically a stupid idea to use it. If it's what you know, and it gets the job done, there's no need to feel guilty about not using a fancier system. Even Linus likes tar, because it's rock-solid reliable.
Now if you have (faint hope) a valid criticism of this guy's use of tar in his environment, then I'm all ears. But I doubt that, since he didn't give enough detail for you to have one.
I don't know why I even bother with this given it's an AC post, except that assholes like this are a major reason why Linux advocates get a bad rep.
-- Old Man Kensey
I have a similar script called rsync-backup. This one does automatic daily snapshots, works over ssh, and uses rsync and hardlinks (to save space), chroot, and an ssh forced command for security.
Mason, Buildkernel and more: http://www.stearns.org/