What Software Do You Use for Unix Backups?
jregel asks: "Linus has stated that
dump should not be considered a reliable backup program, and both tar and cpio have their limitations. So what are Slashdot readers doing for backing up Linux servers and workstations? (you do backup, right?)" Given this bit of news, have you used anything other than the standard Unix staple to back up your Linux boxes? If you were forced off of tar, cpio and dump, what would you use as a replacement?
You know, I was thinking about the same thing since I had problems with a recent restore from a compressed dump archive. I was missing some files probably because I ran the dump from an active file system.
I found out that solaris has a very interesting command: fssnap
It creates a read-only snapshot of your filesystem intended for backup operations.
You create a snapshot, dump the snapshot, then delete the snapshot and the dump is consistent.
I wonder if there's something like this for linux...
I think the 2 above are both excellent, Taper for the less demanding environment, BUpEdge for a system with multiple drives.
/var/squid) instead of just simply skipping that directory.
I'm actually doing a 100gb backup as we speak... so good timing on the Ask Slashdot.
My only beef with Taper (and I'd use it otherwise, on my home system) is that when you do an "e"xclude or "i"nclude of a directory, it scans the entire subtree, which can take *forever*, (like when excluding
mindslip
I use rsync over ssh too; I back it up to a machine at work (which I can reach from home). It basically does my whole home directory except for a few excludes for stuff that's a bit sensitive (ssh keys, keychain, ICQ history) which I manually backup to CD now and then. The machine at work is then backed up with TSM.
The rsync over ssh style of backup is so easy it's addictive!
I have been extremely happy with Amanda. Single centralised backup server running amanda-server. Multiple workstations running the amanda-client. Amanda automagically schedules backups based on sensible heuristics. I just tell Amanda how many tapes I have, how many workstations I have, and Amanda does all the hard work of working out how much tape capacity is required and how often it should schedule incrementals/fulls.
The server/client protocol has been designed to avoid reliance on dangerous security holes like rsh. The server sends the client a "send me your dump" message. The client then connects back to the server and delivers it the output from dump or tar. You can configure exclusion lists on the client if you're worried about sending certain files or filesystems. You can also encrypt the data stream and/or use Kerberos for authentication.
If I forget to load a blank tape then Amanda plays it safe. It doesn't overwrite last night's backup: instead it stores incrementals into the "holding disk". Amanda will then flush the held backups to the next blank tape.
Amanda emails me reports after every backup with a neat summary of what went right/wrong. It also gives you several hours advance warning if you forget to load a blank tape or if any of the workstations are offline.
The only downside of Amanda is that it is fiddly to setup. The documentation is poor and the configuration files are cryptic. But if you're willing to invest some time and effort then you can't do much better (for free) than Amanda.
Features:
For those who don't know: AMANDA cannot append to tapes.
Every time you backup with AMANDA it must start from the beginning of the tape.
So, if you want backups every day, you must have a tape for every day.
(http://amanda.sourceforge.net/fom-serve/cache/29
Some people have already mentioned Amanda.
In addition to amanda, I have good luck with star coded by Jörg Schilling. star is very feature-rich, fast, standards compliant and has been around since 1985. Give it a try!
The star-users mailing list is here . You can also look at the man page and finally download it
Corporate Gadfly
Jonathan Archer: the most beaten up Enterprise captain in Star Trek history
http://backuppc.sourceforge.net/
Automated backups to an online disk server, open source, and a really nice web interface as well as command line interface.
It uses samba and ssh to backup and restore to windows and unix machines.
You can have it restore any files/folders in a backup you select, using the same methods (samba or ssh) as well as it can send the restore files to your browser in a tar or zip file.
I recently replaced a machine using amanda and a DLT drive with a fileserver using a raid 5 array and backuppc. Best switch ever.
> The problem is tar always archives the entire space which makes it difficult to
> backup, say gigabytes of data, daily.
>
> A decent backup tool (as opposed to an archival tool) must absolutely have
> incremental backup support.
Er?
tar --help
[snip]
Operation modifiers:
-G, --incremental handle old GNU-format incremental backup
-g, --listed-incremental handle new GNU-format incremental backup
[snip]
Local file selection:
-N, --newer=DATE only store files newer than DATE
--newer-mtime compare date and time when data changed only
[snip]
This is in tar (GNU tar) 1.12
(Which is really really old actually.. slackware 3.2 dist)
There are also tons of options to exclude directorys and files, to force it to span disks, and pretty much match in any way you need.
I've been making incremental backups (and even restored a few) for awhile now.
Have you even read Linus's comments?
Dump works by reading the raw data partition. That works great with an unmounted partition, or if you have a very limited OS that does not perform any caching.
But Linux is different - it's now using the cached pages as the primary content, usually flushing them to disk only as the pages are dropped. This is the approach used by most mature OSes, but Linux doesn't yet have an interface for "dump" programs to query the OS for updated but unwritten sectors.
So dump is the worst of all possible things now. Not only will you get incomplete live files, you can get incomplete files even if the users have all terminated but the pages haven't been flushed to disk yet. That's non-deterministic, and there's simply no way for you to perform reliable dumps.
On the practical side, dump is specific to the filesystem. When everyone ran ext2, that wasn't a problem. But now people may have a mixture of ext2, ext3, reiserfs, xfs, jfs, and probably even other formats. Each requires their own dump and restore, and that requires a lot more effort.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken