What Software Do You Use for Unix Backups?
jregel asks: "Linus has stated that
dump should not be considered a reliable backup program, and both tar and cpio have their limitations. So what are Slashdot readers doing for backing up Linux servers and workstations? (you do backup, right?)" Given this bit of news, have you used anything other than the standard Unix staple to back up your Linux boxes? If you were forced off of tar, cpio and dump, what would you use as a replacement?
If you were forced off of tar, cpio and dump, what would you use as a replacement?
I'd use dd of course...
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
You know, I was thinking about the same thing since I had problems with a recent restore from a compressed dump archive. I was missing some files probably because I ran the dump from an active file system.
I found out that solaris has a very interesting command: fssnap
It creates a read-only snapshot of your filesystem intended for backup operations.
You create a snapshot, dump the snapshot, then delete the snapshot and the dump is consistent.
I wonder if there's something like this for linux...
I wrote my own (Perl) script, that copies all my "important" files (basically stuff in my home directory that can't be reconstructed by other means and all the system config files) to a new directory tree (using cpio) it then burns the copied tree to CD-RW and verifies the CD against the copied tree.
I operate a 4 disc system, so I always have the last four backups on CD and I keep the copied trees around (uncompressed) for as long as I have disk space. So far I've not needed the CDs (I store 2 of them offsite in case of disaster) but the copied filesystem trees have come in useful a couple of times.
The only drawback of this is it's not appropriate for backing up huge quantites of data (like lots of audio or video files) as the CD media is quite limited in size - but when rewritable holographic storage comes along I'll be able to just change my function that decides which files are "important".
#exclude <ms/windows.h>
I think the 2 above are both excellent, Taper for the less demanding environment, BUpEdge for a system with multiple drives.
/var/squid) instead of just simply skipping that directory.
I'm actually doing a 100gb backup as we speak... so good timing on the Ask Slashdot.
My only beef with Taper (and I'd use it otherwise, on my home system) is that when you do an "e"xclude or "i"nclude of a directory, it scans the entire subtree, which can take *forever*, (like when excluding
mindslip
Seems to me that Linus (or another kernel hacker) should fix the ext2 race condition reported in that thread, rather than blithely dismiss the problem with, "dump was a stupid program in the first place."
I use rsync over ssh too; I back it up to a machine at work (which I can reach from home). It basically does my whole home directory except for a few excludes for stuff that's a bit sensitive (ssh keys, keychain, ICQ history) which I manually backup to CD now and then. The machine at work is then backed up with TSM.
The rsync over ssh style of backup is so easy it's addictive!
I have been extremely happy with Amanda. Single centralised backup server running amanda-server. Multiple workstations running the amanda-client. Amanda automagically schedules backups based on sensible heuristics. I just tell Amanda how many tapes I have, how many workstations I have, and Amanda does all the hard work of working out how much tape capacity is required and how often it should schedule incrementals/fulls.
The server/client protocol has been designed to avoid reliance on dangerous security holes like rsh. The server sends the client a "send me your dump" message. The client then connects back to the server and delivers it the output from dump or tar. You can configure exclusion lists on the client if you're worried about sending certain files or filesystems. You can also encrypt the data stream and/or use Kerberos for authentication.
If I forget to load a blank tape then Amanda plays it safe. It doesn't overwrite last night's backup: instead it stores incrementals into the "holding disk". Amanda will then flush the held backups to the next blank tape.
Amanda emails me reports after every backup with a neat summary of what went right/wrong. It also gives you several hours advance warning if you forget to load a blank tape or if any of the workstations are offline.
The only downside of Amanda is that it is fiddly to setup. The documentation is poor and the configuration files are cryptic. But if you're willing to invest some time and effort then you can't do much better (for free) than Amanda.
Features:
For those who don't know: AMANDA cannot append to tapes.
Every time you backup with AMANDA it must start from the beginning of the tape.
So, if you want backups every day, you must have a tape for every day.
(http://amanda.sourceforge.net/fom-serve/cache/29
They say tar has its limitations. I really dont understand.
Ive worked with different unixen and Linux distros, so I just dont want to be dependant on something that isnt installed by default everywhere. tar already has a VERY well known format and execution parameters.
Ive lost my fair share of data to buggy harddrives and dumb mistakes like pulling off the ide cable while the system is running. So cron does daily backups using tar cfj using a file that has a list of other files to be backed up. This way I dont have to backup the whole partition. To restore a certain file, just tar xvfj backup2.tar.bz2
The cron setup renames backup.bz2 to backup2.bz2 and removes backup2.bz2 so I have the data for the past two days. Beside incremental backup which I dont need due to this setup, what else could I need? And by the way the backup.bz2 is copied off onto an NFS share elsewhere incase my whole RAID setup crashes, or the XFS filesystem bombs out. This setup can be replicated onto FreeBSD Solaris and many others.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
Some people have already mentioned Amanda.
In addition to amanda, I have good luck with star coded by Jörg Schilling. star is very feature-rich, fast, standards compliant and has been around since 1985. Give it a try!
The star-users mailing list is here . You can also look at the man page and finally download it
Corporate Gadfly
Jonathan Archer: the most beaten up Enterprise captain in Star Trek history
http://backuppc.sourceforge.net/
Automated backups to an online disk server, open source, and a really nice web interface as well as command line interface.
It uses samba and ssh to backup and restore to windows and unix machines.
You can have it restore any files/folders in a backup you select, using the same methods (samba or ssh) as well as it can send the restore files to your browser in a tar or zip file.
I recently replaced a machine using amanda and a DLT drive with a fileserver using a raid 5 array and backuppc. Best switch ever.
Tivoli Storage Manager is the only "backup solution" that I have ever seen that truly works well without alot of tweaking and twiddling.
I've worked at places using Legato and Amanda, where restoring from backup was an unreliable and error-prone process more likely to be a waste of time than anything else.
TSM is not cheap, but is worth every penny. We have one full time and one part time employee handle the backup/restore jobs for about 2000 servers. Try that with Legato or Amanda.
Conformity is the jailer of freedom and enemy of growth. -JFK
> The problem is tar always archives the entire space which makes it difficult to
> backup, say gigabytes of data, daily.
>
> A decent backup tool (as opposed to an archival tool) must absolutely have
> incremental backup support.
Er?
tar --help
[snip]
Operation modifiers:
-G, --incremental handle old GNU-format incremental backup
-g, --listed-incremental handle new GNU-format incremental backup
[snip]
Local file selection:
-N, --newer=DATE only store files newer than DATE
--newer-mtime compare date and time when data changed only
[snip]
This is in tar (GNU tar) 1.12
(Which is really really old actually.. slackware 3.2 dist)
There are also tons of options to exclude directorys and files, to force it to span disks, and pretty much match in any way you need.
I've been making incremental backups (and even restored a few) for awhile now.
Have you even read Linus's comments?
Dump works by reading the raw data partition. That works great with an unmounted partition, or if you have a very limited OS that does not perform any caching.
But Linux is different - it's now using the cached pages as the primary content, usually flushing them to disk only as the pages are dropped. This is the approach used by most mature OSes, but Linux doesn't yet have an interface for "dump" programs to query the OS for updated but unwritten sectors.
So dump is the worst of all possible things now. Not only will you get incomplete live files, you can get incomplete files even if the users have all terminated but the pages haven't been flushed to disk yet. That's non-deterministic, and there's simply no way for you to perform reliable dumps.
On the practical side, dump is specific to the filesystem. When everyone ran ext2, that wasn't a problem. But now people may have a mixture of ext2, ext3, reiserfs, xfs, jfs, and probably even other formats. Each requires their own dump and restore, and that requires a lot more effort.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken