What Software Do You Use for Unix Backups?
jregel asks: "Linus has stated that
dump should not be considered a reliable backup program, and both tar and cpio have their limitations. So what are Slashdot readers doing for backing up Linux servers and workstations? (you do backup, right?)" Given this bit of news, have you used anything other than the standard Unix staple to back up your Linux boxes? If you were forced off of tar, cpio and dump, what would you use as a replacement?
If you were forced off of tar, cpio and dump, what would you use as a replacement?
I'd use dd of course...
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
You know, I was thinking about the same thing since I had problems with a recent restore from a compressed dump archive. I was missing some files probably because I ran the dump from an active file system.
I found out that solaris has a very interesting command: fssnap
It creates a read-only snapshot of your filesystem intended for backup operations.
You create a snapshot, dump the snapshot, then delete the snapshot and the dump is consistent.
I wonder if there's something like this for linux...
I wrote my own (Perl) script, that copies all my "important" files (basically stuff in my home directory that can't be reconstructed by other means and all the system config files) to a new directory tree (using cpio) it then burns the copied tree to CD-RW and verifies the CD against the copied tree.
I operate a 4 disc system, so I always have the last four backups on CD and I keep the copied trees around (uncompressed) for as long as I have disk space. So far I've not needed the CDs (I store 2 of them offsite in case of disaster) but the copied filesystem trees have come in useful a couple of times.
The only drawback of this is it's not appropriate for backing up huge quantites of data (like lots of audio or video files) as the CD media is quite limited in size - but when rewritable holographic storage comes along I'll be able to just change my function that decides which files are "important".
#exclude <ms/windows.h>
Amanda comes up a lot. They can't span tapes.
Veritas also comes up a lot. Aside from cost, did you know Veritas can't back up single files larger than 2GB in size on Linux clients?
On paper, BRU looks pretty darned good. I haven't yet put that theory into practice.
I think the 2 above are both excellent, Taper for the less demanding environment, BUpEdge for a system with multiple drives.
/var/squid) instead of just simply skipping that directory.
I'm actually doing a 100gb backup as we speak... so good timing on the Ask Slashdot.
My only beef with Taper (and I'd use it otherwise, on my home system) is that when you do an "e"xclude or "i"nclude of a directory, it scans the entire subtree, which can take *forever*, (like when excluding
mindslip
Seems to me that Linus (or another kernel hacker) should fix the ext2 race condition reported in that thread, rather than blithely dismiss the problem with, "dump was a stupid program in the first place."
I use rsync over ssh too; I back it up to a machine at work (which I can reach from home). It basically does my whole home directory except for a few excludes for stuff that's a bit sensitive (ssh keys, keychain, ICQ history) which I manually backup to CD now and then. The machine at work is then backed up with TSM.
The rsync over ssh style of backup is so easy it's addictive!
This backup machine keeps seven generations of daily backups on one disk (cp -al, so no duplicating of static data), and a few weekly ones on the other disk. Every night it rsyncs things off-site (to my home). That rsync has turned out to be unreliable (probably my adsl), so I have a script that does it in small bits and pieces. Takes a few hours in the early morning.
In Murphy We Turst
I use "mkisofs /etc /root /home -R -T -o backup.iso && cdrecord dev=0,0,0 speed=4 blank=fast -data backup.iso" to create an ISO image, which will be burned to the CDRW disk. That's all I need to backup my workstation. And restoring the data doesn't require any special tools.
cdbkup is a little more sophisticated - multiple levels, multiple disks.
"CDBKUP is a professional-grade open-source package for backing up filesystems onto CD-Rs or CD-RWs."
Note to ACs: I won't mod you up, even if you are being funny or insightful. So take a chance! It's not real life!
I have been extremely happy with Amanda. Single centralised backup server running amanda-server. Multiple workstations running the amanda-client. Amanda automagically schedules backups based on sensible heuristics. I just tell Amanda how many tapes I have, how many workstations I have, and Amanda does all the hard work of working out how much tape capacity is required and how often it should schedule incrementals/fulls.
The server/client protocol has been designed to avoid reliance on dangerous security holes like rsh. The server sends the client a "send me your dump" message. The client then connects back to the server and delivers it the output from dump or tar. You can configure exclusion lists on the client if you're worried about sending certain files or filesystems. You can also encrypt the data stream and/or use Kerberos for authentication.
If I forget to load a blank tape then Amanda plays it safe. It doesn't overwrite last night's backup: instead it stores incrementals into the "holding disk". Amanda will then flush the held backups to the next blank tape.
Amanda emails me reports after every backup with a neat summary of what went right/wrong. It also gives you several hours advance warning if you forget to load a blank tape or if any of the workstations are offline.
The only downside of Amanda is that it is fiddly to setup. The documentation is poor and the configuration files are cryptic. But if you're willing to invest some time and effort then you can't do much better (for free) than Amanda.
Features:
For those who don't know: AMANDA cannot append to tapes.
Every time you backup with AMANDA it must start from the beginning of the tape.
So, if you want backups every day, you must have a tape for every day.
(http://amanda.sourceforge.net/fom-serve/cache/29
Arkeia is a powerful one, but not free software. there are two versions, a free one for small offices and a more powerful costly one. ...quick browse of the site does not reveal the free version, i don't think it exists anymore for 5.x (maybe i am not recalling correctly).
anyway, arkeia can back up windows, linux, unix, and mac osx.
Use my userscript to add story images to Slashdot. There's no going back.
I hear rdiff-backup is good, but I still mainly use rsync with the incremental rsync type scripts that use hardlinks for versioning. We use it here to backup over 2TB of data over a 512kbit link. Since you never need to do a "full" backup, the bandwidth is plenty.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Jason
"FORMAT C:" - Kills bugs dead!
Why should you need to modprobe it at all? Don't you have it set up in /etc/modules to auto-install the module at boot?
They say tar has its limitations. I really dont understand.
Ive worked with different unixen and Linux distros, so I just dont want to be dependant on something that isnt installed by default everywhere. tar already has a VERY well known format and execution parameters.
Ive lost my fair share of data to buggy harddrives and dumb mistakes like pulling off the ide cable while the system is running. So cron does daily backups using tar cfj using a file that has a list of other files to be backed up. This way I dont have to backup the whole partition. To restore a certain file, just tar xvfj backup2.tar.bz2
The cron setup renames backup.bz2 to backup2.bz2 and removes backup2.bz2 so I have the data for the past two days. Beside incremental backup which I dont need due to this setup, what else could I need? And by the way the backup.bz2 is copied off onto an NFS share elsewhere incase my whole RAID setup crashes, or the XFS filesystem bombs out. This setup can be replicated onto FreeBSD Solaris and many others.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
Some people have already mentioned Amanda.
In addition to amanda, I have good luck with star coded by Jörg Schilling. star is very feature-rich, fast, standards compliant and has been around since 1985. Give it a try!
The star-users mailing list is here . You can also look at the man page and finally download it
Corporate Gadfly
Jonathan Archer: the most beaten up Enterprise captain in Star Trek history
For backing up my FreeBSD home server I use a second (identical) HDD in a swappable IDE bracket on a standard plain ole onboard IDE controller (the 2nd channel to be precise). Though hotswapping isn't really supported on these controllers, it does seem to work :)
/home, just everything) to the hotswap HDD, using the "ssync" tool /var/log/backup
Making a backup is easy. I just plug in the bracket and start a homebrew script which:
- enables and inits the hotswap IDE channel
- mounts the partitions on the hotswap HDD
- removes system immutable flags on files on the hotswap HDD (so that they can be overwritten)
- copies over all new files (/sbin,
- resets the system immutable flags to their original state
- umounts the partitions
- disables the IDE channel
- logs the above to
- mails the log to me
A whole backup takes about 25 minutes on an almost full 30GB disk in a P200 (it only copies the new/changed files), and compared to tape it's very cheap. If the master drive fails, I swap HDDs and the whole server works again, without any configuration whatsoever.
http://backuppc.sourceforge.net/
Automated backups to an online disk server, open source, and a really nice web interface as well as command line interface.
It uses samba and ssh to backup and restore to windows and unix machines.
You can have it restore any files/folders in a backup you select, using the same methods (samba or ssh) as well as it can send the restore files to your browser in a tar or zip file.
I recently replaced a machine using amanda and a DLT drive with a fileserver using a raid 5 array and backuppc. Best switch ever.
Tivoli Storage Manager is the only "backup solution" that I have ever seen that truly works well without alot of tweaking and twiddling.
I've worked at places using Legato and Amanda, where restoring from backup was an unreliable and error-prone process more likely to be a waste of time than anything else.
TSM is not cheap, but is worth every penny. We have one full time and one part time employee handle the backup/restore jobs for about 2000 servers. Try that with Legato or Amanda.
Conformity is the jailer of freedom and enemy of growth. -JFK
Companies with money can get a netapp box for critical data. Here you can absolutly use dump, tar or cpio. They create a "snapshot" of a file before backing it up.
Unfortunately we are talking a minimum of $40k for this type of solution.
If the snapshot comcept could be written into ext4, then dumps would be great.
I always put a caveat into my backup policies to cover this issue.
Anybody out there know if Bru does any better?
Veritas have a work arund to this?
Amanda is just a wrapper around dump.
At our dotCom company, we bought EMC boxes and I was REALLY excited about the TimeFinder concept. But then I found out that it doesn't really find time, it just makes backups.
:)
I had thought we had found the answer to getting a six-month project done in 3 months - use "TimeFinder" by EMC.
-Peter
The problem with most suggestions here is that it seems the average slashdot reader is a linux hobbyist or works as the IT manager for a small office that happens to run linux. What happens when you need to backup 6TB/night and don't want to pay someone to sit around swapping tapes all night. Sometimes it just isn't practical to purchase another SAN solution to facilitate an rsync. Or what if you have a collection of high capacity LTO tape drives at your disposal, but don't have the budget for something larger and automated, or smaller with an autoloader. I think automation and efficiency is almost as important as reliability and cost. Not everyone can afford a Storagetek Powderhorn Silo, or needs the versatility of expensive products such as Veritas Netbackup. Then again, sometimes tar or rsync just don't cut it in an enterprise environment where data is mission critical.
> The problem is tar always archives the entire space which makes it difficult to
> backup, say gigabytes of data, daily.
>
> A decent backup tool (as opposed to an archival tool) must absolutely have
> incremental backup support.
Er?
tar --help
[snip]
Operation modifiers:
-G, --incremental handle old GNU-format incremental backup
-g, --listed-incremental handle new GNU-format incremental backup
[snip]
Local file selection:
-N, --newer=DATE only store files newer than DATE
--newer-mtime compare date and time when data changed only
[snip]
This is in tar (GNU tar) 1.12
(Which is really really old actually.. slackware 3.2 dist)
There are also tons of options to exclude directorys and files, to force it to span disks, and pretty much match in any way you need.
I've been making incremental backups (and even restored a few) for awhile now.
Here's a howto for rsync snapshot backups. I keep daily backups for two weeks, weekly backups for two months, and monthly backups forever. I rolled my own wrappers for this stuff in a few hours.
It is about eight zillion times better than tapes. I have hot, random access to all versions of all my files. Thanks to the hard linking, space used is moderate. Since it backs up to a remote computer, backups are instantly off site. And if I want to verify my backups, I don't have to feed in eight million tapes; I just write a little perl script.
I recommend it highly!
Rsync can also be used to make some very nice incremental "snapshot" backups.
I believe you are wrong. EVMS (which was built by IBM) and is distributed under the GPL license for free, provides software raid (0,1,5), filesystem snapshots, has both GUI and CLI tools for linux.
It's a simple patch you can add to any 2.4 kernel.
Have you even read Linus's comments?
Dump works by reading the raw data partition. That works great with an unmounted partition, or if you have a very limited OS that does not perform any caching.
But Linux is different - it's now using the cached pages as the primary content, usually flushing them to disk only as the pages are dropped. This is the approach used by most mature OSes, but Linux doesn't yet have an interface for "dump" programs to query the OS for updated but unwritten sectors.
So dump is the worst of all possible things now. Not only will you get incomplete live files, you can get incomplete files even if the users have all terminated but the pages haven't been flushed to disk yet. That's non-deterministic, and there's simply no way for you to perform reliable dumps.
On the practical side, dump is specific to the filesystem. When everyone ran ext2, that wasn't a problem. But now people may have a mixture of ext2, ext3, reiserfs, xfs, jfs, and probably even other formats. Each requires their own dump and restore, and that requires a lot more effort.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
The ISO9660 FS has some pretty strict limits on number of files in a directory (~1024) and length of filenames under Rock Ridge extensions (~30s, I think). If you exceed this, you'll be unable to retrieve those "extra" files - I know after being burned by it in the past.
(Obviously I don't like working in directories with thousands of entries, but some tools will produce them, it's easy to accidently hit numbers like that with mail or news spools, etc.)
As for the RW media, you do realize that they have a limited lifetime, right? Are you validating the discs you write, or going on blind faith?
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
You do realise that dump doesn't give you a filesystem snapshot? Even on Solaris - the most venerable of modern UNIX - the manpage for ufsdump clearly states:
There's a good reason why nobody seriously uses dump anymore.
And Linux does support filesystem snapshots. The Linux LVM explicitly lists it as a feature.
Moderators, this person was not informative, they were simply wrong.
Prayerware(TM)2.0 http://www.prayerware.com/