Ask Slashdot: >2GB Backup Software for Linux?
Fer asks:
"Are there backup program for
Linux that do not have filesystem or volume
size limits? I am trying to make a full backup of
a 22 GB FTP server, using 4 GB TR-4 tapes. I have tried tar,
dump, afio, taper, and afbackup, and every one of
them either did not allow >2GB volumes or had
weird problems with >4GB filesystems. Currently
I am using dd to do the job, but I think there
must be another option. Any suggestions on free
programs which I may use?"
The Intel 32-bit(ness) or any kind system architecture has nothing to do with >2GB file size problems. 64-bit+ integers all used all the time and you probably dont even think about it, i.e. 128-bit encription, NTFS, timestamps, version numbers on MS binaries, etc.... It's just a bit more of a bother on non-64bit processors.
When we had 16-bit DOS, did it mean we can only have 64K files. You might say, "but 32-bit only addresses 2GB (signed) of memory or whatever". Programs don't even typically load 2-4GB worth of file at a time, but that doesn't mean we can't or shouldn't use the contents of a 80GB file, just because we have a 32-bit processor.
A true 64-bit file system can exist on a 32-bit machine, which would lead to having >2GB files. In fact, if you abstract the filesystem from the OS well, allow for 64-bit references, you could have any kind of large file system, like SGI, WinNT, etc... Maybe he just needs to expand the API / working integer sizes in the kernel to support true(r) 64-bit file systems. Maybe that's what this is all about.
We use a program called Lone-Tar at the ISP where I work. We have a couple of +2gig partitions and it handles them nicely. It also has internal dialogs for setting up cron jobs, and tons of other stuff. I like it a lot, as much as one can like a backup program I guess. Lone-Tar.com.
I would recommend the Amanda backup system which we have used in work for many years and can deal nicely with these problems.
Well, I'm just now installing Legato Networker for UNIX and it seems pretty good.
:( Arcserve DOES have a Linux client, but that software is so bad I've never even tried it.
:)
We've had endless trouble with Arcserve for Windows NT. It backs up UNIX clients very slowly, and trashes its databases on a regular basis. I finally called up Arcserve to scream at them about it, and the upshot is that their database system cannot handle more than 16 million records, about 1GB, and the sheer volume of files we need to backup overwhelms their database engine. Of course, it doesn't gracefully fail, it just quietly corrupts itself without telling you. I have been dodging bullets with that system for months.
Apparently, if you install a SQL server and use it to store your database of records, it works fine for larger installations, but I'm so pissed at them about not documenting such a basic limitation of their system that I've been exploring other alternatives.
I did a test install of Legato Networker on a Solaris 2.5.1 box, and it seems to work pretty well. It spools multiple volumes to tape at the same time, seems to run A LOT faster than Arcserve did under UNIX (about the same speed as Arcserve/NT backing up NT files, about 2.5MB/minute to a 40GB SCSI DLT tape unit), and the restores run fine. It is a bit slow about responding to commands, sometimes taking a minute or so to set itself up for the next job, but it seems good on Unix at least.
Unfortunately, there is no Linux client for Networker.
Have you used Networker in anything other than a Novell environment? S'possible that their Netware stuff isn't so good, while their UNIX stuff is fine. I have all of one week experience with it, and while it seems fine to me, I haven't really loaded it down yet.
If you want to use files larger than 2 GB (the largest number that will fit in a signed 32-bit integer), then get a 64-bit system to put them on. If you want to have simple, efficient, easy to code and easy to port seeks within your files, then you're going to want to be able to use a signed integer to seek back and forth (let's not even mention the trouble with mmap() if your files are larger than a pointer can index...)
The *last* thing Linux should do about 2GB files is try and use hack after kludge to satisfy people who want to use Intel chips but don't want to hear about their limitations.
On some Linux installations, I have used the BRU 2000 backup software. It costs a couple of hundred $$$, IIRC, but it is really excellent software with many features. So If you are willing to spend some money, that should work for you. I must defer to others, however, in the area of doing it with free tools.
I use standard GNU tar v 1.12 to back up several systems to 12G DAT tapes; and have never had a problem using the '--multi-volume' switch to put an 18G filesystem on 2 tapes.
I've had problems with Tar (and the other commands) as well when the number of files I had was extremely large. This is regardless of file size, eg: I had 200,000+ small files, but storage size was about 1 gig or so and I had problems with tar.. I ended up just breaking down my backups into batches. something like:
/files/parta /files/partb
tar -cf part-a-of-tree.tar
tar -cf part-b-of-tree.tar
etc...
-Booya "No Try Not. Do or do not, there is no try." -Yoda
im currently doing a 8+gig backup using kdat from kde and it works well (no compression tho)
:(
:)
we had some trouble with backing up to another server via nfs mount that would only allow me to do a 2 gig max file
if i use the tar with compression i can get up to 24 gig backup to tape (12 gig without compression)
but kdat allows you to span tapes and keeps a nice little index of all previous files backed up on that tape (very nice gui app)
not sure if that will help but it's all i got rite now
You can also use AMANDA backup, which I use on my GNU/Linux machines for backup. It seems to handle the large backup sizes acceptably.
Finally, you can always just split up huge files using dd.
Cheers,
Joshua.
--jon. Postel is dead. May we all mourn his, and our, loss.
Here is our cron.daily/daily.dump file:
This dumps all three of our partitions out to a single tape. The 0 ("zero") option dumps the entire thing, as out tape drive is fast, vs. specifing a dump level > 0 (which is for doing various levels of incremental backups); The u, which updates a human-readable /etc/dumpdates file; B for the number of blocks ("kilobytes") the tape is long (this is your problem); and finally f: the device to dump to.
One of the things that really gets people is how to pass arguments correctly to dump. A little diagram might serve as an aid:
Hope that helps!We use the /dev/nst0 device to write to the tape three times without the thing rewinding. This is the key to putting more than one filesystem on per tape.
If anybody has any questions about using dump, I would be happy to help.
-AP
jordanh@remotepoint.com
WARNING A tape or harddisk in a fireproof container will still be destroied in a fire. Most fireproof containers are designed to save paper from burning by a combination of steaming away water and thermal insulation. As such, the internal tempeture of the container will easily get over 210 degrees F. Most tapes and harddisks will be destroied at that point.
I'd be interested to know what trouble you've been having with dump. I've been doing dumps of 2+ gig partitions for ages with dump and it works very well. Perhaps you're experiencing some other problem which is not related to the backup program you're using? If you can send me more information about where dump fails, I'd be happy to have a look at it.
I highly recommend the Sony AIT drives (I think Seagate or Quantum also sells a variation of the same format). They do 25 GB native per 8mm tape, at 5MB/s. The drives are under $2000, and tapes are around $60. It may be a bit overkill for what you need, but the speed is VERY nice. It does compression as well, but people that quote compressed capacities should be shot.
:)
They also have a cool feature that allows storing directory info on NVRAM on the tape cartridge - 16 KB or so.
And because it's Sony, it's definately likely to stay around. I think they still sell Betamax decks, and I kinda think they know what they are doing when it comes to helical scan recording equipment
Tar and others should work fine as long as you are writing directly to tape, instead of to a temp file. Linux has a 2GB (2^31-1) maximum file size, so if your backup software is trying to spool to disk before streaming to tape, it may fail.
Amanda handles this by splitting the disk files into 2 GB chunks and reassembling them when it writes to tape. It also deals well with network backups. The filesystem side backend is dump or GNU TAR, so it's fairly standard in that regard. I've had no problems with 8+ GB filesystems using Amanda.
I would not recomend using e2fsdump - AFAIK, it's still beta, and I had problems with the interactive restore and some other issues. Because it accesses the filesystem at a lower level than standard file access (I believe), I'd be careful with trusting important backups to it.
TAR definately a safer choice.
BTW, I have a question myself... does anyone know how to get TAR (or something else) to restore permissions on symlinks? Typically it doesn't matter, but Apache uses symlink permissions for the SymlinksIfOwnersMatch directive, and every time I restore or copy a web partition, I have to go through and fix all the links that are now root owned.
ftp://ftp.legato.com/pub/Unsuppor ted/Linux_Client/ has both 4.2 and 5.1 client kits, in .gz and .rpm formats. The clients are unsupported, but they work well for us.
We use Networker to back up ~500 GB, spread across 30 clients (NT, Digital Unix/Tru64, and Linux). Backup performance is excellent (by interleaving sessions over two network cards and the local disks, it can keep two loaders running at ~5MB/sec each).
I don't know what the maximum "partition" size is, but we've backed up 150GB file domains with no problems.
Restore performance is slower, of course, but emphatically not an "all day event"; it takes a few seconds to find what you need in the database, and a couple minutes to load the tape (we're using twin 280GB DLT loaders). After that, the speed is the same as it would be for tar/dump/whatever; the tape drive must seek to your files and read, and that can take up to an hour.
If your files are spread across mutiple tapes (either because you're using incremental or differential backups, or because a single saveset spans multiple tapes), then it can be as long as two hours. If you have only a few clients, these times are reduced somewhat.
The only time I've spent an entire day doing restores is when we lost the Networker server (and its media indices), and had to use Networker's bootstrap procedure to bring back the index, followed by regular restores to bring back everything else. Because I hadn't bothered to keep hardcopies of the logs, Networker had to scan the tapes for a suitable bootstrap. The searching alone took a few hours.
A couple caveats, though: it's not cheap, and it's not easy.
Networker was designed for the kind of environment we've set up, and you may find it overkill for one or two clients. The GUI is marginal, but the command-line tools can completely eliminate it, and do more besides.
Expect to spend a couple of weeks configuring it, and a couple more getting comfortable with the (extremely powerful, IMHO) command-line tools.
You'll need a cabable server to hold the media indices -- we keep data in the index for a Quarter, and the database is over 2GB. We're using a dual-CPU Alpha 4100 @600MHz w/2GB memory, running Tru64 Unix (it's used for a number of other things, of course).
NB: starting with Networker 5, you can have the tape devices and databases on separate machines, which reduces the need for one mammoth server to do backups and media management. It's also good if you have mutiple sites separated by sub-LAN-speed links; you can put a tape device on each LAN.
cheers,
mike
I have used a program called CTAR to cure these problems. Check it out here.
http://www.ctar.com