Recoverable File Archiving with Free Software?
Viqsi asks: "Back in my Win32 days, I was a very frequent user of RAR archives. I've had them get hit by partial hardware failures and still be recoverable, so I've always liked them, but they're completely non-Free, and the mini-RMS in my brain tells me this could be a problem for long-term archival. The closest free equivalent I can find is .tar.bz2, and while bzip2 has some recovery ability, tar is (as far as I have ever been able to tell) incapable of recovering anything past the damaged point, which is unacceptable for my purposes. I've recently had to pick up a copy of RAR for Linux to dig into one of those old archives, so this question's come back up for me again, and I still haven't found anything. Does anyone know of a file archive type that can recover from this kind of damage?"
ever heard of parity archives?
the mini-RMS in my brain
You really ought to have that looked at..
Are you sure it's unacceptable that tar archives are breakable? The way I see it, you'll tar your files then bzip them and finally put them on a backup server/CD/DVD. The bzip layer will provide the auto-repairing features, I don't see how it could break between having the tar and bzipping it. Is this for a normal environment? If your harddrive breaks during or after creating the tar, then the bzip would fail, no? Please tell us more about your situation if not.
There used to be a cpio-like archiver called apio, that was designed for those types of situations. Of course, that might not be much help for non-unix systems (unless you plan on running in Cygwin), but I remember having great success with it for the old QIC tapes, which were in my experience the worst backup medium for important data ever (better to have no backup than think you have a good one, but have a dead tape)
--That's the point of being root, you can do anything you want, even if it's stupid.
Store the recovery information outside the archive. Par2 works really well. You can configure how much redundancy you want (2% should be fine for occasional bit errors, 30% if you burn it to a CD that might get mangled, etc.). It's a work in progress, but it's already really useful.
The format you're looking for is any format you like stored on reliable storage.
Why bother with all the intricacies of a pseudo-fault-tolerant data structure? Ultimately the best archive format for recovery will be one that just duplicates the whole archive twice over, doubling space requirements and improving immunity to lost sectors on drives. At which point one asks, "Why don't I just stick to simple files and archives, and use reliable storage that handles this crap for me, for all my data, automagically?" Storage of any sort just keeps getting cheaper and bigger. If you have any interest in the longevity of your data these days, there's almost no excuse for not using the data-mirroring built into virtually every OS these days and doubling your storage cost and read performance while preventing yourself from worrying about drive failure.
11*43+456^2
True, tar cannot handle a single error... all files past that error are lost.
On the other hand, cpio (and clones) can handle missing/damaged data without losing the undamaged portions that follow (you only lose the archived file that contains the damage). It is the only common/free format I can think of (from the top of my head) that is capable of this.
- Preferences: Solaris 10 (servers), Ubuntu (desktops), Solaris 11 (personal servers) -
Wouldn't simply running tar with --ignore-failed-read achieve the desired results? It wouldn't simply stop once it hits an error. Instead, tar will proceed beyond the error and probably just write out junk data (if anything at all) for the corrupted part of the archive.
DISCLAIMER: I haven't tried this, and I'm not entirely sure this is what you want.
RAR compression is free for decompression with source available, heaps of precompiled binaries for decompression on your OS of choice and it's included in a whole heap of popular free archive programs. Just burn the latest source on every CD you make and you should be fine.
They are backing up data to a MiniDV camcorder adding forward error correction using a simple command line utility to allow holes in the tape the size of a pin without any data loss.
-- John.
its an amazing technology...only quite involved.
Basically you concatenate all the files together (cat should do), print it out on good 32lb paper, get a professor's signature and file it in a college lib...heard those things stick around for centuries
The gzip Recovery Toolkit
/tmp/tar.log 2>&1 & /tmp/tar.log
http://www.urbanophile.com/arenn/hacking/gzrt/gzrThe gzip Recovery Toolkit has a program - gzrecover - that attempts to skip over bad data in a gzip archive and a patch to GNU tar that enables that program to skip over bad data and extract whatever files might be there. This saved me from exactly the above situation. Hopefully it will help you as well.
[...]
Here's an example:
$ ls *.gz
my-corrupted-backup.tar.gz
$ gzrecover my-corrupted-backup.tar.gz
$ ls *.recovered
my-corrupted-backup.tar.recovered
$ tar --recover -xvf my-corrupted-backup.tar.recovered >
$ tail -f
I've had a backup of my hard-drive on another drive, in tar.gz form.
Ofcourse, when the big day came, and my hard drive broke, it turned out the other drive had bad sectors!
First, a comment: never ever ever ever use tar.gz to back up anything you'd like to have back.
You can recover stuff easily from tar past the break point - files in tar are basically concatenated together. So you miss the rest of the current file, but you can find the next header+file easily.
But gzip does not byte-align its data! That's, in my oppinion amazingly stupid. It saves a couple of bits per file, but makes recovering a real hassle.
I had to go through the file past a bad point bit by bit (literaly) to figure out if that is that next data block.
Back in my Win32 days, I was a very frequent user of RAR archives.
Bablefish translation: I was a huge warez kiddie.
On a related noted, were there any wide-spread, legitimate uses of
Why bother with recoverability?
Total loss of the file seems more likely than bit flipping by themselves.
When your storage hardware/media starts flipping bits, it's probably going to die pretty soon.
And more often than not, your storage hardware/media just dies before you experience any bit flips.
You talk about your laptop computer and being on a budget. If you can't afford to make copies of your important files and store them elsewhere, then either your files aren't important, or successfully maintaining a personal computer system is beyond your financial means.
The odds of your laptop failing within the next 3 years are near 90%.
I suggest you get big HDDs (different brands) and backup everything on them. Then burn stuff you want to archive onto CD-Rs or DVD-Rs.
I've had more than a few HDDs fail in the past year (I still have my data tho). Ironically two of
the HDDs were used to back stuff up.
Tar alone can recover past a damaged point it will 'read' past the erroneous data, and recover your data. I believe cpio exhibits the same behavior. It is when you compress the archive (with .gz or bz2) it may become unrecoverable. If you use tar alone however, you will always be able to recover some of the data in a damaged archive.
/^([Ss]ame [Bb]at (time, |channel.)){2}$/
Zip is the way to go. Sure, it doesn't produce smaller archives, but it *is* a standard. It's also "Free", for the ideologues among us. Every system has it, including default installs of Windows. The originator of the article didn't say whether he needs this in a corporate environment. If he does, then zip is the failsafe archive choice. Who's to say his workplace won't convert to Windows in the future? If they do, he's assured to have an archive format that can be repaired.
There is a patent on a recovery scheme by M. Rabin (I don't have the number handy). The patent covers "n+k" recovery schemes, in which n blocks of data are protected using k recovery blocks. The patent is quite old.
I wonder if rar, par and par2 infringe on this patent?
tarfix
may help some of those archive issues.
But, the archive format is not going to save you. Use multiple media. You need more than one physical archive for better safety, regardless of format. Hell, you'll probably die before some of today's media fails.
Your ISP might get pissed.
... my subversion archive is now more than can fit in a cd. Is there a tool I can use to split the big file in two cd's, hopefully something that doesn't need another piece of software to reinstall the big file.
rar has one of the best recovery methods, as it has mutliple of them.
during compression:
Recovery Record (-rr option)
it has Recovery Record, this is data appended to the actual
rar file that lets you recover from errors within a file. The
default RR takes 1% of the archive and lets you recover 0.6%. You
can change this behaviour to going more recoverability by
specifying -rr[N]p and telling it larger percantage for recoverability.
Recovery Volume (-rv option)
further more, rar supports PAR like volumes called REV
That can recover full missing files. For all you are concerned REV is
PAR, except its integrated to RAR utility. all you type is unrar *.rar
and rar will recover files for you, either through RR or REV. No need
to muck around twenty different utilities just to ensure proper file.
Non Solid Archiving (-s- option)
Further more, rar support non solid archiving, meaning each file is
saved using new compression statistics. You will lose some space due
to this method, however you will gain speed (you dont need to decompress
first 20 files to gain access to 21st file), as well as you will gain
partial recoverability (if file 20 is corrupt, you can still decompress
file 21)
during decompression:
Keep Broken Files (-kb option)
By default, like most archiving software, rar will not save a file
that is known that is corrupt, unless you explicitly force it to do
so.
I highly recommend checking out the command line manual to RAR,
Eugene Roshal is GOD
You definitely shouldn't use RAR for archival purposes, but for extracting existing archives, try unrarlib. It includes a library for accessing the contents of RAR files, and an "unrar" utility based on this library. It is dual-licensed under the GPL and a more restrictive license.
Mirroring HD's only protect against fatal failures of a single HD. Motor stops spinning? Then the other HD takes over.
It does NOT protect against failures on the disc. Errors while writing or reading or other fun stuff.
For true backup you need the following.
If you follow all that and make sure you use proper storage conditions and stop unauthorized access and make sure that you keep equipment around to read you storage medium then you can make backups with a reasonble certainty that it will actually work.
Of course this never happens in real life. Only goverments can be bothered. In real life people invest in expensive tapes and backup everything on fresh tapes. Then throw away the tapedrive during an upgrade.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
No two files failing isn't likely to happen. We are however dealing here with disaster recovery. Disasters are always disastrous.
Of course error recovery won't work with a total failure like say a fire. Then your second copie is the better solution.
So two copies is a good idea. Error recovery is a good idea.
Two copies with error recovery is absolutly brilliant.
What do you mean do I own stock in media-storage companies. What a silly questiuon.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
From http://dar.linux.free.fr/:
dar is a shell command, that makes backup of a directory tree and files. It is released under the GNU General Public License (GPL in the following) and actually has been tested under Linux, Windows and Solaris. Since version 2.0.0 an Application Interface (API) is available to open the way to external independent Graphical User Interfaces (GUIs). An extension of this API (in its version 2) is in the air, for release 2.1.0, and would overcome some limitation of API version 1. This API relies on the libdar library which is the core part of DAR programs and, as such, is released under the GPL. In consequences, to use it, your program must be released under the GPL, no commercial use will be tolerated
Archive Testing
thanks to CRC (cyclic redundancy checks), dar is able to detect data corruption in the archive. Only the file where data corruption occurred will not be possible to restore, but dar will restore the other even when compression is used.
Parchive http://parchive.sourceforge.net/:
Parchive: Parity Archive Volume Set
The original idea behind this project was to provide a tool to apply the data-recovery capability concepts of RAID-like systems to the posting and recovery of multi-part archives on Usenet. We accomplished that goal. Our new goal with version 2.0 of the specification is to improve. It extends the idea of version 1.0 and takes the recovery process beyond the file-level barrier. This allows for more effective protection with less recovery data, and removes some previous limitations on the number of recoverable parts. See Par1 compared to Par2 for a more detailed view of the differences.
A quick perusal of the QuickPar website suggests that at least some Par2 clients can restore based on two damaged files and incomplete recovery files:
In the past, however, I've been dealing with getting remote files over a noisy connection where the remote server wasn't so thoughtful to create Par files or even set up Rsync. What I've thought would be a nice is an application that can look for correspondances between three checksum-failed files to try and create one good one. I don't suppose Parchive can do that?
It's been possible to do that for well over a decade, using various utilities such as tarx. I've successfully recovered files after a damaged point in a tarball many times. (Sigh, I used to use an old AT&T UNIX with a #$*@# broken tar, which occasionally created corrupt tarballs).
See this post on the Sun Managers list circa 1993, and the venerable comp.sources.unix collection, volume 24, for the sources.
1. Upgrade your tar
;-)
2. Ditch the old *nix box and switch to Linux / OS/X / *BSD
3. Switch to GNU tar or Joerg Schilling's "star"
4. ???
5. PROFIT!
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??