Slashdot Mirror


Recoverable File Archiving with Free Software?

Viqsi asks: "Back in my Win32 days, I was a very frequent user of RAR archives. I've had them get hit by partial hardware failures and still be recoverable, so I've always liked them, but they're completely non-Free, and the mini-RMS in my brain tells me this could be a problem for long-term archival. The closest free equivalent I can find is .tar.bz2, and while bzip2 has some recovery ability, tar is (as far as I have ever been able to tell) incapable of recovering anything past the damaged point, which is unacceptable for my purposes. I've recently had to pick up a copy of RAR for Linux to dig into one of those old archives, so this question's come back up for me again, and I still haven't found anything. Does anyone know of a file archive type that can recover from this kind of damage?"

27 of 80 comments (clear)

  1. where have you been? by Anonymous Coward · · Score: 3, Informative

    ever heard of parity archives?

    1. Re:where have you been? by jason.stover · · Score: 5, Informative

      Here's the parchive sourceforge site .. Links to PAR2 utils, spec, etc...

  2. wow man by Anonymous Coward · · Score: 3, Funny

    the mini-RMS in my brain

    You really ought to have that looked at..

    1. Re:wow man by Viqsi · · Score: 4, Funny

      Y'know, I would've done that a long time ago, but my health care provider doesn't cover ideologuectomies. They claim that it doesn't threaten your physical life, just your social one. The bastards.

      :D

      --

      --
      viqsi - See "vixen"
      If we do not change our direction we are likely to end up where we are headed.
  3. Try apio by innosent · · Score: 4, Informative

    There used to be a cpio-like archiver called apio, that was designed for those types of situations. Of course, that might not be much help for non-unix systems (unless you plan on running in Cygwin), but I remember having great success with it for the old QIC tapes, which were in my experience the worst backup medium for important data ever (better to have no backup than think you have a good one, but have a dead tape)

    --
    --That's the point of being root, you can do anything you want, even if it's stupid.
    1. Re:Try apio by innosent · · Score: 4, Informative

      Sorry, I believe it was afio

      --
      --That's the point of being root, you can do anything you want, even if it's stupid.
  4. Par2 works great by dozer · · Score: 5, Informative

    Store the recovery information outside the archive. Par2 works really well. You can configure how much redundancy you want (2% should be fine for occasional bit errors, 30% if you burn it to a CD that might get mangled, etc.). It's a work in progress, but it's already really useful.

    1. Re:Par2 works great by Stubtify · · Score: 4, Informative
      Allow me to second this. Par2 is everything the first PAR files were and more. No matter what has been wrong I've always been able to recover with a 10% parity set. (even this seems like a lot of overkill, except on USENET). Interestingly enough Par files have revolutionized USENET, I can't remember the last time I needed a fill.

      good overview here: PAR2 files

      comparison between v1 and 2: here

    2. Re:Par2 works great by Zapper · · Score: 2, Funny

      Just a pity that no sane amount of PAR files will compensate for my ISPs lame news feed. :-(

      --
      So much to do, so little bandwidth.
      --
      Try Mozilla
  5. Yeah by photon317 · · Score: 3, Insightful


    The format you're looking for is any format you like stored on reliable storage.

    Why bother with all the intricacies of a pseudo-fault-tolerant data structure? Ultimately the best archive format for recovery will be one that just duplicates the whole archive twice over, doubling space requirements and improving immunity to lost sectors on drives. At which point one asks, "Why don't I just stick to simple files and archives, and use reliable storage that handles this crap for me, for all my data, automagically?" Storage of any sort just keeps getting cheaper and bigger. If you have any interest in the longevity of your data these days, there's almost no excuse for not using the data-mirroring built into virtually every OS these days and doubling your storage cost and read performance while preventing yourself from worrying about drive failure.

    --
    11*43+456^2
    1. Re:Yeah by Viqsi · · Score: 3, Insightful

      Why bother with all the intricacies of a pseudo-fault-tolerant data structure?

      I'm on a laptop. I like my laptop. It's a very nice laptop. However, it doesn't exactly support those kind of hardware upgrades, and I am still ultimately on a bit of a budget.

      I kind of put forth the question not only out of the hope that a Magical Solution To All My Archival Problems would Mystically Appear (puff of smoke optional but appreciated) but because I want to find something I also feel like I can unreservedly reccomend to non-ideological friends who are looking for, say, something slightly more reliable than ZIP files. I could've mentioned that in the article post, but it was already getting long. :)

      --

      --
      viqsi - See "vixen"
      If we do not change our direction we are likely to end up where we are headed.
    2. Re:Yeah by sasami · · Score: 4, Insightful

      Par archives is just a scam popularized by cluless usnet abusers. Think about it, if those files really could reconstruct a corrupt rar archive, why not post only the smaller par files ... Get youself double copies and you'll be far better off

      Ignore this post. It's either a troll or an idiot.

      PAR files substitute for missing pieces. They don't regenerate the whole file by themselves. Go look up how RAID 5 parity works. They're not called PAR files for nothing.

      Just because you don't understand how something works has no bearing on the fact that it does work. Except in certain performance-sensitive cases, doubling up is the least intelligent way of adding redundancy.

      ---
      Dum de dum.

      --
      Freedom is not the license to do what we like, it is the power to do what we ought.
  6. cpio by Kevin+Burtch · · Score: 5, Informative


    True, tar cannot handle a single error... all files past that error are lost.

    On the other hand, cpio (and clones) can handle missing/damaged data without losing the undamaged portions that follow (you only lose the archived file that contains the damage). It is the only common/free format I can think of (from the top of my head) that is capable of this.

    --
    - Preferences: Solaris 10 (servers), Ubuntu (desktops), Solaris 11 (personal servers) -
    1. Re:cpio by Anonymous Coward · · Score: 2, Informative

      On the other hand, cpio (and clones) can handle missing/damaged data without losing the undamaged portions that follow (you only lose the archived file that contains the damage). It is the only common/free format I can think of (from the top of my head) that is capable of this.

      ZIP also supports this (the command is "zip -F" with Info-ZIP, the standard zip/unzip program on Linux).

    2. Re:cpio by Detritus · · Score: 2, Informative

      I know that I've recovered data from damaged tar archives in the past. I just ran some tests with intentionally damaged tar files, using GNU tar from FreeBSD 5.2.1. GNU tar successfully recovered the data from all of the damaged tar files. It just skips over the damaged bits and resynchronizes at the next valid file header.

      --
      Mea navis aericumbens anguillis abundat
  7. Re:Are you sure tar is unacceptable? by wiswaud · · Score: 5, Informative

    if you make a big tar then bzip2-it, then store the file on a CD.
    then 2 years later you want the data back.
    there's a read-error at some point within the .tar.bz2, and it gives you some garbage data.
    bunzip2 will actually be able to recover all other 900kB chunks of the original tar file, except for this missing chunk or part of it.
    Tar will just choke at that point and you lost everything past the read error. bunzip2 was able to recover the data past the error, but tar can't use the data.
    It's quite frustrating.

  8. Tar options by aster_ken · · Score: 2, Insightful

    Wouldn't simply running tar with --ignore-failed-read achieve the desired results? It wouldn't simply stop once it hits an error. Instead, tar will proceed beyond the error and probably just write out junk data (if anything at all) for the corrupted part of the archive.

    DISCLAIMER: I haven't tried this, and I'm not entirely sure this is what you want.

  9. RAR isn't completely non-free by Kris_J · · Score: 3, Informative

    RAR compression is free for decompression with source available, heaps of precompiled binaries for decompression on your OS of choice and it's included in a whole heap of popular free archive programs. Just burn the latest source on every CD you make and you should be fine.

    1. Re:RAR isn't completely non-free by Kris_J · · Score: 3, Insightful
      But if you purchase it, as I have, you get a product you can use from now until forever, so long as your OS supports it, plus you can get the decompression source so that you (or someone else) can always write a decompressor for a future platform. Surely you don't need to worry about replacing it until both the following are true: None of the versions you've purchased run on your current platform AND no version compatible with your current platform is available (at a reasonable price). At that point you stop creating RAR archives and simply keep the decompressor around (porting and recompiling as necessary).

      (Personally, I don't care about recovery records, I just keep two copies of everything, and I moved to 7-zip -- which can decompress RAR -- about six months ago.)

  10. Take a look at dvbackup/rsbep by jhoger · · Score: 3, Interesting

    They are backing up data to a MiniDV camcorder adding forward error correction using a simple command line utility to allow holes in the tape the size of a pin without any data loss.

    -- John.

  11. Yes... by caesar79 · · Score: 4, Funny

    its an amazing technology...only quite involved.
    Basically you concatenate all the files together (cat should do), print it out on good 32lb paper, get a professor's signature and file it in a college lib...heard those things stick around for centuries

  12. tar/gzip recovery toolkit by wotevah · · Score: 4, Informative
    A quick google search turns up the link shown at the end of this post, from which I quote:

    The gzip Recovery Toolkit

    The gzip Recovery Toolkit has a program - gzrecover - that attempts to skip over bad data in a gzip archive and a patch to GNU tar that enables that program to skip over bad data and extract whatever files might be there. This saved me from exactly the above situation. Hopefully it will help you as well.
    [...]
    Here's an example:

    $ ls *.gz
    my-corrupted-backup.tar.gz
    $ gzrecover my-corrupted-backup.tar.gz
    $ ls *.recovered
    my-corrupted-backup.tar.recovered
    $ tar --recover -xvf my-corrupted-backup.tar.recovered > /tmp/tar.log 2>&1 &
    $ tail -f /tmp/tar.log

    http://www.urbanophile.com/arenn/hacking/gzrt/gzrt .html
  13. RAR Archives by vasqzr · · Score: 4, Funny


    Back in my Win32 days, I was a very frequent user of RAR archives.

    Bablefish translation: I was a huge warez kiddie.

    On a related noted, were there any wide-spread, legitimate uses of .RAR? I only remember .ARJ and .ZIP

    1. Re:RAR Archives by jonadab · · Score: 2, Insightful

      > were there any wide-spread, legitimate uses of .RAR?

      RAR was heavily used in Germany, among the gamer community. A lot of Descent
      players for example distributed their custom levels, missions, textures,
      hogfile utilities, savegame editors, and whatnot in RAR format. It was
      annoying; I had to go hunt down and download a RAR extractor just to install
      some of the stuff.

      The usual argument was that RAR was "better" than ZIP either because of the
      compression rates or because of the partial recoverability or whatever. My
      opinion on the matter has always been that for distributing stuff over the
      internet, the most ubiquitous format is automatically the best, so ZIP is
      better than RAR irrespective of technical issues, due to compatility concerns.
      By the same reasoning, gzip is automatically better than bzip2, and no amount
      of technical superiority makes a good enough reason to use bzip2 over gzip.
      Frankly, for anything that's not inherently *nix-specific, ZIP is better than
      gzip for the same reason. Not everyone agrees with me about this, obviously.

      --
      Cut that out, or I will ship you to Norilsk in a box.
  14. tarfix by morelife · · Score: 3, Insightful

    tarfix

    may help some of those archive issues.

    But, the archive format is not going to save you. Use multiple media. You need more than one physical archive for better safety, regardless of format. Hell, you'll probably die before some of today's media fails.

  15. Rar has one of the best Recovery methods by AdamPiotrZochowski · · Score: 2, Informative


    rar has one of the best recovery methods, as it has mutliple of them.

    during compression:
    Recovery Record (-rr option)

    it has Recovery Record, this is data appended to the actual
    rar file that lets you recover from errors within a file. The
    default RR takes 1% of the archive and lets you recover 0.6%. You
    can change this behaviour to going more recoverability by
    specifying -rr[N]p and telling it larger percantage for recoverability.

    Recovery Volume (-rv option)

    further more, rar supports PAR like volumes called REV
    That can recover full missing files. For all you are concerned REV is
    PAR, except its integrated to RAR utility. all you type is unrar *.rar
    and rar will recover files for you, either through RR or REV. No need
    to muck around twenty different utilities just to ensure proper file.

    Non Solid Archiving (-s- option)

    Further more, rar support non solid archiving, meaning each file is
    saved using new compression statistics. You will lose some space due
    to this method, however you will gain speed (you dont need to decompress
    first 20 files to gain access to 21st file), as well as you will gain
    partial recoverability (if file 20 is corrupt, you can still decompress
    file 21)

    during decompression:
    Keep Broken Files (-kb option)

    By default, like most archiving software, rar will not save a file
    that is known that is corrupt, unless you explicitly force it to do
    so.

    I highly recommend checking out the command line manual to RAR,

    Eugene Roshal is GOD

  16. You can, too, recover tar archives!! (see: tarx) by Helen+O'Boyle · · Score: 2, Informative
    { The poster is looking for alternatives to tar, because he has concerns about tarball content recovery. }

    It's been possible to do that for well over a decade, using various utilities such as tarx. I've successfully recovered files after a damaged point in a tarball many times. (Sigh, I used to use an old AT&T UNIX with a #$*@# broken tar, which occasionally created corrupt tarballs).

    See this post on the Sun Managers list circa 1993, and the venerable comp.sources.unix collection, volume 24, for the sources.