Slashdot Mirror


File Systems Best Suited for Archival Storage?

Amir Ansari asks: "There have been many comparisons between various archival media (hard drive, tape, magneto-optical, CD/DVD, and so on). Of course, the most important characteristics are permanence and portability, but what about the file systems involved? For instance, I routinely archive my data onto an external hard drive: easy to update and mirror, but which file system provides the best combination of reliability, future-proofing, data recovery, and availability across multiple platforms (Linux, OS X, BeOS/Zeta and Windows, in my case)? Open Source best guarantees the future availability of the standard and specification, but are file systems such as ext2 suitable for archival storage? Is journaling important?"

3 of 105 comments (clear)

  1. What about error correction? by F00F · · Score: 5, Interesting

    I've been wondering lately why no common file systems seem to implement error correcting codes (ECC/EDAC).

    In hardware, there's often a checksum, ECC/Hamming code, parity bit, Reed-Solomon code, etc. to detect and/or correct for inadvertent bit flips. But, as far as I know, no error correcting information is ever stored within the filesystem itself. Certainly the filesystem tracks how many blocks are dedicated to a particular file, and how many bytes long the file is, and one can always hash the file twelve ways to Sunday to assure that it hasn't changed since it was originally hashed, but none of that helps repair errors to the file should the medium that's being used to store it decay beyond what's already correctable via the medium access hardware.

    I can imagine scenarios where, for example, the RAM buffer in a hard drive is upset and perfectly encodes the wrong bit into a file (or even multiple stripes + parity in a RAID). In this case, the medium access hardware is useless (the data was, after all, ecoded perfectly wrong), but ECC in the filesystem would detect and potentially correct the error the next time the file was read back, even if it were decades later. I appreciate that it would add overhead, and thus maybe shouldn't be the default, but I don't see it being even an option anywhere, and some people would pay the performance penalty to get the data integrity benefit.

    Especially in instances like encrypted (or compressed, or both) loopback file systems where one bad bit can destroy an entire partition, why don't we have more data assurance layers available? Or have I just not found them?

    Whining of which, what was the deal with GNU ecc? Everyone speaks of "oh, yeah, the algorithm was deeply flawed, bummer..." but I don't ever see any details ...

  2. Non-IT answer by Overzeetop · · Score: 4, Interesting

    The best file system for archival purposes is the one you're using today. Why? Because of you want that archive to be readable in any expedient manner, you are going to have to constantly monitor and update the media on which it is stored. All media will degrade over time, and you will have no idea how bad that degradation has been until you re-read it. No vendor will compensate you for the loss of your data, because there is some data which simply cannot be recreated.

    If you want archival storage, you need to have your data on- or near-line, and rewrite the data to the "new" hardware every couple of years. By choosing a filesystem that is current, you are more likely to be cable to read it in a couple years than if you (try to) stick with a single filesystem. I know this sounds like a lot of work, but if the data is truly worth archiving, it's worth keeping both the storage mechanism and format up to date.

    --
    Is it just my observation, or are there way too many stupid people in the world?
  3. Worry about the hardware, not software by MightyYar · · Score: 4, Interesting

    Thanks to the emulation community, I can read data from an old Commodore 64, Apple ][e, Atari, etc. on any modern computer running any mainstream operating system. What I cannot do is easily hook up an old Apple ][e disk drive to my modern hardware very easily. The filesystem will not really matter so much, because even if Wintel goes the way of the Commodore 64, someone will make a DOSBOX-esque emulator for it. Getting data off of an ATA, SATA, USB, or Firewire drive might be more challenging once new hardware ceases to support those standards.

    Personally, I just throw stuff on external hard drives. 3-5 years later, the new drives are so much bigger, faster, and cheaper that it becomes economical to consolidate to a new drive. I still have data from a 286 that had nothing but floppies, an Apple ][e, and 2 dead Macintoshes. Even my old Windows 95 computer lives on as a VirtualPC image. I don't really use them that much, but the Apple ][e and 286 stuff is under 50 megs, and the VirtualPC image is 2GB. The images of the old Mac hard drives total less than 1GB... it's simply not worth deleting them and it's kind of fun to have my old computers still around, if only "virtually".

    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.