Slashdot Mirror


File Systems Best Suited for Archival Storage?

Amir Ansari asks: "There have been many comparisons between various archival media (hard drive, tape, magneto-optical, CD/DVD, and so on). Of course, the most important characteristics are permanence and portability, but what about the file systems involved? For instance, I routinely archive my data onto an external hard drive: easy to update and mirror, but which file system provides the best combination of reliability, future-proofing, data recovery, and availability across multiple platforms (Linux, OS X, BeOS/Zeta and Windows, in my case)? Open Source best guarantees the future availability of the standard and specification, but are file systems such as ext2 suitable for archival storage? Is journaling important?"

5 of 105 comments (clear)

  1. Re:What about error correction? by whovian · · Score: 2, Informative

    zfs supports checksums (http://en.wikipedia.org/wiki/Comparison_of_file_s ystems#Allocation_and_layout_policies) but it is incompatible with GPL (http://linux.inet.hr/zfs_filesystem_for_linux.htm l). However, Ricardo Correia has an alpha version of zfs for FUSE/Linux (http://zfs-on-fuse.blogspot.com).

    --
    To-do List: Receive telemarketing call during a tornado warning. Check.
  2. Re:Don't overlook popularity by RupW · · Score: 4, Informative

    Does anyone use RAR outside of the copyright infringement scene? Yep, I do. It's widely accepted, better than zip and better than .tar.gz or .tar.bz2 because it orders the files more intelligently than tar before trying to compress them. tar.rz goes some way to address that but you have to do it in two steps because rzip doesn't pipe. .tar.rz compression is about equivalent for large numbers of small files but rzip will often beat rar single large files.

    The killer feature back in the day was the first good implementation of disk splitting. But the compression still stands up now.

    On my 'if I ever get free time' list is to implement rar's file ordering in GNU tar to see if that helps gzip and bzip2 catch up RAR's compression ratio. But I've no idea if/when I'll ever get around to that.

    -- paid-up RAR user since 1996.
  3. Re:No Filesystem is Best by Anonymous Coward · · Score: 2, Informative

    Depends, a 100% par set for a 100GB archive would take forever even on the faster machines. Even a simple "small" 4GB par set for a DVD backup takes hours on an Opteron 250.

  4. Tape by vadim_t · · Score: 2, Informative

    Here's why: IMO, unless you're doing it for a company, the most important thing is convenience.

    If it's your job, sure, you'll do it whether it's convenient or not.

    If it isn't, you'll quickly get tired of messing with CDs, plugging/unplugging hard drives, etc. So I went with the most convenient media possible: tape. Stick a tape into the drive, walk away, store when it spits it out. It doesn't interfere with the computer's usage since nothing else uses tape.

    For absolute convenience, get a tape robot from ebay. Then it can be completely automatic.

    Filesystem: use plain tar to write to the tape. If you must use compression, compress files individually, not the whole tape.

    Paranoid implementation: Tapes have file marks. You can ask the tape drive to give you file #1 for instance. You can use this to store some useful stuff in a format that will always be recoverable so long you have a drive that can read the tape. Store like this:

    File 1: Text document explaining what's all this stuff, and what's on the tape.
    File 2: RFC for tar format
    File 3: RFC for compression format
    File 4: source for tar program
    File 5: source for decompression program
    File 6: backup

    A tape formatted like this should be readable so long a drive capable of reading the data in it survives. To ensure that, go with a popular tape format, which is reliable, open, and has a high capacity (so that it's unlikely to become obsolete too fast)

  5. ZFS - FTW by GuyverDH · · Score: 3, Informative

    While not as widely used (yet), it will eventually become the de-facto standard in safe filesystems.

    I've thrown all kinds of problems at it, and it has yet to lose a single byte of data.
    Add to that, taking snapshots every (x) minutes, you can look back in time as easily as reading a folder.

    With RAIDZ2 in the latest releases, you can set up sets that can withstand the loss of 2 physical drives. If you couple multiple RAIDZ2 sets into a single pool, you've increased the redundancy even further. With plain old JBOD and multiple controllers, you can reach levels of availability that only expensive EMC/Hitachi/StorEdge systems have reached in the past.

    It's opensource as well (although it's the Sun flavor at this time), and being worked on at www.opensolaris.org. I believe Sun is contemplating switching it to GPL at this time.

    --
    Who is general failure, and why is he reading my hard drive?