Slashdot Mirror


Ask Slashdot: Best File System For the Ages?

New submitter Kormoran writes: After many, many years of internet, I have accumulated terabyte HDDs full of software, photos, videos, eBooks, articles, PDFs, music, etc. that I'd like to save forever. The problem is, my HDDs are fine, but some files are corrupting. Some videos show missing keyframes and some photos are ill-colored. RAID systems can protect online data (to a degree), but what about offline storage? Is there a software solution, like a file system or a file format, specifically tailored to avoid this kind of bit rot?

21 of 475 comments (clear)

  1. Stone tablet and chisel by Anonymous Coward · · Score: 5, Funny

    I prefer to chisel the 0s and 1s into a stone tablet. Very secure, no bit rot.

    1. Re: Stone tablet and chisel by I'm+just+joshin · · Score: 4, Funny

      I give you the 15
      *drops one*
      10 commandments...

    2. Re:Stone tablet and chisel by magarity · · Score: 4, Funny

      Yes, weathering. That is why casting in bronze is vastly superior to mere chiseling in stone.

  2. bit rot by Anonymous Coward · · Score: 5, Informative

    zfs

    1. Re:bit rot by Narcocide · · Score: 5, Insightful

      It's pretty sad that in this day and age, only one person has highlighted the relevance of ZFS here, and they're an AC. Someone mod parent up. RAID is borderline necessary if you don't have multiple backups, (to recover from in the event of random corruption caused by gamma rays from outer space or a butterfly flapping their wings on another continent or whatever) but so far as I know, only ZFS has built-in checksumming to detect/prevent the data corruption in the first place.

    2. Re:bit rot by tlhIngan · · Score: 4, Informative

      It's pretty sad that in this day and age, only one person has highlighted the relevance of ZFS here, and they're an AC. Someone mod parent up. RAID is borderline necessary if you don't have multiple backups, (to recover from in the event of random corruption caused by gamma rays from outer space or a butterfly flapping their wings on another continent or whatever) but so far as I know, only ZFS has built-in checksumming to detect/prevent the data corruption in the first place.

      No, RAID Is not sufficient to prevent bit-rot. In fact, RAID can accelerate it. You see, using a redundant mode like 1, 5, 6, most controllers (software and hardware) will only read enough disks to get the data, 1 drive in the case of RAID1, N-1 for RAID5 and N-2 for RAID6 (the non-parity ones, to save a parity calculation). But the drives can return bit errors - it's rare, but it does happen (there's a undetectable fault error rate, something along the lines of 1 in 10^20 bytes read or so will have an undetected error). And this the RAID controller will happily return to you since it didn't check the redundant drives to verify correctness. And it's possible it gets written back corrupted, thus causing corruption.

      You really need something like ZFS which puts a checksum on every file and verifies it, so if it does get an error it can resolve it.

    3. Re:bit rot by __aaclcg7560 · · Score: 4, Informative

      You really need something like ZFS which puts a checksum on every file and verifies it, so if it does get an error it can resolve it.

      ZFS also has its own flavors of RAID 1/5/6.

    4. Re:bit rot by MightyMartian · · Score: 4, Informative

      Whose to say zfs will be around in a few decades?

      The real solution here is relatively frequent backups, multiple copies in different filesystem and physical formats (ie. flash, hard drive, optical). Over time you just keep moving your file store to the new mediums. I have files that are over twenty five years old now, some of them coming from DOS and Windows 3.1, others from my old original Slackware 3 installs. Along the way some of those files have been on CD-Rs, DVDs, early USB thumb drives, various hard drives running everything from FAT, FAT32, ReiserFS, HPFS, NTFS, ext2 and ext3. And I'll keep on doing that until I drop dead, and I'll leave it up to my family to decide whether they want to keep any of the documents, pictures, music files, videos and so on that I've been collecting.

      At no point do I ever assume a mere file system sitting on one physical and/or logical volume is ever going to do the job of keeping my files available over the long haul. RAID and file systems in all their glory are not intended for that. Multiple physical copies at multiple locations on multiple types of media, that's the only real way to assure your files remain accessible and safe over time.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    5. Re:bit rot by Spazmania · · Score: 4, Informative

      He said a filesystem for the ages. While it has wonderful features, ZFS isn't even a filesystem for this age, let along ages to come. FAT32 and ISOFS are your best bets for being readable 20 years from now.

      Bear in mind that your hard disk checksums each block and returns an error if the block is uncorrectable upon read rather than give you bad data. So, if you're getting bit rot at all then you have a hardware problem.

      With or without a hardware problem you want to be able to recover your data. The answer is par2, such as parchive or QuickPar. Par2 uses a Reed-Solomon code to take a set of source files and produce a set of recovery files such that the original files can be checked for correctness and up to N original files can be corrected where N is the number of recovery files created.

      And that's your answer. A filesystem like FAT32 or ISOFS that's likely to still be implemented in future OSes and a recovery files which let you rebuild anything that suffers from bit rot.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    6. Re:bit rot by nmb3000 · · Score: 4, Insightful

      (there's a undetectable fault error rate, something along the lines of 1 in 10^20 bytes read or so will have an undetected error)

      I just want to call this out because it's so important. That number, 10^20, sounds big, but considering the size of modern drives it's really not.

      Randomly picking the WD 8TB Red NAS drive (WD60EFRX), which is designed for consume RAID as an example:

      The spec sheet says the URE (unrecoverable read error) rate is at worst 1 x 10^14 per bits read. However, that drive holds 8 x 10^12 bytes! If you were to read every single byte there is about a 64% chance that at least 1 bit is read incorrectly.

      (8 x 8 (bits per byte) x 10^12) / (1 x 10^14) = 64,000,000,000,000 / 100,000,000,000,000 = 0.64

      Correct my math if I'm wrong, but this should make anyone think twice about using any kind of RAID as a "backup" solution. If you have a disk fail you have a better than 50/50 chance of introducing corrupt data during the rebuild process!

      Frankly, ZFS-style checksumming is the future of files systems. It has to be for any data you care about.

      --
      "What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
      /)
    7. Re:bit rot by grcumb · · Score: 4, Funny

      (there's a undetectable fault error rate, something along the lines of 1 in 10^20 bytes read or so will have an undetected error)

      I just want to call this out because it's so important. That number, 10^20, sounds big, but considering the size of modern drives it's really not.

      Vhrist, you guys. Why so p[aranoid? FAT has been workking just fine since day one, and there's not reason to beliveve it won't keep workingn that way for

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
    8. Re:bit rot by Spazmania · · Score: 4, Informative

      "ZFS isn't even a filesystem for this age" - WTF does that even mean?

      It means that even back when FAT was a johnny come lately it already had greater market penetration than ZFS. With decades behind it and broad market penetration today, there's good reason to believe it won't vanish with the advent of the next development in filesystem architecture. ZFS is likely to be a blip on the radar, a pause before the next innovation. Not what you want for an archival format.

      Bit-rot is an issue inherent to any storage medium

      Bit rot, aka corrupted data, is not inherent to correctly operating hardware. As implemented, you'll see tens of thousands of unreadable blocks on a hard disk before you see a single one in which data has been undetectably corrupted. Every single sector gets a checksum in hardware and if the checksum does not pass you get the famous Abort Retry Ignore. For most storage you get Forward Error Correction coding so that some number of bit errors can be corrected on read before having to throw an error.

      When you see bit rot, the storage media is usually not at fault. More often the data passes through faulty non-parity ram, a noisy memory bus or an overheated controller and gets corrupted on its way to storage rather than getting corrupted at rest on the storage. It died when you used an overclocked piece of garbage to copy it from an old hard disk to a newer, bigger one.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  3. Error correction codes. PAR2, btrfs, partitions,VM by raymorris · · Score: 5, Informative

    The magic phrase to Google is "error correction codes" (ECC).

    PAR2 uses Reed-Solomon error correction. parchive is the ECC file format specification, for Linux you will want PyPar or par2tbb, and on Windows you use a GUI called QuickPar.

    Btrfs can be set to use ECC on a single disk.

    You can slice a single disk into partitions and then use RAID1 or LVM mirroring, or RAID5 or RAID6. LVM can alao be useful to divide (and combine) any number of drives into any number of volumes, then you can RAID across the volumes.

    If you Google "ecc disk", "ecc backup", or "ecc archive" you'll find other options, with details about each option.

  4. ZFS and lots of redundancy by Chewbacon · · Score: 4, Informative

    ZFS will guard against bit rot. That's not enough. RAID isn't enough. You need redundancy outside your home or office. Cloud maybe expensive for the amount of data you have, but Amazon S3 maybe the most affordable in that range. You could get S3 for maybe $15-20 a month if you have a terabyte of data. If that's cost prohibitive, rotate external drives regularly and keep one at work. You'll lose very little data since you're archiving things.

    --
    Chewbacon
    The Bible is like Wikipedia: written by a bunch of people and verifiable by questionable sources.
  5. Re:Terabytes over decades on NTFS by Narcocide · · Score: 5, Insightful

    Schrodinger's bit rot. If you never look in the box again after putting the cat in it, you can pretend it lived forever.

  6. How about getting rid of it? by swb · · Score: 4, Insightful

    You've got terabytes of information you will never access again. How about just getting rid of most of it? Pick some subset you want to keep and then buy 3 HDDs and create triple copies of it Repeat this every year and you'll probably not lose any of the information.

  7. OFFLINE Storage, with FS Access by williamyf · · Score: 5, Informative

    That a job for Linear Tape FileSystem

    https://en.wikipedia.org/wiki/...

    Tape is (still) the best medium for Long Term Storge. Over the years tape (or more likely, the engineers) has agresively incorporated in the standards things like FEC codes (from reed-solomon to more exotic ones nowadays).

    And since 2010, with LTFS, you can aceess the files with the convenience of a normal filesystem (but bear in mind, access is slow as hell).

    Back up your data to tape (more than one set), and send it to specialized offline storage facilities (cimate controlled: ie. temperature/humidity/dust/light control) from different providers, in diferente geographical areas.

    Since now there is only one true-tape standard (LTO-7 released in 2015, the tape business has been shrinking, so the proliferation os standards seems to be over now), so, if you use that today, chances are you will still find equipment to read it 50 years from now. Nonetheless, keep a few (as in two or more) SYSTEMS (Computer+Drive+SW) set up so that you can re-read. A cheapo micro formfactor mobo with an Atom Pocessor (but NOT the Atom C2000series PLEASE), linux, a 1Gbps nic and a tape drive should be more than enough. ....

    Now, for Online, as other posters have said, ZFS WITH ECC memory (and therefore, a very expensive Xeon, or AMD server type mobo) and JBOD will do the trick.

    --
    *** Suerte a todos y Feliz dia!
  8. Re:Error correction codes. PAR2, btrfs, partitions by heypete · · Score: 4, Informative

    QuickPar on Windows is long-obsolete. MultiPar is the more modern variant.

  9. ZFS on Linux has software RAID. by Futurepower(R) · · Score: 4, Informative

    An Introduction to the Z File System (ZFS) for Linux.

    Quote: "ZFS is capable of many different RAID levels, all while delivering performance thatâ(TM)s comparable to that of hardware RAID controllers."

    That sounds good to me. I want to avoid hardware RAID because, when hardware RAID controllers fail, they are often difficult to replace.

  10. Slash rot by Excelcia · · Score: 5, Insightful

    Concur. File corruption due to "age" will not occur without hard read errors. Also, "ill-coloured photos" likely would not be ill-coloured in the case of actual data corruption, but would have whole blocks of hash in them. The user claims to have multiple terabyte sized hard drives - hard drives in this size category userd for archival storage are simply not old enough to be suffering data corruption due to age. The only hard drives suffering so are MFM hard drives that likely the poster wouldn't have a clue how to even interface into a current computer. Hard drives used for archival data storage will likely not age degrade before the interface standard they are based on becomes obsolete. Thus, a perfectly reasonable archival data storage strategy is to simply copy data from one hard drive to a newer (likely much larger and faster) drive when the next generation interface becomes standard, and before the previous generation is totally obsolete. For example, one can still get PATA + SATA USB adapters, SATA + M.2 adapters, etc.

    If the user who submitted this question is actually experiencing a problem at all, suggest that PEBCAK. Better explanation is the poster is not actually experiencing current problems at all, but is simply trying to sound important with inflated claims of reams of data and that Slashdot has been had.

    Further, no person with Slashdot posting authority should have been ignorant of any of the issues in this question that make its legitimacy questionable at best, and certainly not Slashdot worthy in any circumstance.

    1. Re:Slash rot by cvdwl · · Score: 4, Insightful

      Ahh, there's the Slashdot of old that I miss so much.

      --
      ... grumble, grumble, grumble, mutter, mutter, Millenium... Hand... Shrimp, I tol' 'em, I tol' 'em.