Ask Slashdot: Best File System For the Ages?
New submitter Kormoran writes: After many, many years of internet, I have accumulated terabyte HDDs full of software, photos, videos, eBooks, articles, PDFs, music, etc. that I'd like to save forever. The problem is, my HDDs are fine, but some files are corrupting. Some videos show missing keyframes and some photos are ill-colored. RAID systems can protect online data (to a degree), but what about offline storage? Is there a software solution, like a file system or a file format, specifically tailored to avoid this kind of bit rot?
I prefer to chisel the 0s and 1s into a stone tablet. Very secure, no bit rot.
zfs
I've got somewhere between 20-30 TB that has been accumulating for more than 20 years on NTFS, and I've never seen any examples of "bit rot". My files today are identical to what they were 20+ years ago. I have to wonder what kind of filesystem that the poster is using.
I don't respond to AC's.
The only historically tried and proven method of storing information for the long term.
If the bits on your drive are changing while the drive is offline, that isn't a filesystem issue. A filesystem issue would be if your OS wrote the wrong information to the drive, but that can't happen with an offline drive.
Still RAID is a good choice for your redundancy of choice.
Or paper: http://ollydbg.de/Paperbak/#1
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
The magic phrase to Google is "error correction codes" (ECC).
PAR2 uses Reed-Solomon error correction. parchive is the ECC file format specification, for Linux you will want PyPar or par2tbb, and on Windows you use a GUI called QuickPar.
Btrfs can be set to use ECC on a single disk.
You can slice a single disk into partitions and then use RAID1 or LVM mirroring, or RAID5 or RAID6. LVM can alao be useful to divide (and combine) any number of drives into any number of volumes, then you can RAID across the volumes.
If you Google "ecc disk", "ecc backup", or "ecc archive" you'll find other options, with details about each option.
ZFS will guard against bit rot. That's not enough. RAID isn't enough. You need redundancy outside your home or office. Cloud maybe expensive for the amount of data you have, but Amazon S3 maybe the most affordable in that range. You could get S3 for maybe $15-20 a month if you have a terabyte of data. If that's cost prohibitive, rotate external drives regularly and keep one at work. You'll lose very little data since you're archiving things.
Chewbacon
The Bible is like Wikipedia: written by a bunch of people and verifiable by questionable sources.
I'd go for any Linux file system because Linux is the platform that evolves the least. It's still in the 90s so in 2037 it will still be current.
(Watch out of the hater storm! Here they come!)
But it's kinda true if you omit the snideness of the first statement. Because it's maintained by the user base, it's less likely to "devolve" into something incompatible due to market pressure. I, myself, would go for an Apple file system but Apple isn't so keep in keeping the Mac current and it doesn't bode well for the future. There might be a great change in the horizon.
That's a well known problem to photographers, photos colors are affected over time. Keep the photo negatives in a safe place!
Slashdot, fix the reply notifications... You won't get away with it...
"Is there a software solution, like a file system or a file format, specifically tailored to avoid this kind of bit rot?"
Yes, ZFS is specifically tailored for this. Configure a zpool running RAID-Z2 with a hot spare or RAID-Z3. Half a dozen 6TB or 8TB disks should suffice.
Set it to auto-scrub regularly. Send logs and warnings to your email, and pay attention to them. (This is the hard part). Especially pay attention if they stop arriving. (This is even harder).
I have used Nexenta for some time, but the free product has a limit of 18TB of raw storage. If I was starting today I would use FreeNAS which has no such restriction.
The other comments about the futility of trying to do this long term are worth heeding, but that doesn't mean you shouldn't try. They key is to make this an active project rather than a passive archive, and to re-evaluate the best approach every few years.
ZFS is nice I use it it makes assumptions about sane gear that are not safe on desktop grade hardware. BTRFS I also use works great. But for your specific use case snapraid is the thing to use. By that use case things that never change a big pile of files you keep adding to. Mind you your going to have to replace drives over time.
No sir I dont like it.
HDDs will die. If you want something that will last for many decades or even centuries without getting corrupted then you need to stop using a volatile filesystem. The best option is to go with write once media. The best option I know is M-DISC.
M-DISC's design is intended to provide greater archival media longevity.[3][4] Millenniata claims that properly stored M-DISC DVD recordings will last 1000 years.[5] While the exact properties of M-DISC are a trade secret,[6] the patents protecting the M-DISC technology assert that the data layer is a "glassy carbon" and that the material is substantially inert to oxidation and has a melting point between 200 and 1000 C.[7][8] -- Wikipedia
Did you even bother reading the wiki you linked to or did you just copy and paste the first paragraph ?
"However, according to the French National Laboratory of Metrology and Testing at 90 C and 85% humidity the DVD+R with inorganic recording layer such as M-DISC show no longer lifetimes than conventional DVD±R.[11]"
Seriously, minimalism is underrated. There is such a thing as too much useless data. It's hard to catalog, it's hard to track, and if you sat down and sorted out what you actually could still use, most of it is probably worthless or you'd never find the time to use ever again. You might ask "well it's still worth storing IN CASE I ever find a use for it", but that's a typical data-hoarder sentiment that is unsustainable. You can't just keep buying media to store everything and never delete, it's a management nightmare results in these very issues.
I guarantee you, if you find you've deleted something and actually want to get it back, it's available somewhere on the Internet. If it's NOT, then it's a candidate for keeping. That's how minimalism works.
USB has been around 20 years, and it could be another 20 before we lose USB 2.0 / 3.0 compatibility.
Before that we had FireWire 400/800 and SCSI I/II/III. Won't be long before Apple obsoletes USB 1/2/3 for something with a much smaller connector.
I've had a theseus' ZFS pool that I started years ago on a set of PATA drives. RAID-Z2 on OpenSolaris. It's since moved to SATA drives, been expanded a few times, moved from Debian to FreeBSD to now FreeNAS.
Setup a pool with the level of redundancy you need and as technology changes use a system compatible with the old and new tech and just replace drives as needed.
This is the problem with maintaining your own hardware, and a really useful use case for cloud storage, so long as you can trust the provider to keep the hardware up to date while your files stay clean, private and available.
If you want to keep your data private, get it off the Internet. No cloud provider can guarantee your data will stay private, much less clean and available.
You've got terabytes of information you will never access again. How about just getting rid of most of it? Pick some subset you want to keep and then buy 3 HDDs and create triple copies of it Repeat this every year and you'll probably not lose any of the information.
in addition to ZFS, BTRFS also handles bitrot. I'm running a 4 disk BTRFS RAID 10 in my closet, mounting to a development machine on my desk via NFS, it's been working fine for about a year, and I scheduled a scrub a couple times a month whose purpose is exactly this, to catch and correct bitrot. It does so by using a CRC32 check, and if it detects a problem on one slice it overwrites that slice from the data on the good slice.
Also I have offline and offsite backups of very important items.
When using BTRFS read the wiki and settle on a kernel version and btrfs tools version that is sufficiently up to date, it's stabilized sufficiently for these kinds of things, but only if you are careful to run an up to date version that isn't marked as buggy on the wiki
It looks like there are (at least) two with CRC: zfs and btrfs. Here's info for btrfs CRCs: https://en.wikipedia.org/wiki/... [wikipedia.org]
You'd still need a backup or RAID solution to replace a bad black.
If only Slashdot posts had CRC or something like that, the posts wauld say what the poster intended.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
That a job for Linear Tape FileSystem
https://en.wikipedia.org/wiki/...
Tape is (still) the best medium for Long Term Storge. Over the years tape (or more likely, the engineers) has agresively incorporated in the standards things like FEC codes (from reed-solomon to more exotic ones nowadays).
And since 2010, with LTFS, you can aceess the files with the convenience of a normal filesystem (but bear in mind, access is slow as hell).
Back up your data to tape (more than one set), and send it to specialized offline storage facilities (cimate controlled: ie. temperature/humidity/dust/light control) from different providers, in diferente geographical areas.
Since now there is only one true-tape standard (LTO-7 released in 2015, the tape business has been shrinking, so the proliferation os standards seems to be over now), so, if you use that today, chances are you will still find equipment to read it 50 years from now. Nonetheless, keep a few (as in two or more) SYSTEMS (Computer+Drive+SW) set up so that you can re-read. A cheapo micro formfactor mobo with an Atom Pocessor (but NOT the Atom C2000series PLEASE), linux, a 1Gbps nic and a tape drive should be more than enough. ....
Now, for Online, as other posters have said, ZFS WITH ECC memory (and therefore, a very expensive Xeon, or AMD server type mobo) and JBOD will do the trick.
*** Suerte a todos y Feliz dia!
Its not the only solution of its type, but it is imo the best:
http://www.snapraid.it/
It is perfect for your kind of situation - long term, reliable, efficient storage of lots of data that seldom changes. Think of it as offline RAID backup, it works like RAID, but it computes parity during your backup operations "offline"..
The beauty of it, imo, is that is is not file system dependent. It works with NTFS, EXT2, HFS, whatever. It works on Linux, Windows, Macs, whatever. You don't need special controllers, and your hard drivers do not have to be matched to each other. You can even include drives on different buses (some on USB, some on SATA, whatever).
It doesn't mess with your data at all - your files are stored normally and can be accessed normally, there is no difference between using it and not using it under normal operation - there is no performance impact at all (it only does anything during backup operations - and even then it is very lightweight if your data doesn't change drastically day to day). You just schedule it to run on a regular basis and it does it thing. It detects and recovers from bit rot in much the same way as ZFS (although you need double parity or more to really ensure full protection from multiple drive failures). You can be as paranoid as you want, it just takes more storage to be more paranoid :)
It isn't good for frequently changing data, and it isn't so great for huge amounts of small files either. It takes a long time to generate parity setup if you have lots of data. You have to be comfortable with command line usage and you have to have some way to schedule jobs. Those issues aside, for things like media libraries and archival storage, it is easily the least painful, most effective solution I have ever used. And its free to boot (and opensource).
Highly Recommended.
- sigs are stupid
QuickPar on Windows is long-obsolete. MultiPar is the more modern variant.
A lot of bit rot is actually caused by faulty RAM.
When data is moved around, it has to go through RAM, and even smart filesystems like ZFS may not help you there. Servers usually have ECC memory for that reason and ZFS explicitly recommends it.
I agree on PAR2, simply because it's a file you can easily copy around, take backup off and so on. From a 1GB file I have ~3000 source blocks and ~30 recovery blocks, so I can recover from a lot of bit flips or failed 4kb sectors for a 1% size gain. If it's a photo set I usually make sure I can recover at least one completely missing photo. The nice thing is that it's sufficiently overkill you can probably go through several hardware generations without checking/repairing before you accumulate an unrecoverable number of errors. Which is good, because it's fairly CPU intensive so I wouldn't really want to go through an 8TB drive often. But I've found that an on-demand check when I actually need it is fine for content that is "in storage". It's not like it happens very often or applications and other more bit-flip sensitive formats would be screwed up quite often.
Live today, because you never know what tomorrow brings
An Introduction to the Z File System (ZFS) for Linux.
Quote: "ZFS is capable of many different RAID levels, all while delivering performance thatâ(TM)s comparable to that of hardware RAID controllers."
That sounds good to me. I want to avoid hardware RAID because, when hardware RAID controllers fail, they are often difficult to replace.
Concur. File corruption due to "age" will not occur without hard read errors. Also, "ill-coloured photos" likely would not be ill-coloured in the case of actual data corruption, but would have whole blocks of hash in them. The user claims to have multiple terabyte sized hard drives - hard drives in this size category userd for archival storage are simply not old enough to be suffering data corruption due to age. The only hard drives suffering so are MFM hard drives that likely the poster wouldn't have a clue how to even interface into a current computer. Hard drives used for archival data storage will likely not age degrade before the interface standard they are based on becomes obsolete. Thus, a perfectly reasonable archival data storage strategy is to simply copy data from one hard drive to a newer (likely much larger and faster) drive when the next generation interface becomes standard, and before the previous generation is totally obsolete. For example, one can still get PATA + SATA USB adapters, SATA + M.2 adapters, etc.
If the user who submitted this question is actually experiencing a problem at all, suggest that PEBCAK. Better explanation is the poster is not actually experiencing current problems at all, but is simply trying to sound important with inflated claims of reams of data and that Slashdot has been had.
Further, no person with Slashdot posting authority should have been ignorant of any of the issues in this question that make its legitimacy questionable at best, and certainly not Slashdot worthy in any circumstance.
btrfs is in mainline now, and has a number of years to have settled down. Even if you don't like the more advanced features, it has some that tick all the boxes:
- good on-disk checksums to detect errors (incl bit rot) for metadata and data
- RAID modes to protect from whole disk failures
- realtime and online scrubbing to detect and recover from checksum failures from another copy (RAID1) or rebuilding from parity (RAID5/6) - no action required from user (contrast with PAR solutions proposed)
- subvolumes for segregation of data if needed, especially if there is a desire to consolidate multiple older drives and especially useful to pool capacity from these disparate sources to implement RAID modes.
- online reshaping for the above So, even if you're not accessing the data frequently, if OP cares about data I'm sure it's no hassle to plug them in once a year and let a scrub run (for a couple of days if needed, I know this part of the code is still terribly slow). Even if btrfs is deprecated today, it will be a long time before support is removed from the kernel, and even longer before the last distro stops supporting it, and longer yet before that last distro release refuses to boot on whatever incarnation of hardware is available to plug the drives into. All the while the data is free to be migrated onto new spare/surplus drives and a new filesystem if needed.
Make copies of things you care about occasionally on new media. If you don't care about something, let it rot. It's very liberating, kind of like burning your desk.
Have you read my blog lately?
First off, people in this thread seem to think that all the information people are saving is about them.
I have lots of data as well, but most if it is about my kids and family. My kids are starting to hit their teens, and we still go back and watch old family videos. Most are short, under 2 min. They capture points of times in our lives that we won't get back. Our old house, in a different state... friends we had then, neighbors. Not like full documentaries on them, but fading memories. It's not important to anyone else. And if I leave that to them, they may keep it or destroy it. But that is their choice to make. With the ability to capture all of this with much more ease than the previous generation, why shouldn't we? I only have a few photos of my grandparents, and I would like to have more. But they don't exist. I don't have any videos of them at all.
It's not about being famous, or because you are important in the world's eye. There are entire professions dedicated to preserving that. It's about whatever YOU want to preserve. I once found a website that had information about my family name, I had never seen it anywhere before. Pictures and scans of documents, etc. I saved that information off, and about a year after that, the site went away. Since then, ancestry.com came to be, and I was able to use that information I saved to help stitch together our family tree.
We are in the information age, I don't understand why someone would be so opposed to preserving that.
You sound very angry about something, you should probably figure out what that is before it's too late.
My beliefs do not require that you agree with them.