File Systems Best Suited for Archival Storage?
Amir Ansari asks: "There have been many comparisons between various archival media (hard drive, tape, magneto-optical, CD/DVD, and so on). Of course, the most important characteristics are permanence and portability, but what about the file systems involved? For instance, I routinely archive my data onto an external hard drive: easy to update and mirror, but which file system provides the best combination of reliability, future-proofing, data recovery, and availability across multiple platforms (Linux, OS X, BeOS/Zeta and Windows, in my case)? Open Source best guarantees the future availability of the standard and specification, but are file systems such as ext2 suitable for archival storage? Is journaling important?"
If you are not constantly editing the information (and you won't be, it's for archival purposes) the admittedly major downsides of not being journalled and being prone to fragmentation are non-issues. You might run into problem with capacity limits and/or file size limits though.
Pathman, Free (as in GPL) 3D Pac Man
I would say that ubiquity is the most important factor in being able to read something in the future, not it being open source. FAT32 is certain to be easily, if not legally, accessible for the very short expected lifetime of an external harddrive. To improve data recovery capabilities, you might like to create some archives in RAR format for error checking, with PAR2 files for redundancy and recovery. Hard drive space is cheap, so for safety keep the uncompressed files as well as the archives. Since hard drives fail, you should have more than one of them. And ideally, make DVDs also. I created some files with early betas of Openoffice 2, and it was not at all easy to open them once the file format changed before the final release. As another example, despite it being open source, the legal problems of Reiser may cause that file system to be inconvenient to access in the future. An outdated, but very popular legacy format will have support that will last far longer than people want it to. Because of the high marketshare that Wordperfect had in the days of Noah, even now you can open Wordperfect files in Word and Openoffice. If you think FAT32 will be unreadable anytime soon, think again.
Is this going to be relatively live, with data being mirrored onto it regularly, or is this going to be written once and accessed occasionally from then on? If you're only going to write to it a very small portion of the time, (or even WORM), journaling will be useless to you, since anything that takes out your data won't be stopped by it.
How far into the future are you going to need it? I understand the whole "not wanting to become unreadble," but honestly, no one's going to bother re-implementing a filesystem to look at their old vacation photos. Pick a popular filesystem, and you'll be sure of support down the line. FAT's still doing just fine for itself, and the ISO filesystems for CDs and DVDs will be readable as long as people are making drives for them.
All of the data integrity features on filesystems aren't going to protect against disk failure/media wearing out, and error correction on that scale is beyond the scope of any one disk to handle. Like the department jokingly advised, parity files and other methods can handle this in a robust, media-spanning manner, and protect against everything from a few flipped bits to a whole-disk data loss (assuming you have enough parity data).
I think the reason not much talk about filesystems has been going on is because they're mostly irrelevant for this task. They're designed to handle the issues of a live environment; the issues that archives face are beyond the capability of how you choose to store your data on each piece of media to solve.
Remember, there were no nuclear weapons before women were allowed to vote.
You'd be MUCH better off creating PAR2 files for the archive set, instead.
If you made 2 copies of the archive on the media, and piece 10 of both sets die, you've lost everything. If you made 1 copy of the archive, and a 10% par set, any 10% of the pieces (data and parity both) could die and you'd still have your data. If you made a 100% par set, you could lose half of the data and parity and still recover. And it doesn't matter which portions.
Add to that the fact that if you lost piece 10 in archive 1, and piece 9 in archive 2, it would be not much fun to figure out the dead pieces and make a full archive again. With PAR2, the tool will do the work for you.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
In one form or another anyway. People keep asking about the _best_ way to store data for a long time (for some definition of best)
My take on this problem is that you should use the best you reasonably can today. Then in 5 years time when there is a new technology out there, move over to that for archiveing your new data AND move your old data over while you still have working hardware.
I went from floppy disks to LS-120 drives. From LS-120 drives to CDs. From CDs to DVDs. I'll go from DVDs to whichever of HD or BD seems best in a couple of years (unless something else crops up). I might use hard drives instead but I'm not sure yet. The point is I don't need to decide until I need to store that much.
If you're playing in the big leagues do the same with the various formats of giganto capacity tape storage etc.
Plan around the shelf-life and working life of the hardware you can get and the answer drops out.
I use ext3 on my external backup disks because:
- it is much better and more reliable than FAT32
- it is both open source and (relatively) widely used, so I expect there will always be some way to read it
- it can easily be read by attaching it to any machine and booting some Linux LiveCD or bootable USB.
- the OS which traditionally can read ext2/3 is itself open source and also widely used, so there is no fear that it would become unavailable
For archival and backup, I feel all these advantages far outweigh the slight inconvenience that the disks are not readable directly by Windows and Mac, requiring either a driver or a reboot into Linux.
The important point is to label the disks very clearly. Otherwise, someone connecting them to a Windows or Mac machine may believe the disk is empty and re-partition/re-format it! I would not only put a big explanatory label on the disk's case, but also name the volume something like "Linux-..." or "Linux-ext3-...", and also explain to persons involved (manager(s) + people handling the disks) that they are not readable in Windows (some people don't read even big labels...).
Buy something that has dedicated commercial support for the next 20-40 years
You mean like DEC or any of the other out-of-business dinosaurs?
As someone who has been through this, I can only say: do NOT buy anything that depends on "dedicated commercial support"; the companies and industry standards you think are going to be around for "20-40 years" are probably either not going to be, or they are not going to give a damn about you.
Use open standards and open formats, with multi-vendor support; that's the only way to go. And you need to keep your eyes open and move to new formats and standards as the world changes.
If LTO is the right choice, it's the right choice because of that. But I'm not convinced that LTO is going to be long-lived enough as a standard, no matter how many companies have tied their fortunes to it right now.
Stone? Easily chipped or cracked if dropped, low tensile strength, not very portable? No thanks.
Try thin metal plates. A little more difficult to etch by hand (which can be alleviated by using the right malleability of gold), but well worth it for the long-term benefits of damage-resistance and portability.
The party of stupid and the party of evil get together and do something both stupid and evil, then call it bipartisan.
Explore2fs is written and supported by one person and currently doesn't list support for Vista. I would find it hard to recommend to someone else that they use this and expect it to be a reliable solution 5 . . .10 years down the road. And if it was so easy to support ext2 on OSX then why is there no reliable support for Tiger. Last I checked into it (about a month ago) there was ONE person who was working on the project and it had been sitting idle for a while. Given that a lot of Mac users are also linux users, I don't see why there woudln't be widespread support if it was "quite easy". The advantage to the FAT filesystem is that it has been around forever with little changes. It will support MOST archival requirements for file size, etc.
The downside of gold is that invading Conquistadors (or otherwise no-good people) might try to melt it down into bars or bullion, destroying your data.
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Sure, I could use archives with checksums or RAID, but it'd be nice if there was an option to sacrifice some speed and space on a single form of storage to improve the reliability without going to such cumbersome lengths.