File Systems Best Suited for Archival Storage?
Amir Ansari asks: "There have been many comparisons between various archival media (hard drive, tape, magneto-optical, CD/DVD, and so on). Of course, the most important characteristics are permanence and portability, but what about the file systems involved? For instance, I routinely archive my data onto an external hard drive: easy to update and mirror, but which file system provides the best combination of reliability, future-proofing, data recovery, and availability across multiple platforms (Linux, OS X, BeOS/Zeta and Windows, in my case)? Open Source best guarantees the future availability of the standard and specification, but are file systems such as ext2 suitable for archival storage? Is journaling important?"
zfs supports checksums (http://en.wikipedia.org/wiki/Comparison_of_file_s ystems#Allocation_and_layout_policies) but it is incompatible with GPL (http://linux.inet.hr/zfs_filesystem_for_linux.htm l). However, Ricardo Correia has an alpha version of zfs for FUSE/Linux (http://zfs-on-fuse.blogspot.com).
To-do List: Receive telemarketing call during a tornado warning. Check.
The killer feature back in the day was the first good implementation of disk splitting. But the compression still stands up now.
On my 'if I ever get free time' list is to implement rar's file ordering in GNU tar to see if that helps gzip and bzip2 catch up RAR's compression ratio. But I've no idea if/when I'll ever get around to that.
-- paid-up RAR user since 1996.
Depends, a 100% par set for a 100GB archive would take forever even on the faster machines. Even a simple "small" 4GB par set for a DVD backup takes hours on an Opteron 250.
Here's why: IMO, unless you're doing it for a company, the most important thing is convenience.
If it's your job, sure, you'll do it whether it's convenient or not.
If it isn't, you'll quickly get tired of messing with CDs, plugging/unplugging hard drives, etc. So I went with the most convenient media possible: tape. Stick a tape into the drive, walk away, store when it spits it out. It doesn't interfere with the computer's usage since nothing else uses tape.
For absolute convenience, get a tape robot from ebay. Then it can be completely automatic.
Filesystem: use plain tar to write to the tape. If you must use compression, compress files individually, not the whole tape.
Paranoid implementation: Tapes have file marks. You can ask the tape drive to give you file #1 for instance. You can use this to store some useful stuff in a format that will always be recoverable so long you have a drive that can read the tape. Store like this:
File 1: Text document explaining what's all this stuff, and what's on the tape.
File 2: RFC for tar format
File 3: RFC for compression format
File 4: source for tar program
File 5: source for decompression program
File 6: backup
A tape formatted like this should be readable so long a drive capable of reading the data in it survives. To ensure that, go with a popular tape format, which is reliable, open, and has a high capacity (so that it's unlikely to become obsolete too fast)
While not as widely used (yet), it will eventually become the de-facto standard in safe filesystems.
I've thrown all kinds of problems at it, and it has yet to lose a single byte of data.
Add to that, taking snapshots every (x) minutes, you can look back in time as easily as reading a folder.
With RAIDZ2 in the latest releases, you can set up sets that can withstand the loss of 2 physical drives. If you couple multiple RAIDZ2 sets into a single pool, you've increased the redundancy even further. With plain old JBOD and multiple controllers, you can reach levels of availability that only expensive EMC/Hitachi/StorEdge systems have reached in the past.
It's opensource as well (although it's the Sun flavor at this time), and being worked on at www.opensolaris.org. I believe Sun is contemplating switching it to GPL at this time.
Who is general failure, and why is he reading my hard drive?