A Good Filesystem for Storing Large Binaries?
jZnat asks: "I own hundreds of gigabytes of binary data, usually backed up from other mediums such as CDs and DVDs. However, I cannot figure out which filesystem would be best for storing all this reliably. What I'm looking for is a WORM-optimized FS that also has good journaling methods to prevent data loss due to some natural disaster while data is being shifted around. Trying something new for once, I tried using SGI's XFS due to its promising details, but I was met with countless IO errors after trying to write large amounts of data to it. I feel that Ext3 is not optimal for this; ReiserFS is too slow when it comes to reading large data files; and Reiser4 isn't mature enough to entrust my digital assets to. What filesystem would be most appropriate for these needs?"
I know this is not exactly, what you are looking for, but database companies have very similar problem on their hands and since filesystems usually are not quite good for this type of work, they usually come up with their own systems for handling raw disks. For example Oracle has its ASM (Automated Storage Management). You might want to look into these if they are not customizable for your problem or contact the relevant companies for specifics.
If programs would be read like poetry, most programmers would be Vogons.
ZFS has some built-in volume management & data integrity functions that would probably work for you. I don't believe that it is available for Linux, but is freely available via Solaris & OpenSolaris
http://www.sun.com/software/solaris/zfs.jsp
Conformity is the jailer of freedom and enemy of growth. -JFK
Most fancy filesystems like ReiserFS are optimized for performance with lots and lots of tiny files where the disk reads little at a time, seeking, sorting, assembling, slicing etc take most of the time. Here you have few big files, so performance is your least worry - the harddrive read/write speed will be the bottleneck, and all the seeks, directory reads etc will be scarce and fast. Therefore the filesystem won't change much in the means of speed. (it MAY break a lot in the department, like, say compressed filesystems, but won't speed it up above what the harddisk does, and most of filesystems will perform just the same in the means of speed.) What you can do is to optimize the filesystem for capacity, reducing its overhead and allowing to get closer to "advertised disk capacity".
Just use tune[23]fs to reduce number of inodes significantly on the ext3fs. Or look for -simple- filesystems that don't do tricks in optimization of speed (because these usually waste diskspace), just store your files in a straightforward manner.
Anagram("United States of America") == "Dine out, taste a Mac, fries"
If you're looking for a filesystem to archive things indefinitely, avoid exotic new kids on the block with limited OS support and even more limited toolkit support.
You want a filesystem you'll be able to read at any point in the future and, should the worst happen, one which you'll have a reasonable chance of being able to recover.
ext2 and fat32 tend to write files in nice large chunks and there are lots and lots of recovery tools for damaged filesystems. Journaled filesystems like to put little pieces all over the place, and recovery of a badly damaged filesystem is next to hopeless.
There is no call for a complex filesystem just because you want to store large files. ext2 (and to some extent fat32) will do just fine, and you'll be glad for them someday in the future when something breaks.
Around 1997, I discovered the magic of mpeg-layer3. I hung out in #mpeg3 on effnet and was part of what was probably the first ever mp3 trading circle. An aquaintance of mine had a CD of the rare Nirvana/Jesus Lizard single, which had Nirvana's "Oh The Guilt" on it. I borrowed it from him and ripped it to wave and encoded it a 256KB mp3 and returned the CD. Over the next year or so, quite a few people nabbed the song from me during normal trading sessions in #mpeg3. Sometime later I made a boo-boo and lost a folder permanently, and one of the files in it was that song. I was bummed, as the person I borrowed the CD from was gone and the CD was long out of print and cost a lot of money if you happened to find a copy. I forgot about it.
Quite a few years later - I think ~2002, I was on some p2p app, typed in "Oh the Guilt" and got a hit. I downloaded it, and it was a 256KB mp3 of the song. The file modification date in 1997, and the tags were typed in exactly the I would have put them if I had encoded the song. I can't prove it, but I'm pretty sure I got my file back.
I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.
The only problem I've ever run into on JFS was trying to use it with NFS (specifically recommended against at the time). The JFS team was very responsive to my problems and very shortly had all the issues worked out. And during all the problems, the FS never lost a file that had been successfully written, even though the kernel locked up in interesting ways. Note that Linux JFS maps to AIX's JFS2 and is extent based, and very different animal than the original JFS. I'm not sure if this has been implemented on the mainframe as well, AIX is NOT the native OS of the IBM mainframe, FWIW, but given the amount of error checking that goes into their mainframe products I find it doubtful that files disappeared for any reason other than user-error.
You are in a maze of twisted little posts, all alike.
Ext2fs mounted with the 'sync' option.
For large sequential writes, nothing could possibly be more reliable or any faster. Your hard drive's pure IO speed will be the bottleneck unless you are writing to multiple files simultaneously, in which case fancy filesystems come in handy.
If that doesn't suit your needs, you haven't described them well enough for anyone to understand.
I feel hungry.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
But, after trying just about every FS under the sun for my backups, on Linux, FreeBSD, and OS X, I finally settled on Mac HFS+ with journaling and case-sensitivity enabled. I have a 900GB RAID with it on it, and I'm storing some files that are 7GB+. I haven't had any issues with it at all.
Yep, it means you will probably need a mac, but Linux does have HFS support (I don't know how good it is). But everything is working out great, and supposedly has some sort of auto-defrag, but I'm too lazy to actually verify this.
Need Free Juniper/NetScreen Support? JuniperForum
Which is why I asked if the poster was serious about using WORM or not.
ISO-9660 is not the same as UDF. If you have UDF and ISO-9660 on the same volume it is because some one mastered a hybrid filesystem structure onto the disc. Which was the norm on first generation DVD's.
ISO-9660 contains no optimizations for being a WORM filesystem, there are no linking records in ISO-9660 to allow re-writing of data into new blank spots on the non-rewriteable storage media, UDF supports these linking blocks.
When I wrote a UDF filesystem for Linux I tested it by building the structures into blocks on hard disk storage devices. It can be done but UDF isn't designed as a high performance FS, but rather as a highly interchangeable FS.
[completely offtopic]
I always caught shit from other mp3'ers back in the late 90's because of my 'huge' 256kb songs. People that would download from me would frequently complain that my files were too big and that there was no use encoding them at bitrates that high because "128kbps was already CD Quality".
It was also really easy to start flamewars by bringing up the topic. You could just go into an mp3 IRC channel, make an offhand comment like "128kbps mp3 files sound like crap; 192-156 is really needed to approach true CD Quality", and people would immediately start arguing with you - probably in a subconscious effort to justify the fact that they had spent the last three months encoding their entire CD collection at 128kbps.
I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.