Slashdot Mirror


A Good Filesystem for Storing Large Binaries?

jZnat asks: "I own hundreds of gigabytes of binary data, usually backed up from other mediums such as CDs and DVDs. However, I cannot figure out which filesystem would be best for storing all this reliably. What I'm looking for is a WORM-optimized FS that also has good journaling methods to prevent data loss due to some natural disaster while data is being shifted around. Trying something new for once, I tried using SGI's XFS due to its promising details, but I was met with countless IO errors after trying to write large amounts of data to it. I feel that Ext3 is not optimal for this; ReiserFS is too slow when it comes to reading large data files; and Reiser4 isn't mature enough to entrust my digital assets to. What filesystem would be most appropriate for these needs?"

20 of 214 comments (clear)

  1. JFS by member57 · · Score: 4, Informative

    I use JFS on RAID 5, no errors, uptime of 200+ days currently. Handling large files 200-300MB each all day long. Excellent performance.

    --
    If Kerry was the answer, it must have been a stupid question.
    The UN - The largest "political" cause of death.
  2. The Google Filesystem by benploni · · Score: 5, Funny

    Google made a filesystem for exactly that purpose: storing HUGE files highly reliably. OK, so it's not publically available, but it's still perfect for you (other than that).

    1. Re:The Google Filesystem by hanwen · · Score: 4, Insightful
      Google made a filesystem for exactly that purpose: storing HUGE files highly reliably. OK, so it's not publically available, but it's still perfect for you

      I doubt that. To run GFS (assuming you have the code), you need to have a big honking cluster, to replicate data across machines. Also, it assumes a different file semantics, so you need to hand-code your apps to use the different reading and writing semantics. It only works well for appending writes and streaming reads. Furthermore, GFS does not have file-locking, and concurrent writes will leave your files in an undefined states.

      --

      Han-Wen Nienhuys -- LilyPond

  3. Filesystem choice... by strredwolf · · Score: 3, Insightful

    So lets get this straight:

    You need a filesystem that can be "burned" to a medium, yet have error correction capability.

    Journaling doesn't do this. Journaling is for when you get a power surge in the middle of a write, you can get some of the data back. Currently no regular FS can do that.

    --

    --
    # Canmephians for a better Linux Kernel
    $Stalag99{"URL"}="http://stalag99.net";
  4. Possibly... by dcapel · · Score: 4, Funny

    p2pfs?

    Just upload to bittorrent, ftp, or some other p2p system, and redownload it if you need it again!

    Some small security issues may apply though...

    --
    DYWYPI?
  5. Not Linux, but try ZFS by duffbeer703 · · Score: 4, Interesting

    ZFS has some built-in volume management & data integrity functions that would probably work for you. I don't believe that it is available for Linux, but is freely available via Solaris & OpenSolaris

    http://www.sun.com/software/solaris/zfs.jsp

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  6. Hand-tuned ext2/ext3? by Vo0k · · Score: 4, Interesting

    Most fancy filesystems like ReiserFS are optimized for performance with lots and lots of tiny files where the disk reads little at a time, seeking, sorting, assembling, slicing etc take most of the time. Here you have few big files, so performance is your least worry - the harddrive read/write speed will be the bottleneck, and all the seeks, directory reads etc will be scarce and fast. Therefore the filesystem won't change much in the means of speed. (it MAY break a lot in the department, like, say compressed filesystems, but won't speed it up above what the harddisk does, and most of filesystems will perform just the same in the means of speed.) What you can do is to optimize the filesystem for capacity, reducing its overhead and allowing to get closer to "advertised disk capacity".

    Just use tune[23]fs to reduce number of inodes significantly on the ext3fs. Or look for -simple- filesystems that don't do tricks in optimization of speed (because these usually waste diskspace), just store your files in a straightforward manner.

    --
    Anagram("United States of America") == "Dine out, taste a Mac, fries"
  7. So just poo poo all the options then ask for one by thegrassyknowl · · Score: 5, Insightful

    journaling methods to prevent data loss due to some natural disaster while data is being shifted around

    Journalling doesn't do this!. Journalling helps reduce file system corruption in the event of a catastrophic failure while modifying the file system - ie, it's possible to bring it back to the last clean state before it crashed - journalling does not prevent data loss. You might say "well filesystem corruption and data loss are the same", but they are not. If the filesystem is corrupted, the data is not lost. It just becomes not easily retreivable. If the data is lost then it becomes entirely irretreivable.

    I tried using SGI's XFS due to its promising details, but I was met with countless IO errors

    Have you considered your hardware is shit? I use XFS on terabytes of raided disks and have been for more years than I remember... 5 or so? I don't see any I/O errors. XFS is very reliable and I trust it with my data.

    I feel that Ext3 is not optimal for this

    Well not all of your post was dumb!

    ReiserFS is too slow when it comes to reading large data files

    How is it slow? It takes a few microseconds longer to access the first data sector because it does some extra processing first? Give me a break. Filesystem performance for journalled filesystems is mostly bound by writing speed, and this is a function of how the journal is updated. I doubt you would notice the difference in read speed unless you ran a million tests over a million different files, took some sort of average for the filesystems and quibbled over a few milliseconds.

    Reiser4 isn't mature enough to entrust my digital assets to

    You entrust your assets digitally? Shit, why do you trust any filesystem? They are all buggy. Give me a break.

    If you don't like it, keep backups on other media; buy a tape drive and a robot and get in bed with a good archiving company to securely store the backups. Don't come one here and poo poo all of the file systems known to man then tell me "is there anything better"? About the only 4 in common use you left out were JFS (good for large databases but not much use if you have a lot of small files), FAT[12/16/32] (not much good for anything really), NTFS (see FAT, but more complex) and ISO9660. I'll concede there are others, but if you want something that's in common use so you can actually retreive your data when the world turns to shit...

    Anywho!

    --
    I drink to make other people interesting!
  8. Comparison of File Systems by NuclearDog · · Score: 5, Informative

    Comparison of FileSystems (from Wikipedia)

    Personally, I run two 300GB drives in RAID1 on UFS and am quite satisfied with it, but you seem to be incredibly, incredibly picky, so I'm sure you could find something wrong with it ;P

    ND

    --
    This statement is forty-five characters long.
  9. I/O Errors??? by Stephen+Samuel · · Score: 4, Informative
    If you're getting lots of I/O errors with XFS, I'd be inclined to look at a hardware problem (unless the I/O errors consist of attempts to read past the end of the partition -- which could be caused by you manually specifying the partition size, rather than letting mkfs.xfs figure it out).

    Like someone else said -- try using badblocks(8) -- or just use dd to make sure you can read the entire partition without errors.
    Bad disks do happen -- even new ones. Production code in Linux is generally very stable, and (unlike with windows), you can usually start with the presumption that things like I/O errors are caused by real hardware problems of some sort (even if it's just bad/loose cables).

    --
    Free Software: Like love, it grows best when given away.
  10. Keep it simple. ext2 or fat32. by Radak · · Score: 4, Interesting

    If you're looking for a filesystem to archive things indefinitely, avoid exotic new kids on the block with limited OS support and even more limited toolkit support.

    You want a filesystem you'll be able to read at any point in the future and, should the worst happen, one which you'll have a reasonable chance of being able to recover.

    ext2 and fat32 tend to write files in nice large chunks and there are lots and lots of recovery tools for damaged filesystems. Journaled filesystems like to put little pieces all over the place, and recovery of a badly damaged filesystem is next to hopeless.

    There is no call for a complex filesystem just because you want to store large files. ext2 (and to some extent fat32) will do just fine, and you'll be glad for them someday in the future when something breaks.

  11. It worked for me by toadlife · · Score: 5, Interesting

    Around 1997, I discovered the magic of mpeg-layer3. I hung out in #mpeg3 on effnet and was part of what was probably the first ever mp3 trading circle. An aquaintance of mine had a CD of the rare Nirvana/Jesus Lizard single, which had Nirvana's "Oh The Guilt" on it. I borrowed it from him and ripped it to wave and encoded it a 256KB mp3 and returned the CD. Over the next year or so, quite a few people nabbed the song from me during normal trading sessions in #mpeg3. Sometime later I made a boo-boo and lost a folder permanently, and one of the files in it was that song. I was bummed, as the person I borrowed the CD from was gone and the CD was long out of print and cost a lot of money if you happened to find a copy. I forgot about it.

    Quite a few years later - I think ~2002, I was on some p2p app, typed in "Oh the Guilt" and got a hit. I downloaded it, and it was a 256KB mp3 of the song. The file modification date in 1997, and the tags were typed in exactly the I would have put them if I had encoded the song. I can't prove it, but I'm pretty sure I got my file back.

    --
    I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.
  12. Re: ext3 works fine, did you try it? by Matt+Perry · · Score: 4, Informative
    I feel that Ext3 is not optimal for this
    Did you try it with ext3? I have 688G in a RAID5 array spread across four 250GB drives. I use ext3 and I store lots of large files (15GB free on the array right now). I have about 156GB of DVD images, mostly movies that I own and have ripped to watch using daemon tools on Windows. Some of them are rips of training video DVDs I bought for software that I use like Adobe Premiere and Audition. I frequently move large AVI files to and from the array for video projects that I'm working on. These files originate on my Windows box and can be as large as 13GB (for an hour of video footage). I've been using ext3 for years and it's never let me down or given me any problems.
    --
    Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
  13. Re:Keep it simple. ext2 or fat32. by dougmc · · Score: 3, Informative
    There is no call for a complex filesystem just because you want to store large files. ext2 (and to some extent fat32) will do just fine
    fat32 cannot handle files over 4 GB in size at all. That alone probably renders it totally unsuitable for this person's needs.

    Beyond that, I'd say pretty much anything will work fine -- most of the optimizations found in filesystems are needed for lots of small files, not a few large files. For large files, the speeds they can be accessed by various filesystems are not likely to vary more than a few percent unless you let the files get fragmented (which probably isn't a big concern here.)

    And you are right -- if something does go wrong, ext2 or ext3 will probably give you the most options for recovering it. NTFS probably has even more recovery options (and FAT even more, as mentioned), but I'm guessing the OS will be *nix. But really, if your goal is reliability, you don't want some esoteric filesystem that can recover from disk errors (because ultimately, none can, though I guess one could be designed to keep ECC codes on the same disk transparantly -- but I'm aware of no such filesystem existing) -- you want multiple copies of your data. Keeping 5-10% (or more) par2 files for your archive can help a lot in recovering it if your media goes partially bad, and having md5sums or CRC32s of all archived files can help determine if you did recover something accurately, but really there's little subsitute for multiple copies of important data in multiple geographical locations. (And no -- RAID is not a subsitute for backups, no matter how many mirrored drives you have. Not that I saw anybody suggest this yet, but it seems to always come up in response to questions like this, so consider this to be a premptive mention of that.)

  14. Ext2 rw,sync by evilviper · · Score: 4, Interesting
    What I'm looking for is a WORM-optimized FS that also has good journaling methods to prevent data loss due to some natural disaster while data is being shifted around.

    Ext2fs mounted with the 'sync' option.

    For large sequential writes, nothing could possibly be more reliable or any faster. Your hard drive's pure IO speed will be the bottleneck unless you are writing to multiple files simultaneously, in which case fancy filesystems come in handy.

    If that doesn't suit your needs, you haven't described them well enough for anyone to understand.

    I feel that Ext3 is not optimal for this;

    I feel hungry.
    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  15. Re:FAT16 by John+Nowak · · Score: 3, Funny

    You must be really fun at parties.

  16. Re:Ext3 or XFS. by baptiste · · Score: 5, Informative
    Check out the latest. What? 2003? Haven't there been any bug fixes since then?

    While it sucks you've lost data because of XFS, mant people use it heavily every day without issue (I'm one of them) I've deployed XFS across mail, database, and web servers without issue. Your statements about are total FUD. The reason the last 'release' was in 2003 is not long after that, XFS was accepted into the kernel itself. Thus there we no longer a need to 'release' XFS patches for the kernel. If you look at the command packages, you'll see them being updated on a regular basis.

    As for bugs, I think your statement of bugs not being fixed is incorrect as well. Check the closed bug list. You'll see many that are being closed. Also, in your open bug list above, it does appear rather long. But MANY of those bugs are from users who opened a bug saying 'XFS Crashed On Me' and then never followed up with more info. The XFS developers haven't cleaned many of those out it seems. Bugs in the 200s date from 2003, bugs from the 300's from 2004. Late 300's and 400's from 2005.

    So I hate you've had data loss - I wouldn't wish that on anybody (having experienced a RAID5 triple disk failure combined with backup tape failure. Thank goodness for OnTrack!) But don't post FUD about a filesystem that has performed very well for a lot of people and continues to be improved and innovative.

  17. TAR files written to raw partitions by vrmlguy · · Score: 4, Funny
    Don't laugh! Most (if not all) filesystems are optimized to handle the opposite of what you want. TAR files are designed for tape, so you won't be seeking all over the disk to get meta information, instead you'll get your data at the maximum speed supported by your hardware. TAR files are designed so that you can append files to them later, so you can use a *big* partition and just keep dumping stuff into it.

    The only drawbacks are that you have to read the entire partitioin sequentially to find things, and you can't delete files. Both of these can be fixed with a bit of Perl. Write a program that maintains an index of offsets to the files, then you can use "dd" to skip to the correct offset and read from there. More dangerously, write a program that deletes files from the middle of an archive and shuffles everything backwards to fill in the gaps. You'll want to make sure that no one is trying to read the TAR partition while this is running.

    --
    Nothing for 6-digit uids?
  18. FMWORM by davecb · · Score: 3, Informative
    Also spelled FM-WORM, a filesystem which looks like anormal NFS server but knows intimately whaqt needsto be done to deal with WORM disks.

    It's a commercial product from Siemens, which I used years ago for Sietec's large-scale imaging product.

    There is probably a Linux port: We ran it on almost everything in existance (;-))

    --dave

    --
    davecb@spamcop.net
  19. Re:Retry XFS by dwater · · Score: 3, Insightful

    I never run a computer without a UPS. IMO, they should be built into the power supplies (I wonder if you can buy such things...). I guess I've had too many computers die on me due to power issues, but power can be very unreliable in my location. UPSes are so cheap it's not worth it to *not* use them.

    --
    Max.