A Good Filesystem for Storing Large Binaries?
jZnat asks: "I own hundreds of gigabytes of binary data, usually backed up from other mediums such as CDs and DVDs. However, I cannot figure out which filesystem would be best for storing all this reliably. What I'm looking for is a WORM-optimized FS that also has good journaling methods to prevent data loss due to some natural disaster while data is being shifted around. Trying something new for once, I tried using SGI's XFS due to its promising details, but I was met with countless IO errors after trying to write large amounts of data to it. I feel that Ext3 is not optimal for this; ReiserFS is too slow when it comes to reading large data files; and Reiser4 isn't mature enough to entrust my digital assets to. What filesystem would be most appropriate for these needs?"
jfs is about the only one not mentioned that in linux.
JFS.
Next question.
I use JFS on RAID 5, no errors, uptime of 200+ days currently. Handling large files 200-300MB each all day long. Excellent performance.
If Kerry was the answer, it must have been a stupid question.
The UN - The largest "political" cause of death.
Google made a filesystem for exactly that purpose: storing HUGE files highly reliably. OK, so it's not publically available, but it's still perfect for you (other than that).
FAT16 is pretty damn good for DVD backups.
I've been using JFS for about two months now, and it's been quite a plesant experiance with my anime storage. I run it on a 1.6TB array, four 400GB harddrives. It's preformance is damn fast from what I've observed copying too/from a JFS firewire drive. I trust it enough to keep data that I can't back up on it until I can get another identicle array to mirror with - only drive failure will seem to kill this FS, and these drives are about ~3 months old, so failure isn't that much of a concern anymore. I'd reccomend it for storage, havn't tried it as a system FS yet.
So lets get this straight:
You need a filesystem that can be "burned" to a medium, yet have error correction capability.
Journaling doesn't do this. Journaling is for when you get a power surge in the middle of a write, you can get some of the data back. Currently no regular FS can do that.
--
# Canmephians for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.net";
Chop the files up. Study Google File System. They store things as 'shards' which are little pieces that don't care where they live. The redundancy gives them speed and robustness. There is a white paper on teh Interweb about this.
I know this is not exactly, what you are looking for, but database companies have very similar problem on their hands and since filesystems usually are not quite good for this type of work, they usually come up with their own systems for handling raw disks. For example Oracle has its ASM (Automated Storage Management). You might want to look into these if they are not customizable for your problem or contact the relevant companies for specifics.
If programs would be read like poetry, most programmers would be Vogons.
Okay, why is this modded troll again? He misunderstood the direction of the backups, he wasn't trolling! If anything, it's -1 WRONG, not troll! Anyway - he backs up data FROM DVDs and CDs, not TO
reiser or jfs are both solid for this kind of work, with large file and volume support. personally i swear by reiser for my 2tb volume, and have had no problems so far, although there is a minor speed penalty when working with several multi-gigabyte files at once, something to do with shared fs locks/mutexes i'd imagine.
OTOH JFS is quite stable, and though it has less of the elegance in feature set I find in reiser, tends to make up for it with enhanced ruggedness and its handling of large volume/files.
Really can't recommend anything else, as you say, reiser4 is still untested for reliability imho, xfs has issues that vary from kernel to kernel, and ext3 appears quite primitive in comparision, although its journaling seems comparable to the other choices.
JFS if you need the speed, its dead fast in large scales, slower with small files, otherwise Reiser3 is an excellent all-round performer.
The first rule of USENET is you do not talk about USENET.
p2pfs?
Just upload to bittorrent, ftp, or some other p2p system, and redownload it if you need it again!
Some small security issues may apply though...
DYWYPI?
ZFS has some built-in volume management & data integrity functions that would probably work for you. I don't believe that it is available for Linux, but is freely available via Solaris & OpenSolaris
http://www.sun.com/software/solaris/zfs.jsp
Conformity is the jailer of freedom and enemy of growth. -JFK
If you're not writing to the files extensively, ext3 is perfectly fine. If you don't need the data journalling, read `man mount` regarding ext3 mount options.
Personally, I'm using XFS for the same task. Why? Because it segments the filesystem, allowing segment locking instead of filesystem locking, which is nice if you're writing multiple big files at once. I've never had a problem with it.
If you are getting countless IO errors, have you done a `badblocks` on the disk? Have you tried a different IO card or disk?
Be relentless!
I know you listed that as not working right, but I've had very good experiences with it. I do a lot of audio editing (~5G/song) and also run multiple VMWare images (all 2G+). No problems at all.
Are you sure your HD is good? What distro are you using?
"It ain't a war against drugs.it's a war against personal freedom" --Bill Hicks
Most fancy filesystems like ReiserFS are optimized for performance with lots and lots of tiny files where the disk reads little at a time, seeking, sorting, assembling, slicing etc take most of the time. Here you have few big files, so performance is your least worry - the harddrive read/write speed will be the bottleneck, and all the seeks, directory reads etc will be scarce and fast. Therefore the filesystem won't change much in the means of speed. (it MAY break a lot in the department, like, say compressed filesystems, but won't speed it up above what the harddisk does, and most of filesystems will perform just the same in the means of speed.) What you can do is to optimize the filesystem for capacity, reducing its overhead and allowing to get closer to "advertised disk capacity".
Just use tune[23]fs to reduce number of inodes significantly on the ext3fs. Or look for -simple- filesystems that don't do tricks in optimization of speed (because these usually waste diskspace), just store your files in a straightforward manner.
Anagram("United States of America") == "Dine out, taste a Mac, fries"
journaling methods to prevent data loss due to some natural disaster while data is being shifted around
Journalling doesn't do this!. Journalling helps reduce file system corruption in the event of a catastrophic failure while modifying the file system - ie, it's possible to bring it back to the last clean state before it crashed - journalling does not prevent data loss. You might say "well filesystem corruption and data loss are the same", but they are not. If the filesystem is corrupted, the data is not lost. It just becomes not easily retreivable. If the data is lost then it becomes entirely irretreivable.
I tried using SGI's XFS due to its promising details, but I was met with countless IO errors
Have you considered your hardware is shit? I use XFS on terabytes of raided disks and have been for more years than I remember... 5 or so? I don't see any I/O errors. XFS is very reliable and I trust it with my data.
I feel that Ext3 is not optimal for this
Well not all of your post was dumb!
ReiserFS is too slow when it comes to reading large data files
How is it slow? It takes a few microseconds longer to access the first data sector because it does some extra processing first? Give me a break. Filesystem performance for journalled filesystems is mostly bound by writing speed, and this is a function of how the journal is updated. I doubt you would notice the difference in read speed unless you ran a million tests over a million different files, took some sort of average for the filesystems and quibbled over a few milliseconds.
Reiser4 isn't mature enough to entrust my digital assets to
You entrust your assets digitally? Shit, why do you trust any filesystem? They are all buggy. Give me a break.
If you don't like it, keep backups on other media; buy a tape drive and a robot and get in bed with a good archiving company to securely store the backups. Don't come one here and poo poo all of the file systems known to man then tell me "is there anything better"? About the only 4 in common use you left out were JFS (good for large databases but not much use if you have a lot of small files), FAT[12/16/32] (not much good for anything really), NTFS (see FAT, but more complex) and ISO9660. I'll concede there are others, but if you want something that's in common use so you can actually retreive your data when the world turns to shit...
Anywho!
I drink to make other people interesting!
Have you looked at ZFS from Sun?
I believe at this moment it's only available on Solaris, but it's open source.
Personally I haven't had the chance to try it myself, but from what I read it seems really impressive, at least on paper. Might want to check it out.
if solaris is an option go zfs. hands down best fs if you can run it, incredible flexibility and scalability.
also, you mentioned something about burning to dvd, filesystems won't really help with that, i'd look more into taring your filesystem/ fs segments into disc sized segments, then making extra par2 files for error-resilience.
really, backing up 2tb of live data is a f*ing nightmare however you look at it, usually when i reach that hurdle i just build another machine, copy over active data, and put the old box in the closet, a sort of living backup if you will, if i ever need to go back to it. at the cost of hd's now, especially with removable sata chassis's it's the only way to handle the sheer size of data you'll be dealing with.
The first rule of USENET is you do not talk about USENET.
Actually, I wouldn't use ANY filesystem for this sort of work. The files won't change in size and I doubt they'll be deleted. It would seem more sensible to battery-back the RAM on the computer and the hard drive, use a raw partition for the data and a "sequential index" database to figure out where the data starts and how long it is. Batteries guarantee that the state of the computer will never be lost (as the RAM is now non-volatile), so you won't need journalling, and if you can guarantee the data will never fragment, you don't need the overhead of a filesystem.
It would be better if you could find some way of using core memory, rather than magnetic disks or optical media. Magnetic disks have a lifespan of a decade or less, optical media won't even last that. Core memory, on the other hand, requires a refresh rate of about once a century and is unlikely to have significant errors within the remaining lifetime of western civilization. There are only two drawbacks. It's sloooow (anyone with a Commodore 64 disk drive? anyone care to imagine something that's about a hundred times slower?) and it's bulky (you could probably replace your CD collection - but you'd need to put the people on Rhode Island somewhere else first).
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
XFS has issues that go version by version. I've had volumes work perfectly for months suddenly start spitting out I/O spam after a simple kernel upgrade. My guess is poorly applied patches, as many xfs kernel patches are added downstream by a distro or kernel contributor (MM/gentoo). just my 2c, YMMV.
The first rule of USENET is you do not talk about USENET.
I have a 3 TB XFS file system and a 10 TB XFS file system that are regularly accessed by multiple processes that read and write hundreds of gigabytes each without write errors and with excellent performance (several hundred MByte/s sustained). You may have other hardware or software issues if you're seeing errors with XFS. Try to figure out the root cause of your problem before you try another FS.
As a general rule, the latest and greatest stuff will be full of bugs. Give zfs a few years before you trust it with anything critical.
Comparison of FileSystems (from Wikipedia)
;P
Personally, I run two 300GB drives in RAID1 on UFS and am quite satisfied with it, but you seem to be incredibly, incredibly picky, so I'm sure you could find something wrong with it
ND
This statement is forty-five characters long.
UDF is commonly used on DVD media. I think it fits the criteria, being writable (unlike iso9660) but optimized for reading.
Like someone else said -- try using badblocks(8) -- or just use dd to make sure you can read the entire partition without errors.
Bad disks do happen -- even new ones. Production code in Linux is generally very stable, and (unlike with windows), you can usually start with the presumption that things like I/O errors are caused by real hardware problems of some sort (even if it's just bad/loose cables).
Free Software: Like love, it grows best when given away.
If you're looking for a filesystem to archive things indefinitely, avoid exotic new kids on the block with limited OS support and even more limited toolkit support.
You want a filesystem you'll be able to read at any point in the future and, should the worst happen, one which you'll have a reasonable chance of being able to recover.
ext2 and fat32 tend to write files in nice large chunks and there are lots and lots of recovery tools for damaged filesystems. Journaled filesystems like to put little pieces all over the place, and recovery of a badly damaged filesystem is next to hopeless.
There is no call for a complex filesystem just because you want to store large files. ext2 (and to some extent fat32) will do just fine, and you'll be glad for them someday in the future when something breaks.
ext{2,3} can handle nice, big files, and just about any version of linux can read it. You can even get modules for the Microsoft world to read ext2 filesystems. If you're looking at a read-mostly filesystem, then journaling won't get you much (other than making for a fast FSCK if you lose power).
Just remember to specify '-T largefile' on the mke2fs command line to optimize for larger files.
If you haven't thought about it yet, I'd also suggest raid5 rather than raid0. It's worth the extra expense to be able to recover from a dying drive (which will happen, sooner or later).
Free Software: Like love, it grows best when given away.
Look at this, "Six Sigma Security Inc." is an authorized sales agent for citywatcher.com...
http://www.sixsigmasecurity.u
Sounds like a bunch of hooey to me.
Around 1997, I discovered the magic of mpeg-layer3. I hung out in #mpeg3 on effnet and was part of what was probably the first ever mp3 trading circle. An aquaintance of mine had a CD of the rare Nirvana/Jesus Lizard single, which had Nirvana's "Oh The Guilt" on it. I borrowed it from him and ripped it to wave and encoded it a 256KB mp3 and returned the CD. Over the next year or so, quite a few people nabbed the song from me during normal trading sessions in #mpeg3. Sometime later I made a boo-boo and lost a folder permanently, and one of the files in it was that song. I was bummed, as the person I borrowed the CD from was gone and the CD was long out of print and cost a lot of money if you happened to find a copy. I forgot about it.
Quite a few years later - I think ~2002, I was on some p2p app, typed in "Oh the Guilt" and got a hit. I downloaded it, and it was a 256KB mp3 of the song. The file modification date in 1997, and the tags were typed in exactly the I would have put them if I had encoded the song. I can't prove it, but I'm pretty sure I got my file back.
I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
Since nearly all modern drives remap bad sectors automagically unless they're in truly pitiful shape, I'd check my cabling, connectors, termination (depending on the storage hardware platform you're using). I/O errors are not the result of inappropriateness of a filesystem for a given task, they're the result of lost data long before your filesystem ever gets a chance to f*ck it up.
STOP . AMERICA . NOW
You're saying that data recovery of journaling filesystems is worse than that of non-journaling ones? What is it that you know and that hundreds of ReiserFS, ext3, NTFS and XFS programmers don't?
Beyond that, I'd say pretty much anything will work fine -- most of the optimizations found in filesystems are needed for lots of small files, not a few large files. For large files, the speeds they can be accessed by various filesystems are not likely to vary more than a few percent unless you let the files get fragmented (which probably isn't a big concern here.)
And you are right -- if something does go wrong, ext2 or ext3 will probably give you the most options for recovering it. NTFS probably has even more recovery options (and FAT even more, as mentioned), but I'm guessing the OS will be *nix. But really, if your goal is reliability, you don't want some esoteric filesystem that can recover from disk errors (because ultimately, none can, though I guess one could be designed to keep ECC codes on the same disk transparantly -- but I'm aware of no such filesystem existing) -- you want multiple copies of your data. Keeping 5-10% (or more) par2 files for your archive can help a lot in recovering it if your media goes partially bad, and having md5sums or CRC32s of all archived files can help determine if you did recover something accurately, but really there's little subsitute for multiple copies of important data in multiple geographical locations. (And no -- RAID is not a subsitute for backups, no matter how many mirrored drives you have. Not that I saw anybody suggest this yet, but it seems to always come up in response to questions like this, so consider this to be a premptive mention of that.)
I can't seem to make a growable RAID 5 configuration; at least not growable in a useful way.
My plan was to migrate the small striped set to a larger set that included the old drives by making a raid set of the new drives, copy the data, then add drives (I would call that 'Horizontal'); or move all the data to the tops of the drives, and make a raid set of the lower segments of all the drives, and expand the segments. ('Vertical')
I'm up to using EVMS; but no useful Expand options appear to be available for RADI4/5 MD region manager, all I can do is concatenate space to the array, not expand the array itself. Except I once got it to add and resync a 'drive' to the array, making the cpacity of the array larger... except that only happened when I tried making the raid area only the first 100 gigs of each drive, and grow the segment size uniformly... instead of growing the segments, it added a link region of the empty space on the drives as if it were a single seperate drive...
Ext2fs mounted with the 'sync' option.
For large sequential writes, nothing could possibly be more reliable or any faster. Your hard drive's pure IO speed will be the bottleneck unless you are writing to multiple files simultaneously, in which case fancy filesystems come in handy.
If that doesn't suit your needs, you haven't described them well enough for anyone to understand.
I feel hungry.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
You're saying that data recovery of journaling filesystems is worse than that of non-journaling ones?
I did say on a badly damaged filesystem. Simple filesystems tend to write files out in larger contiguous segments, which makes them worlds easier to recover when something utterly trashes your filesystem.
Yes, for typical day-to-day power loss type filesystem damage, journaling is great, but if I'm having to try to recover data from a filesystem that's lost 50% of its bits, I want it to be ext2 or fat.
What's your complaint with NTFS, other than that it is closed-source? Are there any reliable comparisons of NTFS with traditional open-source alternatives?
If you don't know where you are going, you will wind up somewhere else.
At the risk of sounding silly, I will suggest HFS+, Journaled (but without ACL's). This, of course, requires that you use Mac OS X. However, some very good disk repair utilities exist for Mac OS X. As far as performance is concerned, it should be capable of easily handling uncompressed NTSC video without dropouts in a proper configuation, so I can't see that it would be a problem for storing DVD's.
You don't need all that razzmatazz. If you lose your pr0n, just download it from bittorrent again.
I would retry XFS, I've been using it for decades to serve video files, ISOs and other large files with no problems...if you end up using XFS, make sure you have a UPS - though you say that your application is write-once-read-many, so XFS caching writes isn't a problem - and XFS has aggressive caching that will increase performance and extend drive life by reducing load on the disk drives.
...I'm no expert, but I feel compelled to give my opinion.
Consider making your files{,ystem} read-only, and only writing when you need to add something. That'll protect (somewhat) you against problems when you update your kernel (not that I've ever seen any of those). Don't feel obliged to constantly update your OS to the latest version...stay a few versions behind and try to find out if anyone else has any problems before you update.
Also, consider using RAID1 to backup your data - disks are cheap, RAID1 will automatically copy your filesystem to another device, and many controllers allow drives to be hot-unplugged these days so you can use the redundant devices as a backup. With s/w RAID1, you can even use a single cheap, big, slow drive to mirror another faster RAID array, then unplug it and store it off-site somewhere.
Yes, I'd choose to get XFS working.
Not quite automagically - every time I've had a bad block develop, the drive has only remapped it on writing to it. Reading just returns various types of error (makes sense - if you're trying to recover the block in question, it might read successfully one time in a thousand, then you can write it back, and all is well again). I'm pretty sure that SCSI can return warnings that a block was hard to read, allowing the OS to rewrite the block. Evidence I've seen suggests *BSD makes use of this approach to try and pre-empt failed blocks.
ext2/3 will use triple indirect adressing to reach blocks when the filesize reaches a few gigabytes. If performance is more important then stability I would go with a filesystem that supports extents - like reiserfs or jfs - both of which are resonably stable.
Nothing, the programmers will tell you that themselves. Journalled filesystems aren't for protecting your files. They're for protecting your filesystems.
The point of a journal is that you can roll back to a consistent state of the filesystem easily in case of error -- not that you can roll back to a consistent state for a given file (or indeed any file on the filesystem). In point of fact, it's usually more difficult to recover actual data from a journalled filesystem than a traditional one, because the writing process is much more complex. What's more, if an automatic fsck is needed, it's actually a little more likely to lose some data on a journalled filesystem because a non-journal filesystem recovers based on the found files while a journalled one recovers based on a separately recorded journal (you do put your journals on a separate block device from the filesystem, right? Otherwise you're mostly wasting your effort.)
The main point of a journalled filesystem is that when both redundant circuits in your datacenter go at once (which happens) the boxes will be able to at least boot and get through the automatic fscks without a tech needing to drive out there and run the fscks himself.
All's true that is mistrusted
Why does it require OS X? Linux deals well with HFS+ in my personal experience.
You're right about the tools though. DiskWarrior is *godlike*. I cannot believe the stuff it has pulled off... stuff that made fsck cry.
If you need WORM, it will usually be for compliance purposes. And if you need it for compliance purposes, the choice of filesystem simply isn't yours. Netapp, EMC, IBM and others all provide out-of-the-box WORM solutions for not to much money that are certified WORM devices - i.e. you can turn around to your local regulator with a piece of paper in your hand that proves your WORM is really WORM. From my experience with Netapp WORM devices, you are going to be stuck with XFS - incidentally, I spent a good portion of my time yesterday trying to recover from a broken XFS filesystem on a Netapp device, 4 day after our support contract ran out (isn't it always like that?) - trying to recover an XFS filesystem is like taking sandpaper to the platters......
People who think they know everything are a great annoyance to those of us who do.
Well, I guess the best filesystem for this kind of stuff is ISO9660. Very optimized for WORM access, no file fragmentation, and anything new you write will not destroy any existing data. Guaranteed.
The only WORM optimized FS that I know of is UDF.
Of course you mention several FS' that have no support for WORM so I'm doubtful if you are serious about using WORM media?
UDF is an open international standard that supports writing data to ROM/WORM/WMRM and sequential access only media (tape). It completely optimized to support journaling/versioning through the properties of WORM.
DVD's use a ROM version of UDF.
UDF can also exploit the UDF on DVD's to create a hybrid FS of the DVD's on a UDF encoded WORM media that points to the various files on the UDF DVD disc filesystems and then allows versioning of those files on the WORM storage ie you can make updates to the ROM files via the WORM media.
If you were serious about using WORM you should check it out.
For all my WORM disks, I use either ISO 9660 or ISO 9660/UDF bridge format.
Yeah, I simply burn CD-Rs or DVDs. DVDs have the nice property of being easily stored off-site. And files are in nice large contiguous block so even if the filesystem dies you can still recover a lot. Unlike XJFReiFS 2.3.1.5, DVDs will be readable in 50 years time.
And if you need to burn really large files, just use, well, zip. And perhaps some par2 files.
Though, seriously, they're coming up with a UDF variant for hard drives too.
SCO employee? Check out the bounty
XFS is very reliable and I trust it with my data.
Man, this can't be over-emphasized. I switched a few dozen TB of data from Ext3 to XFS and my once-in-every-couple-of-months corruption problems just went away magically.
People love to do laundry lists with filesystem features, but "peace of mind" is the one item that XFS brings that the others don't.
Seeing as the POSIX API isn't transactional, data journalling doesn't really guarantee a whole lot about the integrity of your data. It only really guarantees that your files won't contain "random" data after a crash (e.g. data from other files). It certainly cannot guarantee that data will still be readable by your application in any meaningful way -- in many (most?) cases a partially updated file is as useless as a completely garbled one.
First of all, you shouldn't worry about losing data while "shifting it around". The source data shouldn't be unlinked until the operation has completed. A "sync" mounted filesystem on a drive without write caching should guarantee that the data has actually been written. Then the biggest threat to your data is going to be drive failure, which can be lessened by the use of RAID5.
As for which filesystem, I would humbly suggest the tried-and-true time-tested UFS. I don't think UDF is really what you want, because it sounds more like you are writing data and modifying it "infrequently" instead of just writing it once.
(I'm assuming software RAID here, obviously.)
Use LVM on top of MD/RAID. When enough of your physical drives have the requisite extra free space you can just create a *new* MD volume from the free space and add that to your logical volume. Example: If you had a 100GB drive with 1 partition in the RAID and replace it with a 150GB drive, say, you'd just allocate 100GB to 1 partition and allocate 50GB to a second partition, readding the first partition into the old RAID volume and waiting for it to rebuild. Do this for all your new drives. When enough of your drives have unused 50GB partitions you just create a new MD/RAID volume out of those and add that MD volume to your LV(s). Simple, really.
HAND.
The only drawbacks are that you have to read the entire partitioin sequentially to find things, and you can't delete files. Both of these can be fixed with a bit of Perl. Write a program that maintains an index of offsets to the files, then you can use "dd" to skip to the correct offset and read from there. More dangerously, write a program that deletes files from the middle of an archive and shuffles everything backwards to fill in the gaps. You'll want to make sure that no one is trying to read the TAR partition while this is running.
Nothing for 6-digit uids?
I can't see why it would matter one whit what file system you use. The only problem you list is ReiserFS being too slow (which I doubt, but anyway...). Is speed the worry?? Well, just pick a bigger block size. Make a 64kb block size or something, and just go with whatever file system you would normally use.
I tried using SGI's XFS due to its promising details, but I was met with countless IO errors after trying to write large amounts of data to it.
Whoops, this is just a troll article. Sorry, nothing to see here.
I'm willing to bet, based on the logical inconsistencies of your post, that you are infringing on copyright by downloading movies, music, and other entertainment from various internet sources.
I could be wrong, but I doubt it.
Slashdot - where whining about luck is the new way to make the world you want.
I know, I know, *BSD is dying and all, but it's still silly to leave out UFS :)
The number of inodes in a filesystem is an option that can only be specified upon filesystem creation. i.e. to mke2fs, not tune2fs.
retrorocket.o not found, launch anyway?
What you're looking for is Universal Disk Format or UDF.
It is an open standard supported by all of the major OSes and manufacturers and is the filesystem of choise for Ultra Density Optical WORM and rewritable disks.
There a drivers for Linux, Windows and all of the major UNIXes. Here is the obligatory Wikipedia entry.
Hard disk filesystems like XFS, JFS, Reiser, ZFS etc. are all wonderful at what they do but they are unsuitable for WORM disks.
Stick Men
Eh, if you read the various forums for Mac these days, the fix for most problems is always suggested to be: fsck, then fix permissions. I've yet to see another OS that need a 'fix permissions' tool at all. Oh, and HFS+ has lost me important system files. This with the most recent version of Panther. It doesn't impress me at all.
A drive will not remap on a failed read. It will remap on a read that was successfull, but was almost unsucessful, it will also remap on a write that was unsuccessful.
Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
The Lucene/Nutch guys have a port that's usable for some apps.
Since F/OSS ports exist, it seems it was very insightful, and I if there was something funny about it, I missed it.
Others have said good things in general (XFS,JFS,ext3).
I looked into filesystem comparisons in setting up a MythTV box.
My issues were:
(1) efficient use of hard drive space, and
(2) performance.
Efficient use = filesystem settings have a big effect on amount of usable space.
For ext2/3:
-m 0 = setting 'reserved space for root' to 0%. Default is 5%, which can be 10-20 GB these days, all unusable to non-root users
-T ____ = can tell ext2/3 to optimize inodes and byte-per-inode for different size average files. Largefile versus news spools (tons of small files). Because of the way that a file can be spread out and mapped across the filesystem, this has an effect on 'wasted' space, and maybe performance (# of inode entries per file to lookup).
-b, -i - can set total # of inodes and bytes-per-inode directly. Advanced control over filesystem creation
I never got around to looking into this detail for XFS/JFS - they seem have fewer such options.
Performance I'll leave it to others to talk about filesystem performance with largefiles in general.
MythTV takes a lot of writing, and as it turns out, deleting, of large temporary files for the TV features (records, pause, FF/RR). After some reading online, I've found MythTV performance is drastically impacted by filesystem choice due to all of the deleting.
http://www.mythtv.org/docs/mythtv-HOWTO-24.html#ss 24.2
http://www.gossamer-threads.com/lists/mythtv/users /52672
---SNIP---
> My last reply to myself. Based on a Googled reference, I was able to
> break my XFS 4G file size barrier by formatting the partition 'mkfs.xfs
> -dagsize=4g'. So, here are the complete results:
>
> Time to delete a 10G file, fastest to slowest:
>
> JFS: 0.9s, 0.9s
> XFS: 1.3s
> EXT3: 1.4s, 2.3s
> EXT2: 1.6s
> REISERFS: 6.2s
> EXT3 -T largefile4: 5.9s, 10.2s
>
> After running the XFS test, there didn't seem to be any point in
> reformatting the partition again, so I left it on XFS, but I think I
> would be happy with JFS, XFS, or EXT3 w/o '-T largefile4'.
>>>>
wepprop at sbcglobal
Feb 8, 2004, 2:33 AM
Post #21 of 22 (4121 views)
Re: Changing filesystems? [In reply to]
Robert Kulagowski wrote:
> Interesting. If others care to weigh in, I can either re-write the
> "Advanced Partitioning" section in the HOWTO, or whack it completely.
>
> William, can you give some background on the hardware used for your
> tests? I'd be curious if this data holds up across various drive types,
> LVM, etc. (Without trying to exhaustively test all the possibilities,
> that is)
It appears, based on my personal experience alone, that file deletes are
the only system operations that can stress the hard drive enough to
produce dropped frames. Unfortunately, as others have pointed out,
recordings and deletions go together in Myth. So, unusual as it may be,
it does make at least some sense to take file deletion performance into
account when deciding which filesystem to use for a video partition,
especially for people with multiple tuners.
The really ironic result from my personal perspective is that it would
appear that using the '-T largefile4' setting for ext3, which I was so
pleased with because it give me an extra 2G of storage, may well have
been responsible for all those recordings I had ruined by frame drops.
Assuming it works out, though, I could really get to like this XFS
filesystem because it appears to give me slightly more storage space
than ext3 w/ '-T largefile4' did and it has pretty fast deletes as well.
---SNIP---
Just another vote of confidence for XFS.
In a scientific application, I stream data from ADC boards to an XFS partition at a rate of 40-60 MB/s (depending on hard drive and write location). The data goes into a single file which eventually fills the partition (typically 250-300 GB) in a few hours. The data is then scanned through a few dozen times during analysis. No XFS problems yet.
I also experimented a bit with ReiserFS and JFS before deciding to stick with XFS.
You're obviously looking for a filesystem optimized for porn. I'm impressed that you've managed to accumulate hundreds of gigs of the stuff. Perhaps there is a Porn File System out there somewhere?
Honestly, since I submitted this about a couple months ago, I just formatted the disks to ext3 and it's worked quite well since then, but any better ideas are always welcome.
'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
> usually backed up from other mediums such as CDs and DVDs
This is a very euphemistic way of saying:
"I download moviez, mp3 and porn via P2P all day and even though I usually don't view any movie twice, I still don't want to throw away anything, because I just can't delete anything".
How that could get an "ask slashdot"-posting is left as an exercise to the reader.
Windows 2000 - from the guys who brought us edlin
I tried the ZFS beta on my 2TB raid array and it sucked ass big time. Combined with the myriad problems I had getting OpenSolaris to work properly without crashing I've since switched back to FreeBSD.
But, after trying just about every FS under the sun for my backups, on Linux, FreeBSD, and OS X, I finally settled on Mac HFS+ with journaling and case-sensitivity enabled. I have a 900GB RAID with it on it, and I'm storing some files that are 7GB+. I haven't had any issues with it at all.
Yep, it means you will probably need a mac, but Linux does have HFS support (I don't know how good it is). But everything is working out great, and supposedly has some sort of auto-defrag, but I'm too lazy to actually verify this.
Need Free Juniper/NetScreen Support? JuniperForum
Fix permissions has little to do with the file-system. Yes, its dumb, but not because of the file-system. You need to fix-permissions when apps (Carbon-based usually) that were written without awareness of FS permissions (since they were sitting on an OS whose FS didn't have fs permissions) update important files that need to be read by a host of programs. When these ignorant apps write these files with their default umasks, it wreaks all sorts of havoc. You don't repair fs permissions because the filesystem loses permissions from time to time (since it doesn't), you do it because some programs are not forwards-compatible and inadvertantly change the permissions.
Why not fork?
AIX ran on mainframe from 1990 to 1993.
Linux IS an OS and thus AIX does not run ON Linux.
These inaccuracies bring your post into question.
I see a lot of people advocating XFS. Which distro supports XFS the best?
It's a commercial product from Siemens, which I used years ago for Sietec's large-scale imaging product.
There is probably a Linux port: We ran it on almost everything in existance (;-))
--dave
davecb@spamcop.net
Journaled Universal Keeper for Entries of Binary Object eXtensiblly, or JUKEBOX for short.
No, NTFS with it's mystical magical spare set of data, somehow manages to stay dead. It's garbage, the only upside is it dies a lot less than ext2/3.
I only wish I were trolling... as so much of my lost data would attest if it weren't lost.
For WORM oriented storage Venti is really good, for the details see the paper "Venti: a new approach to archival storage"
It was originally developed for Plan 9 as the replacement of the dedicated fs kernel that used WORM jukeboxes, but now you can use it under Linux, BSD and other Unix systems because it's part of the Plan 9 from User Space" port of Plan 9 tool to Unix systems.
"When in doubt, use brute force." Ken Thompson
I have also been most impressed by XFS *except* for one very annoying issue that makes me not trust it. When editing a file and the machine goes down, the contents of that file are zero-padded, and not at the place of editing or such, but the file gets filled with zeros. Any known fixes or resolution for this? Cause?
What I'm looking for is a WORM-optimized FS that also has good journaling methods to prevent data loss due to some natural disaster while data is being shifted around.
Erm, Write Once Read Many would imply data isn't shifted, period.
I feel that Ext3 is not optimal for this
You don't provide a reason for dismissing it out of hand. It's a nice solid mature filesystem (thanks to being ext2 + additional features) that's widely supported. And if you want to use it for large files, you just need to tune it appropriately, either by manually increasing block size and blocks per inode, or using the -T flag to use a preset like one of the other posters suggested.
I'm curious about the issues reported by some of the other posts. I've been dealing with terabytes of data across hundreds of filesystems, including data with high turnover (e.g. mailspools, log servers, etc) with no data loss that wasn't attributable to hardware, like RAID controllers without battery backup that were left in write-back mode. I don't care what filesystem you're running, it won't be able to recover data that was in volitile cache during a power event.
ZFS has 128 bit file tables. To writw a file that would take up that much space to an hd would require more energy than has reached earth from the sun in the last few eons.
Why not take a look at BFS (or BeFS, depending who you ask)? It supports several petabytes of data, and it is specifically designed to handle large media files. Journaling is built in, as well as the handy metadata database (like Spotlight), if your OS can support that feature.
Of course, I can't guarantee there's a filesystem driver available for whatever OS you may be running, though the internet shows many hits for "bfs linux". You may be able to find something about that from haiku-os.org, or elsewhere on the internet.
Most of the performance numbers I've seen out there unfortunately don't take into account parallel accesses to large files by highly concurrent processes--the exact same kind of access patterns that enterprise-scale storage requirements demand. Instead, they concentrate on single-thread access patterns. This seems to be unrealistic.
Given enterprise-class accesses (extremely large files (hundreds of GB,) hundreds or thousands of accesses to those files per second, and tens or hundreds of processes accessing them simultaneously) XFS appears to excel. In terms of delete performance, XFS appears to do well there also.
I personally know of dozens of *large* companies and financial institutions who trust nothing *but* XFS as their FS of choice.
Your I/O errors indicate other issues, and are more likely not related to the design of the FS.
Make mine XFS. I just wish XFS were extended out to the *BSDs.
Please remain where you are. The RIAA will be arriving shortly to collect a copyright fee (99 Cents) and a convenience fee ($2000). There may also be unspecified interest and late charges.
www.wavefront-av.com
I found one of the Fedora Core kernels had problems with using external USB deives, MD (RAID 1 and 5) and XFS. One of the drives would go offline and corrupt the XFS file system.
If your database can't handle recovery from a snapshot, then it's junk. Get a new DB.
The POINT of all those transaction log files, DB journals, and all the other things that make ACID guarantees real, is to recover from failure. A filesystem snapshot looks exactly like a system that just had a power loss. Your database center may have redundant power backups and all that shiznit, but I bet the datacenter still has a Big Red Button, so your DB had better be able to do recovery ANYWAY.
Some DB backup tools are designed to work with filesystem snapshots. They create a fast recovery point and stop updating the main DB files until after the snapshot. Otherwise the DB has to do basically the same thing as a snapshot, holding open an hours long transaction while doing the backup. When the filesystem or block layer can do a better, more efficient job than the DB, it makes more sense to let it take the load.
Just trying to make you see that snapshots exist for good reason, they do the job very well, and you should look into using the feature; some day you might need it.
[completely offtopic]
I always caught shit from other mp3'ers back in the late 90's because of my 'huge' 256kb songs. People that would download from me would frequently complain that my files were too big and that there was no use encoding them at bitrates that high because "128kbps was already CD Quality".
It was also really easy to start flamewars by bringing up the topic. You could just go into an mp3 IRC channel, make an offhand comment like "128kbps mp3 files sound like crap; 192-156 is really needed to approach true CD Quality", and people would immediately start arguing with you - probably in a subconscious effort to justify the fact that they had spent the last three months encoding their entire CD collection at 128kbps.
I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.
XFS shouldn't barf like that. I've run terabytes of data though it without a single hiccup. Currently I've about a terabyte of data on my home system. Even perfect software can't handle intermittent hardware failures, just warn you about them. I'd get your system checked out.
Fellowship 9/11
Well my preferred file system is this...
I keep my Cd's in a large loose leaf folder. I keep my vinyl on top of the cupboard. I buy Cassette Tapes at the cheapest price I can. My partner records tapes of what ever we want for travelling. My books are semi-randomly arranged on shelves.
The web is my storage for text files and I use spreadsheets or programming languages to solve problems.
rgds,
Richard Rothwell
"All that is required for evil to triumph is that the good keep silent"
If its really a WORM-style of doing things I would skip regular filesystem completly and go with the most simple thing available, which would be good old Tar files. They are just file header and raw data, so their is not much that can go wrong in terms of filesystem integrity and even if it does they are reasonably easy to recover. They don't come with any build in ways to validate them but that might probally be add-able on top of them. There should be somewhere a tarfs floating around that would allow them to be mounted as you can a normal filesystem.
Using Sun's QFS - which is a SAN optimised file system, it's great for storing large and small files - for instance disc images and readmes. You can have a variable cluster size, where the first n blocks of a file are a small number of blocks, like say 1k, and then the rest of the file is stored in clusters of up to 64M.
Oh, you wanted something free?
Then give ZFS a go - it's free and is available in Solaris and OpenSolaris - which you can run on both SPARC and x86.
Specialist Mac support for creative pros, Melbourne