Ask Slashdot: How Reliable are Enormous Filesystems in Linux?
Josh Beck submitted this
interesting question:"Hello. We're currently using
a Mylex PG card and a pile of disks to run a 120 GB RAID5 under
Linux. After some minor firmware issues with the Mylex
(which their tech support acknowledged and fixed right away!)
, we've got a very stable filesystem with a good amount of
storage. My question, though, is how far will Linux and e2fs
go before something breaks? Is anyone currently using e2fs
and Linux to run a 500+ GB filesystem? "
Josh continues...
"I have plenty of faith
in Linux (over half our servers are Linux, most of the rest are
FreeBSD), but am concerned that few people have likely
attempted to use such a large FS under Linux...the fact
that our 120 GB FS takes something like 3 minutes to
mount is a bit curious as well, but hey, how often
do you reboot a Linux box?"
A journalled file system writes all of the proposed changes to control structures (superblock, directories, inodes) into a journalling area before making those writes to the actual filesystem, then removes them from the journal after they have been committed to disk. Thus if the system goes down, you can get the disk into a sane state by replaying/executing the intention journal instead of checking every structure; thus an fsck can take seconds instead of minutes (or hours).
For example, if you're going to unlink the last link to a file (aka delete the file), that involves an update to the directory, inode, and free list. If you're on a non-journalled system and update the directory only, you have a file with no link (see /lost+found); if you update the directory and inode only, you have blocks missing from your free list. Both of these require scanning the whole disk in order to fix; but a journalled system would just update the directory, inode, and free list from the journal and then it would be sane.
Problems with journalled filesystems include conflicts with caching systems (e.g., DPT controllers, RAID subsystems with cache) where the intention journal is not committed to physical disk before the writes to the filesystem commence.