On the State of Linux File Systems

← Back to Stories (view on slashdot.org)

On the State of Linux File Systems

Posted by kdawson on Saturday November 29, 2008 @08:45AM from the here-hold-this-for-me dept.

kev009 writes to recommend his editorial overview of the past, present and future of Linux file systems: ext2, ext3, ReiserFS, XFS, JFS, Reiser4, ext4, Btrfs, and Tux3. "In hindsight it seems somewhat tragic that JFS or even XFS didn't gain the traction that ext3 did to pull us through the 'classic' era, but ext3 has proven very reliable and has received consistent care and feeding to keep it performing decently. ... With ext4 coming out in kernel 2.6.28, we should have a nice holdover until Btrfs or Tux3 begin to stabilize. The Btrfs developers have been working on a development sprint and it is likely that the code will be merged into Linus's kernel within the next cycle or two."

3 of 319 comments (clear)

Min score:

Reason:

Sort:

still doing fs on top of RAID :-( by r00t · 2008-11-29 09:27 · Score: 5, Interesting

We're checksumming free disk space. That's dumb.
It makes RAID rebuilds needlessly slow.
We're unable to adjust redundancy according to
the value that we place on our data. Everything
from the root directory to the access time stamps
gets the same level of redundancy.
The on-disk structure of RAID (the lack of it!)
prevents reasonable recovery. We can handle a
disk that disappears, but not one that gets
some blocks corrupted. We can't even detect it
in normal use; that requires reading all disks.
We have extremely limited transactional ability.
All we get for transactions is a write barrier.
There is no way to map from RAID troubles (not
that we'd detect them) to higher-level structures.
With an integrated system, we could do so much
better. Sadly, it's blocked by an odd sort of
kernel politics. Radical change is hard. Giving
of the simplicity of a layered approach is hard,
even when obviously inferior. There is this idea
that every new kernel component has to fit into
the existing mold, even if the mold is defective.
Re:Reiser4 by Ant+P. · 2008-11-29 10:55 · Score: 5, Interesting

Reiser4 is still being maintained, by one ex-Namesys person IIRC.
The main problem is the Linux kernel devs - they were too busy trying to find reasons to keep it out of the kernel (I can agree with their complaints about code formatting, but after that they descend deep into BS-land) to actually improve it. From the outside it sounds a lot like the story about the RSDL scheduler - completely snubbed because it stepped on the toes of one kernel dev and his pet project.
Re:ZFS!! by Kent+Recal · 2008-11-29 11:59 · Score: 5, Interesting

I hear you and I'm sure the filesystem developers have the same ideas in their heads.
The problem is that there are some really hard problems involved with these things.
In the end everybody wants basically the same thing: A volume that we can write files to.
This volume should live on a pool of physical disks to which we can add and remove disks at will and during runtime.
The unused space should always be used for redundancy, so when our volume is 50% full then we'd expect that 50% of the disks (no matter which) can safely fail at any time without data loss.
Furthermore we don't really want to care about any of these things. We just want to push physical disks into our server, or pull them, and the pool should grow/shrink automagically.
And ofcourse we want to always be able to split a pool into more volumes, as long as there's free space in the pool we're splitting from. Ideally without losing redundancy in the process.
We want all these things and on top we want maximum IOPS and maximum linear read/write performance in any situation. Oh, and we won't really be happy until a pool can span multiple physical machines (that will auto re-sync after a network split and work-as-expected over really slow and unrealiable networks), too.
ZFS is a huge step forward in many of these regards and there's a whole industry built solely around these problems.
Only time will tell which of these goals (and the ones that I omitted here) can really be achieved and how many of them can be addressed in a single filesystem.