On the State of Linux File Systems

← Back to Stories (view on slashdot.org)

On the State of Linux File Systems

Posted by kdawson on Saturday November 29, 2008 @08:45AM from the here-hold-this-for-me dept.

kev009 writes to recommend his editorial overview of the past, present and future of Linux file systems: ext2, ext3, ReiserFS, XFS, JFS, Reiser4, ext4, Btrfs, and Tux3. "In hindsight it seems somewhat tragic that JFS or even XFS didn't gain the traction that ext3 did to pull us through the 'classic' era, but ext3 has proven very reliable and has received consistent care and feeding to keep it performing decently. ... With ext4 coming out in kernel 2.6.28, we should have a nice holdover until Btrfs or Tux3 begin to stabilize. The Btrfs developers have been working on a development sprint and it is likely that the code will be merged into Linus's kernel within the next cycle or two."

12 of 319 comments (clear)

The article is incorrect with respect to ext4... by tytso · 2008-11-29 08:49 · Score: 5, Informative

The article states that ext4 was a Bull project; and that is not correct.
The Bull developers are one of the companies involved with the ext4 development, but certainly by no means were they the primary contributers. A number of the key ext4 advancements, especially the extents work, was pioneered by the Clusterfs folks, who used it in production for their Lustre filesystem (Lustre is a cluster filesystem that used ext3 with enhancements which they supported commercially as an open source product); a number of their enhancements went on to become adopted as part of ext4. I was the e2fsprogs maintainer, and especially in the last year, as the most experienced upstream kernel developer have been responsible for patch quality assurance and pushing the patches upstream. Eric Sandeen from Red Hat did a lot of work making sure everything was put together well for a distribution to use (there are lots of miscellaneous pieces for full filesystem support by a distribution, such as grub support, etc.). Mingming Cao form IBM did a lot of coordination work, and was responsible for putting together some of the OLS ext4 papers. Kawai-san from Hitachi supplied a number of critical patches to make sure we handled disk errors robuestly; some folks from Fujitsu have been working on the online defragmentation support. Aneesh Kumar from IBM wrote the 128->256 inode migration code, as well as doing a lot of the fixups on the delayed allocation code in the kernel. Val Henson from Red Hat has been working on the 64-bit support for e2fsprogs in the kernel. So there were a lot of people, from a lot of different companies, all helping out. And that is one of the huge strengths of ext4; that we have a large developer base, from many different companies. I believe that this wide base of developer is support is one of the reasons why ext3 was more succesful, than say, JFS or XFS, which had a much smaller base of developers, that were primarily from a single employer.
Re:what fs out there... by tytso · 2008-11-29 08:55 · Score: 5, Informative

Ext4 supports up to 128 megabytes per extent, assuming you are using a 4k blocksize. On architectures where you can use a 16k page size, ext4 would be able to support 2^15 * 16k == 512 megs per extent. Given that you can store 341 extent descriptors in a 4k block, and 1,365 extent descriptors in a 16k block, this is plenty...
Re:The article is incorrect with respect to ext4.. by tytso · 2008-11-29 09:04 · Score: 5, Informative

Oh, by the way... forgot to mention. If you are looking for benchmarks, there are some very good ones done by Steven Pratt, who does this sort of thing for a living at IBM. They were intended to be in support of the btrfs filesystem, which is why the URL is http://btrfs.boxacle.net/. The benchmarks were done in a scrupulously fair way; the exact hardware and software configurations used are given, and multiple workloads are described, and the filesystems are measured multiple times against multiple workloads. One interesting thing from these benchmarks is that sometimes one filesystem will do better at one workload and at one setting, but then be disastrously worse at another workload and/or configuration. This is why if you want to do a fair comparison of filesystems, it is very difficult in the extreme to really do things right. You have to do multiple benchmarks, multiple workloads, multiple hardware configurations, because if you only pick one filesystem benchmark result, you can almost always make your filesystem come out the winner. As a result, many benchmarking attempts are very misleading, because they are often done by a filesystem developer who consciously or unconsciously, wants their filesystem to come out on top, and there are many ways of manipulating the choice of benchmark or benchmark configuration in order to make sure this happens.
As it happens, Steven's day job as a performance and tuning expert is to do this sort of benchmarking, but he is not a filesystem developer himself. And it should also be noted that although some of the BTRFS numbers shown in his benchmarks are not very good, btrfs is a filesystem under development, which hasn't been tuned yet. There's a reason why I try to stress the fact that it takes a long time and a lot of hard work to make a reliable, high performance filesystem. Support from a good performance/benchmarking team really helps.
Re:ZFS!! by TheRaven64 · 2008-11-29 09:15 · Score: 5, Funny

Sun has released it under a proper license and we can finally have 1 unified filesystem. The 'we' in this case being Solaris, Mac OS X, and FreeBSD users, of course.

--
I am TheRaven on Soylent News
still doing fs on top of RAID :-( by r00t · 2008-11-29 09:27 · Score: 5, Interesting

We're checksumming free disk space. That's dumb.
It makes RAID rebuilds needlessly slow.
We're unable to adjust redundancy according to
the value that we place on our data. Everything
from the root directory to the access time stamps
gets the same level of redundancy.
The on-disk structure of RAID (the lack of it!)
prevents reasonable recovery. We can handle a
disk that disappears, but not one that gets
some blocks corrupted. We can't even detect it
in normal use; that requires reading all disks.
We have extremely limited transactional ability.
All we get for transactions is a write barrier.
There is no way to map from RAID troubles (not
that we'd detect them) to higher-level structures.
With an integrated system, we could do so much
better. Sadly, it's blocked by an odd sort of
kernel politics. Radical change is hard. Giving
of the simplicity of a layered approach is hard,
even when obviously inferior. There is this idea
that every new kernel component has to fit into
the existing mold, even if the mold is defective.
I propose a new file system.. by Minigun_Fiend · 2008-11-29 09:36 · Score: 5, Funny

..called TLDRFS It simply ignores any files larger than 64KB.
Re:Lightweight by Anonymous Coward · 2008-11-29 09:46 · Score: 5, Funny

Appologies accepted.
Anonymous Cow
Re:ZFS!! by diegocgteleline.es · 2008-11-29 09:53 · Score: 5, Insightful

ZFS has redefined the way future filesystems are going to be designed. But there is no way that it's going to be the "last" filesystem.
As shocking as it may seem to those who have drunk the marketing kool aid, we'll see more filesystems. Filesystem research is as alive as it always was. They'll try to copy the good ideas of ZFS and they will try to avoid the disadvantages (which every software has). So you are never going to have "1 unified filesystem". It's never going to happen. And it's a good thing.
Re:what fs out there... by Knuckles · 2008-11-29 09:54 · Score: 5, Funny

You seem very knowledgeable regarding filesystems in general.
Dude, it should have been a hint when tytso wrote,

I was the e2fsprogs maintainer, and especially in the last year, as the most experienced upstream kernel developer have been responsible for patch quality assurance and pushing the patches upstream.

--
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
Alllrig.... Oh. by Gazzonyx · 2008-11-29 09:55 · Score: 5, Funny

Never have I been so happy and so angry in such a short period of time. I salute you, yet still shake my fist angrily in your general direction.

--
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Re:Reiser4 by Ant+P. · 2008-11-29 10:55 · Score: 5, Interesting

Reiser4 is still being maintained, by one ex-Namesys person IIRC.
The main problem is the Linux kernel devs - they were too busy trying to find reasons to keep it out of the kernel (I can agree with their complaints about code formatting, but after that they descend deep into BS-land) to actually improve it. From the outside it sounds a lot like the story about the RSDL scheduler - completely snubbed because it stepped on the toes of one kernel dev and his pet project.
Re:ZFS!! by Kent+Recal · 2008-11-29 11:59 · Score: 5, Interesting

I hear you and I'm sure the filesystem developers have the same ideas in their heads.
The problem is that there are some really hard problems involved with these things.
In the end everybody wants basically the same thing: A volume that we can write files to.
This volume should live on a pool of physical disks to which we can add and remove disks at will and during runtime.
The unused space should always be used for redundancy, so when our volume is 50% full then we'd expect that 50% of the disks (no matter which) can safely fail at any time without data loss.
Furthermore we don't really want to care about any of these things. We just want to push physical disks into our server, or pull them, and the pool should grow/shrink automagically.
And ofcourse we want to always be able to split a pool into more volumes, as long as there's free space in the pool we're splitting from. Ideally without losing redundancy in the process.
We want all these things and on top we want maximum IOPS and maximum linear read/write performance in any situation. Oh, and we won't really be happy until a pool can span multiple physical machines (that will auto re-sync after a network split and work-as-expected over really slow and unrealiable networks), too.
ZFS is a huge step forward in many of these regards and there's a whole industry built solely around these problems.
Only time will tell which of these goals (and the ones that I omitted here) can really be achieved and how many of them can be addressed in a single filesystem.