On the State of Linux File Systems

← Back to Stories (view on slashdot.org)

On the State of Linux File Systems

Posted by kdawson on Saturday November 29, 2008 @08:45AM from the here-hold-this-for-me dept.

kev009 writes to recommend his editorial overview of the past, present and future of Linux file systems: ext2, ext3, ReiserFS, XFS, JFS, Reiser4, ext4, Btrfs, and Tux3. "In hindsight it seems somewhat tragic that JFS or even XFS didn't gain the traction that ext3 did to pull us through the 'classic' era, but ext3 has proven very reliable and has received consistent care and feeding to keep it performing decently. ... With ext4 coming out in kernel 2.6.28, we should have a nice holdover until Btrfs or Tux3 begin to stabilize. The Btrfs developers have been working on a development sprint and it is likely that the code will be merged into Linus's kernel within the next cycle or two."

3 of 319 comments (clear)

Min score:

Reason:

Sort:

The article is incorrect with respect to ext4... by tytso · 2008-11-29 08:49 · Score: 5, Informative

The article states that ext4 was a Bull project; and that is not correct.
The Bull developers are one of the companies involved with the ext4 development, but certainly by no means were they the primary contributers. A number of the key ext4 advancements, especially the extents work, was pioneered by the Clusterfs folks, who used it in production for their Lustre filesystem (Lustre is a cluster filesystem that used ext3 with enhancements which they supported commercially as an open source product); a number of their enhancements went on to become adopted as part of ext4. I was the e2fsprogs maintainer, and especially in the last year, as the most experienced upstream kernel developer have been responsible for patch quality assurance and pushing the patches upstream. Eric Sandeen from Red Hat did a lot of work making sure everything was put together well for a distribution to use (there are lots of miscellaneous pieces for full filesystem support by a distribution, such as grub support, etc.). Mingming Cao form IBM did a lot of coordination work, and was responsible for putting together some of the OLS ext4 papers. Kawai-san from Hitachi supplied a number of critical patches to make sure we handled disk errors robuestly; some folks from Fujitsu have been working on the online defragmentation support. Aneesh Kumar from IBM wrote the 128->256 inode migration code, as well as doing a lot of the fixups on the delayed allocation code in the kernel. Val Henson from Red Hat has been working on the 64-bit support for e2fsprogs in the kernel. So there were a lot of people, from a lot of different companies, all helping out. And that is one of the huge strengths of ext4; that we have a large developer base, from many different companies. I believe that this wide base of developer is support is one of the reasons why ext3 was more succesful, than say, JFS or XFS, which had a much smaller base of developers, that were primarily from a single employer.
Re:what fs out there... by tytso · 2008-11-29 08:55 · Score: 5, Informative

Ext4 supports up to 128 megabytes per extent, assuming you are using a 4k blocksize. On architectures where you can use a 16k page size, ext4 would be able to support 2^15 * 16k == 512 megs per extent. Given that you can store 341 extent descriptors in a 4k block, and 1,365 extent descriptors in a 16k block, this is plenty...
Re:The article is incorrect with respect to ext4.. by tytso · 2008-11-29 09:04 · Score: 5, Informative

Oh, by the way... forgot to mention. If you are looking for benchmarks, there are some very good ones done by Steven Pratt, who does this sort of thing for a living at IBM. They were intended to be in support of the btrfs filesystem, which is why the URL is http://btrfs.boxacle.net/. The benchmarks were done in a scrupulously fair way; the exact hardware and software configurations used are given, and multiple workloads are described, and the filesystems are measured multiple times against multiple workloads. One interesting thing from these benchmarks is that sometimes one filesystem will do better at one workload and at one setting, but then be disastrously worse at another workload and/or configuration. This is why if you want to do a fair comparison of filesystems, it is very difficult in the extreme to really do things right. You have to do multiple benchmarks, multiple workloads, multiple hardware configurations, because if you only pick one filesystem benchmark result, you can almost always make your filesystem come out the winner. As a result, many benchmarking attempts are very misleading, because they are often done by a filesystem developer who consciously or unconsciously, wants their filesystem to come out on top, and there are many ways of manipulating the choice of benchmark or benchmark configuration in order to make sure this happens.
As it happens, Steven's day job as a performance and tuning expert is to do this sort of benchmarking, but he is not a filesystem developer himself. And it should also be noted that although some of the BTRFS numbers shown in his benchmarks are not very good, btrfs is a filesystem under development, which hasn't been tuned yet. There's a reason why I try to stress the fact that it takes a long time and a lot of hard work to make a reliable, high performance filesystem. Support from a good performance/benchmarking team really helps.