On the State of Linux File Systems

← Back to Stories (view on slashdot.org)

On the State of Linux File Systems

Posted by kdawson on Saturday November 29, 2008 @08:45AM from the here-hold-this-for-me dept.

kev009 writes to recommend his editorial overview of the past, present and future of Linux file systems: ext2, ext3, ReiserFS, XFS, JFS, Reiser4, ext4, Btrfs, and Tux3. "In hindsight it seems somewhat tragic that JFS or even XFS didn't gain the traction that ext3 did to pull us through the 'classic' era, but ext3 has proven very reliable and has received consistent care and feeding to keep it performing decently. ... With ext4 coming out in kernel 2.6.28, we should have a nice holdover until Btrfs or Tux3 begin to stabilize. The Btrfs developers have been working on a development sprint and it is likely that the code will be merged into Linus's kernel within the next cycle or two."

10 of 319 comments (clear)

Min score:

Reason:

Sort:

The article is incorrect with respect to ext4... by tytso · 2008-11-29 08:49 · Score: 5, Informative

The article states that ext4 was a Bull project; and that is not correct.
The Bull developers are one of the companies involved with the ext4 development, but certainly by no means were they the primary contributers. A number of the key ext4 advancements, especially the extents work, was pioneered by the Clusterfs folks, who used it in production for their Lustre filesystem (Lustre is a cluster filesystem that used ext3 with enhancements which they supported commercially as an open source product); a number of their enhancements went on to become adopted as part of ext4. I was the e2fsprogs maintainer, and especially in the last year, as the most experienced upstream kernel developer have been responsible for patch quality assurance and pushing the patches upstream. Eric Sandeen from Red Hat did a lot of work making sure everything was put together well for a distribution to use (there are lots of miscellaneous pieces for full filesystem support by a distribution, such as grub support, etc.). Mingming Cao form IBM did a lot of coordination work, and was responsible for putting together some of the OLS ext4 papers. Kawai-san from Hitachi supplied a number of critical patches to make sure we handled disk errors robuestly; some folks from Fujitsu have been working on the online defragmentation support. Aneesh Kumar from IBM wrote the 128->256 inode migration code, as well as doing a lot of the fixups on the delayed allocation code in the kernel. Val Henson from Red Hat has been working on the 64-bit support for e2fsprogs in the kernel. So there were a lot of people, from a lot of different companies, all helping out. And that is one of the huge strengths of ext4; that we have a large developer base, from many different companies. I believe that this wide base of developer is support is one of the reasons why ext3 was more succesful, than say, JFS or XFS, which had a much smaller base of developers, that were primarily from a single employer.
Re:what fs out there... by tytso · 2008-11-29 08:55 · Score: 5, Informative

Ext4 supports up to 128 megabytes per extent, assuming you are using a 4k blocksize. On architectures where you can use a 16k page size, ext4 would be able to support 2^15 * 16k == 512 megs per extent. Given that you can store 341 extent descriptors in a 4k block, and 1,365 extent descriptors in a 16k block, this is plenty...
Re:The article is incorrect with respect to ext4.. by tytso · 2008-11-29 09:04 · Score: 5, Informative

Oh, by the way... forgot to mention. If you are looking for benchmarks, there are some very good ones done by Steven Pratt, who does this sort of thing for a living at IBM. They were intended to be in support of the btrfs filesystem, which is why the URL is http://btrfs.boxacle.net/. The benchmarks were done in a scrupulously fair way; the exact hardware and software configurations used are given, and multiple workloads are described, and the filesystems are measured multiple times against multiple workloads. One interesting thing from these benchmarks is that sometimes one filesystem will do better at one workload and at one setting, but then be disastrously worse at another workload and/or configuration. This is why if you want to do a fair comparison of filesystems, it is very difficult in the extreme to really do things right. You have to do multiple benchmarks, multiple workloads, multiple hardware configurations, because if you only pick one filesystem benchmark result, you can almost always make your filesystem come out the winner. As a result, many benchmarking attempts are very misleading, because they are often done by a filesystem developer who consciously or unconsciously, wants their filesystem to come out on top, and there are many ways of manipulating the choice of benchmark or benchmark configuration in order to make sure this happens.
As it happens, Steven's day job as a performance and tuning expert is to do this sort of benchmarking, but he is not a filesystem developer himself. And it should also be noted that although some of the BTRFS numbers shown in his benchmarks are not very good, btrfs is a filesystem under development, which hasn't been tuned yet. There's a reason why I try to stress the fact that it takes a long time and a lot of hard work to make a reliable, high performance filesystem. Support from a good performance/benchmarking team really helps.
Re:But what about Windows? by Ant+P. · 2008-11-29 10:33 · Score: 2, Informative

Unless you're dealing with backward firmware/BIOS code that only understands FAT, you should be using UDF. Vista supports it, OS X supports it, Linux supports it, and everything back to win98 has readonly support - but you can get third-party drivers just like for ext2.
Re:What is more needed is a modern multi-platform by TrekkieGod · 2008-11-29 11:22 · Score: 2, Informative

The only real problem I have is there doesn't exist a modern journaling FS which would work just as well on all 3 platforms.
I agree with you that's really important. I'd also like zfs to be that filesystem. However, as long as you don't need that drive to be the root drive of your respective file system, you might be interested in some of these links:

I can use ext3, but cannot plug it into a Mac.
Give this a try. The latest news is that you get write support in Tiger, but I use it in Leopard without problems.
Also don't worry about the ext2 part. Ext3 is designed to be backwards compatible with ext2. It can be mounted as ext2 (it just won't get journaling)
You didn't ask for it, so you might already know about this windows driver. There are actually a couple out there, I think that one works the best (which is kind of unfortunate, because it's freeware, but proprietary).

I can use NTFS, but cannot write to it on a Mac.
Sure you can, same way you do it in Linux, through fuse and ntfs-3g.

I can use Mac's FS, but cannot plug it into Windows (unless I pay for a proprietary driver every time I use that disk on a different machine)
Yeah, you got me there. MacDrive works really well, but I'd like a non-proprietary version myself.
For a removable drive that you can plug in anywhere, I'd go with ntfs actually. No FAT size restrictions, no permissions (actually a plus for a removable drive), and most linux distributions come with ntfs-3g installed by default. That means you only have to install the driver in mac os x

--
Warning: Opinions known to be heavily biased.
Re:ZFS!! by SanityInAnarchy · 2008-11-29 12:23 · Score: 2, Informative

You just have to draw the layers differently.
I've repeatedly proposed something, only to find that ZFS already implements it: Define one layer which is solely responsible for storing your bare primitives, like a sequence of data. It is the FS-level equivalent of malloc/free.
Then, implement everything else on top of that layer. Databases could sit directly on the layer -- no reason they need to pretend to create files. Filesystems would sit on that layer, implementing structures like directories and POSIX file permissions.
Of course, while I'm at it, I have this other idea -- unify disk caches. It should be possible for me to allocate my entire available free space as a shared cache, between my package manager, browser, everything -- then, provide a common mechanism for reclaiming that space when something wants disk space.

--
Don't thank God, thank a doctor!
Re:ZFS!! by szaka · 2008-11-29 12:42 · Score: 2, Informative

ntfs-3g is hardly a default package in most distros
Actually it's available for over 190 distributions and it's the default one the most popular ones, e.g. Ubuntu, openSUSE, Fedora, Mandriva, Slackware, etc. Btw, thanks to FUSE, NTFS-3G also works on FreeBSD, NetBSD, OpenSolaris, OS X and some others (more in the way).
Re:Reiser4 by StormReaver · 2008-11-29 14:22 · Score: 2, Informative

"From the outside it sounds a lot like the story about the RSDL scheduler - completely snubbed because it stepped on the toes of one kernel dev and his pet project."
ReiserFS v4 wasn't included in the mainline kernel because Hans was being an even greater prick than usual to the kernel maintainers who asked him to fix his bugs and adhere to kernel coding conventions.
RSDL wasn't included in the mainline kernel because Linus considered Con to be unreliable, and wanted to have a scheduler with a developer he could count on to maintain the code and fix bugs.
Re:But what about Windows? by BrentH · 2008-11-29 23:51 · Score: 2, Informative

You and the parent need to get the ublio version of ntfs-3g for OSX: http://macntfs-3g.blogspot.com/ Performance is excellent.
No xfs fsck? eh? by Booker · 2008-11-30 08:50 · Score: 2, Informative

XFS is also nice, but the lack of a proper userspace fsck has turned me away there.
Eh? Man xfs_repair(8)
Just because it's not called "fsck" (and not run at boot time) does not mean that the functionality is not there when you need it.
A crash does not mean you need to run fsck; that is why you pay the price for the journaling overhead, right? When xfs detects errors at runtime, run xfs_repair, and bask in the glory of "a proper userspace fsck."