On the State of Linux File Systems
kev009 writes to recommend his editorial overview of the past, present and future of Linux file systems: ext2, ext3, ReiserFS, XFS, JFS, Reiser4, ext4, Btrfs, and Tux3. "In hindsight it seems somewhat tragic that JFS or even XFS didn't gain the traction that ext3 did to pull us through the 'classic' era, but ext3 has proven very reliable and has received consistent care and feeding to keep it performing decently. ... With ext4 coming out in kernel 2.6.28, we should have a nice holdover until Btrfs or Tux3 begin to stabilize. The Btrfs developers have been working on a development sprint and it is likely that the code will be merged into Linus's kernel within the next cycle or two."
The article states that ext4 was a Bull project; and that is not correct.
The Bull developers are one of the companies involved with the ext4 development, but certainly by no means were they the primary contributers. A number of the key ext4 advancements, especially the extents work, was pioneered by the Clusterfs folks, who used it in production for their Lustre filesystem (Lustre is a cluster filesystem that used ext3 with enhancements which they supported commercially as an open source product); a number of their enhancements went on to become adopted as part of ext4. I was the e2fsprogs maintainer, and especially in the last year, as the most experienced upstream kernel developer have been responsible for patch quality assurance and pushing the patches upstream. Eric Sandeen from Red Hat did a lot of work making sure everything was put together well for a distribution to use (there are lots of miscellaneous pieces for full filesystem support by a distribution, such as grub support, etc.). Mingming Cao form IBM did a lot of coordination work, and was responsible for putting together some of the OLS ext4 papers. Kawai-san from Hitachi supplied a number of critical patches to make sure we handled disk errors robuestly; some folks from Fujitsu have been working on the online defragmentation support. Aneesh Kumar from IBM wrote the 128->256 inode migration code, as well as doing a lot of the fixups on the delayed allocation code in the kernel. Val Henson from Red Hat has been working on the 64-bit support for e2fsprogs in the kernel. So there were a lot of people, from a lot of different companies, all helping out. And that is one of the huge strengths of ext4; that we have a large developer base, from many different companies. I believe that this wide base of developer is support is one of the reasons why ext3 was more succesful, than say, JFS or XFS, which had a much smaller base of developers, that were primarily from a single employer.
A cute FA in some ways, but bereft of content. Wish there was something to see here, like comparisons regarding integrity, access costs, evolution from JFS and Andrews journaled FS, etc. No real meat (with apologies to the vegetarians out there). Just a lightweight historical analysis with some glib suggestions of current adaptations.
---- Teach Peace. It's Cheaper Than War.
Ext4 supports up to 128 megabytes per extent, assuming you are using a 4k blocksize. On architectures where you can use a 16k page size, ext4 would be able to support 2^15 * 16k == 512 megs per extent. Given that you can store 341 extent descriptors in a 4k block, and 1,365 extent descriptors in a 16k block, this is plenty...
Oh, by the way... forgot to mention. If you are looking for benchmarks, there are some very good ones done by Steven Pratt, who does this sort of thing for a living at IBM. They were intended to be in support of the btrfs filesystem, which is why the URL is http://btrfs.boxacle.net/. The benchmarks were done in a scrupulously fair way; the exact hardware and software configurations used are given, and multiple workloads are described, and the filesystems are measured multiple times against multiple workloads. One interesting thing from these benchmarks is that sometimes one filesystem will do better at one workload and at one setting, but then be disastrously worse at another workload and/or configuration. This is why if you want to do a fair comparison of filesystems, it is very difficult in the extreme to really do things right. You have to do multiple benchmarks, multiple workloads, multiple hardware configurations, because if you only pick one filesystem benchmark result, you can almost always make your filesystem come out the winner. As a result, many benchmarking attempts are very misleading, because they are often done by a filesystem developer who consciously or unconsciously, wants their filesystem to come out on top, and there are many ways of manipulating the choice of benchmark or benchmark configuration in order to make sure this happens.
As it happens, Steven's day job as a performance and tuning expert is to do this sort of benchmarking, but he is not a filesystem developer himself. And it should also be noted that although some of the BTRFS numbers shown in his benchmarks are not very good, btrfs is a filesystem under development, which hasn't been tuned yet. There's a reason why I try to stress the fact that it takes a long time and a lot of hard work to make a reliable, high performance filesystem. Support from a good performance/benchmarking team really helps.
What Sun needs to do is release ZFS under a proper license so we can finally have 1 unified filesystem. Yes, we can use it under FUSE, but this brings unnecessary overhead and problems. It will be nice when we can transport disks around, similar to fat(32), and not have to worry about whether another OS will be able to read it or not. On top of that, CRC block checksumming, high performance, smb/nfs/iscsi support integrated, Volume AND partition manager.
Come on Sun! Are you listening??
Just my 2 bits. As a user of Linux in a software/algorithm context, my personal beefs with ext3 / the current kernel line are:
1) IO priority isn't linked to to process priority, or at least, not in a decent manner. it is all too easy to lock up the system with one process that is IO heavy (or a multiple of these) -- hurting even high priority processes. As the IO call is handled by a system level (handling buffering, etc.) -- it garners a relatively high priority (possibly falling under the RT scheduler) and as a result IO heavy processes can choke other processes.
2) ext3+nfs simply sucks with very large amount of files. I used to routinely have directories with 500,000 files (very easy to reach such amounts with a cartesian multiplication of options). The result is simply downright appalling performance.
We're checksumming free disk space. That's dumb.
It makes RAID rebuilds needlessly slow.
We're unable to adjust redundancy according to
the value that we place on our data. Everything
from the root directory to the access time stamps
gets the same level of redundancy.
The on-disk structure of RAID (the lack of it!)
prevents reasonable recovery. We can handle a
disk that disappears, but not one that gets
some blocks corrupted. We can't even detect it
in normal use; that requires reading all disks.
We have extremely limited transactional ability.
All we get for transactions is a write barrier.
There is no way to map from RAID troubles (not
that we'd detect them) to higher-level structures.
With an integrated system, we could do so much
better. Sadly, it's blocked by an odd sort of
kernel politics. Radical change is hard. Giving
of the simplicity of a layered approach is hard,
even when obviously inferior. There is this idea
that every new kernel component has to fit into
the existing mold, even if the mold is defective.
..called TLDRFS It simply ignores any files larger than 64KB.
You seem very knowledgeable regarding filesystems in general.
Dude, it should have been a hint when tytso wrote,
I was the e2fsprogs maintainer, and especially in the last year, as the most experienced upstream kernel developer have been responsible for patch quality assurance and pushing the patches upstream.
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
Never have I been so happy and so angry in such a short period of time. I salute you, yet still shake my fist angrily in your general direction.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Hans was a jerk who has difficult to work with, and now he is a convicted murderer. That doesn't change the fact that Reiser4 as is may be the best desktop file system for Linux users, even with plenty of room for improvement.
There are filesystems in development like Btrfs and Tux3 that look promising, but why should Reiser4 be abandoned? It is GPL. Anyone can pick it up and maintain it, or fork it.
Does anyone know anything about the future of Reiser4?
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Ah, anyone can edit wikipedia. I say he's just showing off to get the chicks!
Sad to see JFS being overlooked so. While it may not have the postmodern features to compete in the wake of JFS, it's still in many cases the best current filesystem for linux. It's remarkably crashproof, has the lowest CPU loading of any of {ext3 jfs xfs reiser3}, good all-round performance (generally either first or second in benchmarks) and is fast at deleting big files. I haven't used anything else in a couple of years - I used to put reiser3 on /var, but got fed up with its crash intolerance. It's sad to see jfs so overlooked, because at least until btrfs or tux3 come out it's arguably the best option available.
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.
Over and above this, it'll need a new name. I know it doesn't make one iota technical difference, but people are fussy about such things; change the name, and people don't care if it was developed by fiends. Keep it and people will find excuses to edge away and it'll wither on the vine.
The Volkswagen was a runaway success despite its Nazi origins, but had it been named the "Hitlerwagen", things would have probably turned out a lot differently.
I don't think that there is a 100% "safe and accurate" way to display the file type, assuming you are depending on a possibly-hostile file to supply the information in the first place. There are, however, a few things that an operating system can do to make life safer for users:
1) Clearly mark executable files. Have some visual indication whether a file is set to be executable (this, of course, assumes that your operating system has an execute bit; if it doesn't, that's a bigger problem). This indication should be consistent, universal, and impossible to override with metadata or custom icons. It should apply both to CLI shells and GUIs. (Although not necessarily in the exact same way; however my personal preference for such an indicator, which is putting the file name in bold, would work both in a GUI and CLI environment.)
2) Don't use the same action to execute as to open. Using the same action (the double-click) both to "run" and to "open" -- which are two very different actions -- is probably responsible for the vast majority of user-propagated malware today. I would love to see an operating system rigorously enforce a separate 'run' action, so that a user clicking on what appears or claims to be a data file (intending to open an application and read that file) could not accidentally execute it.
3) Break the filesystem into 'data' and 'executable' sections, and bar files on the 'data' sections from being marked as executable under any circumstances. I don't think this would be as effective as #2, but it would probably involve less user retraining. In order for content to be executed, it would have to be copied or installed onto the executable partition (which in normal operation could even be mounted read-only).
You could do all of this with the data-type indicator as part of the file name, or as a separate piece of metadata; it doesn't really matter. There's no 'safety' advantage to doing it either way, it's just that keeping it in the file name is considered very ugly by a lot of people (myself included). I'm personally a fan of the way that the Mac used to do it, with a two part code (one for the file's actual type, the other for the application that either created it or should be used to open it), except that unlike the Mac, it should be easily editable by the user, and a lot of standardization and interoperability challenges would have to be solved. I'll be surprised if I see the filename.ext thing die in my lifetime, honestly. It's just too entrenched.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Repeato ad absurdium...
What is that gibberish supposed to mean? Christ, I hate mock-Latin. If you want a fancy-sounding term referring to repeating something again and again, use ad nauseam.
Wouldn't it be logical to assume a filesystem developer has an idea on what the workload and hardware will be like _before_ writing his filesystem, then picking a benchmark that suits his ideas on what a filesystem is supposed to do?
No, that would be illogical, unless again they were trying to craft bullshit benchmarks. The developer does not know how I will use the filesystem, and so any such benchmark is not useful to me. I also want to know how well the filesystem will perform if I have to perform some new task on it.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"