Is ext4 Stable For Production Systems?
dr_dracula writes "Earlier this year, the ext4 filesystem was accepted into the Linux kernel. Shortly thereafter, it was discovered that some applications, such as KDE, were at risk of losing files when used on top of ext4. This was diagnosed as a rift between the design of the ext4 filesystem and the design of applications running on top of ext4. The crux of the problem was that applications were relying on ext3-specific behavior for flushing data to disk, which ext4 was not following. Recent kernel releases include patches to address these issues. My questions to the early adopters of ext4 are about whether the patches have performed as expected. What is your overall feeling about ext4? Do you think is solid enough for most users to trust it with their data? Did you find any significant performance improvements compared to ext3? Is there any incentive to move to ext4, other than sheer curiosity?"
I'm using ext4 on an encrypted partition on my tiny X41 tablet. The hard disk is 5400RPM IIRC, so when Ubuntu decides to run fsck due to a scheduled run or an unclean shutdown after a certain bug manifests itself, I don't have to sit there for 10 minutes or more waiting for fsck to run. That for me and many other casual users is probably the biggest advantage of ext4.
Does a laptop count as production? In the eyes of an everyday user, yes. My laptop is very much "production" IMHO, and I trust ext4 enough to not magically make all my school assignments disappear.
Digressing a bit, I haven't seen any of the data loss either, though I use GNOME and not KDE. I do think that if an application relies on specific undocumented behavior, that the application should change, not the filesystem driver. It's acceptable that the kernel developers are doing their best to get temporary workarounds into place, but the permanent solution is to fix the applications so they don't depend on undocumented behavior.
If you worry about file corruption, I wouldn't touch XFS, that thing shredded files for me on every single unclean shutdown.
Maybe you should do something about whatever the cause for the constance fsck'ing is. You do realize it is quite abnormal to have a system have errors at each remount, don't you?
The problem is that some applications assume a behavior that is not supported by the POSIX definitions (the guarantees provided by the OS functions they're calling). However, it happens to be the behavior on existing filesystems and happens to be convenient. Now a new filesystem comes along and sticks to the POSIX definitions but does not follow this behavior. Application breaks, people complain.
As a simplified example, imagine you create file B, then delete file A. Existing filesystems happen to do this in order, so you always have at least one of A or B. (If the system crashed partway through, you might have both A and B.) Your application fails if neither A nor B is present. POSIX doesn't require that the operations be performed in order. New filesystem comes along and sometimes does them in the reverse order, so if the system crashes at the wrong time, neither A nor B is left on the filesystem.
http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
FYI, Ts'o is the ext4 maintainer.
how should the apps behave? write,rename is the best way to do what they want, if you cant trust the filesystem to rename a file (and not just not rename it but leave its metadata wrong so neither the new or original are in the correct place) then what sort of program are you going to be able to run?
IranAir Flight 655 never forget!
I haven't had any problems thusfar (no data loss, etc)
How do you know? Do you do md5sums on every file? Most admins I've come across don't seem to, and it could be months or years before you find out, in which case any loss might easily end up outside your backup cycle.
They never say "I've plugged this scanner in over 1000 times and it's never died!"
Speaking as a help desk tech, they say that alot. In fact, its always worked before is probably the single most common form of whining the caller's do.
Its particularly amusing when someone is complaining they've never had te replace a battery/toner cartridge before.
Liberte, Egalite, Fraternite (TM)
The apps don't fail on ufs.
--dave
davecb@spamcop.net
If the specification allows this kind of gray interpretation it should be clarified to resolve it forcibly either in favor of the FS or in favor of the app designers, but either way it is written to spec while the apps are not.
The specs are not remotely ambiguous: They are in favor of the FS. The problem is that app developers got lazy and wrote
//some write operations
//some write operations /foo/bar.new on startup. Isolation is achieved by locks, separate files, etc.
bar=open("/foo/bar", O_CREAT | O_WRONLY | O_TRUNC);
close(bar);
When the specs say they should write this (otherwise if the write operations don't make it to the disk for any reason the config file is truncated):
bar=open("/foo/bar.new", O_CREAT | O_WRONLY | O_TRUNC);
close(bar);
rename("/foo/bar.new", "/foo/bar");
Since the rename operation is atomic the config files are always in a consistent state and changes are atomic; if you need durability (per ACID) you add an O_SYNC to the flags (or follow every write with fsync(bar);) and check for the existence of a
Also interesting: unlike fsync(), rename() isn't a very intensive operation; the above code basically says to the system "make sure it's in a consistent state next time I look at it, but don't panic if it doesn't make it to disk at all, just make sure the old version is still there."
$ make available
KDE did already do the 2nd (what you list as correct), and most developers assumed that this was sufficient to keep the files in a consistent state, due to rename() being atomic. The problem is the sync issue you mention afterwards: the failure mode being encountered was that the rename() executed instantly to clobber the old file, while the new file still contained no data on disk. If the machine crashed in the window between the rename() and the sync, you have neither the old nor the new file.
The main thing being discussed with KDE (and others) is how to fix this. Adding a sync() after every config update totally destroys performance, if you might update hundreds of small config files semi-frequently. See, for example, this discussion among Python folks for pros/cons of various options.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Hey genius! It's called SARCASM! There is no need to insult me for sharing my experiences with ext4. I am not some idiot who thinks RAID is a backup either.
I am NOT happy with ext4, but in case you needed to know more about my system, my ext4 data is a MIRROR of the ext3 data, and I have a LTO-2 ultirum drive to backup my filesystem regularly. I have off-site backups in fireproof waterproof safes.
I also use e2image to get the metadata off the drive in case I need to reconstruct the contents.
every _exit() is the same, but every clone() is different.
I did, and the the cause of the fscks was...(drumroll)..ext4.
every _exit() is the same, but every clone() is different.
A couple of months ago i installed Ubuntu 9.01, which used ext4 by default. Running it, i experienced data loss for the first time since i moved from ext2 to ext3 quite a few years ago now. I've just changed back to ext3 - which has been rock solid for me since it first appeared in Redhat or whatever distro it was i was using back then.
There's no such thing as Ubuntu 9.01. I'm assuming you mean Ubuntu 9.04 (aka. "Jaunty"). If you installed that a few months ago, you installed it while it was still in pre-release status. It also uses ext3 by default, not ext4. See http://www.ubuntu.com/testing/jaunty/beta#Ext4%20filesystem%20support . where the Ubuntu team says "Ubuntu 9.04 Beta supports the option of installing the new ext4 file system. ext3 will remain the default filesystem for Jaunty, and we will consider ext4 as the default for the next release based on user feedback. There has been extensive discussion about the reliability of applications running on ext4 in the face of sudden system outages. Applications that use the conventional approach of writing data to a temporary file and renaming it to its final location will have their reliability expectations met in Ubuntu 9.04 beta; further discussion is ongoing in the kernel community."
The rename is precisely what is broken in EXT4!
Some corrections, although the sentiment is correct:
copy of A to A', ftruncate(A'), write(A'), rename(A' to A), host crash, causes the resulting file to contain A data and not A'
This is not what is wrong. If the file contained the old version of A it would be fine, this is the expected behavior. The problem is that the file contains some partially-written version of A' (usually a zero-length version).
Posix doesn't actually say that rename is a write barrier for data and metadata
Actually POSIX does say exactly that. The hole EXT4 weasels through is that POSIX says "anything can happen when the machine crashes".
apps didn't make use of fdatasync() / fsync() correctly
The apps *were* using these calls correctly, by not calling them. They are very slow and make guarantees that have nothing to do with the desired action, which is an atomic rename.
we can thank Linus & co for poisoning even the filesystem itself, which will get foisted on unknowing users...
Ext4 might have made it into the tree, but Linus hasn't made it a default. There are lots of things in the tree that are clearly marked as "experimental", and neither Linus nor anyone else expect or recommend that you use them in a production environment. If your distribution DOES make it a default, I would seriously suggest you find a different distro.