Is ext4 Stable For Production Systems?
dr_dracula writes "Earlier this year, the ext4 filesystem was accepted into the Linux kernel. Shortly thereafter, it was discovered that some applications, such as KDE, were at risk of losing files when used on top of ext4. This was diagnosed as a rift between the design of the ext4 filesystem and the design of applications running on top of ext4. The crux of the problem was that applications were relying on ext3-specific behavior for flushing data to disk, which ext4 was not following. Recent kernel releases include patches to address these issues. My questions to the early adopters of ext4 are about whether the patches have performed as expected. What is your overall feeling about ext4? Do you think is solid enough for most users to trust it with their data? Did you find any significant performance improvements compared to ext3? Is there any incentive to move to ext4, other than sheer curiosity?"
Face it: your side lost. "fsync everywhere" is an infeasible, untenable, and useless position to take.
fsync-on-rename creates a much better environment for application developers and users alike. The Right Thing happens by default, and I maintain that nobody actually wants the unsafe rename behavior. Allowing an application "choice" in this respect is a red herring.
The only improvement I'd make it to flush the file involves on every rename, not just renames that happen to overwrite an existing file. Under the current scheme, an application doing the write-close-rename to replace a file will still be put in a bind if the file to write doesn't exist yet. (i.e., you can still end up with a zero-length file where no such file ever existed on a running system)
Well, the fsck times are really fast compared to ext3, and thank god, because EVERY time I reboot it requires an fsck, complaining about group descriptor checksums. Even if I unmount my ext4 filesystem and remount it without rebooting it gets all fscked up. I have a 3TB ext4 fs on LVM on RAID, that was NOT converted from ext3, but built on brand new drives. My similar ext3 filesystem has had so such problems.
ext4 takes about 7 minutes to fsck, ext3 took hours. I hope they fix this soon.
every _exit() is the same, but every clone() is different.
I was one of the people that spoke loudly when Ext4 caused 0-byte file corruption.
While I don't entirely agree that it's just "an application issue", because apps that work fine on every other filesystem should not need to be re-written specifically for Ext4, I am pleased at the work the devs have done to work around the problems. The kernel patches have eradicated the issues I had with corruption, and the performance is still great.
I never did official benchmarking to determine the extent, but my perception is that there's a noticeable performance increase when using Ext4 instead of Ext3.
If I were building a production server, I may think twice and just go with Ext3... unless the app would *greatly* benefit from Ext4. However, for a desktop system, I think Ext4 is a very good choice and ready for primetime.
I've been running ext4 for / , but left ext3 for /home where any KDE apps I run could fudge writes. No problems at all.
sudo mount --milk --sugar
Wha....? Are you seriously suggesting that applications/utilities need to be patched to deal with faulty (yes, faulty) filesystem semantics? For _every_ single filesystem they might encounter? The whole point behind a filesystem layer is to present a unified view of files to the user layer regardless of physical media or driver quirks.
The point is really that ext4 is/was broken, and IMO, any filesystem requiring patches to applications in order not to lose data is no filesystem at all. It's unbelievable (despite the technical benefits of ext4) that this would even be up for consideration.
Face it: your side lost. "fsync everywhere" is an infeasible, untenable, and useless position to take.
And had it been enforced, as soon as all developers went thru and added the fsync calls everywhere it would have become necessary for file system maintainers to no-op fsync calls in order to regain any approximation of prior performance.
Flushing "one file" is not always sufficient. Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. And perhaps the higher level directory as well.
Sig Battery depleted. Reverting to safe mode.
hmm i think most of them are but im still having problems with mv, seriosuly can we stop this bullshit, ext4 was clearly not working!
If you cant rename a fucking file without risking total corruption of the file, at no point in renaming "settings-new" to "settings" should the file "settings" become unusable, What the fuck CAN kde4 do?
IranAir Flight 655 never forget!
Our 8TB raid system would get trashed after copying data onto it (group descriptor checksums on fsck). It looks like it was an ext4 bug. They fixed it about a week or two ago, here. Maybe it will get in your kernel soon. I'm not going to start ext4 on any production system for at least 6 months I think now.
The point is that you have expressed all sorts of fear about ext4 - oh no, I'm not letting it near my production boxes - but you have not applied the same standard to the applications that trashed their config files when run on ext4. Even though, strictly speaking, it is the applications that are buggy. You should be equally enthusiastic about getting rid of KDE and any other software that trashes configuration files; otherwise it looks like you are playing favourites and blaming ext4 in order to overlook the bugs in the apps you're attached to.
-- Ed Avis ed@membled.com