Slashdot Mirror


Is ext4 Stable For Production Systems?

dr_dracula writes "Earlier this year, the ext4 filesystem was accepted into the Linux kernel. Shortly thereafter, it was discovered that some applications, such as KDE, were at risk of losing files when used on top of ext4. This was diagnosed as a rift between the design of the ext4 filesystem and the design of applications running on top of ext4. The crux of the problem was that applications were relying on ext3-specific behavior for flushing data to disk, which ext4 was not following. Recent kernel releases include patches to address these issues. My questions to the early adopters of ext4 are about whether the patches have performed as expected. What is your overall feeling about ext4? Do you think is solid enough for most users to trust it with their data? Did you find any significant performance improvements compared to ext3? Is there any incentive to move to ext4, other than sheer curiosity?"

28 of 289 comments (clear)

  1. Risk Vs Benefits Analysis by eldavojohn · · Score: 5, Insightful

    Is ext4 Stable For Production Systems?

    Probably.

    Is there any incentive to move to ext4, other than sheer curiosity?

    Ok so I'm gussing production = income = your ass? Let me turn your question back to you by asking, "What is driving this need to move to ext4?" Because so far, all you've told me is that you are considering risking your ass for sheer curiosity.

    I may be grossly misinformed but that is how the question sounds to me. And by "your ass" I don't mean oh-no-we-had-a-service-outage-for-five-minutes ... no, we could have a customer on the phone saying, "You mean to tell me that the modifications being made to my site for the past 24 hours are gone?!"

    If it ain't broke, don't fix it!

    I don't know about you but I'm too busy dealing with shit like this than to ponder new potential problems I can put into play.

    Look through this page for a rough comparison of ext4 with other file systems. There's a better list of features for ext4 here that will tell you why you might need to switch to it. It is backward compatible with ext3 and ext2 so moving to it may be trivial. If you're dealing with more than 32000 subdirectories or need to partition some major petabytes/exobytes then you might not have a choice. Some of these benefits are probably not risking your ass for but if there's a business need that cannot be overcome any easier way then back your shit up and do rigorous testing before you go live with it. If you're using Slashdot to feel out if the majority of users scream OMGNOES so you don't waste your time doing that, then that's fine. Just don't do this if you don't have to.

    I tell you what, there's a $288 desktop computer at Dell today that you can buy, put ext4 on and your OS of choice and your application(s) and whipping boy it into next century without risking anything. Where I work we have two servers in addition to our production servers. I don't think this is an uncommon scheme so if you have a development server, throw it on there and poke it with a stick. Then move it to the testing server and let your testers grape it for two weeks. Then you'll know.

    --
    My work here is dung.
    1. Re:Risk Vs Benefits Analysis by Joce640k · · Score: 4, Insightful

      > If it ain't broke, don't fix it!

      This.

      --
      No sig today...
  2. Ye by identity0 · · Score: 5, Funny

    I've been running ext4 on my system and everything's fi

    1. Re:Ye by dov_0 · · Score: 4, Interesting

      I've been running ext4 for / , but left ext3 for /home where any KDE apps I run could fudge writes. No problems at all.

      --
      sudo mount --milk --sugar /cup/tea /mouth /etc/init.d/relax start
    2. Re:Ye by TCM · · Score: 4, Insightful

      So you used the "riskier" fs for / where you don't actually need the features it provides and used the "more stable" fs where features could actually be useful because app/fs developers couldn't agree on semantics?

      Only on Linux...

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
  3. Wrong question by AmiMoJo · · Score: 5, Insightful

    You are asking the wrong question. Ext4 does not need fixing, the apps do.

    Are your apps patched yet?

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    1. Re:Wrong question by QuoteMstr · · Score: 5, Interesting

      Face it: your side lost. "fsync everywhere" is an infeasible, untenable, and useless position to take.

      fsync-on-rename creates a much better environment for application developers and users alike. The Right Thing happens by default, and I maintain that nobody actually wants the unsafe rename behavior. Allowing an application "choice" in this respect is a red herring.

      The only improvement I'd make it to flush the file involves on every rename, not just renames that happen to overwrite an existing file. Under the current scheme, an application doing the write-close-rename to replace a file will still be put in a bind if the file to write doesn't exist yet. (i.e., you can still end up with a zero-length file where no such file ever existed on a running system)

    2. Re:Wrong question by k8to · · Score: 5, Insightful

      There was no single loser here.

      Ext4 should handle the case gracefully, but the apps will fail on other filesystems, and they *will* be run on those filesystems, so they should fix the bugs.

      --
      -josh
    3. Re:Wrong question by nwanua · · Score: 4, Interesting

      Wha....? Are you seriously suggesting that applications/utilities need to be patched to deal with faulty (yes, faulty) filesystem semantics? For _every_ single filesystem they might encounter? The whole point behind a filesystem layer is to present a unified view of files to the user layer regardless of physical media or driver quirks.

      The point is really that ext4 is/was broken, and IMO, any filesystem requiring patches to applications in order not to lose data is no filesystem at all. It's unbelievable (despite the technical benefits of ext4) that this would even be up for consideration.

    4. Re:Wrong question by blueg3 · · Score: 5, Informative

      The problem is that some applications assume a behavior that is not supported by the POSIX definitions (the guarantees provided by the OS functions they're calling). However, it happens to be the behavior on existing filesystems and happens to be convenient. Now a new filesystem comes along and sticks to the POSIX definitions but does not follow this behavior. Application breaks, people complain.

      As a simplified example, imagine you create file B, then delete file A. Existing filesystems happen to do this in order, so you always have at least one of A or B. (If the system crashed partway through, you might have both A and B.) Your application fails if neither A nor B is present. POSIX doesn't require that the operations be performed in order. New filesystem comes along and sometimes does them in the reverse order, so if the system crashes at the wrong time, neither A nor B is left on the filesystem.

    5. Re:Wrong question by icebike · · Score: 4, Interesting

      Face it: your side lost. "fsync everywhere" is an infeasible, untenable, and useless position to take.

      And had it been enforced, as soon as all developers went thru and added the fsync calls everywhere it would have become necessary for file system maintainers to no-op fsync calls in order to regain any approximation of prior performance.

      Flushing "one file" is not always sufficient. Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. And perhaps the higher level directory as well.

      --
      Sig Battery depleted. Reverting to safe mode.
    6. Re:Wrong question by Anonymous Coward · · Score: 5, Insightful

      Only on Linux is it the user's fault that apps have data loss because the Linux kernel people changed filesystem semantics. At least Microsoft takes some responsibility for their mistakes :-/

      I did follow the ext4 debate. Here's my quick synopsis.

      • Linux kernel hacker discovers he can make a certain microbenchmark run 50% faster if he allows reordering of filesystem metadata writes ahead of filesystem data writes. Said hacker checks in code with a "now 50% faster!!!" message.
      • A few months later, users start discovering data corruption of KDE files. Specifically, a copy of A to A', ftruncate(A'), write(A'), rename(A' to A), host crash, causes the resulting file to contain A data and not A' data despite the well-known atomic "rename" that serves as a barrier.
      • Linux kernel hacker ignored problem as not-a-bug, since the apps didn't make use of fdatasync() / fsync() correctly, which (using Posix semantics) would have prevented data corruption. The detail to note here is that Posix doesn't actually say that rename is a write barrier for data and metadata, even though everyone would assume that it is a write barrier and ALL other filesystems have treated it as a write barrier. (And in my opinion as a professional systems programmer, this is an oversight in the Posix standard and not a desired behavior). So the linux kernel hacker is technically correct but has introduced a behavior that goes against all previous implementations.
      • Linux kernel hacker (and some Slashdot posters) attack KDE developers for being incompetent because they didn't read a sub-sub-sub clause of the Posix spec that (1) isn't mentioned in the man pages, (2) only gets read by kernel programmers anyway, and (3) is about two orders of magnitude more arcane than the average desktop app developer will ever read documentation.
      • 90% of users and 80% of programmers wonder what the hell fdatasync() and fsync() and the difference between data and metadata write barriers are, and why the default behavior is to corrupt data.
      • Linux kernel hacker promises to commit a few patches to fix the problem, so as not to break software that has worked perfectly fine for the past 10 years.
      • Those of us with experience realize that since said kernel hacker didn't believe this was a problem in the first place, the patches are as likely to be half-hearted band-aids as to actually increase data integrity guarantees. Programming has a long and proud history of making a quick fix to satisfy "management" (in this case, the Linux community) that makes one symptom go away and doesn't actually fix the underlying problem.
      • We get an Ask Slashdot asking if the problem actually got fixed, because 99% of us do not have the technical expertise to understand patches to the Linux filesystem to figure out if this actually got fixed.

      I do have a moral to this story. Filesystems have one cardinal, inviolable rule. DO NOT CORRUPT THE USER'S DATA. The guarantee is that if a user makes a read, the user will get back either good data OR an error (or explicit indication of no data). Google likes filesystems that lose data - but they don't ever give back corrupt search results. Ext3 can reorder writes - but defaults to a safe 5-second flush rate to keep the window of unexpected corruptions small. Ext4 ignored this rule and allows silent data corruption so that this filesystem can be the best at certain microbenchmarks, and instead of accepting responsibility, the kernel hacker in question blames everybody else.

      The greatest danger to Linux's success is not Microsoft. It's the hubris of many Linux developers, users, and advocates, who are too busy disavowing responsibility and blaming everybody else to fix real user's problems. (And yes, I'm a follower of the Raymond Chen philosophy)

    7. Re:Wrong question by Jane+Q.+Public · · Score: 5, Funny

      Huh? Buddy, this is Slashdot. There are lots of single losers here.

    8. Re:Wrong question by RiotingPacifist · · Score: 4, Interesting

      hmm i think most of them are but im still having problems with mv, seriosuly can we stop this bullshit, ext4 was clearly not working!
      If you cant rename a fucking file without risking total corruption of the file, at no point in renaming "settings-new" to "settings" should the file "settings" become unusable, What the fuck CAN kde4 do?

      --
      IranAir Flight 655 never forget!
    9. Re:Wrong question by QuoteMstr · · Score: 5, Insightful

      But even then you might end up with a zero byte file, if your system crashes between the close and rename call. (Or between write and close, or doing write, or well anytime after open).

      This statement is incorrect. Suppose you want to atomically replace the contents of file "foo". Your application will write a file "foo.tmp", then call rename("foo.tmp", "foo"). At no time on a running system does any process observe a file called "foo" that does not have either the new or the old contents, and this invariant holds true whether or not "foo", "foo.tmp", or any other file has been flushed to the disk.

      On the filesystem level, the kernel can actually write the contents of foo.tmp to disk whenever is convenient. The only constraint is that the on-disk name record for "foo" must be updated to point to the new data blocks from foo.tmp only after these data blocks have themselves been written to disk. That's the issue here: without that ordering guarantee, the kernel can write a file's name record before its data blocks. If the system crashes after the name record is written but before the data blocks are, what's observed on the recovered system is a zero-length file.

      That's the problem here: the kernel is conjuring out of thin air a zero-length file that never actually existed on a running system.

      Forcing applications to call fsync is not only an onerous burden on application developers, but it also reduces performance because it gives the filesystem less freedom than the much looser constraint on rename above.

      Bonus points for anyone who can give a realistic use case for DO_NOT_FLUSH_ON_CLOSE

      1. Application configuration files. You don't care that they hit the disk immediately, but only that when they do hit the disk, they're not corrupt
      2. /etc/mtab

      Flushing on close is the wrong thing: it far exceeds the minimum requirements that most applications actually need, which will substantially reduce performance.

    10. Re:Wrong question by QuoteMstr · · Score: 4, Insightful

      The problem is that some applications assume a behavior that is not supported by the POSIX definitions

      POSIX is a red herring here. It covers the behavior of a running system, and makes no guarantees about atomicity or durability following a crash. After a crash and as far as POSIX goes, it's perfectly legitimate to overwrite the entire disk with hentai. Every crash recovery technique goes beyond POSIX because POSIX says nothing about crashes.

      POSIX doesn't require that the operations be performed in order

      It most certainly does! On a running system, if you rename B over A, at no point does any process on the system observe a file called "A" that does not have either the contents of the old A or the contents of B. THIS ATOMICITY IS A FUNDAMENTAL POSIX GUARANTEE.

      Filesystems should do their best to honor this guarantee (which always applies on a running system, remember) even when the system crashes. Filesystems don't have to do that according to POSIX. Instead, they should do it because it's a sane thing to do, and doesn't violate anything POSIX guarantees. POSIX is not the arbiter of what a good system should be. It's perfectly reasonable to make guarantees that go beyond POSIX, and every real-world operating system does precisely that. POSIX guarantees are necessary but insufficient for a reasonable system in 2009.

  4. Re:I think it's "safe enough" by eldavojohn · · Score: 5, Funny

    I moved to ext4 as soon as it became available. I haven't had any problems thusfar (no data loss, etc), and the increased speed is noticable. So - in the opinion of a very casual Linux user - I would say that yes, it's "okay." I'm not sure I'd trust it with anything super serious, though. I could be the only one without any problems, after all. As always, you should tip-toe around anything bleeding-edge.

    Yeah, man, it's ok go ahead and flip your entire corporation's servers to ext4 over this weekend. A Slashdot user named buttfscking just said it is "safe enough."

    --
    My work here is dung.
  5. Um, yes, it's called fsck. by dandaman32 · · Score: 5, Informative

    I'm using ext4 on an encrypted partition on my tiny X41 tablet. The hard disk is 5400RPM IIRC, so when Ubuntu decides to run fsck due to a scheduled run or an unclean shutdown after a certain bug manifests itself, I don't have to sit there for 10 minutes or more waiting for fsck to run. That for me and many other casual users is probably the biggest advantage of ext4.

    Does a laptop count as production? In the eyes of an everyday user, yes. My laptop is very much "production" IMHO, and I trust ext4 enough to not magically make all my school assignments disappear.

    Digressing a bit, I haven't seen any of the data loss either, though I use GNOME and not KDE. I do think that if an application relies on specific undocumented behavior, that the application should change, not the filesystem driver. It's acceptable that the kernel developers are doing their best to get temporary workarounds into place, but the permanent solution is to fix the applications so they don't depend on undocumented behavior.

  6. ext4 is buggy by hamanu · · Score: 4, Interesting

    Well, the fsck times are really fast compared to ext3, and thank god, because EVERY time I reboot it requires an fsck, complaining about group descriptor checksums. Even if I unmount my ext4 filesystem and remount it without rebooting it gets all fscked up. I have a 3TB ext4 fs on LVM on RAID, that was NOT converted from ext3, but built on brand new drives. My similar ext3 filesystem has had so such problems.

    ext4 takes about 7 minutes to fsck, ext3 took hours. I hope they fix this soon.

    --
    every _exit() is the same, but every clone() is different.
    1. Re:ext4 is buggy by msuarezalvarez · · Score: 4, Informative

      Maybe you should do something about whatever the cause for the constance fsck'ing is. You do realize it is quite abnormal to have a system have errors at each remount, don't you?

    2. Re:ext4 is buggy by TCM · · Score: 5, Insightful

      But he uses R-A-I-D! R-A-I-D magically makes data bulletproof and immune to disaster as we all know.

      Seriously, running a 3TB RAID with a buggy fs and applauding faster fsck times instead of wondering why the fs gets fucked up constantly must be the peak of idiocy.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
  7. It's a good file system. by 3vi1 · · Score: 4, Interesting

    I was one of the people that spoke loudly when Ext4 caused 0-byte file corruption.

    While I don't entirely agree that it's just "an application issue", because apps that work fine on every other filesystem should not need to be re-written specifically for Ext4, I am pleased at the work the devs have done to work around the problems. The kernel patches have eradicated the issues I had with corruption, and the performance is still great.

    I never did official benchmarking to determine the extent, but my perception is that there's a noticeable performance increase when using Ext4 instead of Ext3.

    If I were building a production server, I may think twice and just go with Ext3... unless the app would *greatly* benefit from Ext4. However, for a desktop system, I think Ext4 is a very good choice and ready for primetime.

  8. Theodore Ts'o: Donâ(TM)t fear the fsync! by sirdude · · Score: 5, Informative

    After reading the comments on my earlier post, Delayed allocation and the zero-length file problem as well as some of the comments on the Slashdot story as well as the Ubuntu bug, itâ(TM)s become very clear to me that there are a lot of myths and misplaced concerns about fsync() and how best to use it. I thought it would be appropriate to correct as many of these misunderstandings about fsync() in one comprehensive blog posting.

    http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/

    FYI, Ts'o is the ext4 maintainer.

  9. Re:I think it's "safe enough" by BrokenHalo · · Score: 4, Informative

    I haven't had any problems thusfar (no data loss, etc)

    How do you know? Do you do md5sums on every file? Most admins I've come across don't seem to, and it could be months or years before you find out, in which case any loss might easily end up outside your backup cycle.

  10. We had this problem by xiox · · Score: 4, Interesting

    Our 8TB raid system would get trashed after copying data onto it (group descriptor checksums on fsck). It looks like it was an ext4 bug. They fixed it about a week or two ago, here. Maybe it will get in your kernel soon. I'm not going to start ext4 on any production system for at least 6 months I think now.

  11. Re:EXT4 is not broken? by Jurily · · Score: 4, Insightful

    It's working exactly as designed. It's the applications that need fixing, no?

    Does it matter whose fault it is when users are losing config files? It worked fine before, and now one of my basic expectations concerning Linux is broken: that no matter what happens short of hardware failure, I will not lose the files I already have. We're disappointed, and pointing fingers does not help.

  12. Re:EXT4 is not broken? by Ed+Avis · · Score: 4, Interesting

    The point is that you have expressed all sorts of fear about ext4 - oh no, I'm not letting it near my production boxes - but you have not applied the same standard to the applications that trashed their config files when run on ext4. Even though, strictly speaking, it is the applications that are buggy. You should be equally enthusiastic about getting rid of KDE and any other software that trashes configuration files; otherwise it looks like you are playing favourites and blaming ext4 in order to overlook the bugs in the apps you're attached to.

    --
    -- Ed Avis ed@membled.com
  13. most apps already did the 2nd; still failed by Trepidity · · Score: 4, Informative

    KDE did already do the 2nd (what you list as correct), and most developers assumed that this was sufficient to keep the files in a consistent state, due to rename() being atomic. The problem is the sync issue you mention afterwards: the failure mode being encountered was that the rename() executed instantly to clobber the old file, while the new file still contained no data on disk. If the machine crashed in the window between the rename() and the sync, you have neither the old nor the new file.

    The main thing being discussed with KDE (and others) is how to fix this. Adding a sync() after every config update totally destroys performance, if you might update hundreds of small config files semi-frequently. See, for example, this discussion among Python folks for pros/cons of various options.