Slashdot Mirror


Apps That Rely On Ext3's Commit Interval May Lose Data In Ext4

cooper writes "Heise Open posted news about a bug report for the upcoming Ubuntu 9.04 (Jaunty Jackalope) which describes a massive data loss problem when using Ext4 (German version): A crash occurring shortly after the KDE 4 desktop files had been loaded results in the loss of all of the data that had been created, including many KDE configuration files." The article mentions that similar losses can come from some other modern filesystems, too. Update: 03/11 21:30 GMT by T : Headline clarified to dispel the impression that this was a fault in Ext4.

15 of 830 comments (clear)

  1. Not a bug by casualsax3 · · Score: 5, Informative
    It's a consequence of not writing software properly. Relevant links later in the same comment thread for those who don't might otherwise miss them:

    https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45

    https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54

    1. Re:Not a bug by Anonymous Coward · · Score: 5, Informative

      Quoting T'so:

      "The final solution, is we need properly written applications and desktop libraries. The proper way of doing this sort of thing is not to have hundreds of tiny files in private ~/.gnome2* and ~/.kde2* directories. Instead, the answer is to use a proper small database like sqllite for application registries, but fixed up so that it allocates and releases space for its database in chunks, ...

      Linux reinvents windows registry?
      Who knows what they will come up with next.

    2. Re:Not a bug by davecb · · Score: 4, Informative

      Er, actually it removes the previous data, then waits to replace it for long enough that the probability of noticing the disappearance approaches unity on flaky hardware (;-))

      --dave

      --
      davecb@spamcop.net
    3. Re:Not a bug by OeLeWaPpErKe · · Score: 5, Informative

      Let's not forget that the only consequence of delayed allocation is the write-out delay changing. Instead of data being "guaranteed" on disk in 5 seconds, that becomes 60 seconds.

      Oh dear God, someone inform the president ! Data that is NEVER guaranteed to be on disk according to spec is only guaranteed on disk after 60 seconds.

      You should not write your application to depend on filesystem-specific behavior. You should write them to the standard, and that means fsync(). No call to fsync, look it up in the documentation (man 2 write).

      The rest of what Ted T'so is saying is optimization, speeding up the boot time for gnome/kde, it is not necessary for correct workings.

      Please don't FUD.

      You know I'll look up the docs for you :

      (quote from man 2 write)

      NOTES
                    A successful return from write() does not make any guarantee that data has been committed to disk. In fact, on some buggy implementations, it does not even guarantee
                    that space has successfully been reserved for the data. The only way to be sure is to call fsync(2) after you are done writing all your data.

                    If a write() is interrupted by a signal handler before any bytes are written, then the call fails with the error EINTR; if it is interrupted after at least one byte has
                    been written, the call succeeds, and returns the number of bytes written.

      That brings up another point, almost nobody is ready for the second remark either (write might return after a partial write, necessitating a second call)

      So the normal case for a "reliable write" would be this code :

      size_t written = 0;
      int r = write(fd, &data, sizeof(data))
      while (r >= 0 && r + written sizeof(data)) {
              written += r;
              r = write(fd, &data, sizeof(data));
      }
      if (r 0) { // error handling code, at the very least looking at EIO, ENOSPC and EPIPE for network sockets
      }

      and *NOT*

      write(fd, data, sizeof(data)); // will probably work

      Just because programmers continuously use the second method (just check a few sf.net projects) doesn't make it the right method (and as there is *NO* way to fix write to make that call reliable in all cases you're going to have to shut up about it eventually)

      Hell, even firefox doesn't check for either EIO or ENOSPC and certainly doesn't handle either of them gracefully, at least not for downloads.

    4. Re:Not a bug by Jurily · · Score: 4, Informative

      It just loses recent data if your system crashes before it has flushed what it's got in RAM to disk.

      No, that's the bug. It loses ALL data. You get 0 byte files on reboot.

    5. Re:Not a bug by caerwyn · · Score: 4, Informative

      You're right. The correct thing to do is to *always* call fsync() when you need a data guarantee, *regardless* of which FS you're on. The fact that not doing it in the past hasn't caused problems isn't the problem- those calls are the correct way of handling things.

      --
      The ringing of the division bell has begun... -PF
    6. Re:Not a bug by dmiller · · Score: 4, Informative

      You are doing it wrong; permanently failing on recoverable EINTR and EAGAIN errors. See here for how to do it right.

    7. Re:Not a bug by Qzukk · · Score: 4, Informative

      A file system should take my data buffer, and after saying "Ok, I got it"

      There's your problem, you didn't even bother to ask if it got it, you just threw a ton of data into the file descriptor and closed it, now didn't you. And you want me on thedailywtf?

      But lets back up here, because there's more than just people too lazy to call fsync() in order to ask the file system to write the data to the disk and say "Ok, I got it".

      All that stuff about creating a backup copy and doing this and that, has to happen inside the file system.

      The filesystem does exactly what you tell it to do. If you don't want it to make a zero byte file, then DON'T USE O_TRUNC OR *truncate() TO EMPTY YOUR FILE. Make a new file, fill it up, rename it over the other file. Don't assume that in just a few instructions, you're going to be filling it back up with new data, because those instructions may never arrive.

      You don't like it? Try and convince people that (open file, erase all the data in it, do some stuff, write some data, do some more stuff, write some more data, write data to disk, close file) should be an uninterruptable atomic operation. You want a versioning filesystem? Take your pick.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
  2. Re:Bull by Anonymous Coward · · Score: 5, Informative

    This is NOT a bug. Read the POSIX documents.

    Filesystem metadata and file contents is NOT required to be synchronous and a sync is needed to ensure they are syncronised.

    It's just down to retarded programmers who assume they can truncate/rename files and any data pending writes will magically meet up a-la ext3 (which has a mount option which does not sync automatically btw).

    RTFPS (Read The Fine POSIX Spec).

  3. Re:Bull by pc486 · · Score: 5, Informative

    Ext3 doesn't write out immediately either. If the system crashes within the commit interval, you'll lose whatever data was written during that interval. That's only 5 seconds of data if you're lucky, much more data if you're unlucky. Ext4 simply made that commit interval and backend behavior different than what applications were expecting.

    All modern fs drivers, including ext3 and NTFS, do not write immediately to disk. If they did then system performance would really slow down to almost unbearable speeds (only about 100 syncs/sec on standard consumer magnetic drives). And sometimes the sync call will not occur since some hardware fakes syncs (RAID controllers often do this).

    POSIX doesn't define flushing behavior when writing and closing files. If your applications needs data to be in NV memory, use fsync. If it doesn't care, good. If it does care and it doesn't sync, it's a bad application and is flawed, plain and simple.

  4. Re:Excuses are false. This is a severe flaw. by Anonymous Coward · · Score: 4, Informative

    Delayed writes should lose at most any data between commit and actual write to disk. Ext4 loses the complete files (even their content before the write).

    You seem to misunderstand that's *exactly* what is happening.

    KDE is *DELETING* all of its config files, then writing them back out again in two operations.

    Three states now exist, the 'old old' state, where the original file existed, the 'old' state, where it is empty, and the 'new' state where it is full again.

    The problem is getting caught between step #2 and step #3, which on ext3 was mostly mitigated by the write delay being only 5 seconds.

    KDE is *broken* to delete a file and expect it to still be there if it crashes before the write.

  5. man 2 fsync by Nicolas+MONNET · · Score: 5, Informative

    The filesystem doesn't guarantee anything is written until you've called fsync and it has returned.

  6. Re:To Anonymous Coward: by Bronster · · Score: 4, Informative

    mount -o sync. Enjoy your slow returns and strictly ordered writes.

  7. Re:Bull by LWATCDR · · Score: 4, Informative

    It isn't a flaw. It is documented and the programmers didn't follow the docs. There is a specific command called fsync to flush the buffers to prevent the problem.
    In fact here is a link to that call http://www.opengroup.org/onlinepubs/007908799/xsh/fsync.html

    Yes if we had a prefect world we would have instant IO but we do not. The flaw is in the application plan and simple.
    They didn't use the api properly and it really is just that simple.

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  8. Re:Excuses are false. This is a severe flaw. by Tadu · · Score: 5, Informative

    KDE is *broken* to delete a file and expect it to still be there if it crashes before the write.

    Nope, it writes a new file and then renames it over the old file, as rename() says it is an atomic operation - you either have the old file or the new file. What happens with ext4 is that you get the new file except for its data. While that may be correct from a POSIX lawyer pont of view, it is still heavily undesirable.