Apps That Rely On Ext3's Commit Interval May Lose Data In Ext4
cooper writes "Heise Open posted news about a bug report for the upcoming Ubuntu 9.04 (Jaunty Jackalope) which describes a massive data loss problem when using Ext4 (German version): A crash occurring shortly after the KDE 4 desktop files had been loaded results in the loss of all of the data that had been created, including many KDE configuration files." The article mentions that similar losses can come from some other modern filesystems, too. Update: 03/11 21:30 GMT by T : Headline clarified to dispel the impression that this was a fault in Ext4.
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54
This is NOT a bug. Read the POSIX documents.
Filesystem metadata and file contents is NOT required to be synchronous and a sync is needed to ensure they are syncronised.
It's just down to retarded programmers who assume they can truncate/rename files and any data pending writes will magically meet up a-la ext3 (which has a mount option which does not sync automatically btw).
RTFPS (Read The Fine POSIX Spec).
Its even WORSE than just being asynchronous:
EXT4 reproducably delays write ops, but commits journal updates concerning this write.
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
Ext3 doesn't write out immediately either. If the system crashes within the commit interval, you'll lose whatever data was written during that interval. That's only 5 seconds of data if you're lucky, much more data if you're unlucky. Ext4 simply made that commit interval and backend behavior different than what applications were expecting.
All modern fs drivers, including ext3 and NTFS, do not write immediately to disk. If they did then system performance would really slow down to almost unbearable speeds (only about 100 syncs/sec on standard consumer magnetic drives). And sometimes the sync call will not occur since some hardware fakes syncs (RAID controllers often do this).
POSIX doesn't define flushing behavior when writing and closing files. If your applications needs data to be in NV memory, use fsync. If it doesn't care, good. If it does care and it doesn't sync, it's a bad application and is flawed, plain and simple.
Delayed writes should lose at most any data between commit and actual write to disk. Ext4 loses the complete files (even their content before the write).
You seem to misunderstand that's *exactly* what is happening.
KDE is *DELETING* all of its config files, then writing them back out again in two operations.
Three states now exist, the 'old old' state, where the original file existed, the 'old' state, where it is empty, and the 'new' state where it is full again.
The problem is getting caught between step #2 and step #3, which on ext3 was mostly mitigated by the write delay being only 5 seconds.
KDE is *broken* to delete a file and expect it to still be there if it crashes before the write.
Whats wrong with "After a file is closed, its synced to disk"?!?
What, you want people to have to delay/stagger/coordinate their file closes in order to avoid overloading the filesystem? That is the wrong approach. close() just means that the application is done with the file. The sync calls are not a joke, they are there precisely for the reason that close() already has an antirely sensible but different semantics. Anybody that wants close also to sync can code it that way without problem. Anybody else probably does not want this behaviour in the first place.
This is not hidden in any way. A simple "man close" not warns of this, it also refers the reader to the fsync call. Anybody getting bitten by this did not no their homework.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
The filesystem doesn't guarantee anything is written until you've called fsync and it has returned.
mount -o sync. Enjoy your slow returns and strictly ordered writes.
It isn't a flaw. It is documented and the programmers didn't follow the docs. There is a specific command called fsync to flush the buffers to prevent the problem.
In fact here is a link to that call http://www.opengroup.org/onlinepubs/007908799/xsh/fsync.html
Yes if we had a prefect world we would have instant IO but we do not. The flaw is in the application plan and simple.
They didn't use the api properly and it really is just that simple.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Nope, it writes a new file and then renames it over the old file, as rename() says it is an atomic operation - you either have the old file or the new file. What happens with ext4 is that you get the new file except for its data. While that may be correct from a POSIX lawyer pont of view, it is still heavily undesirable.
Who modded this up? Jane Q. Public is completely clueless on this topic, but she manages to sound like she has an idea to fellow clueless moderators. She should be called out for the karma whoring ignoramus she is.
Some choice quotes from her on this thread.
Delayed allocation is like leading a moving target when shooting.
BadAnalogyGuy would be proud. Probably also worth mentioning that without delayed allocation, the system would be unbearably slow.
The longer you delay allocation after writing the journal (and Ext4 seems to take this to extremes), the more chance there is of something -- almost anything really -- going wrong
A kernel crash or power outage is certainly something that could go wrong. Modern journalling file-systems handle this gracefully by making sure the file-system is in a consistent state when it comes back up.
The filesystem is flawed, plain and simple.
You'll realize why that one is a gem when you read her next quote. As the discussion continues, she begins to realize how far off the mark she is and begins to correct...
It most definitely is a filesystem limitation. That is different from saying that it's the filesystem's fault.
Still off the mark, but perhaps she is beginning to figure out what a file system should offer and what the issue being discussed is.
If an application that reads and writes lots of small files fails under Ext4, then it is Ext4's fault, not the application. An application should be able to read and write lots of small files if it wants... I can think of a great many practical examples.
Go ahead and do that. But if you want to make sure you're data is written, in case of a kernel crash or power outage, then you had better understand what is going on at the FS level.
As a user of a high-level language, I should not be expected to know the disk I/O API in a given OS. That is for the authors of the compiler or interpreter.
No, but you should understand the API of the language you are dealing with. Since when does a compiler handle disk I/O anyway? As for your interpreter, it is free to call fsync whenever it wants, but what has that got to do with the FS again?
Excuse the heck out of me, but the issue being discussed was a failure that was NOT due to a power loss or other such system problem. It was a crash caused by this very issue. If it were simply missing data due to power loss or some such, there would be no point in this discussion at all.
The purpose of this quote is to demonstrate that she both has no regard for TFA and also has no idea what this issue being discussed is. I encourage anyone looking to give her mod points actually RTFA and also do a bit of background reading on file systems and in particular delayed writes.
My point was and still is: if the data is not flushed to disk yet, it should either be accessible from the buffer, or not at all.
This sentence alone deserves a -1 Huh? If you do a write, and it is successful, then you can do a read on the same file and it will return what you wrote, whether or not it had been flushed to disk. This is the way it is supposed to work. Think about it for like 10 seconds and you'll begin to get it.
not supposed to have to worry about OS-specific details
WE ARE TALKING ABOUT UNEXPECTED KERNEL CRASHED AND POWER OUTAGES. If you care about that situation then you should get a clue before you start coding. If not, then what is the problem, or was it fault... er, sorry limitation?
One should not have to know about syncing to do something like a few simple file writes
And one doesn't need to if she is not concerned with the rare possibility that the system CRASHES OR LOSES POWER in the next few minutes.
Anyway, I've never called out another poster like this before and now I feel dirty.