Apps That Rely On Ext3's Commit Interval May Lose Data In Ext4
cooper writes "Heise Open posted news about a bug report for the upcoming Ubuntu 9.04 (Jaunty Jackalope) which describes a massive data loss problem when using Ext4 (German version): A crash occurring shortly after the KDE 4 desktop files had been loaded results in the loss of all of the data that had been created, including many KDE configuration files." The article mentions that similar losses can come from some other modern filesystems, too. Update: 03/11 21:30 GMT by T : Headline clarified to dispel the impression that this was a fault in Ext4.
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54
Don't worry guys, I read the summary this time, and it only affects the German version of ext4.
Real reason for the bug report: Someone's angry and wants his porn back.
The problem here is that delaying writes speeds up things greatly but has this possible side-effect. For a shorter commit time, simply stay with ext3. You can also mount your filesystems "sync" for a dramatic performance hit, but no write delay at all.
Anyways, with moderen filesystems data does not go to disk immediately, unless you take additional measures, like a call to fsync. This should be well known to anybody that develops software and is really not a surprise. It has been done like that on server OSes for a very long time. Also note that there is no loss of data older than the write delay period and this only happens on a system crash or power-failure.
Bottom line: Nothing to see here, except a few people that do not understand technology and are now complaining that their expectations are not met.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
It's amazing how fast a filesystem can be if it makes no guarantees that your data will actually be on disk when the application writes it.
Anyone who assumes modern filesystems are synchronous by default is deluded. If you need to guarantee your data is actually on disk, open the file with O_SYNC semantics. Otherwise, you take your chances.
Moreover, there's no assertion that the filesystem was corrupt as a result of the crash. That would be a far more serious concern.
Meh, this is crap that happens only when the system crashes, and is pretty much unavoidable if you're doing a lot of caching in memory -- which, coincidentally, is what you need to do to maximize performance. This doesn't sound like the filesystem's "fault" or the application's "fault;" it's just the way things are. Everybody knows that if you don't cleanly unmount, most bets are off.
The problem is not the many small files, but the missing disk sync. The many small files just make the issue more pbvous.
True, with ext4 this is more likely to cause problems, but any delayed write can cause this type of issue when no explicit flush-to-disk is done. And lets face it: fsync/fdatasync are not really a secret to any competent developer.
What however is a mistake, and a bad one, is making ext4 the default filesystem at this time. I say give it another half year, for exactly this type of problem.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
In fact, there is no such thing as an OS bug! All good programmers should re-implement essential and basic operating system features in their user applications whenever they run into so-called "OS bugs." If you question this, you must be a bad programmer, obviously.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
The journal isn't being written before the data. Nothing is written for periods between 45-120 seconds so as to batch up the writing to efficient lumps. The journal is there to make sure that the data on disk makes sense if a crash occurs.
If your system crashes after a write hasn't hit the disk, you lose either way. Ext3 was set to write at most 5 seconds later. Ext4 is looser than that, but with associated performance benefits.
This is NOT a bug. Read the POSIX documents.
Filesystem metadata and file contents is NOT required to be synchronous and a sync is needed to ensure they are syncronised.
It's just down to retarded programmers who assume they can truncate/rename files and any data pending writes will magically meet up a-la ext3 (which has a mount option which does not sync automatically btw).
RTFPS (Read The Fine POSIX Spec).
Ext3 doesn't write out immediately either. If the system crashes within the commit interval, you'll lose whatever data was written during that interval. That's only 5 seconds of data if you're lucky, much more data if you're unlucky. Ext4 simply made that commit interval and backend behavior different than what applications were expecting.
All modern fs drivers, including ext3 and NTFS, do not write immediately to disk. If they did then system performance would really slow down to almost unbearable speeds (only about 100 syncs/sec on standard consumer magnetic drives). And sometimes the sync call will not occur since some hardware fakes syncs (RAID controllers often do this).
POSIX doesn't define flushing behavior when writing and closing files. If your applications needs data to be in NV memory, use fsync. If it doesn't care, good. If it does care and it doesn't sync, it's a bad application and is flawed, plain and simple.
This is the attitude that has the web stuck with IE.
There's a standard out there called POSIX. It's just like an HTML or CSS standard. If everyone pays attention to it, everything works better. If you fail to pay attention to it for your bit (writing files or writing web pages), it's not *my* fault if my conforming implementation (implementing the writing or the rendering) doesn't magically fix your bugs.
The ringing of the division bell has begun... -PF
Bullshit. It is not a filesystem limitation. POSIX tells you what you can expect from file system calls. Data committed to disk as soon as an fwrite or fclose returns is not something you can or should expect. (And this is true of every OS I've used in the last 20 years.)
A great many crap programmers think APIs ought to do what they'd like them to. But APIs don't. At best they do what they are specified to do.
The filesystem doesn't guarantee anything is written until you've called fsync and it has returned.
Does anyone else think that 150 second is a bit over the top in terms of writing to disk?
I could understand one or two seconds as you speculate more data might come that needs to be written.
5 seconds is a bit iffy, as with ext3.
150 seconds? That's surely a bug.
Nope, it writes a new file and then renames it over the old file, as rename() says it is an atomic operation - you either have the old file or the new file. What happens with ext4 is that you get the new file except for its data. While that may be correct from a POSIX lawyer pont of view, it is still heavily undesirable.
Ahh yes, I love developers like you. You assume your app is the only one running, and it must have full access to the entire IO bandwidth an HDD can provide.
And then an antivirus program updates while Firefox is starting and a video is transcoding, and your program either slows to a crawl or crashes after 30 seconds of not receiving or being able to write any data.
Recently I was playing Left4Dead when one of my HDDs in my RAID array died in a very audible way. All the drives spun down, then 3 of them came back online. IOPS went to zero for over 60 seconds. No data in or out to those devices!
Interestingly, Ventrilo kept running fine. Left4Dead completely froze, but a minute or so after the 3 drives came back online, it unfroze. (CPU catching up?) All the while I was freaking out on Ventrilo, much to my friends' amusement.
Pretty much everything else crashed, except for Portable Firefox... uTorrent crashed, but first it left corrupted files all over - appearing as undeletable folders, which require a format to remove.
Time for a disk wipe. Thank you, shitty developers! Next time, use the API properly, and if you must have it written to disk, sync it immediately after you write!
It's not going to happen immediately in any case. Some optimizations can only be done if you introduce a delay, and once introduced you have to deal with that there's a delay. Just because it's one second instead of a minute doesn't mean your computer can't crash in the precisely wrong moment.
While I'm not an expert in filesystems, I'd expect writing a single file to be at least 4 writes: inode, data, update the directory the file is in, and a bitmap to show space allocation. If there's a journal add a write for the journal. Each of those will require a seek due to all of these things being in different places on the disk in most filesystems.
So your 40 small files just turned into 400-500 seeks, which at 8ms each will take 1.6 to 2 seconds to complete.
Now let's suppose we can batch things up. We need to write the inode and data for each file, and can do just one seek for the directory (the same for all), and the bitmap and journal can be updated in one operation. Now we're down to 2 writes per file, giving 80 seeks, plus 3 for metadata, giving 83 seeks, which can be done in 0.6 seconds.
But what if we do delayed allocation and create the all the inodes and write all the data as one large contigous area? We're now down to 5 writes total, with a seek time of 40ms. The time needed to write the data can probably be disregarded, since modern disks easily write at 50MB/s, and those 40 files with metatata probably amount to less than 32K.
And with some optimization, we just reduced the time it takes to write your 40 files to just 2% of the unoptimized time.
You're not going to get this sort of improvement without some sort of delay. If you insist on a per-file write you'll get really, really awful performance on the sort of workload you're using as an example. And you can even see it in practice, just boot a DOS box, and do benchmarks with and without smartdrv. Running something like a virus scanner should show a huge difference in the presence of a cache.