Ext4 Data Losses Explained, Worked Around

← Back to Stories (view on slashdot.org)

Ext4 Data Losses Explained, Worked Around

Posted by timothy on Thursday March 19, 2009 @05:41AM from the you-did-back-up-right dept.

ddfall writes "H-Online has a follow-up on the Ext4 file system — Last week's news about data loss with the Linux Ext4 file system is explained and new solutions have been provided by Ted Ts'o to allow Ext4 to behave more like Ext3."

8 of 421 comments (clear)

Min score:

Reason:

Sort:

Quick workaround - no patches required by canadiangoose · 2009-03-19 06:32 · Score: 5, Informative

If you mount your ext4 partitions with nodelalloc you should be fine. You will of course no longer benefit from the performance enhancements that delayed allocation bring, but at least you'll have all of your freaking data. I'm running Debian on Linux 2.6.29-rc8-git4, and so far my limited testing has shown this to be very effective.

--
Never eat more than you can lift -- Miss Piggy
Re:Workaround is disaster for laptops by Kjella · 2009-03-19 06:48 · Score: 5, Informative

Fixed code:
fwrite()
fsync() - sync this file before close
fclose()
rename()
Either you're a troll or an idiot, since you're AC'ing I guess I got trolled. This will sync immidiately and kill performance and battery life, since every block must be confirmed written before the process can continue. What you need to fix this is a delayed rename that happens after the delayed write.
Problem:
fwrite()
fclose()
rename()
*ACTUAL RENAME*
*TIME PASSES* <-- crash happens here = lose old file
*ACTUAL WRITE*
Real solution:
fwrite()
fclose()
rename()
*TIME PASSES* <-- crash happens here = keep old file
*ACTUAL WRITE*
*ACTUAL RENAME*

--
Live today, because you never know what tomorrow brings
A bad design that it is used everywhere by diegocgteleline.es · 2009-03-19 07:46 · Score: 5, Informative

"No write is guaranteed to be written to disk until the OS is shut down, everything can be cached in RAM for an indefinite amount of time." However that'd be real flaky and lead to data loss. That makes my FS useless. Doesn't matter if it is well documented, what matters is that the damn thing loses data on a regular basis.
It turns out that all the modern operative systems work exactly like that. In ALL of them you need to use explicit syncronization (fsync and friends) to get a notification that your data has really been written to disk (and that's all what you get, a notification, because the system could oops before fsync finishes). You also can mount your filesystem as "sync", which sucks.
Journaling, COW/transaction-based filesystems like ZFS only guarantee the integrity, not that your data is safe. It turns out that Ext3 has the same problem, it's just that the window is smaller (5 seconds). And I wouldn't bet that HFS and ZFS have not the same problem (btrfs is COW and transaction based, like ZFS, and has the same problem).
Welcome to the real world...
1. Re:A bad design that it is used everywhere by Tacvek · 2009-03-19 08:52 · Score: 5, Informative
  
  The Ext3 5 seconds thing is true, but that is not the important difference.
  On Ext3, with the default mount options, if one writes a file to disk, and then renames the file the write is guarantee to come before the rename. This can be used to ensure atomic updates to files, by writing a temporary copy of the file with the desired changes, and then renaming the file.
  On Ext4, if one writes a file to the disk, and then renames the file, the rename can happen first. The result of this is that it is not possible to ensure atomic updates to files unless one uses fsync between the writing and the renaming. However, that would hurt performance, since fsync will force the file to be committed to disk right now, when all that is really important is that it is committed to disk before the rename is.
  Thankfully the Ext4 module will be gaining a new mount option that will ensure that a file is written to disk before the renaming occurs. This mount option should have no real impact on performance, but will ensure the atomic update idiom that works on Ext3 will also work on Ext4.
  
  --
  Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
Re:No kidding by Tacvek · 2009-03-19 08:40 · Score: 4, Informative

I don't think you have it right.
On Ext3 with "data=ordered" (a default mount option), if one writes the file to disk, and then renames the file, ext3 will not allow the rename to take place until after the file has been written to disk.
Therefore if an application that wants to change a file uses the common pattern of writing to a temporary file and then renaming (the renaming is atomic on journaling file systems), if the system crashes at any point, when it reboots the file is guaranteed to be either the old version or the new version.
With Ext4, if you write a file and then rename it, the rename can happen before the write. Thus if the computer crashes between the rename and the write, on reboot the result will be a zero byte file.
The fact that the new version of the file may be lost is not the issue. The issue is that both versions of the file may be lost.
The end result is the write and rename method of ensuring atomic updates to files does not work under Ext4.
A new mount option that forces the rename to come after the data is written to disk is being added. Once that is available, the problem will be gone if you use that mount option. Hopefully it will be made a default mount option.

--
Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
Re:LOL: Bug Report by zenyu · 2009-03-19 08:47 · Score: 4, Informative

They don't. Applications just need to concern themselves with the details of of the APIs they use, and the guarantees those APIs do or don't provide.
Yup, and the problem has existed with KDE startup for years. I remember the startup files getting trashed when Mandrake first came out and I tried KDE for long enough to get hooked, and it's happened to me a few times a year ever since with every filesystem I've used. I just make my own backups of the .kde directory and fix this manually when it happens. I'm pretty good at this restore by now. Hopefully this bug in KDE will get fixed now that it is causing the KDE project such great embarrassment. I had a silent wish Tso would increase the default commit interval to 10 minutes when the first defenders of the KDE bug started squawking, but he's was too gracious for that.
PS I use a lot of experimental graphics drivers for work, hence lockups during startup are common enough that I probably see this KDE bug more than most KDE users. But they really violate every rule of using config files: 1st. open with minimum permission needed, in this case read only, unless a write is absolutely necessary. 2nd. only update a file when it needs updating. 3rd. when updating a config file make a copy, commit it to disk, and then replace the original, making sure file permissions and ownership are unchanged, then commit the rename if necessary.
PS2 Those computer users saying an fsync will kill performance need to get cluebat applied to them by the nearest programmer. 1st. There will be no fsyncs of config files at startup once the KDE startup is fixed. 2nd. fsyncs on modern filesystems are pretty fast, ext3 is the rare exception to that norm; this will be non-noticable when you apply a settings change. 3rd. These types of programming errors are not the norm; I've graded first and second year computer science classes and each of the three major mistakes made would have lost you 20-30% of your score for the assignment.
Sounds like they need to talk to Kirk McKusick by argent · 2009-03-19 08:58 · Score: 4, Informative

Kirk McKusick spent a lot of time working out the right order to write metadata and file data in FFS and the resulting file system, FFS with Soft Updates, gets high performance and high reliability... even after a crash.
Workaround patches already in Fedora and Ubuntu by tytso · 2009-03-19 15:04 · Score: 4, Informative

It's really depressing that there are so many clueless comments in Slashdot --- but I guess I shouldn't be surprised.
Patches to work around buggy applications which don't call fsync() have been around long before this issue got slashdotted, and before the Ubuntu Laundpad page got slammed with comments. In fact, I commented very early in the Ubuntu log that patches that detected the buggy applications and implicitly forced the disk blocks to disk were already available. Since then, both Fedora and Ubuntu are shipping with these workaround patches.
And yet, people are still saying that ext4 is broken, and will never work, and that I'm saying all of this so that I don't have to change my code, etc ---- when in fact I created the patches to work around the broken applications *first*, and only then started trying to advocate that people fix their d*mn broken applications.
If you want to make your applications such that they are only safe on Linux and ext3/ext4, be my guest. The workaround patches are all you need for ext4. The fixes have been queued for 2.6.30 as soon as its merge window opens (probably in a week or so), and Fedora and Ubuntu have already merged them into their kernels for their beta releases which will be released in April/May. They will slow down filesystem performance in a few rare cases for properly written applications, so if you have a system that is reliable, and runs on a UPS, you can turn off the workaround patches with a mount option.
Applications that rely on this behaviour won't necessarily work well on other operating systems, and on other filesystems. But if you only care about Linux and ext3/ext4 file systems, you don't have to change anything. I will still reserve the right to call them broken, though.