EXT4 Data Corruption Bug Hits Linux Kernel

← Back to Stories (view on slashdot.org)

EXT4 Data Corruption Bug Hits Linux Kernel

Posted by Soulskill on Wednesday October 24, 2012 @07:22AM from the plenty-of-time-to-fix dept.

An anonymous reader writes "An EXT4 file-system data corruption issue has reached the stable Linux kernel. The latest Linux 3.4, 3.5, 3.6 stable kernels have an EXT4 file-system bug described as an apparent serious progressive ext4 data corruption bug. Kernel developers have found and bisected the kernel issue but are still working on a proper fix for the stable Linux kernel. The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."

12 of 249 comments (clear)

Min score:

Reason:

Sort:

Re:Bisected? by Slayne · 2012-10-24 07:30 · Score: 5, Informative

Nope - bisection is a common technique for tracking down the cause of a bug by doing a binary search through the code history.
https://en.wikipedia.org/wiki/Code_Bisection
This is why I stick to Reiser by Anonymous Coward · 2012-10-24 07:33 · Score: 5, Funny

I know he'd never do anything to harm me or my data.
I don't see the problem then... by Zapotek · 2012-10-24 07:34 · Score: 5, Funny

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.
We're talking about Linux users here...move along.
Really clever... by K.+S.+Kyosuke · 2012-10-24 07:36 · Score: 5, Funny

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."
They're trying to boost the average uptime of all installations by making people keep their machines turned on. It's just a continuation of the uptime war waged with the BSD folks!

--
Ezekiel 23:20
Interesting bug, but don't get excited. by dacut · 2012-10-24 07:38 · Score: 5, Informative

From Ted Ts'o's commentary, it's an optimization ("jbd2: don't write superblock when if its empty") gone awry:

The reason why the problem happens rarely is that the effect of the buggy commit is that if the journal's starting block is zero, we fail to truncate the journal when we unmount the file system. This can happen if we mount and then unmount the file system fairly quickly, before the log has a chance to wrap.
Basically, this optimization has the side effect of not updating the transaction log in this rare case. You can end up replaying old transactions after new ones, which will scramble metadata blocks. Given the rather unique conditions needed to hit this one, I'm not going to lose any sleep over any servers running without Ted's fix (though I'll certainly apply it once RedHat releases the patch).
1. Re:Interesting bug, but don't get excited. by Anonymous Coward · 2012-10-24 09:34 · Score: 5, Informative
  
  Ubuntu users are at risk.
  http://www.ubuntuupdates.org/package/core/quantal/main/proposed/linux-image-3.5.0-18-generic
  Look for " jbd2: don't write superblock when if its empty
  - LP: #1066176"
  If any Ubuntu users have proposed repo enabled and they've updated to 3.5.0-18, they're vulnerable.
Re:Reiserfs became 'murderfs'... by Anonymous Coward · 2012-10-24 07:59 · Score: 5, Funny

So clearly the answer is General Tso's FS. Delicious, but you'll lose your data an hour later.
Your Papers Please by Anonymous Coward · 2012-10-24 08:05 · Score: 5, Funny

grammar nazi's
grammar Nazis
Summary is wrong by DrJimbo · 2012-10-24 08:05 · Score: 5, Informative

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.
This is wrong. The problem occurs when the fs is unmounted too *soon*. Twice in a row. The bug only appears if the journal buffer does not wrap. You only get catastrophic results if this happens twice in a row.

--
We don't see the world as it is, we see it as we are.
-- Anais Nin
1. Re:Summary is wrong by Anonymous Coward · 2012-10-24 08:27 · Score: 5, Interesting
  
  This appears to be untrue. My latest tests suggest that it happens if a single unclean umount happens while the fs is mounted in 3.6.3. (At least, I saw corruption in /var after a single boot, followed by a rescue boot into 3.6.1 and fsck: every filesystem that had journal replay invoked also had corruption.)
  -- N., original reporter, not much enjoying his fifteen minutes of fame since it comes with happy fun filesystem corruption attached: captcha is 'contrite', how appropriate
Re:Low impact by jedidiah · 2012-10-24 08:14 · Score: 5, Insightful

> Windows has never had anything as serious as a file system corruption bug.
That you know of...
Since the Windows development process isn't open, there's no way for you to tell. You don't get to see Microsoft's development versions and you don't get to see Microsoft's bug database.

--
A Pirate and a Puritan look the same on a balance sheet.
Most of the early stories on the web are wrong.... by tytso · 2012-10-24 13:42 · Score: 5, Informative

I have a Google+ post where I've posted my latest updates to this still-developing story:
https://plus.google.com/117091380454742934025/posts/Wcc5tMiCgq7
Also, I will note that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by all of the major file system authors). So for example, my development laptop, which I am currently using to post this note, is currently running v3.6.3 with the ext4 patches which I have pushed to Linus for the 3.7 kernel. Why am I willing to do this? Specifically because I've run a very large set of automated regression tests on a very regular basis, and certainly before pushing the latest set of patches to Linus. So while it is no guarantee of 100% perfection, I and many other kernel developers *are* willing to eat our own dogfood.