Data Corrupting ext3 Bug In Latest Linux 2.4.20

← Back to Stories (view on slashdot.org)

Data Corrupting ext3 Bug In Latest Linux 2.4.20

Posted by Hemos on Sunday December 1, 2002 @01:27PM from the bad-news dept.

An anonymous reader writes "Andrew Morton alerted readers of the Linux Kernel mailing list today that ext3 in the 2.4.20 kernel has a new bug that can easily cause file data corruption at unmount time. The bug will only affect people using ext3 in "data=journal" mode, which fortunately is not the default... Full details can be read on KernelTrap."

10 of 50 comments (clear)

Min score:

Reason:

Sort:

Ob Lame Comment by Trusty+Penfold · 2002-12-01 13:30 · Score: 5, Funny

I hope this bug doesn't corrupt the Slashdot datab%(@#LJASLO)aojda2
Re:another victory for open source by The+Bungi · 2002-12-01 13:58 · Score: 3, Insightful

Yes, remarkable, isn't it?
Even more remarkable is the fact that these stories always somehow fail to make the front page, while every 2-cent obscure vulnerability discovered in Internet Explorer and IIS are shoved front and center.
Slashdot needs a bit more balance in the way it covers things. If this had been a problem with the goddamn filesystem (!) in Windows you'd be seeing 900 posts to the tone of "Hah! M$ sucks!!!1!!".
Sad.
From LKM -- GET MIRRORS PEOPLE! by fire-eyes · 2002-12-01 14:01 · Score: 3, Informative

In 2.4.20-pre5 an optimisation was made to the ext3 fsync function
which can very easily cause file data corruption at unmount time. This
was first reported by Nick Piggin on November 29th (one day after 2.4.20 was
released, and three months after the bug was merged. Unfortunate timing)

This only affects filesystems which were mounted with the `data=journal'
option. Or files which are operating under `chattr -j'. So most people
are unaffected. The problem is not present in 2.5 kernels.

The symptoms are that any file data which was written within the thirty
seconds prior to the unmount may not make it to disk. A workaround is
to run `sync' before unmounting.

The optimisation was intended to avoid writing out and waiting on the
inode's buffers when the subsequent commit would do that anyway. This
optimisation was applied to both data=journal and data=ordered modes.
But it is only valid for data=ordered mode.

In data=journal mode the data is left dirty in memory and the unmount
will silently discard it.

The fix is to only apply the optimisation to inodes which are operating
under data=ordered.

--- linux-akpm/fs/ext3/fsync.c~ext3-fsync-fix Sat Nov 30 23:37:33 2002
+++ linux-akpm-akpm/fs/ext3/fsync.c Sat Nov 30 23:39:30 2002
@@ -63,10 +63,12 @@ int ext3_sync_file(struct file * file, s
*/
ret = fsync_inode_buffers(inode);

- /* In writeback mode, we need to force out data buffers too. In
- * the other modes, ext3_force_commit takes care of forcing out
- * just the right data blocks. */
- if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_WRITEBACK_DATA)
+ /*
+ * If the inode is under ordered-data writeback it is not necessary to
+ * sync its data buffers here - commit will do that, with potentially
+ * better IO merging
+ */
+ if (!ext3_should_order_data(inode))
ret |= fsync_inode_data_buffers(inode);

ext3_force_commit(inode->i_sb);

_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

--
-- Note: If you don't agree with me, don't bother replying. I won't read it.
Re:another victory for open source by MattCohn.com · 2002-12-01 14:02 · Score: 4, Interesting

At the end of the link....

Andrew Morton wrote:>
> ...
> The fix is to only apply the optimisation to inodes which are operating
> under data=ordered.
>

That "fix" didn't fix it. Sorry about that.

Please avoid ext3/data=journal until it is sorted out.

WELL. It seems that the Open Source people ARE on top of it, but please, don't turn a Linux bug into a way to bash Microsoft. A better comment would have been "Hm. Well, they did screw up but they are fixing it".

Klez and ILOVEYOU all have fixes. A lazy person who doesn't update and patch will have an unsecure system regardless of if it runs Windows, Linux, BSD, Mac OS X, or ANYTHING.

And no, people who run Linux ARN'T smarter and WON'T update more consistantly, they just prefer Linux. And yes, newbies are more likely to be running Windows, but they wouldn't update no matter what OS they are on. And while newbies are more likely to run Windows, Gurus are NOT more likely to run *nix. It's getting old. You like Linux? Great. I'm sure that although things could be better you are very happy with your OS. I run Windows. Great. Although things could be better, I'm very happy with mine.
Update by fire-eyes · 2002-12-01 14:03 · Score: 3

In fact, there is a reply to that on LKM:

In fact it was reported on lkml on 18th July IIRC before 2.4.19 was
released if that is any help to you. 2.4.19 and 2.4.20 are affected
and I haven't tested previous releases. I was going to re-report it
sometime, but Alan brought it to light just the other day.

Nick

--
-- Note: If you don't agree with me, don't bother replying. I won't read it.
Re:So I'm clueless by J'raxis · 2002-12-01 14:28 · Score: 4, Informative

Unmounts happen at shutdown. You also need to unmount before scanning/fixing a filesystem. The whole bug here pertains to the fact that it isn't flushing ("syncing") the last 30 seconds of cached data to the disk beforehand. A cold reboot without unmounting could potentially cause all kinds of other data inconsistency problems to pop up.

The temporary fix seems to be to run sync manually. Stick "sync" in your /etc/rc.d/init.d/mountfs (or whatever it's called on your system) script right before the "umount" line.

--
Liberty in your lifetime
Re:Most Unsecure OS? Yep, It's Linux by GreyWolf3000 · 2002-12-01 14:50 · Score: 3, Insightful

From the troll that brought you the *BSD is DYING posts (all 5,425 of them) I'm sure. Okay, I'll bite.
Really though, CERT advisories are inadequate tools for measuring vulnerability. Assuming Linux+apache+ssh, etc., all had equal number of bugs, the number of CERT advisories would be dramatically higher for Linux as opposed to Windows, since Microsoft forces people to hush up when a hole is found, and in the case of Linux, the bugs get reported several times, and the same hole in several distros likely becomes different bugs.
Hence, the article draws a similar conclusion to something like "Our army suffered more casualties than our opponent's army; hence, our opponent is the victor."

--
Slashdot: Where people pretend to be twice as smart as they really are by behaving like children.
Interesting by droyad · 2002-12-01 16:30 · Score: 3, Insightful

I just got a similar report of a bug from a Accounting software vendor alerting us to a bug in Windows.

Apparently in W2k SP1 MS broke something that caused data not to be writen from disk cache to the actual disk, which caused data corruption. This was only fixed in SP3.

I just find it interesting that this bug was not common knowledge as it is not really a "security" issue so they can't hide behind that smoke screen.
Re:Why isn't this on the front page? by walt-sjc · 2002-12-01 17:04 · Score: 4, Insightful

Um, maybe because regular non-developer type people don't run out and grab the latest kernel that just came out and compile it themselve for the hell of it. Instead, they run whatever version comes with their distro.

Anyone running the latest bleeding edge stuff keeps up with the LKML anyway, and KNOWS what is going on, way before it would hit a news site like /.

The sky is falling! Sheesh...
Re:another victory for open source by Phexro · 2002-12-01 19:16 · Score: 4, Insightful

"While the kernel which has fs corruption bug is supposed to be used by non-production, testing environment, and for those you like to use bleeding edge release."

Bzzt. 2.4 is the current stable Linux branch, and 2.4.20 is the latest stable version of that branch.

While this kind of thing is not uncommon in the development branch, it's awful to see in a point release of the stable branch.