Kernel Hackers On Ext3/4 After 2.6.29 Release

← Back to Stories (view on slashdot.org)

Kernel Hackers On Ext3/4 After 2.6.29 Release

Posted by timothy on Wednesday March 25, 2009 @12:18AM from the good-things-come-from-certain-clashes dept.

microbee writes "Following the Linux kernel 2.6.29 release, several famous kernel hackers have raised complaints upon what seems to be a long-time performance problem related to ext3. Alan Cox, Ingo Molnar, Andrew Morton, Andi Keen, Theodore Ts'o, and of course Linus Torvalds have all participated. It may shed some light on the status of Linux filesystems. For example, Linus Torvalds commented on the corruption caused by writeback mode, calling it 'idiotic.'"

15 of 316 comments (clear)

Idiotic by baadger · 2009-03-25 00:27 · Score: 5, Informative

Mirror for the thread:
http://thread.gmane.org/gmane.linux.kernel/811167/focus=811699
Let me guess... by Puls4r · 2009-03-25 00:43 · Score: 5, Funny

The server is running linux.
1. Re:Let me guess... by UnRDJ · 2009-03-25 00:46 · Score: 5, Funny
  
  too much karma for your tastes?
Re:lkml.org server is slashdotted. by FernandoTorres · 2009-03-25 00:44 · Score: 5, Funny

Well this is just my meta comment. I'll be writing my real comment later...
OK, then... *WHO* is the official ext3 "moron"? by Anonymous Coward · 2009-03-25 00:47 · Score: 5, Insightful

Quote from Linus:

"...the idiotic ext3 writeback behavior. It literally does everything the wrong way around - writing data later than the metadata that points to it. Whoever came up with that solution was a moron. No ifs, buts, or maybes about it."
In the interests of fairness... it should be fairly easy to track down the person or group of people who did this. Code commits in the Linux world seem to be pretty well documented.
How about ASKING them rather than calling the Morons?
(note: they may very well BE morons, but at least give them a chance to respond before being pilloried by Linus)
TDz.
1. Re:OK, then... *WHO* is the official ext3 "moron"? by Anonymous Coward · 2009-03-25 01:09 · Score: 5, Insightful
  
  Torvalds exactly knows who it is and most people following the discussion will probably know it, too.
  Also, there has been a fairly public discussion including a statement by the responsible person in question.
  Not saying the name is Torvalds attempt at saving grace. Similar to a parent of two children saying, I don't know who did the mess, but if I come back, it better be cleaned up.
  Yes, Mr. Torvalds is fairly outspoken.
2. Re:OK, then... *WHO* is the official ext3 "moron"? by 644bd346996 · 2009-03-25 01:24 · Score: 5, Informative
  
  ext3 was merged to the mainline kernel in 2001. Git was created in 2005. I wouldn't trust any authorship evidence in a git repo for code predating the repo.
  The journalling behavior of ext3 was probably decided by Stephen Tweedie
3. Re:OK, then... *WHO* is the official ext3 "moron"? by houghi · 2009-03-25 01:35 · Score: 5, Insightful
  
  Knowing the humor that Linus has, it could be himself.
  
  --
  Don't fight for your country, if your country does not fight for you.
4. Re:OK, then... *WHO* is the official ext3 "moron"? by Ecuador · 2009-03-25 02:03 · Score: 5, Funny
  
  Yep, we urgently need some kind of killer FS for Linux...
  Oh, wait...
  
  --
  Violence is the last refuge of the incompetent. Polar Scope Align for iOS
Re:Slow performance by morgan_greywolf · 2009-03-25 00:51 · Score: 5, Funny

Well, they had to switch the lkml server to ext3 because posts kept getting killed and cut into pieces with their old filesystem and the admins just kept saying "Well, they must've gone to Russia."

--
My blog
Re:I would go further than Linus on this one... by Skuto · 2009-03-25 00:59 · Score: 5, Informative

You are confusing writeback caching with ext3/4's writeback option, which is simply something different.
The problem with all the ext3/ext4 discussions has been the ORDER in which things get written, not whether they are cached or not. (Hence the existance of an "ordered" mode)
You want new data written first, and the references to that new data updated later, and most definitely NOT the other way around.
Linus seems to understand this much better than the people writing the filesystems, which is quite ironic.
Re:lkml.org server is slashdotted. by Anonymous Coward · 2009-03-25 01:34 · Score: 5, Insightful

Well this is just my meta comment. I'll be writing my real comment later...
You forgot to include a link to the comment you'll be writing later.
Data - metadata ordering: softupdates by ivoras · 2009-03-25 02:14 · Score: 5, Informative

Somebody's going to mention it so here it is: there was a BSD unix research project that ended as the soft-updates implementation (currently present in all modern free BSDs). It deals precisely with the ordering of metadata and data writes. The paper is here: http://www.ece.cmu.edu/~ganger/papers/softupdates.pdf. Regardless of what Linus says, soft-updates with strong ordering also do metadata updates before data updates, and also keeps tracks of ordering *within* metadata. It has proven to be very resilient (up to hardware problems).
Here's an excerpt:
We refer to this requirement as an update dependency, because safely writing the direc- tory entry depends on first writing the inode. The ordering constraints map onto three simple rules: (1) Never point to a structure before it has been initialized (e.g., an inode must be initialized before a directory entry references it). (2) Never reuse a resource before nullifying all previous pointers to it (e.g., an inode's pointer to a data block must be nullified before that disk block may be reallocated for a new inode). (3) Never reset the last pointer to a live resource before a new pointer has been set (e.g., when renaming a file, do not remove the old name for an inode until after the new name has been written). The metadata update problem can be addressed with several mecha- nisms. The remainder of this section discusses previous approaches and the characteristics of an ideal solution.
There's some quote about this... something about those who don't know unix and about reinventing stuff, right :P ?

--
-- Sig down
Re:lkml.org server is slashdotted. by linuxrocks123 · 2009-03-25 02:30 · Score: 5, Insightful

Actually, Linus was, as he sometimes is, completely clueless. He's unaware of the fact that filesystem journaling was *NEVER* intended to give better data integrity guarantees than an ext2-crash-fsck cycle and that the only reason for journaling was to alleviate the delay caused by fscking. All the filesystem can normally promise in the event of a crash is that the metadata will describe a valid filesystem somewhere between the last returned synchronization call and the state at the event of the crash. If you need more than that -- and you really, probably don't -- you have to do special things, such as running an OS that never, ever, ever crashes and putting a special capacitor in the system so the OS can flush everything to disk before the computer loses power in an outage.

--
vi ~/.emacs # I'm probably going to Hell for this.
Re:lkml.org server is slashdotted. by AigariusDebian · 2009-03-25 02:45 · Score: 5, Informative

On-disk state must always be consistent. That was the point of journalig, so that you do not have to do a fsck to get to a consistent state. You write to a journal, what you are planing to do, then you do it, then you activate it and mark done in the journal. At any point in time, if power is lost, the filesystem is in a consistant state - either the state before the operation or the state after the operation. You might get some half-written blocks, but that is perfectly fine, because they are not referenced in the directory structure until the final activation step is written to disk and those half-written bloxk are still considered empty by the filesystem.