Slashdot Mirror


Data Corrupting ext3 Bug In Latest Linux 2.4.20

An anonymous reader writes "Andrew Morton alerted readers of the Linux Kernel mailing list today that ext3 in the 2.4.20 kernel has a new bug that can easily cause file data corruption at unmount time. The bug will only affect people using ext3 in "data=journal" mode, which fortunately is not the default... Full details can be read on KernelTrap."

50 comments

  1. Ob Lame Comment by Trusty+Penfold · · Score: 5, Funny

    I hope this bug doesn't corrupt the Slashdot datab%(@#LJASLO)aojda2

  2. another victory for open source by tps12 · · Score: 2, Flamebait

    Forgive me for gloating, but I'm once again elated at how quickly this bug was squashed. Literally hours after the kernel was released, we have a fix available. Meanwhile, Windorks are still getting hammered by the Klez and ILOVEYOU virii. It's a miracle Linux and less popular open source programs like *BSD haven't wiped out the competition entirely.

    Of course, I'm sure some of the more bleeding-edge types were bitten by this buglet, but I guess that comes with the territory; backup backup backup! I hope no Slashdotters lost any of their porn collections.

    --

    Karma: Good (despite my invention of the Karma: sig)
    1. Re:another victory for open source by The+Bungi · · Score: 3, Insightful
      Yes, remarkable, isn't it?

      Even more remarkable is the fact that these stories always somehow fail to make the front page, while every 2-cent obscure vulnerability discovered in Internet Explorer and IIS are shoved front and center.

      Slashdot needs a bit more balance in the way it covers things. If this had been a problem with the goddamn filesystem (!) in Windows you'd be seeing 900 posts to the tone of "Hah! M$ sucks!!!1!!".

      Sad.

    2. Re:another victory for open source by MattCohn.com · · Score: 4, Interesting

      At the end of the link....

      Andrew Morton wrote:>
      > ...
      > The fix is to only apply the optimisation to inodes which are operating
      > under data=ordered.
      >

      That "fix" didn't fix it. Sorry about that.

      Please avoid ext3/data=journal until it is sorted out.

      WELL. It seems that the Open Source people ARE on top of it, but please, don't turn a Linux bug into a way to bash Microsoft. A better comment would have been "Hm. Well, they did screw up but they are fixing it".

      Klez and ILOVEYOU all have fixes. A lazy person who doesn't update and patch will have an unsecure system regardless of if it runs Windows, Linux, BSD, Mac OS X, or ANYTHING.

      And no, people who run Linux ARN'T smarter and WON'T update more consistantly, they just prefer Linux. And yes, newbies are more likely to be running Windows, but they wouldn't update no matter what OS they are on. And while newbies are more likely to run Windows, Gurus are NOT more likely to run *nix. It's getting old. You like Linux? Great. I'm sure that although things could be better you are very happy with your OS. I run Windows. Great. Although things could be better, I'm very happy with mine.

    3. Re:another victory for open source by OneFix · · Score: 1

      If this had been a problem with the goddamn filesystem (!) in Windows you'd be seeing 900 posts to the tone of "Hah! M$ sucks!!!1!!".

      Regardless of your OS, you're stupid if you're putting new kernels on high availibility systems.

    4. Re:another victory for open source by the+eric+conspiracy · · Score: 2, Informative

      Slashdot needs a bit more balance in the way it covers things. If this had been a problem with the goddamn filesystem (!) in Windows you'd be seeing 900 posts to the tone of "Hah! M$ sucks!!!1!!".

      Oh baloney.

      The fact is that the open source development process is just that, open. This means that users have access to versions of the kernel at all stages of development. This build is only a few days old. Clearly everyone should realize the amount of testing is too small for widespread production use.

      This kernel, and bug have NOT made it into any significant distributions of Linux. The only people using this version are bleeding edge types and testers who routinely compile their own kernels from source.

      If this was a case of, say RedHat 8.0 showing up with a file corruption bug, then, yes, it should be a front page article. This is nothing of the sort. This is a kernel version that might have shown up in Red Hat 8.1, say six months from now had it passed the test of time.

      I shudder to think what kinds of problems we would be reporting here if Microsoft gave its customers anything like the same level of access to its development process.

      After all, Microsoft is the company that shipped Windows ME and MS Smartphone.

      Score: -1, Pro-Microsoft

      If this is your typical posting, yes.

    5. Re:another victory for open source by Anonymous Coward · · Score: 0

      What can Is ay to that but:

      Preach on brother!

    6. Re:another victory for open source by Anonymous Coward · · Score: 1, Informative

      Kez and ILOVEYOU have been patched by MS, a LONG time ago. The only people getting hit by them are people not willing to run windowsupdate. You're a freaking idiot.

    7. Re:another victory for open source by jsse · · Score: 2, Insightful

      Klez and ILOVEYOU all have fixes. A lazy person who doesn't update and patch will have an unsecure system regardless of if it runs Windows, Linux, BSD, Mac OS X, or ANYTHING.

      I'm not going to get into pro-some-OS flame war but I'd like to add one thing that you might have missed in the argument.

      The OS that was infected with Klez and ILOVEYOU is a production system.

      While the kernel which has fs corruption bug is supposed to be used by non-production, testing environment, and for those you like to use bleeding edge release.

    8. Re:another victory for open source by Anonymous Coward · · Score: 0

      So? The fact is, is that this bug was left uncought until it was released. 2.4.20 is suppose to be STABLE. MS would probably have not had a bug this severe though. While virus's might get through, using their FS how its intended probably wouldn't result in FS corruption.

      I keep hearing this "OSS/FS bugs are caught more quickly in released software". That doesn't matter when the software is suppose to be stable.

    9. Re:another victory for open source by monthos · · Score: 1

      As much as i love open source and all, and i do agree Opensource helps fix this bug, but i think something like this should never have made it into a "stable" kernel.

    10. Re:another victory for open source by shaitand · · Score: 1, Flamebait

      The difference between this and every 2cent obscure vulnerability in IE and IIS is that in IE or IIS they would have been discovered 6 months or more after release, in turn microsoft would deny they exist for another 3 months, then another month would go by before microsoft would release a fix. In the meantime developers as talented or moreso than those at microsoft who are unfortunate enough to work in a windows enironment would be forced to sit and wait because they can't look at the code and fix it themselves.

    11. Re:another victory for open source by shaitand · · Score: 1, Troll

      Do you actually work on computers for home users and small buisness? I do, among a great deal of other things, and can tell you that the security patches released by microsoft hardly stop klez. You can patch but the systems still get reinfected again and again without a proper anti-virus to catch the bug.

    12. Re:another victory for open source by Tumbleweed · · Score: 2, Flamebait

      You'd seem smarter if you didn't use the non-word 'virii'. The correct plural form is 'viruses'.

      And this is hardly a 'victory' for open source. Fixing a bug (or not, as the case apparently is) is never a victory. If they'd been able to put out a version of the kernel without a serious bug, now *that* could be considered a victory.

    13. Re:another victory for open source by The+Bungi · · Score: 0, Flamebait
      Gosh, you misunderstand me. I'm not trying to compare bugs or vulnerabilities or their significance or even their seriousness.

      Read my post again. Slowly. Then read it again. And then, if you understand what I was trying to say, post an intelligent reply. Otherwise keep your insight to yourself. Your attempt to rationalize whatever it is you conceived as my slight towards everybody's favorite OS was a waste of time and keyboard lubricant.

      If this is your typical posting, yes.

      ROFLMAO, and you even added me to your "foes" list - how charming.

    14. Re:another victory for open source by Anonymous Coward · · Score: 1, Informative

      Do you actually work on computers for home users and small buisness?

      Yes, I do. And a fully patched computer will NOT be infected by klez. My guess is that somehow you're screwing the system up. How bout this - stay away from Windows machines, they obviously don't like you.

    15. Re:another victory for open source by shaitand · · Score: 2

      It's been my experience that Windows machines don't like anybody ;) As for klez, I can see a "will not", "will so" battle ahead if I go into it. So let's just agree we've had different experiences on this one.

    16. Re:another victory for open source by Phexro · · Score: 4, Insightful

      "While the kernel which has fs corruption bug is supposed to be used by non-production, testing environment, and for those you like to use bleeding edge release."

      Bzzt. 2.4 is the current stable Linux branch, and 2.4.20 is the latest stable version of that branch.

      While this kind of thing is not uncommon in the development branch, it's awful to see in a point release of the stable branch.

    17. Re:another victory for open source by Anonymous Coward · · Score: 0

      I'll take that as a concession of victory to myself. Thank you.

    18. Re:another victory for open source by Vlad_the_Inhaler · · Score: 2
      Nah.
      Back in the days I used to run computers under Windows, Win95b came out with the vfat32 file system as an option. It was new and an improvement over it's predecessor, but bug free it was not.

      Coming closer to home, Linus released a 2.4 kernel a year ago (Thanksgiving 2001? The Turkey kernel) with a major data-corruption bug which was far worse than this one and affected configurations used by the majority. I don't use ext3 like that and can live with this new problem.

      --
      Mielipiteet omiani - Opinions personal, facts suspect.
    19. Re:another victory for open source by fudgefactor7 · · Score: 2

      "And no, people who run Linux ARN'T smarter and WON'T update more consistantly, they just prefer Linux. And yes, newbies are more likely to be running Windows, but they wouldn't update no matter what OS they are on. And while newbies are more likely to run Windows, Gurus are NOT more likely to run *nix. It's getting old. You like Linux? Great. I'm sure that although things could be better you are very happy with your OS. I run Windows. Great. Although things could be better, I'm very happy with mine."

      Amen, Brother. I wish more *nix zealots thought like this.

    20. Re:another victory for open source by Harik · · Score: 1
      Bzzt. 2.4 is the current stable Linux branch, and 2.4.20 is the latest stable version of that branch.
      Bzzt right back. 2.4.19 is the latest stable version. 2.4.20-pre5 is an unstable, testing version for evaulation only. NOT FOR PRODUCTION MACHINES.
    21. Re:another victory for open source by Anonymous Coward · · Score: 0

      You're right, 2.4.19 is the latest stable version. Unfortunately, the official 2.4.20 release was last Saturday.

    22. Re:another victory for open source by Cro+Magnon · · Score: 2

      I still think FILE CORRUPTION bugs should make the front page! Yes, it's in a raw kernel, but the very people dumb enough to use yesterday's kernel are the ones most likely to read /. And while I don't have much sympathy for anyone who uses a new kernel for PRODUCTION, it should be well publisized for the benefit of home users who are fool enough to use it.

      --
      Slow down, cowboy! It has been 4 hours since you last posted. You must wait another few hours.
    23. Re:another victory for open source by Cro+Magnon · · Score: 2

      Bzzt! 2.4.20 has been released. It's officially the latest "stable" version.

      --
      Slow down, cowboy! It has been 4 hours since you last posted. You must wait another few hours.
  3. From LKM -- GET MIRRORS PEOPLE! by fire-eyes · · Score: 3, Informative

    In 2.4.20-pre5 an optimisation was made to the ext3 fsync function
    which can very easily cause file data corruption at unmount time. This
    was first reported by Nick Piggin on November 29th (one day after 2.4.20 was
    released, and three months after the bug was merged. Unfortunate timing)

    This only affects filesystems which were mounted with the `data=journal'
    option. Or files which are operating under `chattr -j'. So most people
    are unaffected. The problem is not present in 2.5 kernels.

    The symptoms are that any file data which was written within the thirty
    seconds prior to the unmount may not make it to disk. A workaround is
    to run `sync' before unmounting.

    The optimisation was intended to avoid writing out and waiting on the
    inode's buffers when the subsequent commit would do that anyway. This
    optimisation was applied to both data=journal and data=ordered modes.
    But it is only valid for data=ordered mode.

    In data=journal mode the data is left dirty in memory and the unmount
    will silently discard it.

    The fix is to only apply the optimisation to inodes which are operating
    under data=ordered.

    --- linux-akpm/fs/ext3/fsync.c~ext3-fsync-fix Sat Nov 30 23:37:33 2002
    +++ linux-akpm-akpm/fs/ext3/fsync.c Sat Nov 30 23:39:30 2002
    @@ -63,10 +63,12 @@ int ext3_sync_file(struct file * file, s
    */
    ret = fsync_inode_buffers(inode);

    - /* In writeback mode, we need to force out data buffers too. In
    - * the other modes, ext3_force_commit takes care of forcing out
    - * just the right data blocks. */
    - if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_WRITEBACK_DATA)
    + /*
    + * If the inode is under ordered-data writeback it is not necessary to
    + * sync its data buffers here - commit will do that, with potentially
    + * better IO merging
    + */
    + if (!ext3_should_order_data(inode))
    ret |= fsync_inode_data_buffers(inode);

    ext3_force_commit(inode->i_sb);

    _
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

    --
    -- Note: If you don't agree with me, don't bother replying. I won't read it.
  4. Update by fire-eyes · · Score: 3

    In fact, there is a reply to that on LKM:

    In fact it was reported on lkml on 18th July IIRC before 2.4.19 was
    released if that is any help to you. 2.4.19 and 2.4.20 are affected
    and I haven't tested previous releases. I was going to re-report it
    sometime, but Alan brought it to light just the other day.

    Nick

    --
    -- Note: If you don't agree with me, don't bother replying. I won't read it.
  5. So I'm clueless by OldMiner · · Score: 0, Redundant

    So, I'm clueless. But there's a lot of smart people on Slashdot. No, really, how often does one actually unmount a volume at home? In a production environment? When you shut down, is an unmount performed? If so, is the cached metadata and data flushed manually beforehand? Does this mean it's safer to simply reboot one's computer rather than carefully shut it down?

    --
    You like splinters in your crotch? -Jon Caldara
    1. Re:So I'm clueless by J'raxis · · Score: 4, Informative

      Unmounts happen at shutdown. You also need to unmount before scanning/fixing a filesystem. The whole bug here pertains to the fact that it isn't flushing ("syncing") the last 30 seconds of cached data to the disk beforehand. A cold reboot without unmounting could potentially cause all kinds of other data inconsistency problems to pop up.

      The temporary fix seems to be to run sync manually. Stick "sync" in your /etc/rc.d/init.d/mountfs (or whatever it's called on your system) script right before the "umount" line.

    2. Re:So I'm clueless by iggymanz · · Score: 2

      hey, I like that! Like the good old BSD days when you did sync;sync;sync;halt; right after kicking all the (l)users off!

  6. So it was a dumb idea... by Ayanami+Rei · · Score: 2, Informative

    JUST DON'T SHUT YOUR SYSTEM OFF! MUWAHAHAHAHAA!!

    just kiddin'

    Fortunately, this bug didn't make it into 2.5 so it won't be propogated forward. Hint: the quick fix ISN'T a quick fix, it doesn't work.
    Either stick with 2.4.19, don't use journaled file data, or sync before umounting (I do that anyway... just superstitious I guess ^_^).

    It will take a few days to add some extra magic to the umount logic to flush all buffers in an intelligent way. Hopefully this optimization is worth the effort for dudes with high-uptime.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  7. Re:Most Unsecure OS? Yep, It's Linux by GreyWolf3000 · · Score: 3, Insightful
    From the troll that brought you the *BSD is DYING posts (all 5,425 of them) I'm sure. Okay, I'll bite.

    Really though, CERT advisories are inadequate tools for measuring vulnerability. Assuming Linux+apache+ssh, etc., all had equal number of bugs, the number of CERT advisories would be dramatically higher for Linux as opposed to Windows, since Microsoft forces people to hush up when a hole is found, and in the case of Linux, the bugs get reported several times, and the same hole in several distros likely becomes different bugs.

    Hence, the article draws a similar conclusion to something like "Our army suffered more casualties than our opponent's army; hence, our opponent is the victor."

    --
    Slashdot: Where people pretend to be twice as smart as they really are by behaving like children.
  8. Why isn't this on the front page? by Anonymous Coward · · Score: 1, Insightful

    Why didn't this make it to the front page? It would be prudent to warn the visitors who don't regularly check the developers section, so that they can take appropriate measures to avoid corruption. This is just plain irresponsible.

    1. Re:Why isn't this on the front page? by walt-sjc · · Score: 4, Insightful

      Um, maybe because regular non-developer type people don't run out and grab the latest kernel that just came out and compile it themselve for the hell of it. Instead, they run whatever version comes with their distro.

      Anyone running the latest bleeding edge stuff keeps up with the LKML anyway, and KNOWS what is going on, way before it would hit a news site like /.

      The sky is falling! Sheesh...

    2. Re:Why isn't this on the front page? by Anonymous Coward · · Score: 2, Insightful

      When 2.4.20 was released, the news made it to the front page. Wouldn't it be appropriate to notify the same people who were notified that this new kernel version was released and ready for download?
      I suspect that there are many Slashdot readers who will compile the latest kernel, but who do not read the developers section.
      I wouldn't consider 2.4.20 "bleeding edge", as it is the latest kernel in the current stable series, and as such is supposed to be safe for running. "Bleeding edge" would be the latest 2.5 kernel or possibly prerelease kernels in the 2.4 series.
      Again, this deserves to be on the front page.

  9. Greased Turkey, anyone? by cperciva · · Score: 2, Redundant

    I think the answer here is simply to avoid any Linux kernels released close to Thanksgiving.

    1. Re:Greased Turkey, anyone? by Anonymous Coward · · Score: 0

      I think the answer here is simply to avoid any Linux kernels released close to Thanksgiving.

      The last 4 words were really unnecessary.

  10. bad for linux tco :( by Anonymous Coward · · Score: 0, Flamebait

    Let's have a close look at the costs involved when running a Linux system.

    An important factor in Linux' cost is its maintenance. Linux requires a *lot* of maintenance, work doable only by the relatively few high-paid Linux administrators that put themselves - of course willingly - at a great place in the market. Linux seems to be needing maintenance continuously, to keep it from breaking down.

    Add to this the cost of loss of data. Linux' native file system, EXT2FS, is known to lose data like a firehose spouts water when the file system isn't unmounted properly. Other unix file systems are much more tolerant towards unexpected crashes. An example is the FreeBSD file system, which with soft updates enabled, performance-wise blows EXT2FS out of the water, and doesn't have the negative drawback of extreme data loss in case of a system breakdown.

    According to Linux advocates, an alternative to EXT2FS would be ReiserFS. Unfortunately, ReiserFS is still in beta stage. This means it is not intended for production use (although according to many Linux advocates this shouldn't be a problem, which makes me wonder how (little) valuable they find your data).

    The other proposed 'solution', EXT3FS, is nothing more than an ugly hack to put journaling into the file system. All the drawbacks of the ancient EXT2FS file system remain in EXT3FS, for the sake of 'forward- and backward compatibility'. This is interesting, considering that the DOS heritage in the Windows 9x/ME series was considered a very bad thing by the Linux community, even though it provided what could be called one of the best examples of compatibility, ever. When it's about Linux, compatibility constraints don't seem to be that much of a problem for Linux advocates.

    Back to Linux' cost. Factor in also the fact that crashes happen much more often on Linux than on other unices. On other unices, crashes usually are caused by external sources like power outages. Crashes in Linux are a regular thing, and nobody seems to know what causes them, internally. Linux advocates try to hide this fact by denying crashes ever happen. Instead, they have frequent "hardware problems".

    The steep learning curve compared to about any other operating system out there is a major factor in Linux' cost. The system is a mix of features from all kinds of unices, but not one of them is implemented right. A Linux user has to live with badly coded tools which have low performance, mangle data seemingly at random and are not in line with their specification. On top of that a lot of them spit out the most childish and unprofessional messages, indicating that they were created by 14-year olds with too much time, no talent and a bad attitude.

    I could go on and on and on, but the conclusion is clear. Linux is not an option for any one who seeks a professional OS with high performance, scalability, stability, adherence to standards, etc.

  11. Interesting by droyad · · Score: 3, Insightful

    I just got a similar report of a bug from a Accounting software vendor alerting us to a bug in Windows.

    Apparently in W2k SP1 MS broke something that caused data not to be writen from disk cache to the actual disk, which caused data corruption. This was only fixed in SP3.

    I just find it interesting that this bug was not common knowledge as it is not really a "security" issue so they can't hide behind that smoke screen.

    1. Re:Interesting by Anonymous Coward · · Score: 1, Funny

      Quick! post the story to the front page so we can make fun of Microsoft (Micro$oft haha) for having bugs!

  12. my porn collection by larry+bagina · · Score: 1
    Of course, I'm sure some of the more bleeding-edge types were bitten by this buglet, but I guess that comes with the territory; backup backup backup! I hope no Slashdotters lost any of their porn collections.

    Ironically, yes. Since it only affects you when unmount a disk, and the only reason i unmounted a disk was to reboot the kernel after recompiling with the bug fix!

    Oh well, it's not as if i have anything better to do than surf for porn. And my dick could use a rest from the constant masturbation.

    --
    Do you even lift?

    These aren't the 'roids you're looking for.

  13. More details and a request for information by DaveAtFraud · · Score: 2

    BTW, if you use ext3 with the default mount options, you will not run into this problem. Its only if you override the mount default of data=ordered and use the data=journal option that the problem even occurs.

    Hell, it took me several minutes of searching to even find out what the option was to even cause the problem. Something tells me this won't affect many people. Maybe someone who knows ext3 internals will enlighten us with why someone would want to use data=journal.

    --
    They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
    Ben
    1. Re:More details and a request for information by Clue4All · · Score: 2

      Are you kidding? One of the biggest selling points of ext3 was that is journaled both data and metadata, unlike other journaling filesystems that journal only metadata (that's what this option does, and have been discussed in numerous ext3 articles that have appeared on Slashdot). That's why I use ext3. For Andrew Morton to downplay this like no one uses that mode is a big mistake. Surely the release should be pulled or the patch rolled back or SOMETHING.

      --

      Is your browser retarded?
    2. Re:More details and a request for information by DaveAtFraud · · Score: 2

      Thanks for the info. Internal details of file systems aren't way up there on my list so I appreciate a concise answer. A couple of things though:

      1) Journaling both data and metadata may have been a "selling point" of ext3 but journaling of data is off by default. This isn't a distro decision, that's the way it was described in the write up on the LKML. This could be why Andrew downplayed the impact. It takes some digging to even find out about the journaling options.

      2) Unfortunately, most of my experience with a journaling file system has been with reiser. With journaling file systems, my impression is that people ask too much of an operation that is inherently physically limited. Writing the data and writing the meta-data are two separate operations. reiserfs tries to keep small I/O in the journal but ended up with a complicated scheme that fails all too frequently (also, this was more for performance than robustness if I remember correctly). I fear the data=journal option for ext3 has simply demonstrated the same flaw: if you can write the data to the journal, why not write it where it belongs? If the answer is that the journal is simpler and thus faster to write to then you have incurred the complexity of having two separate file systems. You will note that the ext3 error occurred when an optimization was applied to the "data=journal" case that should not have been.

      Robust and fast usually are alternatives and are not usually compatible.

      --
      They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
      Ben
  14. Is the ext3 bug also in the -ac tree? by Anonymous Coward · · Score: 0

    can I get around this shyte by using alan cox's developmental tree?

  15. Redundant? by Anonymous Coward · · Score: 0
    Whoever moderated that does not understand what you are talking about. That greased turkey kernel caused me serious problems.

    I always wait several days now before I even start to experiment with a new kernel. Probably good practice anyway :-)