Slashdot Mirror


2.4.20 ext3 Data Corrupting Bug Fixed

An anonymous reader writes "The ext3 data corrupting bug found in the latest stable Linux kernel and reported by Slashdot here and here has been fixed. In this interesting KernelTrap story Andrew Morton describes the problem and offers a working patch. Evidently the bug has its roots in a much bigger design issue, something that won't likely be fixed in the current 2.4 kernel series. In any case, with Morton's patch applied your data will not be corrupted."

34 comments

  1. QA test cases. by Trusty+Penfold · · Score: 3, Insightful


    Where can I find the QA documentation, test cases and scripts for ext3? I would like to verify that this bug, and variations thereof, will be caught before release in the future. Thanks.

    They don't seem to be at the ext3 home (linked to in the story).

    Open Source is useless without Open Procedures, Open Documentation and Open Quality Control.

    1. Re:QA test cases. by Anonymous Coward · · Score: 0

      >> Open Source is useless without Open Procedures, Open Documentation and Open Quality Control.

      Au contraire. Open source is useless with Open Bureaucracy.

  2. Should be front page. by jericho4.0 · · Score: 4, Insightful
    Why did /. have to cover this 3 times in the dev section. I know many non-dev types who jump on point releases as soon as they come out. They should know about this.

    I hate to say it, but maybe /. doesn't like stories that make linix look bad.

    --
    "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
    1. Re:Should be front page. by ignorant_newbie · · Score: 3, Insightful

      because anyone who runs the brand new shiniest version of anything on their machine should expect that it's not perfect, and should go look at resources for developers.

      slashdot:
      news for whiners, stuff for people who need things explained to them in very small words

    2. Re:Should be front page. by gmhowell · · Score: 2, Insightful

      Hmm, yup, 'ignorant newbie' just about sums it up. This is a 2.4 problem. You know, 2.4, the current STABLE, safe release of the Linux kernel?

      Sometime in the future, 2.4 will go down in history as one serious cluster-fuck of a kernel.

      --
      Jesus was all right but his disciples were thick and ordinary. -John Lennon
    3. Re:Should be front page. by GreenHell · · Score: 2, Funny

      Sometime in the future, 2.4 will go down in history as one serious cluster-fuck of a kernel.

      2.4.11 or 2.4.15 (aka "greased turkey") anyone? ;)

      --
      "I won't mod you down - I feel the need to call you a twit explicitly, rather than by implication."
    4. Re:Should be front page. by Anonymous Coward · · Score: 0

      It's probably for the same reason that any posts critical of Linux are moderated down and people start screaming FUD.

    5. Re:Should be front page. by FooBarWidget · · Score: 2

      "I hate to say it, but maybe /. doesn't like stories that make linix look bad."

      The fact that you are modded up +4 proofs that that is untrue.

  3. vs 3.0 by Lord+Bitman · · Score: 3, Interesting

    And here we're talking about calling the next major release "3.0" while things as important as /the file system/ need to be majorly reworked. Perhaps we shouldnt jump the gun on this. 3.0 should not have things laying around in it that need to be completely re-worked if they're going to work right. It doesnt count as a culmination of significant changes since 2.0 if those changes wont actually be working in 3.0.1

    --
    -- 'The' Lord and Master Bitman On High, Master Of All
    1. Re:vs 3.0 by MrResistor · · Score: 2

      I don't know where you've been for the last couple of months, but the next release is going to be 2.6, according to Linus.

      I can only assume that the moderators that moderated this up are similarly misinformed, which is why I chose to reply rather than moderate this "Overrated" like it should be.

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    2. Re:vs 3.0 by Lord+Bitman · · Score: 2

      And if I said we shouldnt go to the moon, I suppose you'd assume I meant "at all" instead of "again" and correct me there, too.

      --
      -- 'The' Lord and Master Bitman On High, Master Of All
    3. Re:vs 3.0 by eyez · · Score: 2

      From what I've read on lkml, Linus hasn't yet decided what the release will be, whether it's 2.6 or 3.0. in the meantime, he's calling it 2.6.

      --
      get 0wned. irc.w30wnzj00.com
    4. Re:vs 3.0 by MrResistor · · Score: 2

      From what I've read in interviews since the recent cruise it's definately 2.6, and Linus didn't ever seriously consider calling it 3.0.

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    5. Re:vs 3.0 by slashdot_commentator · · Score: 2

      And here we're talking about calling the next major release "3.0" while things as important as /the file system/ need to be majorly reworked.

      2.4.x is the "stable" kernel. That means its not supposed to incorporate radical changes to its infrastructure. Apparently, the maintainer thought they could add some "safe" changes off of the 2.5.x kernel research to add functionality. The team was wrong. The ideal correction would include a radical change, so its going to be a kludge fix instead.

      The file system is getting major rework, IN the development kernel (2.5.x). 2.4 is not 3.0. 2.5 is not 3.0. 3.0 will be out when its ready. Stop judging 3.0 (actually 2.6) based on what's going on in 2.4.

      Besides, only an incompetent would use ext3 in a production machine.

      --
      There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
    6. Re:vs 3.0 by Vlad_the_Inhaler · · Score: 2
      So I'm incompetent. I had another major problem with a nic that was fixed in 2.4.20 and moved up this morning after having waited a week for bug reports. My root partition is ext3 (most of the others are Reiserfs) but I do not use journalling in the way that causes the problem.

      Actually, the server is an emergency backup / mirror server and it has been pretty unreliable for ages. I am not allowed to replace the nic and a kernel that does not require me to pull the power cable every time things go wrong is a big plus. Maybe there was another solution, but anything more than a day per month on that project is seen as lost time for me.

      As to you other point, I hope that Linus's feature freeze does not preclude fixes for problems like this making the next stable set of kernels. Whatever they are called.
      --
      Mielipiteet omiani - Opinions personal, facts suspect.
    7. Re:vs 3.0 by slashdot_commentator · · Score: 2

      The problem with using ext3 in a production system is that is "new". That means its subject to "bugs". Some bugs don't get picked up until many months after its in use. On a filesystem, that means you can get data corruption and lose files/data for months before you realize there is a problem. (And the corruption would be handed down to your backups.) Also, with ext3 being new, it won't have many diagnostic tools or other utilities.

      I have heard BAD things about reiserfs. Its a fact that they don't journal the metadata, just the filesystem structures. In certain crashes, you can lose some data while rapidly bringing up the system. But there are other people who swear by it, and perhaps its better than nothing.

      Myself, I use XFS. There are people who will grouse endlessly about it, but I've never encountered a problem with it. In any case, the whole point of a journaling filesystem quick restart of the filesystems (no fsck) AND integrity of the data. Competent sysadmins don't use flaky filesystems or new kernels on PRODUCTION machines.

      Actually, the server is an emergency backup / mirror server and it has been pretty unreliable for ages.

      Aiieeee... How can it be an emergency backup/mirror server if its unreliable? Mind you, its childsplay to use the machine for prototyping and backup merely by adding a harddrive to it, and doing your prototyping work on the second drive. How the heck can they refuse the replace the NIC if its a clunker? Its a lousy $20 bucks. You probably can cannabalize an old machine's NIC for free.

      Maybe there was another solution, but anything more than a day per month on that project is seen as lost time for me.

      Screwing around for a day because the company is too cheap to spend $20 for a good NIC is ridiculous as well. Its about 1 hour of your salary. I've worked for cheap companies, but that's plain stupid. As does having you mess around with kernels released days ago.

      As to you other point, I hope that Linus's feature freeze does not preclude fixes for problems like this making the next stable set of kernels.

      The whole point of the feature freeze is to stop incorporating NEW features. Bugfixes are the only thing allowed in a frozen development kernel until release. Its a mistake to think of a stable kernel (2.4) as being bugfree for each release. There were shops that still ran 2.2 kernels, because they didn't like the "instability" of the 2.4 kernels.

      --
      There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
  4. Which as-shipped distros are affected by this? by Anonymous Coward · · Score: 1, Interesting

    Which distributions (Redhat, mandrake, debian, etc) are affected by this in their default ISO images? ie - which ones do I have to update just to get around this fatal error?

    1. Re:Which as-shipped distros are affected by this? by Cecil · · Score: 4, Informative

      Which distributions ship using ext3 filesystems by default and setting them to mode journalled in their default ISO images? Um, none?

      Did you mean that you run your ext3 filesystems in full-journal mode, and would like to know if you have to update? Yes. Regardless of distro.

      In either case, please remember that journalled mode is NOT the default. The default is ordered. Unless you're explicitly setting your filesystem to full journalling, you aren't affected by this problem.

      HTH.

    2. Re:Which as-shipped distros are affected by this? by Anonymous Coward · · Score: 0

      Thanks, that was great. Basically if you have just about any distro and did a usual install, you can't be hit by this bug regardless. That will make a lot of clients happy.

  5. Re:Yet another proof by zcat_NZ · · Score: 1

    Obviously you should be running a totally bug-free OS that has never needed to be patched for filesystem-corruption bugs.

    --
    455fe10422ca29c4933f95052b792ab2
  6. Re:Yet another proof by zcat_NZ · · Score: 2, Informative

    On a less inflamatory note; it demonstrates something that most of us are already well aware of. Don't go enabling advanced features or running bleeding-edge kernels unless you either have good backups, or are happy to risk losing some data.

    You're an idiot if you don't have backups anyhow. The most reliable filesystem in the world isn't going to save you from a hard-drive failure, user error, malicious code, theft, flood, fire, lightning strike, earthquake.. These things eat data a lot more frequently than filesystem bugs!

    Expect data loss. Keep backups.

    --
    455fe10422ca29c4933f95052b792ab2
  7. It's a good thing I'm running FreeBSD by bcc123 · · Score: 0

    hmm...yeah... Talk about code maturity.

  8. Yah. by TheLink · · Score: 2

    Microsoft is damn annoying. NT's filesystem was just starting to get a bit more reliable with each service pack and they now say they are going to throw it totally away to introduce new bugs.

    Still, Linux has so many filesystems it's not funny. What are the odds of them getting in the way of the kernel in the future?

    --
  9. One more reason to use XFS by mnmn · · Score: 2

    ...which is a lot more mature and thoroughly tested than ext series. Heres howto install RedHat on XFS:

    Install redhat on ext3,
    configure redhat, esp the networking
    get online, get the latest 2.4 kernel
    get XFS patch and xfsprogs and install
    recompile a new kernel with XFS in it and boot.
    mkfs.xfs /dev/, mount /dev/(xfs) to /mnt
    cd /
    cp -a {bin,usr,etc,... except tmp,mnt,proc} /mnt
    fix /mnt/etc/fstab to point at new partion for older redhats.
    reboot.

    This still gives some obscure errors on bootup, but maybe because of redundant scripts. works very fast and stable for me. If you get around to fixing those errors, please roll out a HOWTO since noone can take filesystem instability on production servers, yet everyone wants to use 2.4.

    --
    "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
    1. Re:One more reason to use XFS by haruchai · · Score: 1

      Do you really have to do all that to use XFS on Redhat? For Mandrake, it's an option that you can choose during the initial setup. My entire filesystem is 100% XFS.

      --
      Pain is merely failure leaving the body
    2. Re:One more reason to use XFS by mnmn · · Score: 2


      Yes you have to.

      I'm aiming for RHCE so I have to use RedHat, and this is the only way to get a decent filesystem. Considering the news of ext3 unstability, still more reason to walk the path of XFS

      --
      "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
  10. Re:Yet another proof by Anonymous Coward · · Score: 0

    OK. I'm not replying to your post specifically, rather I'm replying to the idea that is repeated throughout this thread: DON'T USE BLEEDING EDGE KERNELS and all it's variations. Yes, that's good advice. But let's look at this for one second shall we? What's the version here? 2.4.20. Yup, the twentieth iteration of the stable series kernel. I should be able to install that mother fucker on my pacemaker. There really is no excuse for there to be bugs on a .20 (or .19, or .18 or, even .10) release of a kernel. The fact that there is tells us that there is something fundamentally broken about the process.

  11. Re:Yet another proof by Ed+Avis · · Score: 2

    So 2.4.20 is a 'bleeding-edge' kernel? Ext3 is a 'cutting-edge' feature?

    Are you saying that users should refrain from upgrading to newer releases even when those have been explicitly tagged as 'stable'? Where do you draw the line?

    I do think there is some truth in the argument that you shouldn't upgrade the kernel even from a stable series. Wait for your vendor to release an updated kernel package, if they judge it necessary. And maybe don't upgrade even then.

    But it is unfair in this case to criticize users for installing what they thought was a stable, tested, reliable kernel version. Ah well, mistakes happen.

    --
    -- Ed Avis ed@membled.com
  12. Re:Next story! by Anonymous Coward · · Score: 0
    Most stable: regrettably, probably yes. I had a number of annoying problems with the earlier 2.4 kernels, before Linus dumped 2.4 on Marcelo and started training whales on tightropes with the 2.5 series (lovely imagery, Alan!).

    Of course this particular problem could not have happened with 2.2 because there were no Journalling Filesystems (or has one of them been backported to one of the latest 2.2 kernels?).

    I'll stick with installing the newest kernel a week or so after it hits the streets. That saved me from the greased turkey a year ago.
  13. Re:Yet another proof by zcat_NZ · · Score: 1

    My understanding is that the bug doesn't affect the -default- journalling mode of ext3. You have to specifically change it using some filesystem-tuning utility.

    --
    455fe10422ca29c4933f95052b792ab2