Slashdot Mirror


Serious Bug In 2.4.15/2.5.0

John Ineson writes: "There is a bug in the latest kernel releases, that causes fs corruption on umount. A lot of people have already been hit by this, so for now I suggest you hold fire on booting those new kernels. More dead-duck than greased-turkey. Two possible fixes are being discussed on linux-kernel." Colin Bayer adds links to a story at the Register and Al Viro's fix. Update: 11/25 00:39 GMT by T : Tarkie writes "Linux 2.4.16-pre1 is out, as detailed at NewsForge. If you've been having the filesystem corruptions, might be worth a try so that 2.4.16 can be out ASAP!"

33 of 498 comments (clear)

  1. Does anyone know... by Griim · · Score: 4, Insightful

    ...how something like this could have creeped in, and be missed? Was it a last-minute change that just didn't have time for testing, or was it (bad)luck-of-the-draw that no one noticed it?

    1. Re:Does anyone know... by Colin+Bayer · · Score: 5, Informative

      This bug was introduced when the kernel coders were trying to fix a bug that existed earlier (but, AFAIK, didn't cause fs corruption). It was introduced in pre9, but the final kernel was released within a few hours, so I guess nobody caught it in time.

      --
      Want Linux games? HERE.
  2. If you are already running it... by krogoth · · Score: 3, Funny

    I recomment turning your computer off with the power switch or by unplugging it, after you've made sure you can boot an older kernel. Since umounting is done when you shut down cleanly, you don't want to do that.

    --

    They that quote Benjamin Franklin on liberty and safety deserve neither.
    1. Re:If you are already running it... by PeterM+from+Berkeley · · Score: 4, Informative

      I wouldn't do what this guy says.
      You're pretty much guaranteed to corrupt your
      filesystem this way. Probably nothing fsck
      couldn't fix, but still.

      Other posters have suggested that you use
      "shutdown -F" after running "sync",
      and rebooting into a NON-2.4.15 kernel.

      "sync" will write all the unsaved data to
      the disk, and "shutdown -F" will cause
      an fsck to start after rebooting.

      PM

  3. Re:Filesystems by MShook · · Score: 5, Informative

    You're correct, it is regardless of filesystem. If you happen to be running 2.4.15 or 2.5.0, just remember to force a fsck for the next reboot (shutdown -F) that's the only way to clear the fs because it will be marked clean even if it's not). Right now, the developpers don't know how reseirfs would deal with this bug...

  4. Re:Alan Cox by linux_warp · · Score: 4, Interesting

    Also, straight out of alans diary:

    September 29th - Much kernel patching going on. The -ac kernel tree seems to be turning into the stable tree as Linus merges odder, weirder and more alarming things. I just hope he knows what he is doing.
    ---

    Sounds like confidence to me :)

  5. Re:Filesystems by Colin+Bayer · · Score: 5, Interesting

    It afflicts every filesystem. However, rebooting with the file /forcefsck extant forces it to run an fsck (and fix the corruption) on boot.

    Also of help might be the Alt+SysRq keys; if you sync the drives and unmount them in single user mode before reboot, you should reduce or eliminate the corruption.

    --
    Want Linux games? HERE.
  6. A Workaround by kanelephant · · Score: 4, Informative
    Al Viro gave this comment and workaround on lkml.
    Breakage happens when you umount filesystem (_any_ local filesystem, be it ext2, reiserfs, whatever) that still has dirty inodes.

    As a workaround - sync before umount (and don't boot unpatched 2.4.15/2.4.15-pre9 again, obviously).

    IOW, if you are running 2.4.15 - build a patched kernel, install it and do the following:
    * switch to single-user
    * sync
    * umount everything non-busy
    * remount the rest read-only
    * turn the thing off
    * boot with patched kernel or with anything before 2.4.15-pre9

    The filesystem corruption can be fixed by a forced fsck. (The fsck must be forced since the filesystem is marked clean.)
    1. Re:A Workaround by kanelephant · · Score: 3, Informative

      sorry I didnt make that clear. If you follow the above advice you should not get any filesystem corruption. The last line is what to do if you have already got a corrupt filesystem!

    2. Re:A Workaround by Anonymous Coward · · Score: 3, Interesting

      The strange thing is, out of habit (years ago, you always had to remember to sync on Unix, and due to a bug, you always had to sync more than once), I always sync, sync, sync, umount...

  7. the patch from the kernel list by MentlFlos · · Score: 4, Informative

    I hope /. dosent mangle this up too bad, but if it does:
    http://marc.theaimsgroup.com/?l=linux-kernel&m=100 658174003122&w=2

    List: linux-kernel
    Subject: Re: 2.4.15-pre9 breakage (inode.c)
    From: Linus Torvalds
    Date: 2001-11-24 5:55:42
    [Download message RAW]

    On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
    >
    > --- 2.4.15pre9aa1/fs/inode.c.~1~ Thu Nov 22 20:48:23 2001
    > +++ 2.4.15pre9aa1/fs/inode.c Sat Nov 24 06:30:20 2001
    > @@ -1071,7 +1071,7 @@
    > if (inode->i_state != I_CLEAR)
    > BUG();
    > } else {
    > - if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
    > + if (!list_empty(&inode->i_hash)) {
    > if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
    > list_del(&inode->i_list);
    > list_add(&inode->i_list, &inode_unused);

    I have to say that I like this patch better myself - the added tests are
    not sensible, and just removing them seems to be the right thing.

    Linus

  8. Re:FS corruption? by Jagasian · · Score: 5, Insightful

    Thats funny. I have been running Debian (stable) for a long time now, and I haven't had any filesystem corruption. In fact, I haven't had the OS crash either.

    Its better to compare Windows 2000 to another complete operating system, NOT a bleeding edge kernel. Compare Windows 2000 to Debian (stable), and Windows 2000 will look like a house of cards.

  9. Strange by imrdkl · · Score: 5, Insightful

    that a successful reboot of the system running the kernel is not in the regression suite. Does this error occur on every architecture?

  10. Re:Don't throw stones in Glass Houses by fishebulb · · Score: 3, Interesting

    yes this is quite a serious bug, but 2 things set this apart from MS. It will be fixed within 24-48 hours. The frequency of these bugs are a bit smaller than MS's bug of the day (which very often are large holes).

  11. This is why I use FreeBSD by cperciva · · Score: 4, Insightful

    Come on guys, nobody is going to take linux seriously as long as problems like this -- or the VM saga -- keep popping up in supposedly stable kernels. FreeBSD has no trouble keeping separate -CURRENT and -STABLE trees; why can't linux do the same?

    1. Re:This is why I use FreeBSD by FattMattP · · Score: 3, Informative
      This is not a stable kernel, as there is no development tree to iron out all the bugs.
      Well, I disagree with you there. The way things have always been done, and the way we tell people that they are done is that x.<even#>.x is a stable kernel and x.<odd#>.x is a development kernel. Once you make that second number even, then it's interpreted by the whole community as stable, whether there's a development kernel or not, because that's what we've been taught and that's the way Linus has always done it. Continuing to put new features into the 2.4 tree rather than opening up 2.5 has led us to this unfortunate position. Hopefilly, in the future, the development tree will open as soon as the next major stable release is made and we can avoid things like this.
      --
      Prevent email address forgery. Publish SPF records for y
  12. Re:Really... by Anonymous Coward · · Score: 5, Insightful

    (Inven: r-r, * to see, ESC) Wear/Wield which item? r
    You are wielding a Rant Stick (1d2) (+0,+0) (*slay* kernel developer)(a).

    It's not so much that it wasn't stable enough when it was released, but rather that they keep messing with 2.4 instead of making a 2.5. I think maybe Linus had this idea (at the end of 2.3) that the developers could focus on fixing bugs and make 2.4 really great. Unfortunately, they're volunteer developers, so they're working on things that excite them, which means insane stuff like VM rewrites and other "hey, let's try this" changes.

    This is why I still use 2.2 and will until there has been a 2.5 for a while (so the developers have a place to try their unstable new ideas) and 2.4 has gone into "bug-fix" mode (like 2.2 is now). It's really annoying, because I want some of the new features of 2.4 (the ones introduced back in 2.3), but can't afford to have the thing crashing on me, and don't want to spend a long time looking for a stable 2.4.X.

    Maybe next time, Linus won't wait so long to introduce a development version, or will at least refuse anything but bugfixes in so-called "stable" branches. Still, despite my complaining, I am happy that people have gone through all the trouble to write the Linux kernel, and will try to remember that. :)

  13. How can this be avoided in the future? by imrdkl · · Score: 3, Insightful

    Can someone give a joe-user guide to helping test new kernels?

  14. The discussion isn't over by Carnage4Life · · Score: 4, Informative

    The last post in that thread is this one by Andrea Arcangeli sometime this morning and from the looks of things (if you read the entire thread) there is conflict between Alexander Viro and Andrea on which is the better solution.

    Linus saying he prefers a patch on an initial viewing isn't the end of the situation for now. I'd suggesting waiting a week and revisiting the thread to find out what the final word was.

  15. What I don't get ... by pauljlucas · · Score: 3, Insightful

    ... is why there seems to exist this rampant tendency among Linux-folk to upgrade one's kernel constantly. Unless a new kernel solves a problem you have, there is no reason to upgrade.

    --
    If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
  16. Patch download here by DeeKayWon · · Score: 4, Informative

    The mailing list converted tabs into spaces, causing patch to choke. Get the patch here.

  17. Things are working right not wrong: by amccall · · Score: 5, Insightful
    I've already seen 2 posts refering to "QA" and keeping the kernel stable, etc... If you are going to try the latest version of each package that comes out, you are going to get burned.

    This is one reason why distributions are so important. They do the QA, they make sure packages are stable, they apply the patches. If you want to download and run the latest edition of every package out, including the kernel, then you should expect some bumps in the road, because you are beta testing - even on a "stable" kernel series. Remember: release early, release often. You will have to do the QA, you will have to apply the patches, you will be burned. Some people like doing this to stay on the bleeding edge, others are a bit more cautious.

    If you want stable, solid kernels, that are heavily QA'd wait for packages to come out. Otherwise, post a bug report, and quit whining.

    --
    ------ 24.5% slashdot pure
    1. Re:Things are working right not wrong: by amccall · · Score: 4, Insightful
      I say linux is an OS NOT a distro and the OS had a bloody problem with something that was declared stable. Waiting for distros is not an option for people who role their own because of whatever special requirements they need. Wow you run debian good for you, not everyone has that luxery.


      Actually, I do "roll-my-own" and maintain a Linux distribution.I was not burned by this, because like other people "rolling their own/maintaining a distro" I do keep track of LKM posts.


      Anyone else doing this type of work, will hopefully learn from this - and NOT install the latest kernel the day after it's out. This type of thing has happened in EVERY series of stable kernels I can remember. And it will happen again.

      --
      ------ 24.5% slashdot pure
  18. See? by SuiteSisterMary · · Score: 4, Insightful

    If only this was Open Source Software, the source code could have been examined by thousands of highly motivated and intelligent hackers, who would have noticed the problem immediately. Wait....

    --
    Vintage computer games and RPG books available. Email me if you're interested.
  19. irony by FlyingDragon · · Score: 4, Funny
    From yesterday's discussion:

    > So who else is downloading 2.5 (Score:5, Funny)
    > by Chuck Chunder on Friday November 23, @02:23AM
    >
    > so they can be cool and trendy and be on the development tree while it's still stable?
    >
    > The Great Chunder Page - Alcohol Induced Fun!

    If you didn't think it was funny before, admit it -- it's pretty damn funny now.

  20. Re:Ok so Apple isn't the only one to screw up by amccall · · Score: 3

    Maybe, just maybe, that's because the iTunes player was an end-user product, and the kernel source is intended for adventerous users, developers, and distributions. If the default RedHat kernel of a stable RedHat release had a FileSystem corruption error, that would be something to write home about - this isn't.

    --
    ------ 24.5% slashdot pure
  21. regression tests? by treat · · Score: 4, Redundant

    Is there any project to create a set of regression tests for the Linux kernel? This is not the first serious bug that would have been found with even the most basic set of regression tests.

  22. Big deal. by Shane · · Score: 5, Insightful

    It amazes me how big of a deal people make these types of issues out to be. I have heard of high standards but SH*T!. The more I read slashdot the more I realize that very few posters here actully work with much commerical grade software. These type of issues occure freqently with every software vendor I deal with professionally: Cisco, Microsoft, IBM, RedHat, Checkpoint ect.. ect.. The difference is when Cisco releases a new IOS image (which they do about twice as freqently as Linus does) They will quitely mark saym a 1/4th of them DF which stands for _DEFFERED_ i.e. SERIOUS BUG DON'T USE once it is discovered.

    This is why production implentations of software go through testing before deployment when at all possible. If you are running Cisco IOS that is say less then a month old you are taking a risk that there will be a serious bug that will hurt you. The same holds true for Linux kernels or any other peice of software. The more complicated the software the harder it is to keep serious bugs from slipping through the cracks, It is _AMAZING_ that Linux has a few major issues as it does.

    Here is an exercise for you all: Go to www.microsoft.com go to their support section and read through all of the changelogs (they are hard to find) for all of the hot fixes, service packs and general software updates and you will see what I mean (And yes you will find file system corruption there too).

    --
    -- You can be a geeklord too :)
    1. Re:Big deal. by Shane · · Score: 3, Insightful

      The software _was_ released after it was tested. It was tested, a problem was found.. a patch was provided.. the patch was tested.. it was included.. kernel got released.. problem was discovered a patch was created and its about to be released.. thats how software works. You don't catch all of the issues..

      Now you can sit there and say "If Linus would of waited _blank_ period of time someone would of caught the problem before the release and this wouldn't of happend. You could also says that if Linus would just release -pre kernels and only release -stable kernels once a year we would have a REALLY stable kernel... the problem is thats not how the release early/release often model of development works. If you want that model use Microsoft we all know how stable their software is.

      If you want serious QA use redhat.. they do serious QA.. If you are running 0day software you get burned.. wether its the latest linux kernel, the latest microsoft service pack or the latest Cisco IOS.

      Question: what is your example of software that is released "AFTER it's been tested". I can't wait to go read through the change logs and find some bugs that should of been caught by this software superior QA.

      --
      -- You can be a geeklord too :)
  23. That is a cop-out by Sanity · · Score: 5, Insightful
    Transferring the responsibility to distribution maintainers is a cop-out.

    The real problem is that new functionality is being added to the stable branch.

    The solution to this type of problem is simple, when a stable kernel is released, an unstable branch should be created immedately. New functionality was being added to the 2.4 branch by developers simply because there is nowhere else to put it.

    New functionality should never be added to a stable branch in a piece of software as mission-critical as a kernel, that is what the unstable/development branch is for.

    If the kernel maintainers want to accelorate the pace at which new functionality gets into a stable branch then they should increase the frequency with which development branches become stable.

  24. Re:Please spare us by Shane · · Score: 4, Informative

    First: This linux bug does not the loss of the ENTIRE FILE SYSTEM. It leaves .lock files with invalid INODES which can be repaired by manully running fsck. As to you're challenge, these are just a few corruption problems with windows 2000 that I found doing a simple search on www.microsoft.com.

    http://support.microsoft.com/support/kb/articles /Q 268/8/97.ASP

    http://support.microsoft.com/support/kb/articles /Q 258/0/75.ASP

    http://support.microsoft.com/support/kb/articles /Q 273/2/45.ASP

    http://support.microsoft.com/support/kb/articles /Q 298/9/36.ASP?LN=EN-US&SD=gn&FR=0&qry=file%20system %20corruption&rnk=16&src=DHCS_MSPSS_gn_SRCH&SPR=WI N2000

    http://support.microsoft.com/support/kb/articles /Q 261/1/22.ASP?LN=EN-US&SD=gn&FR=0&qry=file%20system %20corruption&rnk=19&src=DHCS_MSPSS_gn_SRCH&SPR=WI N2000

    http://support.microsoft.com/support/kb/articles /Q 255/5/69.ASP?LN=EN-US&SD=gn&FR=0&qry=file%20system %20corruption&rnk=23&src=DHCS_MSPSS_gn_SRCH&SPR=WI N2000

    --
    -- You can be a geeklord too :)
  25. I think you missed his point a bit. by Chuck+Chunder · · Score: 4, Insightful

    People downloading kernels from kernel.org, particularly in the first few days of a release, are part of the QA process, not the ultimate beneficiaries of one.

    The Open Source (or more correctly, bazaar or distributed) development model also distributes responsibility. If the possibility of losing your data is something you can't afford then you simply shouldn't be sitting on the cutting edge of kernel development.

    --
    Boffoonery - downloadable Comedy Benefit for Bletchley Park