Slashdot Mirror


Serious Bug In 2.4.15/2.5.0

John Ineson writes: "There is a bug in the latest kernel releases, that causes fs corruption on umount. A lot of people have already been hit by this, so for now I suggest you hold fire on booting those new kernels. More dead-duck than greased-turkey. Two possible fixes are being discussed on linux-kernel." Colin Bayer adds links to a story at the Register and Al Viro's fix. Update: 11/25 00:39 GMT by T : Tarkie writes "Linux 2.4.16-pre1 is out, as detailed at NewsForge. If you've been having the filesystem corruptions, might be worth a try so that 2.4.16 can be out ASAP!"

21 of 498 comments (clear)

  1. Stick with 2.4.15-pre8 by ShawnX · · Score: 2, Informative

    No problems with this kernel pre release :)

    --
    Everyone wants a Tux in their life.
  2. Re:Filesystems by MShook · · Score: 5, Informative

    You're correct, it is regardless of filesystem. If you happen to be running 2.4.15 or 2.5.0, just remember to force a fsck for the next reboot (shutdown -F) that's the only way to clear the fs because it will be marked clean even if it's not). Right now, the developpers don't know how reseirfs would deal with this bug...

  3. Re:Does anyone know... by Colin+Bayer · · Score: 5, Informative

    This bug was introduced when the kernel coders were trying to fix a bug that existed earlier (but, AFAIK, didn't cause fs corruption). It was introduced in pre9, but the final kernel was released within a few hours, so I guess nobody caught it in time.

    --
    Want Linux games? HERE.
  4. A Workaround by kanelephant · · Score: 4, Informative
    Al Viro gave this comment and workaround on lkml.
    Breakage happens when you umount filesystem (_any_ local filesystem, be it ext2, reiserfs, whatever) that still has dirty inodes.

    As a workaround - sync before umount (and don't boot unpatched 2.4.15/2.4.15-pre9 again, obviously).

    IOW, if you are running 2.4.15 - build a patched kernel, install it and do the following:
    * switch to single-user
    * sync
    * umount everything non-busy
    * remount the rest read-only
    * turn the thing off
    * boot with patched kernel or with anything before 2.4.15-pre9

    The filesystem corruption can be fixed by a forced fsck. (The fsck must be forced since the filesystem is marked clean.)
    1. Re:A Workaround by kanelephant · · Score: 3, Informative

      sorry I didnt make that clear. If you follow the above advice you should not get any filesystem corruption. The last line is what to do if you have already got a corrupt filesystem!

  5. the patch from the kernel list by MentlFlos · · Score: 4, Informative

    I hope /. dosent mangle this up too bad, but if it does:
    http://marc.theaimsgroup.com/?l=linux-kernel&m=100 658174003122&w=2

    List: linux-kernel
    Subject: Re: 2.4.15-pre9 breakage (inode.c)
    From: Linus Torvalds
    Date: 2001-11-24 5:55:42
    [Download message RAW]

    On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
    >
    > --- 2.4.15pre9aa1/fs/inode.c.~1~ Thu Nov 22 20:48:23 2001
    > +++ 2.4.15pre9aa1/fs/inode.c Sat Nov 24 06:30:20 2001
    > @@ -1071,7 +1071,7 @@
    > if (inode->i_state != I_CLEAR)
    > BUG();
    > } else {
    > - if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
    > + if (!list_empty(&inode->i_hash)) {
    > if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
    > list_del(&inode->i_list);
    > list_add(&inode->i_list, &inode_unused);

    I have to say that I like this patch better myself - the added tests are
    not sensible, and just removing them seems to be the right thing.

    Linus

  6. NO! by Anonymous Coward · · Score: 2, Informative

    This is a common misconception! 2.4 is *not* "stable"! It is "testing"! Well, now that it's split in two I suppose it can officially be called "stable" (what a bad start!), but I don't consider it stable (though I'm just a lowly AC). AFAIC, 2.2 = "stable" and 2.4 = "testing". In a month or so, things we'll change and we'll have 2.4 = "stable" and 2.5 = "experimental". Until 2.5 turns into 2.6/3.0, at which point it will be "testing", and the cycle continues :)

  7. Re:Strange by Colin+Bayer · · Score: 2, Informative

    Does this error occur on every architecture?

    Yep... since the affected files are in fs/, not arch/*, it's an architecture-independent problem. Good thing I have the Magic SysRq enabled. ;)

    --
    Want Linux games? HERE.
  8. Re:"QA" by Anonymous Coward · · Score: 1, Informative

    You cannot compare Linux and FreeBSD that way. FreeBSD is a complete OS, not just the kernel.

    I've NEVER seen filesystem corruption caused by my distribution (Red Hat) kernel.

    Compare FreeBSD with the whole Red Hat Linux (probably the same for Debian, don't know about the others), and you'll see neither have this sort of problems.

  9. The discussion isn't over by Carnage4Life · · Score: 4, Informative

    The last post in that thread is this one by Andrea Arcangeli sometime this morning and from the looks of things (if you read the entire thread) there is conflict between Alexander Viro and Andrea on which is the better solution.

    Linus saying he prefers a patch on an initial viewing isn't the end of the situation for now. I'd suggesting waiting a week and revisiting the thread to find out what the final word was.

  10. Make sure you have removed ext3 option, too by willamowius · · Score: 2, Informative

    For those who have tried ext3 in 2.4.15:
    Make sure you have reset the journaling flag on your filesystems, because your older kernel will not mount an unclean ext3 volume.

    Do a "tune2fs -O ^has_journal /dev/whatever".

  11. Patch download here by DeeKayWon · · Score: 4, Informative

    The mailing list converted tabs into spaces, causing patch to choke. Get the patch here.

  12. Re:This is why I use FreeBSD by FattMattP · · Score: 3, Informative
    This is not a stable kernel, as there is no development tree to iron out all the bugs.
    Well, I disagree with you there. The way things have always been done, and the way we tell people that they are done is that x.<even#>.x is a stable kernel and x.<odd#>.x is a development kernel. Once you make that second number even, then it's interpreted by the whole community as stable, whether there's a development kernel or not, because that's what we've been taught and that's the way Linus has always done it. Continuing to put new features into the 2.4 tree rather than opening up 2.5 has led us to this unfortunate position. Hopefilly, in the future, the development tree will open as soon as the next major stable release is made and we can avoid things like this.
    --
    Prevent email address forgery. Publish SPF records for y
  13. Re:If you are already running it... by PeterM+from+Berkeley · · Score: 4, Informative

    I wouldn't do what this guy says.
    You're pretty much guaranteed to corrupt your
    filesystem this way. Probably nothing fsck
    couldn't fix, but still.

    Other posters have suggested that you use
    "shutdown -F" after running "sync",
    and rebooting into a NON-2.4.15 kernel.

    "sync" will write all the unsaved data to
    the disk, and "shutdown -F" will cause
    an fsck to start after rebooting.

    PM

  14. Re:to clarify by macinslak · · Score: 2, Informative

    It seems the second set of commands got mangled, sorry:

    telinit S
    kill everything but your shell
    sync
    unmount everything but root
    sync
    reboot

  15. NT ok, Win2k fixes NTFS errors pretty well. by Otis_INF · · Score: 2, Informative

    As an owner of a lovely IBM 75GXP hdd, I can say Win2k fixes corrupted files on NTFS pretty well. NT4 is perhaps a different ballgame, there you have the chance to indeed get stuck with files which are not recoverable at all.

    --
    Never underestimate the relief of true separation of Religion and State.
  16. Re:Bad start by alhague · · Score: 2, Informative

    > for the brasilian guy, hum ?

    Nope. 2.4.15 was released by Linus ...

    al

  17. Re:Please spare us by Shane · · Score: 4, Informative

    First: This linux bug does not the loss of the ENTIRE FILE SYSTEM. It leaves .lock files with invalid INODES which can be repaired by manully running fsck. As to you're challenge, these are just a few corruption problems with windows 2000 that I found doing a simple search on www.microsoft.com.

    http://support.microsoft.com/support/kb/articles /Q 268/8/97.ASP

    http://support.microsoft.com/support/kb/articles /Q 258/0/75.ASP

    http://support.microsoft.com/support/kb/articles /Q 273/2/45.ASP

    http://support.microsoft.com/support/kb/articles /Q 298/9/36.ASP?LN=EN-US&SD=gn&FR=0&qry=file%20system %20corruption&rnk=16&src=DHCS_MSPSS_gn_SRCH&SPR=WI N2000

    http://support.microsoft.com/support/kb/articles /Q 261/1/22.ASP?LN=EN-US&SD=gn&FR=0&qry=file%20system %20corruption&rnk=19&src=DHCS_MSPSS_gn_SRCH&SPR=WI N2000

    http://support.microsoft.com/support/kb/articles /Q 255/5/69.ASP?LN=EN-US&SD=gn&FR=0&qry=file%20system %20corruption&rnk=23&src=DHCS_MSPSS_gn_SRCH&SPR=WI N2000

    --
    -- You can be a geeklord too :)
  18. Re:That is a cop-out by ViXX0r · · Score: 2, Informative

    > The real problem is that new functionality is being added to the stable branch

    In this case, the real problem was that a bugfix (which is supposed to occur in stable kernels) was faulty and caused another bug.

    --
    University - a box of academia nuts.
  19. I found a REAL good patch by Anonymous Coward · · Score: 1, Informative

    I found a patch that solves the problem TOTALLY! The URL is www.freebsd.org

    Linux Sucks by the way