Serious Bug In 2.4.15/2.5.0
John Ineson writes: "There is a bug in the latest kernel releases, that causes fs corruption on umount. A lot of people have already been hit by this, so for now I suggest you hold fire on booting those new kernels. More dead-duck than greased-turkey. Two possible fixes are being discussed on linux-kernel."
Colin Bayer adds links to a story at the Register and Al Viro's fix. Update: 11/25 00:39 GMT by T : Tarkie writes "Linux 2.4.16-pre1 is out, as detailed at NewsForge. If you've been having the filesystem corruptions, might be worth a try so that 2.4.16 can be out ASAP!"
From the looks of the post this bug occurs regardless of filesystem. Is that accurate? or would certain fs's be unaffected, im guessing that it doesnt matter, anyone care to clarify that
No problems with this kernel pre release :)
Everyone wants a Tux in their life.
...how something like this could have creeped in, and be missed? Was it a last-minute change that just didn't have time for testing, or was it (bad)luck-of-the-draw that no one noticed it?
I recomment turning your computer off with the power switch or by unplugging it, after you've made sure you can boot an older kernel. Since umounting is done when you shut down cleanly, you don't want to do that.
They that quote Benjamin Franklin on liberty and safety deserve neither.
Also, straight out of alans diary:
:)
September 29th - Much kernel patching going on. The -ac kernel tree seems to be turning into the stable tree as Linus merges odder, weirder and more alarming things. I just hope he knows what he is doing.
---
Sounds like confidence to me
You can find Andrea Arcangeli's fix at:n drea/kernels/v2.4/2.4.15aa1/00_iput-unmount-corrup tion-fix-1
ftp://ftp.us.kernel.org/pub/linux/kernel/people/a
The users are the QA (why do you think Linus moved to 2.4 so early? To get more testers). If you don't like being a guinea pig, then wait about a week before moving to the newest kernel. Seriously, 7 days isn't that long, and all show-stoppers will have shown up long before then.
Dude. I hate to say this, but Windows 2000, while it may crash more, doesn't hose you're filesystem nearly as often as Linux seems to these days. At what point do we get to start making the LinSux jokes?
PS> Don't flame me please. I just wiped Win2K off my harddrive this morning. Luckily, I downloaded the 2.4.15 tree but have been too lazy to compile it yet.
A deep unwavering belief is a sure sign you're missing something...
Isn't the 2.4 branch supposed to be stable? You know, the one that doesn't eat your disk. I think that this kernel should have gotten a little more testing for bugs of the catastrophic nature before it was deemed fit for general consumption.
I hope /. dosent mangle this up too bad, but if it does:
0 658174003122&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=10
List: linux-kernel
Subject: Re: 2.4.15-pre9 breakage (inode.c)
From: Linus Torvalds
Date: 2001-11-24 5:55:42
[Download message RAW]
On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> --- 2.4.15pre9aa1/fs/inode.c.~1~ Thu Nov 22 20:48:23 2001
> +++ 2.4.15pre9aa1/fs/inode.c Sat Nov 24 06:30:20 2001
> @@ -1071,7 +1071,7 @@
> if (inode->i_state != I_CLEAR)
> BUG();
> } else {
> - if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
> + if (!list_empty(&inode->i_hash)) {
> if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
> list_del(&inode->i_list);
> list_add(&inode->i_list, &inode_unused);
I have to say that I like this patch better myself - the added tests are
not sensible, and just removing them seems to be the right thing.
Linus
This is a common misconception! 2.4 is *not* "stable"! It is "testing"! Well, now that it's split in two I suppose it can officially be called "stable" (what a bad start!), but I don't consider it stable (though I'm just a lowly AC). AFAIC, 2.2 = "stable" and 2.4 = "testing". In a month or so, things we'll change and we'll have 2.4 = "stable" and 2.5 = "experimental". Until 2.5 turns into 2.6/3.0, at which point it will be "testing", and the cycle continues :)
that a successful reboot of the system running the kernel is not in the regression suite. Does this error occur on every architecture?
You would still have to be careful until then - people who regularly mount and unmout read/write might want to be careful. I wonder if mounting read-only would help, or if the bug is below that level (from the discussion, it doesn't sound like it)?
They that quote Benjamin Franklin on liberty and safety deserve neither.
yes this is quite a serious bug, but 2 things set this apart from MS. It will be fixed within 24-48 hours. The frequency of these bugs are a bit smaller than MS's bug of the day (which very often are large holes).
Come on guys, nobody is going to take linux seriously as long as problems like this -- or the VM saga -- keep popping up in supposedly stable kernels. FreeBSD has no trouble keeping separate -CURRENT and -STABLE trees; why can't linux do the same?
Tarsnap: Online backups for the truly paranoid
how hard is it really to compile a kernel?
:)
download the source, read teh kernel-howto, go through the menu (or x config), make bzImage, etc
then repeat as necessary to get it to boot properly (ide root drive, load ide driver as module is always a good combo
Need a Catering Connection
Can someone give a joe-user guide to helping test new kernels?
The last post in that thread is this one by Andrea Arcangeli sometime this morning and from the looks of things (if you read the entire thread) there is conflict between Alexander Viro and Andrea on which is the better solution.
Linus saying he prefers a patch on an initial viewing isn't the end of the situation for now. I'd suggesting waiting a week and revisiting the thread to find out what the final word was.
... is why there seems to exist this rampant tendency among Linux-folk to upgrade one's kernel constantly. Unless a new kernel solves a problem you have, there is no reason to upgrade.
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
For those who have tried ext3 in 2.4.15:
/dev/whatever".
Make sure you have reset the journaling flag on your filesystems, because your older kernel will not mount an unclean ext3 volume.
Do a "tune2fs -O ^has_journal
I had an fs corruption with RH 7.2, using the kernel that came with the distro. It trashed the geometry of an entire drive. I was using a combo of ext2 and ext3 on the drive. I didn't lose anything, as I backup my system regularly.
I've since migrated to Mandrake 8.1, which is much more solid than RH 7.2. Yet, it too runs a 2.4 kernel variant. This distro on one boot failed to recognize the ext3 partitions. I migrated all of the ext3 partitions back to ext2.
I'd be very interested in learning if this is a problem that extends far back into the kernel tree.
Graham
Linux - Fast Pane Relief
The mailing list converted tabs into spaces, causing patch to choke. Get the patch here.
This is one reason why distributions are so important. They do the QA, they make sure packages are stable, they apply the patches. If you want to download and run the latest edition of every package out, including the kernel, then you should expect some bumps in the road, because you are beta testing - even on a "stable" kernel series. Remember: release early, release often. You will have to do the QA, you will have to apply the patches, you will be burned. Some people like doing this to stay on the bleeding edge, others are a bit more cautious.
If you want stable, solid kernels, that are heavily QA'd wait for packages to come out. Otherwise, post a bug report, and quit whining.
------ 24.5% slashdot pure
If only this was Open Source Software, the source code could have been examined by thousands of highly motivated and intelligent hackers, who would have noticed the problem immediately. Wait....
Vintage computer games and RPG books available. Email me if you're interested.
It's rotted.
> So who else is downloading 2.5 (Score:5, Funny)
> by Chuck Chunder on Friday November 23, @02:23AM
>
> so they can be cool and trendy and be on the development tree while it's still stable?
>
> The Great Chunder Page - Alcohol Induced Fun!
If you didn't think it was funny before, admit it -- it's pretty damn funny now.
There is a big difference between this and iTunes. iTunes affected ONLY those with spaces in their Disk Labels. This affects everyone on linux with moderate disk writes (probably won't damage an idle computer).
Similar response times. I'd classify this issue way worse ESPECIALLY since it should have gone through standard testing.
Rod Taylor
It seems the second set of commands got mangled, sorry:
telinit S
kill everything but your shell
sync
unmount everything but root
sync
reboot
Maybe, just maybe, that's because the iTunes player was an end-user product, and the kernel source is intended for adventerous users, developers, and distributions. If the default RedHat kernel of a stable RedHat release had a FileSystem corruption error, that would be something to write home about - this isn't.
------ 24.5% slashdot pure
for once I'm glad i have 56k and decided against downloading the new kernel just yet. for all those bitching cause their system got hosed. well what did you expect? thats why you wait for the next post on slashdot saying somethings wrong with the new kernel. besides what about 2.4.15 was so necessary that you had to have the latest incremental kernel? I'm rather happy with 2.4.8. unless you're a developer/bug-tester/bleed-freak what reason do you have to upgrade to the very latest kernel?
-
I have a response to those who have said that the open source QA process is to release early and let early adopters suffer the consequences. Are you sure? Are you saying this is good example of open source development? Are you saying this is the exemplar of the open source development process? This is a data loss bug.
In the open source development process, it's not a problem if the new release of Mozilla has a small problem with frames in XHTML, or if the new Linux kernel breaks support for USB joysticks. These are problems that can be fixed.
Some bugs are so serious, however, that they deserve extra attention. These are the "showstoppers." In every kernel release, Linux says something about not finding any showstoppers. That is, there are no data loss bugs or other serious bugs that he knows of. He wouldn't release it if he thought it had such a serious problem.
All I am saying is have a process that can perform rudimentary checks on the kernel to pick up any showstoppers. This process would take a few hours or at most a day. It would prevent situations like this, where the Linux community opens itself up for attack by all the brainwashed Microsoft zealots. Is this really flamebait?
So, will we start seeing -post releases?
Heh. I can see it now. 2.4.15-post1 :)
my old sig used to be funny, but then slashcode ate it and now it's not funny anymore
As an owner of a lovely IBM 75GXP hdd, I can say Win2k fixes corrupted files on NTFS pretty well. NT4 is perhaps a different ballgame, there you have the chance to indeed get stuck with files which are not recoverable at all.
Never underestimate the relief of true separation of Religion and State.
> for the brasilian guy, hum ?
...
Nope. 2.4.15 was released by Linus
al
Is there any project to create a set of regression tests for the Linux kernel? This is not the first serious bug that would have been found with even the most basic set of regression tests.
It amazes me how big of a deal people make these types of issues out to be. I have heard of high standards but SH*T!. The more I read slashdot the more I realize that very few posters here actully work with much commerical grade software. These type of issues occure freqently with every software vendor I deal with professionally: Cisco, Microsoft, IBM, RedHat, Checkpoint ect.. ect.. The difference is when Cisco releases a new IOS image (which they do about twice as freqently as Linus does) They will quitely mark saym a 1/4th of them DF which stands for _DEFFERED_ i.e. SERIOUS BUG DON'T USE once it is discovered.
This is why production implentations of software go through testing before deployment when at all possible. If you are running Cisco IOS that is say less then a month old you are taking a risk that there will be a serious bug that will hurt you. The same holds true for Linux kernels or any other peice of software. The more complicated the software the harder it is to keep serious bugs from slipping through the cracks, It is _AMAZING_ that Linux has a few major issues as it does.
Here is an exercise for you all: Go to www.microsoft.com go to their support section and read through all of the changelogs (they are hard to find) for all of the hot fixes, service packs and general software updates and you will see what I mean (And yes you will find file system corruption there too).
-- You can be a geeklord too
Finally some good points. This probably shouldn't be a reply to this
message but it's too late now.
I would point out that this bug does not turn up readily.
This bug allows a system to boot up normally, run fine, and then when you
reboot (and only when you reboot) some files are missing if (and only
if) their buffers were dirty when you rebooted.
This is NOT easy to catch. The average Linux system has upwards of 50,000
files. A few disappearing is not easy to notice. In addition, buffers
tend to get flushed pretty well during the shutdown process, so it
wouldn't show up too often either (I avoided on accident it due to a
peculiar RAID shutdown script I have that sync's and sleep's for a bit).
For the M$ zealots out there be careful to practice what you preach. One
of the core arguements of the Slaves of the Empire is that the Linux
zealots bash M$ but can't take criticism themselves. If you'll check your
precious windowsupdate.microsoft.com on a fresh Win98 install, you'll find
the IDE Hard Drive Cache Update. For the uninitiated, this patch fixes a
problem where Windows doesn't write all of the data to disk on
shutdown. Ironically, this tended to completely hose Win98 systems
beyond fixing by Scandisk (usually registry damage).
So, Win98 and Linux have similar problems. In a week, the Linux bug will
be history, but the M$ one is still being minted on CD and requires an
Internet download (because it's a "minor problem", the fix is to "wait
before shut down so the data is written"). I don't remember too much
babbling on Slashdot about that bug and it's been there for YEARS.
Gosh, I guess I should write this off as being dribble by "Linux
Bashers".
Oh, and to completely trash my karma, I've had disk corruption in a stable
FreeBSD due to a bug in FreeBSD code, so don't get too high on your own
superiority yet. You've got older code--sometimes it's a strength,
sometimes it's a weakness. Like the FreeBSD development process isn't
ever rocky.
I think Mauve has the most RAM. --PHB (Dilbert Comic)
I've seen lots of posts about 'We need to QA this!'
and 'Are there any projects to try and QA the kernel releases?' Both of these miss the point. While we do need more people running the tests which do exist on the -pre releases, it comes down to Linus having an itchy trigger finger, so to speak. 2.4.15 in it's final form did exist for a little while, but it wasn't long enough for anyone to go and give it a good test. There's often been requests for Linus to wait a few days from the last -pre to -final so other arches and sync up (2.4.15 only compiles on x86/sparc64/arm and alpha). If this was released on monday, none of this would happen.
The real problem is that new functionality is being added to the stable branch.
The solution to this type of problem is simple, when a stable kernel is released, an unstable branch should be created immedately. New functionality was being added to the 2.4 branch by developers simply because there is nowhere else to put it.
New functionality should never be added to a stable branch in a piece of software as mission-critical as a kernel, that is what the unstable/development branch is for.
If the kernel maintainers want to accelorate the pace at which new functionality gets into a stable branch then they should increase the frequency with which development branches become stable.
Oh it isn't? Name one that has not had this same problem in the last year? Just one.
-- You can be a geeklord too
"Name one that has not had this same problem in the last year? Just one"
Windows 2000
Take this post as a challenge. Reply with a link that shows that there is/was a bug in Windows 2000 that caused the loss of a ENTIRE FILE SYSTEM ala Linux or Apples iTunes.
When the so-called stable kernel can be released with such a huge bug, how can we tell the managers that Linux is stable and hassle-free?
Really - we need to make scripts that test right about every critical aspect of a kernel. That would be file systems, VM, IPC, SMP, hardware drivers, SCSI, IDE, ethernet, token ring and more.
Has anybody made such scripts? One thing is a broken, obscure driver, another thing is bugs that break everybody - like VM and now unmount.
Stop the brainwash
Installed 2.4.15 the day this post came out. GAH! Now trying to deinstall the bird and go back to 2.4.14, and no matter what I do it says it's the greased turkey.
Back to 2.2.19 now to recompile 2.4.14...
--
# Canmephians for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.net";
Take this post as a challenge. Reply with a link that shows that there is/was a bug in Redhat Linux 7 that caused the loss of an ENTIRE FILE SYSTEM.
The point (which I'm sure you'll miss, but anyway) is that linux-2.4.15.tar.gz is not an operating system. Anyone with the knowhow to download, compile, and install 2.4.15 from source had better be able to run fsck when something like this happens.
Furthermore you way overstate the case when you assert this causes lost file systems. The vast majority of 2.4.15 corruption cases can be repaired with a fsck.
Personally, I consider the code red II worm to be a far greater threat to my data than linux-2.4.15.tar.gz.
First: This linux bug does not the loss of the ENTIRE FILE SYSTEM. It leaves .lock files with invalid INODES which can be repaired by manully running fsck. As to you're challenge, these are just a few corruption problems with windows 2000 that I found doing a simple search on www.microsoft.com.
s /Q 268/8/97.ASP
s /Q 258/0/75.ASP
s /Q 273/2/45.ASP
s /Q 298/9/36.ASP?LN=EN-US&SD=gn&FR=0&qry=file%20system %20corruption&rnk=16&src=DHCS_MSPSS_gn_SRCH&SPR=WI N2000
s /Q 261/1/22.ASP?LN=EN-US&SD=gn&FR=0&qry=file%20system %20corruption&rnk=19&src=DHCS_MSPSS_gn_SRCH&SPR=WI N2000
s /Q 255/5/69.ASP?LN=EN-US&SD=gn&FR=0&qry=file%20system %20corruption&rnk=23&src=DHCS_MSPSS_gn_SRCH&SPR=WI N2000
http://support.microsoft.com/support/kb/article
http://support.microsoft.com/support/kb/article
http://support.microsoft.com/support/kb/article
http://support.microsoft.com/support/kb/article
http://support.microsoft.com/support/kb/article
http://support.microsoft.com/support/kb/article
-- You can be a geeklord too
Odd minor version numbers are unstable (so 2.1, 2.3 and 2.5 are all unstable kernel branches).
Blaming GW Bush for the Iraq war is like blaming Ronald McDonald for the poor quality of food.
<pseudo-rant>
maybe there's a good side to your ISP going out of business and qwest dsl fscking you over changing your isp, making it harder to update your kernel 8)
</pseudo-rant>
but ultimately, i can't see its all that big of a deal. all you have to do is take a couple of weeks to get to the newest kernel. wait till its been out a fortnight, and you're golden
Brian Voils
"A university is what a college becomes when the faculty loses interest in students."
Nope. 2.4.15 was released by Linus
True, but it's still a "bad start", in the sense of "unpleasant", for the "brasilian guy", because his very first task involves the urgent need for a quick release of a bug-fixed 2.4.16...Imagine getting hired as a sysadmin and the very first morning you walk into the office 100 or so computers belonging to senior management all start propagating MS VB viruses amongst themselves and the rest of the system, crashing machines, emailing sensitive data to random people outside of the company, slowing network traffic to a crawl, etc. etc.....
Talk about "trial by fire"....
Hacker Public Radio is our Friend
Please, I'm obviously not as smart as you are, so can you please give me a list of the "large" holes of Windows that happen on a daily basis? My memory is obviously failing me already as I don't remember very many at all. Certainly not more than 400 "large holes" since W2K was released.
And odder still I do remember that every time that I have heard of a "major" flaw it was fixed very quickly, and then took a few days to go under the standard regression tests on all platforms and machines before it was publically released. If you were affected by one of these problems, you could get the "unsupported" patch as soon as it was developed, but before they could complete testing.
You can't do complete testing of a patch in 24-48 hours and release it as public with support.
Also, when a "serious" problem does come out, the relevant MS developers are told to work 18 hours a day 7 days a week until it's solved.
It's one thing to say "hey, it looks like here's the problem, here we just corrected it and compiled it, that should do", and another completely to have performed all of the tests required to make sure that one small "fix" didn't corrupt something on some obscure hardware configuration that other major clients are using.
You're all so quick to cut down Microsoft and defend Linux when worse problems happen. You'll also have to explain to me how this is not completely hypocritical, because the logic on that one eludes me as well.
If God gave us curiosity
People downloading kernels from kernel.org, particularly in the first few days of a release, are part of the QA process, not the ultimate beneficiaries of one.
The Open Source (or more correctly, bazaar or distributed) development model also distributes responsibility. If the possibility of losing your data is something you can't afford then you simply shouldn't be sitting on the cutting edge of kernel development.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
I think I'm just anti-anti-microsoft, that's all. I think they get beat up far too often for things that aren't always their fault.
If God gave us curiosity
Vintage computer games and RPG books available. Email me if you're interested.
Actually dude. That's really unfair. Slashdot for all it's zealotry keeps it's bias on it's sleave. You *know* Slashdot has it's bias, that's why your here right?
Seriously, the professional astroturfers on this site love to whinge about how the slashdot sysops are anti-ms, but hey; they admit it don't they?
Compare that to MS owned news and the fact it NEVER critisizes windows, but pretends to be unbiased (One would even believe it if one didn't know better)
At the end of the day , it reminds me of a comment by Aust media theorist John Hartley that "Propaganda is more honest than news, because at least it admits it's bias".
Think about it.
Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
Well, at least 2.5 was fucked up! Now nobody can really say that 2.5 was ever stable!
...that make me glad I switched to FreeBSD a while ago. /usr/src and do some rebuilding to update. No waiting for a release.
Linux does have a lot of things I miss - DRI/DRM still isn't working right, X and GTK in particular seem a bit slower - but it's absolutely rock solid. I've only managed to crash it once, and that was my own fault - loaded a KLD from 4.4-RELEASE into a 4.4-STABLE kernel. Nice panic there.
The ports system also is really nice; it could do dependencies a bit better, but it's generally fairly smart about it. And having the entire system source on dosk and available is nice.
I like how the system is in CVS - bugs get patched and fixes checked in fast, and all one has to do is a 'make update' in
-- Veni, vidi, dormivi
If you've been having the filesystem corruptions,
*Everybody* will get corruption using 2.4.15/2.5.0, just don't use it.
Life's a bitch but somebody's gotta do it.
Face it, Linus is screwing up big time with 2.4. Sure he's human, but he's got to do something to catch all these stupid bugs.
2.4 is supposed to be a STABLE release. However so far it has not been anything like that so far - VM issues, symlink probs, etc etc etc.
Doesn't look like there's any QA worth talking about. No regression testing.
Heck even the oft-flamed Redhat is doing a better job releasing stable kernels.
Yeah you've had disk corruption in FreeBSD, but overall it looks like Linux 2.4 standards are rather poor.
Why use Windows 98 for comparison? Windows is the far low end of the scale. Linux would be crap if it is just slightly better than Windows. So you got to aim a LOT higher than that. Looking at Linus' 2.4 STABLE, you'd be able to call Windows 95 stable too.
Linux has great potential - heck the Linux on a mainframe thing is great. But someone should come up with some decent regression tests so that we don't get STUPID problems like this for STABLE releases. I don't care if stupid stuff happens in dev releases.
Very disappointing.
Yeah we have high standards. If not we'd write the stuff ourselves right? ;)
Seriously tho, what do you want, a Linux that's slightly less unstable than Windows. Or a Linux that's actually stable.
I don't understand why so many people here are using Microsoft to show why Linux isn't that bad.
This is STABILITY we are talking about. If you have to resort to mentioning Microsoft then Linux has become rather bad hasn't it?
If you are talking Joe Public acceptability then yeah mention Microsoft.
Maybe Linux should go towards the FreeBSD style of releasing - STABLE, CURRENT, DEVELOPMENT.
More regression tests before an actual release would help too.
Cheerio,
Link.
man that really made me laugh out loud.
/. that deal with Microsoft or Linux, and then say all of that again. The majority of users here do nothing but slam microsoft at every opportunity that they get, and yes, insult microsoft users. Sometimes even browsing at a level of 2 isn't enough to keep them out.
Set your comment level to 0 and read any stories on
Wow what a hoot.
If God gave us curiosity
enterprise software, by Linux.
For absolutely needing to upgrade your enterprise-wide linux base in a hurry. Which could happen.
I completely stuffed my first 2 or 3 kernel patches/upgrades/compiles etc, but after a couple of dozen it becomes second nature, and in a stressful (read-Manager/client on your back wanting it done yesterday!!) situation that's what you need.
Plus, it is kind of fun and interesting.
Unless it was Apple. Then it still would destroy hard drives. :-)
Vintage computer games and RPG books available. Email me if you're interested.
"Without STABILITY Linux will NEVER succeed... "
ITYM `without stability we will be unable to continue using it for production purposes'. HTH. HAND.
"But the "TOTAL WORLD DOMINATION" is about the non "nerd/technical" people..."
Precisely why I don't give a fig about total world domination.
Go read the Advocacy-HOWTO.
~Tim
--
Rushing on down to the circle of the turn
This so called history doesn't seem to have noticed that there is/was a version of DOOM for the Apple Macinstosh.
It was a great game. My only exposure to the Mac
was through that game.
Stonewolf