2.4.20 ext3 Data Corrupting Bug Fixed
An anonymous reader writes "The ext3 data corrupting bug found in the latest stable Linux kernel and reported by Slashdot here and here has been fixed. In this interesting KernelTrap story Andrew Morton describes the problem and offers a working patch. Evidently the bug has its roots in a much bigger design issue, something that won't likely be fixed in the current 2.4 kernel series. In any case, with Morton's patch applied your data will not be corrupted."
Where can I find the QA documentation, test cases and scripts for ext3? I would like to verify that this bug, and variations thereof, will be caught before release in the future. Thanks.
They don't seem to be at the ext3 home (linked to in the story).
Open Source is useless without Open Procedures, Open Documentation and Open Quality Control.
I hate to say it, but maybe /. doesn't like stories that make linix look bad.
"A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
And here we're talking about calling the next major release "3.0" while things as important as /the file system/ need to be majorly reworked. Perhaps we shouldnt jump the gun on this. 3.0 should not have things laying around in it that need to be completely re-worked if they're going to work right. It doesnt count as a culmination of significant changes since 2.0 if those changes wont actually be working in 3.0.1
-- 'The' Lord and Master Bitman On High, Master Of All
Which distributions (Redhat, mandrake, debian, etc) are affected by this in their default ISO images? ie - which ones do I have to update just to get around this fatal error?
Obviously you should be running a totally bug-free OS that has never needed to be patched for filesystem-corruption bugs.
455fe10422ca29c4933f95052b792ab2
On a less inflamatory note; it demonstrates something that most of us are already well aware of. Don't go enabling advanced features or running bleeding-edge kernels unless you either have good backups, or are happy to risk losing some data.
You're an idiot if you don't have backups anyhow. The most reliable filesystem in the world isn't going to save you from a hard-drive failure, user error, malicious code, theft, flood, fire, lightning strike, earthquake.. These things eat data a lot more frequently than filesystem bugs!
Expect data loss. Keep backups.
455fe10422ca29c4933f95052b792ab2
hmm...yeah... Talk about code maturity.
Microsoft is damn annoying. NT's filesystem was just starting to get a bit more reliable with each service pack and they now say they are going to throw it totally away to introduce new bugs.
Still, Linux has so many filesystems it's not funny. What are the odds of them getting in the way of the kernel in the future?
Install redhat on ext3,
configure redhat, esp the networking
get online, get the latest 2.4 kernel
get XFS patch and xfsprogs and install
recompile a new kernel with XFS in it and boot.
mkfs.xfs
cd
cp -a {bin,usr,etc,... except tmp,mnt,proc}
fix
reboot.
This still gives some obscure errors on bootup, but maybe because of redundant scripts. works very fast and stable for me. If you get around to fixing those errors, please roll out a HOWTO since noone can take filesystem instability on production servers, yet everyone wants to use 2.4.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
OK. I'm not replying to your post specifically, rather I'm replying to the idea that is repeated throughout this thread: DON'T USE BLEEDING EDGE KERNELS and all it's variations. Yes, that's good advice. But let's look at this for one second shall we? What's the version here? 2.4.20. Yup, the twentieth iteration of the stable series kernel. I should be able to install that mother fucker on my pacemaker. There really is no excuse for there to be bugs on a .20 (or .19, or .18 or, even .10) release of a kernel. The fact that there is tells us that there is something fundamentally broken about the process.
So 2.4.20 is a 'bleeding-edge' kernel? Ext3 is a 'cutting-edge' feature?
Are you saying that users should refrain from upgrading to newer releases even when those have been explicitly tagged as 'stable'? Where do you draw the line?
I do think there is some truth in the argument that you shouldn't upgrade the kernel even from a stable series. Wait for your vendor to release an updated kernel package, if they judge it necessary. And maybe don't upgrade even then.
But it is unfair in this case to criticize users for installing what they thought was a stable, tested, reliable kernel version. Ah well, mistakes happen.
-- Ed Avis ed@membled.com
Of course this particular problem could not have happened with 2.2 because there were no Journalling Filesystems (or has one of them been backported to one of the latest 2.2 kernels?).
I'll stick with installing the newest kernel a week or so after it hits the streets. That saved me from the greased turkey a year ago.My understanding is that the bug doesn't affect the -default- journalling mode of ext3. You have to specifically change it using some filesystem-tuning utility.
455fe10422ca29c4933f95052b792ab2