Linux 4.0 Has a File-System Corruption Problem, RAID Users Warned
An anonymous reader writes: For the past few days kernel developers and Linux users have been investigating an EXT4 file-system corruption issue affecting the latest stable kernel series (Linux 4.0) and the current development code (Linux 4.1). It turns out that Linux users running the EXT4 file-system on a RAID0 configuration can easily destroy their file-system with this newest "stable" kernel. The cause and fix have materialized but it hasn't yet worked its way out into the mainline kernel, thus users should be warned before quickly upgrading to the new kernel on systems with EXT4 and RAID0.
Losing data goes with the territory if you're going to use RAID 0.
This is the new 4.0 kernel, A Major version update , less than a month old, that most Linux systems will not have yet ...and the issue has already been patched
Bleeding edge builds get what they expect, stable builds don't even notice
Puteulanus fenestra mortis
RAID 0 is only as unstable as its least stable component. In this case it's most likely a drive failure, and most drives are fairly long MTBFs. The chances of a disk failure increase as a function of time and number of drives deployed. A two-drive RAID 0 will be more stable than a five-drive RAID 0 which will be more stable than a 10 drive RAID 0 that's three years old. In the case of higher RAID levels, you can remove a single (or multiple) drive failure as the point of failure. In this case, the point of failure is the kernel, so it's perfectly legitimate to consider this a really bad problem. Would you say the same thing if the bug affected RAID 1 or RAID 5?
Would you say the same thing if the bug affected RAID 1 or RAID 5?
I suspect not, since his point seemed to be that you shouldn't be using RAID 0 for data that you care about anyway.
It doesn't really make it ok for a bug to exist that destroys RAID 0 volumes, but it does mitigate the seriousness of the damage caused. And it's true: Don't use RAID 0 to store data that you care about. I don't care if the MTBF is long, because I'm not worried about the mean time, but the shortest possible time between failures. If we take 1,000,000 drives and the average failure rate is 1% for the first year, it's that that comforting to the 1% of people whose drives fail in that first year.
I understand if you are emotionally attached to Linux to the point where someone accidentally criticising it makes you feel uncomfortable, but you really should be able to figure out that "but... but... they're worse!" is no rational response :)
Or just use a power of 2 chunk size?
What idiot configuration did someone have to have to trigger this bug?
Meanwhile, my Win keeps BSOD.
Really? Sounds like you're screwing something up pretty bad, haven't seen one of those in about 6 or 7 years.
It's not. However it isn't beyond a reasonable expectation that a dodgy touchpad driver shouldn't be able to kill an OS.