Linux 4.0 Has a File-System Corruption Problem, RAID Users Warned

← Back to Stories (view on slashdot.org)

Linux 4.0 Has a File-System Corruption Problem, RAID Users Warned

Posted by timothy on Thursday May 21, 2015 @01:23AM from the don't-store-the-ark-there dept.

An anonymous reader writes: For the past few days kernel developers and Linux users have been investigating an EXT4 file-system corruption issue affecting the latest stable kernel series (Linux 4.0) and the current development code (Linux 4.1). It turns out that Linux users running the EXT4 file-system on a RAID0 configuration can easily destroy their file-system with this newest "stable" kernel. The cause and fix have materialized but it hasn't yet worked its way out into the mainline kernel, thus users should be warned before quickly upgrading to the new kernel on systems with EXT4 and RAID0.

29 of 226 comments (clear)

Min score:

Reason:

Sort:

Linux is clearly unstable! by Anonymous Coward · 2015-05-21 01:26 · Score: 5, Funny

I'll stick with Windows Vista, thanks.
stable by rossdee · 2015-05-21 01:34 · Score: 4, Funny

this is obviously some strange usage of the word "stable" that I wasn't previously aware of.
1. Re:stable by Anonymous Coward · 2015-05-21 01:38 · Score: 5, Funny
  
  If you ever owned horses, you would understand what "stable" means in this context
2. Re:stable by Deep+Esophagus · 2015-05-21 02:13 · Score: 2
  
  This. My first thought upon reading TFS was, how did this ever pass peer review and testing to get into the "stable" kernel? They do still perform peer review and unit testing, don't they?
3. Re:stable by Trevelyan · 2015-05-21 02:20 · Score: 5, Informative
  
  It's stable as in terms of features and changes. i.e. No longer under development and will only receive fixes.
  
  However! Kernels from kernel.org are not for end users, if someone is using these kernels directly then they do so at their own risk.
  They are intended for integrators (distributions), whose integration will include their own patches/changes, testing, QA and end user support
  
  There is a reason that RHEL 7 is running Kernel 3.10 and Debian 8 is running 3.16. Those are the 'stable' kernels you were expecting.
  
  When kernel development moved from 2.5 to 2.6 (that later became 3.0), they stopped their odd/even number development/stable-release cycle. Now there is only development, and the integrators are expected to take the output of that to create stable-releases.
4. Re:stable by dave420 · 2015-05-21 03:01 · Score: 3, Insightful
  
  I understand if you are emotionally attached to Linux to the point where someone accidentally criticising it makes you feel uncomfortable, but you really should be able to figure out that "but... but... they're worse!" is no rational response :)
5. Re: stable by jbengt · 2015-05-21 06:08 · Score: 2
  
  I can routinely cause a BSOD (about 1/3 of the time) on my HP laptop running Windows 7 Pro if I use the touchpad at the log-in screen on start-up. It's apparently a known bug in the touchpad driver that will not get fixed.
6. Re: stable by oobayly · 2015-05-21 07:30 · Score: 3, Insightful
  
  It's not. However it isn't beyond a reasonable expectation that a dodgy touchpad driver shouldn't be able to kill an OS.
Warning: RAID 0 by Culture20 · 2015-05-21 01:37 · Score: 2, Interesting

RAID 0 is unstable to begin with. Medium case scenario here (for legitimate use) is some data gets corrupted on a compute node. Run the program on two nodes; if you get the same result on both, that result is probably fine. If you're running RAID0 on any filesystem that isn't temporary or at least easily replaceable, you're doing it wrong.
1. Re:Warning: RAID 0 by Enry · 2015-05-21 02:08 · Score: 2, Insightful
  
  RAID 0 is only as unstable as its least stable component. In this case it's most likely a drive failure, and most drives are fairly long MTBFs. The chances of a disk failure increase as a function of time and number of drives deployed. A two-drive RAID 0 will be more stable than a five-drive RAID 0 which will be more stable than a 10 drive RAID 0 that's three years old. In the case of higher RAID levels, you can remove a single (or multiple) drive failure as the point of failure. In this case, the point of failure is the kernel, so it's perfectly legitimate to consider this a really bad problem. Would you say the same thing if the bug affected RAID 1 or RAID 5?
2. Re:Warning: RAID 0 by nine-times · 2015-05-21 02:39 · Score: 4, Insightful
  
  Would you say the same thing if the bug affected RAID 1 or RAID 5?
  I suspect not, since his point seemed to be that you shouldn't be using RAID 0 for data that you care about anyway.
  It doesn't really make it ok for a bug to exist that destroys RAID 0 volumes, but it does mitigate the seriousness of the damage caused. And it's true: Don't use RAID 0 to store data that you care about. I don't care if the MTBF is long, because I'm not worried about the mean time, but the shortest possible time between failures. If we take 1,000,000 drives and the average failure rate is 1% for the first year, it's that that comforting to the 1% of people whose drives fail in that first year.
3. Re:Warning: RAID 0 by nine-times · 2015-05-21 03:47 · Score: 2
  
  Well, it mitigates the seriousness of the damage a bug should cause, assuming that people use RAID reasonably.
  I'm going to go ahead and say that it mitigates the serious of the damage caused in actuality since most IT people entrusted with serious and important data aren't going to be that stupid. I mean, yes, I've seen some pretty stupid things, and I've seen professional IT techs set up production servers with RAID 0, but it's a bit of a rarity. There could still be some serious damage, but much less than if it were a bug affecting RAID 5 volumes.
Why ext4 by silas_moeckel · 2015-05-21 01:38 · Score: 2

If your running a brand spanky new kernel, with data you do not care about why an old FS. Plenty of newer better FS's to choose from.

--
No sir I dont like it.
1. Re:Why ext4 by Rich0 · 2015-05-21 02:47 · Score: 2
  
  The problem is that the feature-list for ZFS is very enterprise-oriented.
  Why would you want to add just one drive to a server with 5x 6-drive RAID6 arrays? Just add another 6 drives at a time.
  On the other hand, if you have a PC with 3 drives in RAID5, you could easily want to turn that into a 4-drive RAID5 or a 5-drive RAID6 in-place.
  Btrfs has a lot of features that are useful for smaller deployments, like being able to modify the equivalent of a vdev in-place. ZFS on the other hand has a lot of features like ZIL that are very useful for larger deployments.
2. Re:Why ext4 by fnj · 2015-05-21 04:01 · Score: 2, Informative
  
  Name one that actually boots the Linux kernel, and doesn't just run in user space. (Yes, I am a fan of ZFS, but not the Linux implementation.)
  You really should get out more. ZFS on Linux is not to be confused with the ZFS Fuse project. You can boot from a ZoL filesystem. In general ZoL is about as stable, complete, and reliable as any ZFS.
3. Re: Why ext4 by wed128 · 2015-05-21 04:34 · Score: 2
  
  ReiserFS predates ext4, and it's hard to be an active software developer in prison.
4. Re:Why ext4 by houstonbofh · 2015-05-21 16:17 · Score: 2
  
  If you trace it back, all of that fear originates on one post from the freenas forums. A post from one of the key developers says that you should use ecc for any server with critical data, but zfs is neither more or less sensitive to it.
New version ... by JasterBobaMereel · 2015-05-21 01:40 · Score: 5, Insightful

This is the new 4.0 kernel, A Major version update , less than a month old, that most Linux systems will not have yet ...and the issue has already been patched
Bleeding edge builds get what they expect, stable builds don't even notice

--
Puteulanus fenestra mortis
1. Re:New version ... by Anonymous Coward · 2015-05-21 02:14 · Score: 2, Insightful
  
  The last major Linux version update that actually meant something was 1->2. The "major version" bumps in the kernel are now basically just Linus arbitrarily renumbering a release. The workflow no longer has a notion of the next major version.
Re:Which RAID are they referring to? by bakaorg · 2015-05-21 01:46 · Score: 5, Informative

md raid. The bug was in commit md/raid0: fix bug with chunksize not a power of 2 causing, you guessed it, a bug with a chunksize not a power of two. I guess "fix" was a bit diversionary.

The real problem was a macro modifying arguments that were later expected to be the unmodified version.
Re:It's RAID 0 by Enry · 2015-05-21 01:58 · Score: 2

I have 4 drives in a RAID 10, so two RAID 1 arrays of two drives each combined together in a RAID 0. I did it mostly because I can add new drive at any time and just chain them onto the RAID 0.
Re:Which RAID are they referring to? by msauve · 2015-05-21 02:39 · Score: 4, Informative

No. There was a minor bug introduced at 3.14. The patch to fix that, completely different issue, went into 4.0 and caused this corruption issue.

--
"National Security is the chief cause of national insecurity." - Celine's First Law
Ahh there it is by drinkypoo · 2015-05-21 04:13 · Score: 2

Tunneled down into the articles, http://git.neil.brown.name/?p=... has the patch. I'm building a system with 4.0.4 right now so this was material to me

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Or just use a power of 2 chunk size? by tlambert · 2015-05-21 04:16 · Score: 3, Insightful

Or just use a power of 2 chunk size?
What idiot configuration did someone have to have to trigger this bug?
Re:Which RAID are they referring to? by MSG · 2015-05-21 04:55 · Score: 3, Informative

That fix is actually in the wrong place. The fix for that is tracked in kernel.org's bugzilla # 98501. I'm not linking directly as linking to bugzilla tends to place too high a load on those systems. It's impolite.
Neil Brown said that he'd push the fix to Linus "shortly" at 2015-05-20 23:06:58 UTC. I still don't see the fix in Linus' tree.
Watch for a fix titled "md/raid0: fix restore to sector variable in raid0_make_request"
Re:It's RAID 0 by kthreadd · 2015-05-21 05:12 · Score: 2, Insightful

Or it could work just fine. RAID 0 is not dangerous, you may just as well loose your data even if you only use a single drive. Hard drives and SSDs don't go bad that often that it's a problem.
Raid kills bugs dead! by TeknoHog · 2015-05-21 05:46 · Score: 5, Funny

Well, there goes that slogan.

--
Escher was the first MC and Giger invented the HR department.
In particular, NO redundancy. Reliability drops. by Ungrounded+Lightning · 2015-05-21 06:05 · Score: 5, Informative

Losing data goes with the territory if you're going to use RAID 0.
In particular, RAID 0 combines disks with no redundancy. It's JUST about capacity and speed, striping the data across several drives on several controllers, so it comes at you faster when you read it and gets shoved out faster when you write it. RAID 0 doesn't even have a parity disk to allow you to recover from failure of one drive or loss of one sector.
That means the failure rate is WORSE than that of an individual disk. If any of the combined disks fails, the total array fails.
(Of course it's still worse if a software bug injects additional failures. B-b But don't assume, because "there's a RAID 0 corruption bug", that there is ANY problem with the similarly-named, but utterly distinct, higher-level RAID configurations which are directed toward reliability, rather than ONLY raw speed and capacity.)

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Re:It's RAID 0 by Forever+Wondering · 2015-05-21 11:40 · Score: 4, Informative

Based on the commit fixes, it's in a function called raid0_make_request, which is only used in raid0.c
raid 10 is in raid10.c, so it doesn't use this function.
The bug is based on the fact that a macro "sector_div" modifies it's first argument [and returns the remainder]. I've removed the obligatory backslashes for clarity:

# define sector_div(n, b)(
{
int _res;
_res = (n) % (b);
(n) /= (b);
_res;
}
)
This is used in some fifty files. Some just want the remainder [and they don't want the first arg changed so they do]:

sector_t tmp = sector;
rem = sector_div(tmp,blah);
This is effectively what the code wanted, but the actual fix was to do a restore afterwards:

sector_t sector = myptr->sector;
...
rem = sector_div(sector,blah);
...
sector = myptr->sector;
... // use sector [original value only please ;-)]
The last line to restore sector with the original value was the fix.
They should do a full code audit as their may be other places that could be a problem. I've reviewed half the files that use this macro and while they're not broken, some of the uses are fragile. I paraphrase: "sector_div considered harmful"
What they really need are a few more variants which are pure functions that could be implemented as inlines:
rem = sector_rem_pure(s,n)
s2 = sector_div_pure(s1,n)
Or, a cleaner sector_div macro:
sector_div_both(s,n,sector_return,rem_return)

--
Like a good neighbor, fsck is there ...