EXT4 Data Corruption Bug Hits Linux Kernel
An anonymous reader writes "An EXT4 file-system data corruption issue has reached the stable Linux kernel. The latest Linux 3.4, 3.5, 3.6 stable kernels have an EXT4 file-system bug described as an apparent serious progressive ext4 data corruption bug. Kernel developers have found and bisected the kernel issue but are still working on a proper fix for the stable Linux kernel. The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."
Nope - bisection is a common technique for tracking down the cause of a bug by doing a binary search through the code history.
https://en.wikipedia.org/wiki/Code_Bisection
No this means the kernel has bug-like tendencies from time to time, but is not exclusively buggy. For instance when it's in college, or if its at a bar, and has had a few drinks, well then it might be buggy, but normally at work and at home and to all its friends it acts stable.
I want to delete my account but Slashdot doesn't allow it.
I know he'd never do anything to harm me or my data.
The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.
We're talking about Linux users here...move along.
The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."
They're trying to boost the average uptime of all installations by making people keep their machines turned on. It's just a continuation of the uptime war waged with the BSD folks!
Ezekiel 23:20
Brilliant. Well, it certainly worries this Linux developer -- although I mostly rely on pre-3.0 kernels. Wasn't there a rule on Slashdot about mirroring articles before posting links to them ?
In Soviet Russia, our new overlords are belong to all your base.
From Ted Ts'o's commentary, it's an optimization ("jbd2: don't write superblock when if its empty") gone awry:
Basically, this optimization has the side effect of not updating the transaction log in this rare case. You can end up replaying old transactions after new ones, which will scramble metadata blocks. Given the rather unique conditions needed to hit this one, I'm not going to lose any sleep over any servers running without Ted's fix (though I'll certainly apply it once RedHat releases the patch).
...and too deep. It awoke a being of segfaults and kernel panics.
The summary should say "bisected and found" not "found and bisected". Bisecting is a way of finding bugs.
When all you have is a hammer, every problem starts to look like a thumb.
At first I had mixed feelings of slight disappointment and concern, especially because it is the default filesystem in several distros, (including Android). Although, after some second thoughts, I have come to the following conclusions:
...please, guys, don't do it again!
1) it is part of the game of having a continuous development toward improvement (most of the times) and new features implies some pitfalls. So far, benefits are much larger than costs.
2) Despite the fact developers are still working on a fix, I wouldn't be surprised if it would be found soon.
3)
This is why I don't use file systems less than 10 years old.
What they actually split in half is a sequence of changesets (also known as commits).
The idea is you have a seqence of changesets that take you from the last known good revision to the first known bad revision. By splitting that sequence in half and determining if the revsion in the middle is good or bad you can in principle halve the number of revisions between last known good and first known bad until you find the revision that introduced the bug. Reality is messier because of nonlinear history, because some revisions may be "broken" such that it is not possible to determine if they are "good" or "bad" and because some bugs may be difficult to test for but still bisection is a useful tool for finding problem revisions among a long history relatively easill.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
What term do we get to use for ext4 now? It's unfortunate that Theodore Tso is actually a pretty decent guy instead of being a murderer (and a jerk). So there aren't any obviously negative terms that come to mind.
But clearly, something needs to be done along these lines, as well as a legion of people who forever more claim that ext4 corrupts your data and you should never use it and stick with ext3 instead.
Need a Python, C++, Unix, Linux develop
Write large chunks of data to every filesystem and force the journals to cycle before reboot. If you have to ask "how often is too often?", then you're probably already in trouble.
Article suggested that people who shut down every day, say a laptop owner who doesn't use suspend/hibernate, will probably bump up against this. My suspicion is that those of us with uptimes of several months will have no trouble, but YMMV.
grammar nazi's
grammar Nazis
The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.
This is wrong. The problem occurs when the fs is unmounted too *soon*. Twice in a row. The bug only appears if the journal buffer does not wrap. You only get catastrophic results if this happens twice in a row.
We don't see the world as it is, we see it as we are.
-- Anais Nin
presumably from this post, "being technical" only means complete knowledge of all tools.
I'm guessing you find it very hard to find work with that kind of understanding of what "being technical" implies.
I'm god, but it's a bit of a drag really...
I'm a laptop owner, who uses Dmcrypt, and with a 2 second boot time off SSD, i never bother hibernating. Better check what kernel....
Actually, XP is incompatible with the newest version of NTFS, as you will notice if you ever move HDs around various computers or some reason. Not quite the same thing, but easy to overlook. It can produce some very nasty problems.
I have to agree with you. This is one of the best demos of ZFS around :)
http://www.youtube.com/watch?v=QGIwg6ye1gE
ZFS solves 3 problems by taking a wholistic approach:
* Volume Management
* File System
* Data Integrity
Instead of fragmenting the problem into 3 layers which only have limited access and knowledge by using a unified layer you have more meta-information available to make smarter decisions.
Some interesting essays:
https://blogs.oracle.com/bonwick/entry/raid_z
https://blogs.oracle.com/bonwick/en_US/entry/rampant_layering_violation
Windows can fuck up its file system just fine. It's just that Microsoft never warns its users about defects in Windows unless someone goes public first. Mostly they silently slip the fixes in with a bunch of other fixes. That is, if they fix the bugs at all.
Backups are important regardless of file system. In the absence of human error or hardware failure... sure enough your file system will still get fucked.
Also if I had a dollar for every time Windows fucked a partition table, I'd be driving a much nicer car.
> Windows has never had anything as serious as a file system corruption bug.
That you know of...
Since the Windows development process isn't open, there's no way for you to tell. You don't get to see Microsoft's development versions and you don't get to see Microsoft's bug database.
A Pirate and a Puritan look the same on a balance sheet.
I love Chan9 and MS Research and I think a lot of what MS makes is "cool", but we are all human and mistakes WILL be made. Linux has a great track record. This is also why BTRFS will take a while to get traction in the Enterprise. EXT4 and ZFS are still getting bug fixes.
...or XFS with a recent kernel.
Views expressed do not necessarily reflect those of the author.
http://answers.microsoft.com/en-us/windows/forum/windows_cp-files/bug-report-serious-filesystem-corruption-and-data/17f69e19-92ca-4e1e-b9d5-f78f1ac4e963
Bugs happen. The difference here is that Linux development is done in the open so people find out about them.
I have used BSD. I found it .... quite striking. There's a hell of a lot of performance enhancement in Linux, and it really shows when you try to boot BSD and find it's ass-slow from the get-go. I even tried slapping down Debian-kfreebsd to compare something roughly the same and ... yeah it's just slow as shit. Solaris (both Sun Solaris and Nexenta = Ubuntu/Solaris) wasn't that slow.
Support my political activism on Patreon.
Hopefully BTFS will conquer this.
Blame SUN, they choose a license for ZFS to ensure it never had proper in kernel linux support. They did that because Linux was eating their lunch and still is.
Source?
Cuz I'm looking:
http://en.wikipedia.org/wiki/Ntfs#Microsoft_Windows
http://www.tomshardware.com/forum/1249-63-ntfs-win7-windows
http://en.wikipedia.org/wiki/Ntfs#Versions
And just not seeing "XP is incompatible with the newest version of NTFS"
On the Oregon Cost born and raised, On the beach is where I spent most of my days
This is what you get when you use a filesystem that wasn't developed by a real company.
Because if they had to worry about losing money, they would make damned sure that problem didn't exist. Or at least make it go away. I thought this "problem" existed with ext4 for years.
Yeah, Micro$oft is evil, but their FS works. And file corruption isn't a serious issue except when hard drives fail, and, well, in that case...DERP!
This one occurred in october so pretty doubtful since none of the major distros are that up to date.
Bisect and disect are synonymous, they both mean "splitting in half."
Perhaps, if disect is a real word, but dissect means "cut up/apart", not specifically into two parts.
If God forks the Universe every time you roll a die, he'd better have a damned good memory.
... can we get the words "stable", "linux", and "kernel" into a single summary? I like this game.
"Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
They're mounting it wrong!
When you mount your disks, you need to be sure of proper head alignment. Make sure she's spun up properly as well, otherwise the disks could be surprised and jump away causing a crash. Lastly, my geek friends, mounting too often can cause burning friction which can destroy data and cause irritation and discomfort.
I said no... but I missed and it came out yes.
> Blame SUN, they choose a license for ZFS to ensure it never had proper in kernel linux support.
That's a myth / blatant lie.
Fork Yeah! The Rise and Development of illumos
http://www.youtube.com/watch?feature=player_detailpage&v=-zRN7XLCRhc#t=1460s
Why You Need ZFS
http://www.youtube.com/watch?v=6F9bscdqRpo
@5:40 I just want to clarify you comment "It would be illegal to ship"
@5:45 I think there is a perception issue that we need to tackle.
@5:55 One point that I would like to make because I think said earlier that I think we have much more in common then that separates us.
@5:58 One of the most important things we all have in common is we are all open source systems.
@6:02 And we need to end this self inflicted madness of open source licensing compatibility.
@6:12 I think that it is a boogey man and we letting it us hold us back.
@6:19 You say it would be illegal to ship. I say no one has standing
@6:24 The GPL was never ever designed to counter-act other open source licenses.
@6:33 That is a complete rewrite of history to believe the GPL was designed to be at war with BSD or with Cuddle.
@6:39 The GPL was at war with properiety softwware. And thank the GPL and Stallman open source won.
@6:45 That is the whole point. Open source won.
@6:49 We are pissing on our own victory parade by not allowing these technologies to flow between systems.
grammar nazi's?
*facepalm* I hope that was deliberate.
Free Martian Whores!
They split it in half?
I know it's wrong but I just got this mental image of someone moving all the 0's to one side of a page and all the 1's to the other side...
You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office
http://en.wikipedia.org/wiki/Common_Development_and_Distribution_License#GPL_incompatibility
There seems to be quite an argument over the matter.
If God forks the Universe every time you roll a die, he'd better have a damned good memory.
Nah, He only needs the latest SHA1 for each roll outcome commit as that'll point up the GIT tree :-D
You don't get to see Microsoft's development versions and you don't get to see Microsoft's bug database.
You're looking in the wrong place!
They're called features, and they're on the technet website for all the world to see.
Like how in older Windows versions, disks would be auto-mounted, and NTFS didn't have native active/active capability. In other words, if you made the slightest mistake in your FC zoning, then you could kiss your multi-terabyte cluster volume goodbye.
ZFS has not already been debugged on linux. Is there even a non-FUSE ZFS implementation for linux?
I am not sure everything has to be done in one step. Do one thing and do it well. This holistic idea is nice in concept but often leads to the windows outcome. Not much gets done and what gets done is not that great if at any point "just works" just doesn't.
I think YOU are the one who didn't get the joke...
"That's right...I said it."
Bisecting is also a way of killing bugs - or perhaps Bisecting is when you act like an insect that goes both ways.
it is only after a long journey that you know the strength of the horse.
1) Windows 7 fucked up the Windows 8 partition not because of a bug, but because it isn't forward compatible
2) Microsoft does not recommend dual booting Windows 8 with older Windows versions
3) The guy was using a prerelease version of Windows 8
Show me a bug where a specific version of Windows corrupts its OWN filesystem (ie. the filesystem that comes with it). You can't, because it never happens.
Show me their development and I bet I find one.
And Netcraft conirmed it. I know. Everybody knows. You don't need to keep repeating it.
But, of course, zumbies are knwon to be slow... You may be up to something.
Rethinking email
That's a myth / blatant lie.
You are going to come up with better arguments than that. Your quotes do not support that statement.
Sun was about as Linux-hostile as any company could get, basically from 1995 and forwards. They tried to do as much as they could to make sure that Linux did not benefit in any way from any Solaris or Sun technology.
Of course it makes sense that they tried to fight against the OS which was destined to make them obsolete. Luckily they did not have a particularly competent legal team.
Finally! A year of moderation! Ready for 2019?
Seriously, I have to agree here. It is extremely rare for NTFS to get corrupted under Windows. It just wins this battle.
On Linux front, I presume FS corruption bugs partly arise from the continuously evolving R&D development style of the kernel. New file systems get invented all the time and previous ones get tweaked. Can't say if it's good or bad, it's just another way of doing things. I myself have not wished much since the journal support of ext3.
That isn't a file system bug, that is progress. Would you consider it a bug if a Linux system from 1998 caused corruption on an ext4 volume?
Hell yeah.
If it'd tell me it doesn't know the file system and has no idea what do do with it,
that would be perfectly fine.
But corrupting a file system just because it is unknown to/unsupported by the
system trying to read it would be a huge bug.
There's a native kernel port of ZFS for Linux: http://zfsonlinux.org/
Dilbert RSS feed
> Your quotes do not support that statement.
I'm not sure how clearer you can get with "Open source won. We are pissing on our own victory parade by not allowing these technologies to flow between systems."
You _do_ realize who said them, right?
Bryan Cantrill (wrote dtrace) worked with Jeff Bonwick (designed/wrote ZFS.) They were both together at Sun for 14 and 20 years respectively. If you watch the "Fork Yeah!" video the impression I get is that it looks like they wanted to open source as much possible but was held back by legal.
The _only_ other two people who could weigh in would be the people who designed ZFS and the GPL.
* Jeff Bonwick, and
* Richard Stallman
I don't know anyone else who _would_ actually have credibility in settling the question. Do you?
From a fully updated Ubuntu 12.10 (no patch for this bug yet):
$ uname -r
3.5.0-17-generic
From the summary:
The latest Linux 3.4, 3.5, 3.6 stable kernels have an EXT4 file-system bug
-1 overrated isn't the same thing as "I disagree".
Seriously, I have to agree here. It is extremely rare for NTFS to get corrupted under Windows. It just wins this battle.
I've never seen NTFS get corrupted. I have seen it delete multi-gigabyte files because they were open when Windows crashed.
I've never seen ext3 get corrupted, or delete multi-gigabyte files because they were still open when Linux crashed (or, more likely, went down due to a power failure).
I've never trusted ext4 after the early 'so what if I delete your data after a power failure?' arguments from the developers.
I just hope he's not storing his repo on ext4.
Do you even lift?
These aren't the 'roids you're looking for.
Still, for all of the shit that Linux users talk about Windows, WINDOWS has NEVER had anything as serious as a FILE system CORRUPTION bug.
Finally, someone talking sense ... oh wait.
http://www.computerworld.com/s/article/9054178/Microsoft_s_Windows_Home_Server_corrupts_files
"Microsoft's Windows Home Server CORRUPTS FILES"
"'Don't edit' list includes photos, as well as Quicken and QuickBooks files, warns Microsoft; no word on patch"
Never mind ...
The GPL was at war with properiety softwware. And thank the GPL and Stallman open source won.
Amen.
People reboot linux?
Considering how those who manage the curve are rude, obstructive and just downright mean - I think Linux does a great job in keeping up.
Nah!
Your'e wrong!!
The 0's go to the top of the page, and the 1's to the bottom!!!
(As the 0's have air bubbles that make them float...)
[An irrelevant irrelevancy?]
It's not from october:
linux-stable$ git show 14b4ed22a6
commit 14b4ed22a6b5fc1549504336131be4f5f6ba1bf4
Author: Eric Sandeen <sandeen@redhat.com>
Date: Sat Aug 18 22:29:40 2012 -0400
jbd2: don't write superblock when if its empty
commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3 upstream.
Also FatPhil on SoylentNews, id 863
> they wanted to open source as much possible but was held back by legal.
The legal dept. at Sun?
Also FatPhil on SoylentNews, id 863
Save yourself the extra write and extra opportunity for something to go wrong: disable the journal. worth considering in any case: http://pentabular.wordpress.com/ext4-on-laptop-ssd/
The summary should say "bisected and found" not "found and bisected". Bisecting is a way of finding bugs.
No. They found the bug, then bisected the commits between "last known working" and HEAD to discover what patch caused it.
Dewey, what part of this looks like authorities should be involved?
>> Windows has never had anything as serious as a file system corruption bug.
>That you know of...
So what were all those chkdsk errors after BSODs?
Also FatPhil on SoylentNews, id 863
If you watch the "Fork Yeah!" video the impression I get is that it looks like they wanted to open source as much possible but was held back by legal.
So what if certain engineers wanted to open source things? They didn't get to make that decision.
The quotes are implying that the GPL does not work and that you can combine CDDL-licensed code with GPL'd code and distribute the combination. That position is rather weird, but then again Sun did suffer from a reality distortion field when it came to legal issues. The only other person I have heard of with the same view is Jörg Schilling.
Finally! A year of moderation! Ready for 2019?
Nice try, but fail. That wasn't a bug in Windows, it was a bug in applications.
Really? Not according to Microsoft.
http://support.microsoft.com/kb/946676
"A BUG has been discovered in the way that the initial release of Windows Home SERVER manages FILE transfer and balancing across multiple hard drives. In certain cases, depending on application use patterns, timing, and the workload that is placed on the Windows Home Server-based computer, certain FILES could become CORRUPTED."
"... For distributing data across the different hard drives that are MANAGED by WINDOWS Home Server, the WINDOWS Home Server mini-filter driver REDIRECTS I/O ... A BUG has been discovered in the REDIRECTION mechanism which, in certain cases, depending on application use patterns, timing, and workload, may cause interactions between NTFS, the Memory Manager, and the Cache Manager to get out of sync. This causes CORRUPTED data to be written to FILES."
I doubt that's true. They may not have released a version with such a bug, but they probably did have them at some point. Remember, the vanilla kernel and LKML are the FOSS equivilent of the internal development process and it's releases to QA.
If you want the post QA versions, use a distro kernel.
If it'd tell me it doesn't know the file system and has no idea what do do with it, that would be perfectly fine.
But corrupting a file system just because it is unknown to/unsupported by the system trying to read it would be a huge bug.
Windows did have this behaviour, by the way. In 2007 I had a Dell Inspiron laptop with two power buttons: one for Normal Windows and one for Media Center Windows. I had wiped the hard drive and installed Fedora on it. Powering with the normal button worked fine, but if by accident one were to power it on with the Media Center button then I would get the initial Media Center screen (I have no idea where that code was hiding, possibly in a hidden partition) and it would wipe all my ext3 filesystems.
It is dangerous to be right when the government is wrong.
It's inaccurate anyway. Ted looked at all ext4 changes and found one that he had a hunch might be related. It turns out that this hits 3.6.1 as well if you try hard enough: the change he spotted merely worsens the race window.
No bisection of any kind was involved: the only people who can bisect for a bug are those who can reproduce it reliably. (This probably means I'll have to try to do just that sooner or later, although the prospect of bisection with filesystem damage at each failed bisection step is not remotely appealing.)
-- N.
Windows has never had anything as serious as a file system corruption bug.
I believe they've accomplished this by ensuring that NTFS fails safely to a state of corrupt registry hive errors instead.
Ah I see, we have ambiguity about what "find a bug" means. From the user's perspective, "finding a bug" means producing the buggy behavior. But from the developer's perspective, "finding a bug" means finding the erroneous code. And we are talking about developers here. From my perspective, until the bug was "found" by bisecting it was only "known to exist", not found. See?
By the way, I've actually bisected bugs, have you? No? OK.
When all you have is a hammer, every problem starts to look like a thumb.
I have a Google+ post where I've posted my latest updates to this still-developing story:
https://plus.google.com/117091380454742934025/posts/Wcc5tMiCgq7
Also, I will note that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by all of the major file system authors). So for example, my development laptop, which I am currently using to post this note, is currently running v3.6.3 with the ext4 patches which I have pushed to Linus for the 3.7 kernel. Why am I willing to do this? Specifically because I've run a very large set of automated regression tests on a very regular basis, and certainly before pushing the latest set of patches to Linus. So while it is no guarantee of 100% perfection, I and many other kernel developers *are* willing to eat our own dogfood.
I have a Google+ post where I've posted my latest updates to this still-developing story:
https://plus.google.com/117091380454742934025/posts/Wcc5tMiCgq7
Also, I will note that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by all of the major file system authors). So for example, my development laptop, which I am currently using to post this note, is currently running v3.6.3 with the ext4 patches which I have pushed to Linus for the 3.7 kernel. Why am I willing to do this? Specifically because I've run a very large set of automated regression tests on a very regular basis, and certainly before pushing the latest set of patches to Linus. So while it is no guarantee of 100% perfection, I and many other kernel developers *are* willing to eat our own dogfood.
what's this mean about various versions of Android using ext4? I think I just flashed my tablet to use ext4 (ugh)... really don't want corruption my tablet...
I've had whole NTFS partitions get corrupted, twice. In both instances, the partitions were formatted under Linux, specifically Ubuntu.
Lesson learnt is, never format an NTFS partition under Linux. Personally, I think this functionality should be disabled. It's way too dangerous.
Dropbox drops it like it's hot.
Windows NT 3.51 had a FAT bug where after a file was appended to, the correct file size was not re-written to the FAT. The only way to identify the file size was to read the entire file byte for byte. Microsoft denied that it was a bug, didn't publicize the undocumented feature and never changed that particular behavior.
> Windows has never had anything as serious as a file system corruption bug.
I'm going to assume that either you are joking, or you have only been using Windows for about 5 minutes.
On the off chance that you are actually serious, Geoff Chappell documented a case some years ago in which Windows would occasionally toggle a byte (might have been a word; can't remember now) on the hard drive. Just one byte in a random sector somewhere on the drive. Happy flower sunshine.
You should also Google "Windows disk corruption" and look at all the complaints and cries for help.
One reason why I tried Linux, then switched to it and have stuck with it, was because I was sick and tired of having to run scandisk and/or chkdsk at least once a week on my Windows systems just to keep them running. At the time, I was a contract programmer doing a ton of development, and believe me, if you were constantly working the hard drive (as I was), you WOULD have corruption issues. At random, no explanation. You learned to do constant backups and to be prepared for anything.
The only thing I've experienced even close to that under Linux is that the installer typically does a quick format instead of a full format. As a result, if you have a drive that's iffy and with bad sectors, the install will appear to complete successfully, but it won't work. The answer to that one is, "buy a new hard drive." :)
(I had to learn that one the hard way. If you get ANY errors on a hard drive, just replace the blasted thing. Don't wait, either. Do it now.)
Windows 7 seems to be fairly stable, but XP (just to name one) is notorious for just blowing things up at random. It might be a registry entry; it might be a corrupted executable image on disk. Who knows? But the standard cure is just to back up and reinstall.
Cogito, igitur comedam pizza.
OK, and now I'm probably off topic, but I'm an older guy and as we get older, we like to reminisce. (Between bellowed exhortations to remove ones feet from the lawn, of course.)
I remember a million years ago, when I was developing VxDs for Windows 95. I rigged up the debugger to go active early in the boot ... and had to disable it.
Windows 95 generated SO MANY faults during the boot, it took forever otherwise. I mean, it constantly klonged. Bang, bang, bang, one exception after another. They (mostly) went away when Windows 95 OSR2 appeared. :)
Ah, memories ... Blue Screens of Death .. .. random disk corruption ... it was a beautiful thing.
Cogito, igitur comedam pizza.
Still, for all of the shit that Linux users talk about Windows, Windows has never had anything as serious as a file system corruption bug.
I take it you've never experienced Win9x and Microsoft's FAT family of filesystems. If that's the case, you got lucky.
Whoa somebody's got their undies in a bunch.
True, I'm too easily trolled by armchair experts.
I didn't know bisecting was required for entrance into the cool club, but I guess I've been there awhile.
The you know it's really more of a victims club because if you're doing this the code base is probably pretty nasty. But it also likely means you know what you're doing. Good interview question: have you ever found a bug by bisecting? How does that work? What bug was it? How did you fix it? (The last two questions are needed to identify those who claim do have done something that they have in fact only read about. And you will run into these guys.)
When all you have is a hammer, every problem starts to look like a thumb.
Next time I find a bug, I will try that - it could be interesting!
Sent from my ASR33 using ASCII
I have many thumb drives formatted in ext4. I guess it will not be good idea to use it on my 3.5 kernel based distro, then?
How about the Windows Home Server?
http://en.wikipedia.org/wiki/Windows_Home_Server#File_corruption
The first release of Windows Home Server, RTM (Release to manufacturing), suffered from a file corruption flaw whereby files saved directly to or edited on shares on a WHS device could become corrupted.[29] Only the files that had NTFS Alternate Data Streams were susceptible to the flaw.[30] The flaw led to data corruption only when the server was under heavy load at the time when the file (with ADS) was being saved onto a share.[31]
http://www.mueller-public.de - My site http://www.anr-institute.com/ - Advanced Natural Research Institute
Windows 7 should not have automounted the partition once it detected it wasn't forward compatible with the partition formatting. Forced mounting and formatting would be possible user choices. The bug is in the detection (there may not be any) or the action after the detection.
Well, I might have a way, but it only works on a semi spherical planet in a vacuum.
The more recent patch at http://marc.info/?l=linux-kernel&m=135105626207228&w=2 fixes stuff.
Yes, it just won't receive the benefits of an Apple Fusion drive, but it does run fine.
Change is certain; progress is not obligatory.
The irony is that this *did* in fact happen because I was rebooting wrong. (There appears to be no way to reboot *right* reliably in my position, but rebooting while a umount is proceeding is definitely in some way wrong.)
I got bit by this one: http://support.microsoft.com/kb/925308 on volumes with hundreds of thousands of small files. All who had a size multiple of 4kb were corrupted.
Also note that bisecting won't nessacerally find the root cause of the bug, it will hopefully* find the commit where the bug became apparent but the developer will still have to analyse what part of that commit made the bug apparent and whether the commit really introduced the bug or merely made an existing bug elsewhere more apparrent.
* It is possible that the commit that introduced the bug will be a "broken" commit or immediately preceeded by broken commits so that bisection can't accurately identify it, only give a range of commits that may have introduced the bug.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
They split it in half? I suspect you mean disected.
Actually, "Di" means two just as "Bi" does. Therefore, Bisected and Disected both mean "Cut into two pieces."
/endrant
I currently work on a product that uses fuse on top of xfs on top of LVM on top of RAID1. There are good solid reasons for the existence of each of those layers.
No filesystem is the best for all uses, and when ZFS tries to do everything it means that it doesn't play nice with the rest of the stack.
According to Ted Ts'o's latest update (https://plus.google.com/117091380454742934025/posts) this actually involved a combination of "umount -l" and shutting down while the filesystem was still mounted, and the user also had "nobarrier" set on the filesystem as well as "journal_async_commit".
So it sure looks like the user was playing fast and loose...this is not something that's going to hit your average person.
Actually, we do know of one really lovely bug in Windows Home Server. It didn't corrupt the filesystem metadata, but if you were foolish enough to save files to it from applications like Word the actual data got corrupted. Microsoft's advice was not to save data in ways that lead to it becoming corrupted; they didn't fix the underlying issue for months.
I have to disagree. ZFS was infamous for filesystem metadata corruption issues amongst people who tried to use it seriously. If you were lucky it detected the corruption and remounted read-only, otherwise it kernel paniced the moment you tried to mount the FS and there was no way to recover data short of manually repairing the FS with a hex editor (ZFS didn't have a working fsck, partly for marketing reasons).
Perhaps the author of this summary could have been more precise. The bug is very unlikely to be triggered, here are some examples: https://lkml.org/lkml/2012/10/24/535 and http://phoronix.com/forums/showthread.php?74697-EXT4-Data-Corruption-Bug-Hits-Stable-Linux-Kernels&p=293446#post293446 . Indeed is a good measure to downgrade to a safe version and wait for a patch to come. I have been using the 3.6.2 on my two Gentoo boxes for a couple of days and nothing happened. As a precaution I will downgrade till they release such fix.
> ZFS didn't have a working fsck, partly for marketing reasons
If your File System (FS) needs fsck to recover from errors your FS _design_ is shoddy and incomplete.
Your FS should NEVER get in that STATE in the first place! That's like locking the barn door after the horses escaped.
Unfortunately people want to trade security for performance.
I have bisected bugs, horizontally.
When I was in college the place we lived in had an infestation of 2 inch cockroaches.
Used to kill them with wax bullets.
Shoot at the floor at a low angle a few inches in front of the bug and the spray of wax would cut them in half.
Often the bottom half would run off and leave the top half.
End MGM. Get prospective parents of boys to Google: Men do complain
And this is why I wait before switching fs types. I waited almost 2 years after ext3 was considered stable, before I switched from ext2. I just rebuilt my machine 2 days ago, and I almost, almost went with ext4. But that little voice of caution(read, paranoid subconcious :P) told me to hold off, then someone points out this thread to me.
With that said, after reading the posts in the mailing lists, I am once again proud of the kernel developers and the hardcore linux geeks, for so quickly jumping on this problem, as well as the calm of the "victims". If a similar problem occurred in windows, hoo-boy, there would be an uprising.
--- Amateur musician: http://josh.morine.net/headbanger/
On the one hand I would tend to agree... on the other hand, there are a lot of developers and contributers that are not native english. I've worked on projects with other devs, who would show me some code, documentation, etc... and be completel shocked(and appalled) when I pointed out what (to me) were glaring spelling/grammatical errors. Hell, I am native english, and though I try my best, I can sometimes read and re-read text a hundred times, and miss something stupid like "you're" instead "your" or "it's" instead of "its". :P
--- Amateur musician: http://josh.morine.net/headbanger/
If they *found* the bug, they could just fix it. If they wanted to know what caused it, 'svn blame' would let them know.
If they merely *reproduced* the bug, then they might want to use bisection.
I'm a bit surprised that you got through putting a full-weight distro like Fedora on it and didn't notice the presence of peculiar partitioning schemes. Or was your "install" a matter of dropping the boot DVD into the drive and selecting the "unattended install" option. (It's been a long time since I did a Fedora myself - I don't know when / if it acquired such capabilities. Normally I like to know what is going onto my computers.
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
I did the partitioning myself, I always have: two for alternative /'s, one for swap and one for /home. I really don't know where the Media Center code hid. Possibly in an EPROM? I actually still have the machine, but the screen is unusable. I could plug it into a monitor if I were really curious.
It is dangerous to be right when the government is wrong.
I guess it has come time to tell the truth.
First of all, the bug has never been bisected, and the whole story that hit Slashdot and some other news sites was based solely on Ted's speculation, which was never confirmed. In fact, at the of the same day, Ted admitted that his hypothesis was wrong.
After a few days of investigation, the problem was traced to an experimental mounting option, which is not turned on by default and was intended for developers only. Accidentally, this option was not marked as "experimental", so it is available to users. https://lkml.org/lkml/2012/10/26/570
A guess : when you wrote your partition table and then made file systems on the partitions, you didn't clear the formatted partitions and overwrite everything with zeros. (Who does on the size of hard drives this decade?) So, even after writing your own partition table, and formatting the partitions, much of the Media Centre boot code could have survived. Second guess : the Media Centre hard drive had X sectors, but the partition scheme only covered X-[some] sectors. "some" could well be quite small (display a splash screen ; read some configuration file ; boot Windoze with certain parameters) ; conceivably just a few sectors. Just because writing compact code to the bare metal isn't exactly popular these days, doesn't mean that the Evil Empire couldn't hire Melto do it.
every instruction he wrote could also be considered
a numerical constant.
He could pick up an earlier âoeaddâ instruction, say,
and multiply by it,
if it had the right numeric value.
No, I still don't understand the "separate constants" bit ; at least not while I'm sober. Hail Mel!
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
That was some tangent!
Here is a picture where both power buttons are visible, for the curious (it gives me shivers, I actually covered the second button after the second loss):
http://www.notebookreview.com/assets/10236.jpg
Interestingly, googling for some information on the Media Center (or Media Direct) I see almost nothing, as if there were never any issues with it or as if nobody ever used it!
It is dangerous to be right when the government is wrong.
Nah. To get the case I found you need not one experimental option, but *three*.
Specifically, you need nobarrier,journal_async_commit -- and the latter option implies journal_checksum, so it's really three options.
If you do all that, reboots / blockdev disconnections while an unmount is proceeding will not merely give you filesystem corruption on second mount (regardless of options the second time), but *silent* filesystem corruption on remount (journal_checksum and any other options will give you a journal abort and read-only remount, which is a pretty big clue that something is wrong, though the filesystem is still corrupted).
Fun stuff.
-- N.
Meanwhile, the laptop whose graphics card I recently static'd has just been replaced, and for warranty reasons, I'm just cloning the hard drive before I power it up, for warranty reasons. According to a sticker on the machine it has "Windows 7" on it, but that just means that I need to clone the drive before I use the computer. How Win7 works and how it behaves doesn't even raise waning interest. Does this version of Windows still play media?
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
I wouldn't know either: I've been using Linux-bases OSes since 2001 and the last time I did try to install Windows for a neighbour, it wouldn't open Word files out of the box! Ubuntu just so happens to open Word files out of the box, by the way.
Yeah, I had to give up on Fedora, but I do still prefer CentOS on the server even if I prefer Debian-based at home.
It is dangerous to be right when the government is wrong.