Slashdot Mirror


EXT4 Data Corruption Bug Hits Linux Kernel

An anonymous reader writes "An EXT4 file-system data corruption issue has reached the stable Linux kernel. The latest Linux 3.4, 3.5, 3.6 stable kernels have an EXT4 file-system bug described as an apparent serious progressive ext4 data corruption bug. Kernel developers have found and bisected the kernel issue but are still working on a proper fix for the stable Linux kernel. The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."

249 comments

  1. Bisected? by Anonymous Coward · · Score: 0, Troll

    Kernel developers have found and bisected the kernel issue...

    They split it in half? I suspect you mean disected.

    1. Re:Bisected? by Slayne · · Score: 5, Informative

      Nope - bisection is a common technique for tracking down the cause of a bug by doing a binary search through the code history.
      https://en.wikipedia.org/wiki/Code_Bisection

    2. Re:Bisected? by Gothmolly · · Score: 4, Funny

      No this means the kernel has bug-like tendencies from time to time, but is not exclusively buggy. For instance when it's in college, or if its at a bar, and has had a few drinks, well then it might be buggy, but normally at work and at home and to all its friends it acts stable.

      --
      I want to delete my account but Slashdot doesn't allow it.
    3. Re:Bisected? by Alan+Shutko · · Score: 0

      No, they mean bisected.

      That's a procedure by which you do a binary search to find which patch caused a problem.

    4. Re:Bisected? by Anonymous Coward · · Score: 0

      Whoooooooooosh!!

    5. Re:Bisected? by Tough+Love · · Score: 1

      The summary should say "bisected and found" not "found and bisected". Bisecting is a way of finding bugs.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    6. Re:Bisected? by Anonymous Coward · · Score: 0

      I'm pretty sure the GP was making a grammatical comment that was, while technically correct, not applicable technically. English is a harsh mistress...

    7. Re:Bisected? by petermgreen · · Score: 4, Informative

      What they actually split in half is a sequence of changesets (also known as commits).

      The idea is you have a seqence of changesets that take you from the last known good revision to the first known bad revision. By splitting that sequence in half and determining if the revsion in the middle is good or bad you can in principle halve the number of revisions between last known good and first known bad until you find the revision that introduced the bug. Reality is messier because of nonlinear history, because some revisions may be "broken" such that it is not possible to determine if they are "good" or "bad" and because some bugs may be difficult to test for but still bisection is a useful tool for finding problem revisions among a long history relatively easill.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    8. Re:Bisected? by i_ate_god · · Score: 1

      presumably from this post, "being technical" only means complete knowledge of all tools.

      I'm guessing you find it very hard to find work with that kind of understanding of what "being technical" implies.

      --
      I'm god, but it's a bit of a drag really...
    9. Re:Bisected? by partyguerrilla · · Score: 1

      Bisect and disect are synonymous, they both mean "splitting in half."

    10. Re:Bisected? by newcastlejon · · Score: 2

      Perhaps, if disect is a real word, but dissect means "cut up/apart", not specifically into two parts.

      --
      If God forks the Universe every time you roll a die, he'd better have a damned good memory.
    11. Re:Bisected? by mcgrew · · Score: 1

      grammar nazi's?

      *facepalm* I hope that was deliberate.

    12. Re:Bisected? by FatdogHaiku · · Score: 2

      They split it in half?

      I know it's wrong but I just got this mental image of someone moving all the 0's to one side of a page and all the 1's to the other side...

      --
      You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office
    13. Re:Bisected? by EMR · · Score: 3, Funny

      If God forks the Universe every time you roll a die, he'd better have a damned good memory.

      Nah, He only needs the latest SHA1 for each roll outcome commit as that'll point up the GIT tree :-D

    14. Re:Bisected? by CheshireDragon · · Score: 2

      I think YOU are the one who didn't get the joke...

      --
      "That's right...I said it."
    15. Re:Bisected? by fireman+sam · · Score: 1, Funny

      Bisecting is also a way of killing bugs - or perhaps Bisecting is when you act like an insect that goes both ways.

      --
      it is only after a long journey that you know the strength of the horse.
    16. Re:Bisected? by larry+bagina · · Score: 1

      I just hope he's not storing his repo on ext4.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    17. Re:Bisected? by Nivag064 · · Score: 3, Funny

      Nah!

      Your'e wrong!!

      The 0's go to the top of the page, and the 1's to the bottom!!!

      (As the 0's have air bubbles that make them float...)

      [An irrelevant irrelevancy?]

    18. Re:Bisected? by Anonymous Coward · · Score: 0

      Thank you. Today I learned something new.

    19. Re:Bisected? by Just+Some+Guy · · Score: 4, Informative

      The summary should say "bisected and found" not "found and bisected". Bisecting is a way of finding bugs.

      No. They found the bug, then bisected the commits between "last known working" and HEAD to discover what patch caused it.

      --
      Dewey, what part of this looks like authorities should be involved?
    20. Re:Bisected? by Anonymous Coward · · Score: 0

      A misspelling on my part, and apparently a common one, but yes, that's what I thought was meant.

      MW
      1: to separate into pieces : expose the several parts of (as an animal) for scientific examination
      2: to analyze and interpret minutely (dissect a problem)

    21. Re:Bisected? by Dan+East · · Score: 0

      The OPs mockery of the summary is justified. If bisecting is the process of finding the change in which a bug was introduced, then how can you do that to the bug AFTER it has been found?

      Kernel developers have found and bisected the kernel issue

      Should be "Kernel developers have bisected change sets and the found kernel issue".

      --
      Better known as 318230.
    22. Re:Bisected? by Anonymous Coward · · Score: 1

      It's inaccurate anyway. Ted looked at all ext4 changes and found one that he had a hunch might be related. It turns out that this hits 3.6.1 as well if you try hard enough: the change he spotted merely worsens the race window.

      No bisection of any kind was involved: the only people who can bisect for a bug are those who can reproduce it reliably. (This probably means I'll have to try to do just that sooner or later, although the prospect of bisection with filesystem damage at each failed bisection step is not remotely appealing.)

        -- N.

    23. Re:Bisected? by Tough+Love · · Score: 3, Informative

      Ah I see, we have ambiguity about what "find a bug" means. From the user's perspective, "finding a bug" means producing the buggy behavior. But from the developer's perspective, "finding a bug" means finding the erroneous code. And we are talking about developers here. From my perspective, until the bug was "found" by bisecting it was only "known to exist", not found. See?

      By the way, I've actually bisected bugs, have you? No? OK.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    24. Re:Bisected? by Anonymous Coward · · Score: 0

      By the way, I've actually bisected bugs, have you? No? OK.

      Whoa somebody's got their undies in a bunch. I didn't know bisecting was required for entrance into the cool club, but I guess I've been there awhile.

    25. Re:Bisected? by Just+Some+Guy · · Score: 0

      You're wrong. It's cute that you want to play "more pedantic than thou" with the big kids, but it's not flattering to you when you're not very good at it.

      --
      Dewey, what part of this looks like authorities should be involved?
    26. Re:Bisected? by Anonymous Coward · · Score: 0

      Gotta love the sensational "journalism" of Phoronix.

    27. Re:Bisected? by Tough+Love · · Score: 1

      Whoa somebody's got their undies in a bunch.

      True, I'm too easily trolled by armchair experts.

      I didn't know bisecting was required for entrance into the cool club, but I guess I've been there awhile.

      The you know it's really more of a victims club because if you're doing this the code base is probably pretty nasty. But it also likely means you know what you're doing. Good interview question: have you ever found a bug by bisecting? How does that work? What bug was it? How did you fix it? (The last two questions are needed to identify those who claim do have done something that they have in fact only read about. And you will run into these guys.)

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    28. Re:Bisected? by Anne+Thwacks · · Score: 1

      Next time I find a bug, I will try that - it could be interesting!

      --
      Sent from my ASR33 using ASCII
    29. Re:Bisected? by petermgreen · · Score: 1

      Also note that bisecting won't nessacerally find the root cause of the bug, it will hopefully* find the commit where the bug became apparent but the developer will still have to analyse what part of that commit made the bug apparent and whether the commit really introduced the bug or merely made an existing bug elsewhere more apparrent.

      * It is possible that the commit that introduced the bug will be a "broken" commit or immediately preceeded by broken commits so that bisection can't accurately identify it, only give a range of commits that may have introduced the bug.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    30. Re:Bisected? by Anonymous Coward · · Score: 0

      Binary search? Noobs! All the bugs I introduce are hashed.

    31. Re:Bisected? by Mike_Theory · · Score: 2

      Kernel developers have found and bisected the kernel issue...

      They split it in half? I suspect you mean disected.

      Actually, "Di" means two just as "Bi" does. Therefore, Bisected and Disected both mean "Cut into two pieces."

      --
      /endrant
    32. Re:Bisected? by Anonymous Coward · · Score: 0

      Except for the little issue that Disected is NOT A WORD.

    33. Re:Bisected? by Keith+Henson · · Score: 2

      I have bisected bugs, horizontally.

      When I was in college the place we lived in had an infestation of 2 inch cockroaches.

      Used to kill them with wax bullets.

      Shoot at the floor at a low angle a few inches in front of the bug and the spray of wax would cut them in half.

      Often the bottom half would run off and leave the top half.

      --
      End MGM. Get prospective parents of boys to Google: Men do complain
    34. Re:Bisected? by sunderland56 · · Score: 1

      If they *found* the bug, they could just fix it. If they wanted to know what caused it, 'svn blame' would let them know.

      If they merely *reproduced* the bug, then they might want to use bisection.

  2. Low impact by Anonymous Coward · · Score: 0

    It's a good thing most stable releases are on 3.2 or 3.0 with commercial systems on even earlier versions.

    1. Re:Low impact by Anonymous Coward · · Score: 0, Flamebait

      Still, for all of the shit that Linux users talk about Windows, Windows has never had anything as serious as a file system corruption bug.

    2. Re:Low impact by hierofalcon · · Score: 0

      I suspect they were just more likely to find them during development since they have to reboot so often when updating Microsoft products. Reboots aren't nearly as frequent on Linux boxes. To say they never had them would be a stretch.

    3. Re:Low impact by Anonymous Coward · · Score: 1

      Actually, XP is incompatible with the newest version of NTFS, as you will notice if you ever move HDs around various computers or some reason. Not quite the same thing, but easy to overlook. It can produce some very nasty problems.

    4. Re:Low impact by Anonymous Coward · · Score: 1

      Windows can fuck up its file system just fine. It's just that Microsoft never warns its users about defects in Windows unless someone goes public first. Mostly they silently slip the fixes in with a bunch of other fixes. That is, if they fix the bugs at all.

      Backups are important regardless of file system. In the absence of human error or hardware failure... sure enough your file system will still get fucked.

      Also if I had a dollar for every time Windows fucked a partition table, I'd be driving a much nicer car.

    5. Re:Low impact by jedidiah · · Score: 5, Insightful

      > Windows has never had anything as serious as a file system corruption bug.

      That you know of...

      Since the Windows development process isn't open, there's no way for you to tell. You don't get to see Microsoft's development versions and you don't get to see Microsoft's bug database.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    6. Re:Low impact by Bengie · · Score: 1

      I love Chan9 and MS Research and I think a lot of what MS makes is "cool", but we are all human and mistakes WILL be made. Linux has a great track record. This is also why BTRFS will take a while to get traction in the Enterprise. EXT4 and ZFS are still getting bug fixes.

    7. Re:Low impact by h4rr4r · · Score: 4, Informative

      http://answers.microsoft.com/en-us/windows/forum/windows_cp-files/bug-report-serious-filesystem-corruption-and-data/17f69e19-92ca-4e1e-b9d5-f78f1ac4e963

      Bugs happen. The difference here is that Linux development is done in the open so people find out about them.

    8. Re:Low impact by negRo_slim · · Score: 2

      Source?
      Cuz I'm looking:

      http://en.wikipedia.org/wiki/Ntfs#Microsoft_Windows
      http://www.tomshardware.com/forum/1249-63-ntfs-win7-windows
      http://en.wikipedia.org/wiki/Ntfs#Versions

      And just not seeing "XP is incompatible with the newest version of NTFS"

      --
      On the Oregon Cost born and raised, On the beach is where I spent most of my days
    9. Re:Low impact by Anonymous Coward · · Score: 0

      That isn't a file system bug, that is progress. Would you consider it a bug if a Linux system from 1998 caused corruption on an ext4 volume?

    10. Re:Low impact by bertok · · Score: 1

      You don't get to see Microsoft's development versions and you don't get to see Microsoft's bug database.

      You're looking in the wrong place!

      They're called features, and they're on the technet website for all the world to see.

      Like how in older Windows versions, disks would be auto-mounted, and NTFS didn't have native active/active capability. In other words, if you made the slightest mistake in your FC zoning, then you could kiss your multi-terabyte cluster volume goodbye.

    11. Re:Low impact by Anonymous Coward · · Score: 1

      1) Windows 7 fucked up the Windows 8 partition not because of a bug, but because it isn't forward compatible
      2) Microsoft does not recommend dual booting Windows 8 with older Windows versions
      3) The guy was using a prerelease version of Windows 8

      Show me a bug where a specific version of Windows corrupts its OWN filesystem (ie. the filesystem that comes with it). You can't, because it never happens.

    12. Re:Low impact by h4rr4r · · Score: 1

      Show me their development and I bet I find one.

    13. Re:Low impact by jones_supa · · Score: 1, Troll

      Seriously, I have to agree here. It is extremely rare for NTFS to get corrupted under Windows. It just wins this battle.

      On Linux front, I presume FS corruption bugs partly arise from the continuously evolving R&D development style of the kernel. New file systems get invented all the time and previous ones get tweaked. Can't say if it's good or bad, it's just another way of doing things. I myself have not wished much since the journal support of ext3.

    14. Re:Low impact by the_other_chewey · · Score: 4, Insightful

      That isn't a file system bug, that is progress. Would you consider it a bug if a Linux system from 1998 caused corruption on an ext4 volume?

      Hell yeah.

      If it'd tell me it doesn't know the file system and has no idea what do do with it,
      that would be perfectly fine.

      But corrupting a file system just because it is unknown to/unsupported by the
      system trying to read it would be a huge bug.

    15. Re:Low impact by 0123456 · · Score: 1

      Seriously, I have to agree here. It is extremely rare for NTFS to get corrupted under Windows. It just wins this battle.

      I've never seen NTFS get corrupted. I have seen it delete multi-gigabyte files because they were open when Windows crashed.

      I've never seen ext3 get corrupted, or delete multi-gigabyte files because they were still open when Linux crashed (or, more likely, went down due to a power failure).

      I've never trusted ext4 after the early 'so what if I delete your data after a power failure?' arguments from the developers.

    16. Re:Low impact by sk999 · · Score: 4, Informative

      Still, for all of the shit that Linux users talk about Windows, WINDOWS has NEVER had anything as serious as a FILE system CORRUPTION bug.

      Finally, someone talking sense ... oh wait.

      http://www.computerworld.com/s/article/9054178/Microsoft_s_Windows_Home_Server_corrupts_files

      "Microsoft's Windows Home Server CORRUPTS FILES"
      "'Don't edit' list includes photos, as well as Quicken and QuickBooks files, warns Microsoft; no word on patch"

      Never mind ...

    17. Re:Low impact by Anonymous Coward · · Score: 0

      Except it wouldn't warn you. It simply would not recognize the file system and it would offer to format it for you. That happens because nobody has invented a time machine.

    18. Re:Low impact by Anonymous Coward · · Score: 0

      Nice try, but fail. That wasn't a bug in Windows, it was a bug in applications.

    19. Re:Low impact by Anonymous Coward · · Score: 0

      I think they were being sarcastic but didn't use the sarcasm font.

      I have had filesystem corruption caused by bugs and admitted and fixed by Microsoft from DOS to XP. I did skip the two bugs they claimed were an OS, Miwennium and Pista but DOS, WFW3.1/3.11/3.11 corporate/ Win95/98/2000/XP

    20. Re:Low impact by fatphil · · Score: 2

      >> Windows has never had anything as serious as a file system corruption bug.

      >That you know of...

      So what were all those chkdsk errors after BSODs?

      --
      Also FatPhil on SoylentNews, id 863
    21. Re:Low impact by sk999 · · Score: 4, Informative

      Nice try, but fail. That wasn't a bug in Windows, it was a bug in applications.

      Really? Not according to Microsoft.

      http://support.microsoft.com/kb/946676

      "A BUG has been discovered in the way that the initial release of Windows Home SERVER manages FILE transfer and balancing across multiple hard drives. In certain cases, depending on application use patterns, timing, and the workload that is placed on the Windows Home Server-based computer, certain FILES could become CORRUPTED."

      "... For distributing data across the different hard drives that are MANAGED by WINDOWS Home Server, the WINDOWS Home Server mini-filter driver REDIRECTS I/O ... A BUG has been discovered in the REDIRECTION mechanism which, in certain cases, depending on application use patterns, timing, and workload, may cause interactions between NTFS, the Memory Manager, and the Cache Manager to get out of sync. This causes CORRUPTED data to be written to FILES."

    22. Re:Low impact by sjames · · Score: 1

      I doubt that's true. They may not have released a version with such a bug, but they probably did have them at some point. Remember, the vanilla kernel and LKML are the FOSS equivilent of the internal development process and it's releases to QA.

      If you want the post QA versions, use a distro kernel.

    23. Re:Low impact by dotancohen · · Score: 2

      If it'd tell me it doesn't know the file system and has no idea what do do with it, that would be perfectly fine.

      But corrupting a file system just because it is unknown to/unsupported by the system trying to read it would be a huge bug.

      Windows did have this behaviour, by the way. In 2007 I had a Dell Inspiron laptop with two power buttons: one for Normal Windows and one for Media Center Windows. I had wiped the hard drive and installed Fedora on it. Powering with the normal button worked fine, but if by accident one were to power it on with the Media Center button then I would get the initial Media Center screen (I have no idea where that code was hiding, possibly in a hidden partition) and it would wipe all my ext3 filesystems.

      --
      It is dangerous to be right when the government is wrong.
    24. Re:Low impact by ffflala · · Score: 1

      Windows has never had anything as serious as a file system corruption bug.

      I believe they've accomplished this by ensuring that NTFS fails safely to a state of corrupt registry hive errors instead.

    25. Re:Low impact by Anonymous Coward · · Score: 0

      I still stick to ext2 for db partitions (postgres already has its own journal, thank you) and ext3 for everything else. In 10 years I might consider BTRFS (for the zfs-clone capabilities, which will probably be close to zfs by then)

    26. Re:Low impact by petman · · Score: 2

      I've had whole NTFS partitions get corrupted, twice. In both instances, the partitions were formatted under Linux, specifically Ubuntu.

      Lesson learnt is, never format an NTFS partition under Linux. Personally, I think this functionality should be disabled. It's way too dangerous.

    27. Re:Low impact by Anonymous Coward · · Score: 0

      I don't think comparing Windows unfavorably to Linux by bringing up corruption over a network filesystem is something you really want to do *cough*NFS*cough*.

    28. Re:Low impact by OhANameWhatName · · Score: 1

      Windows NT 3.51 had a FAT bug where after a file was appended to, the correct file size was not re-written to the FAT. The only way to identify the file size was to read the entire file byte for byte. Microsoft denied that it was a bug, didn't publicize the undocumented feature and never changed that particular behavior.

    29. Re:Low impact by Anonymous Coward · · Score: 0

      > Windows has never had anything as serious as a file system corruption bug.

      That you know of...

      One of the NT 3.51 service packs (I'm going to guess SP5, but it's been a loooong time) tickled a bug in the NTFS filesystem driver... which manifested itself by corrupting the new copy of NTFS.SYS as it was copied onto the system disk. This resulted in an unbootable machine (and a small irony implosion).

    30. Re:Low impact by smpoole7 · · Score: 2

      > Windows has never had anything as serious as a file system corruption bug.

      I'm going to assume that either you are joking, or you have only been using Windows for about 5 minutes.

      On the off chance that you are actually serious, Geoff Chappell documented a case some years ago in which Windows would occasionally toggle a byte (might have been a word; can't remember now) on the hard drive. Just one byte in a random sector somewhere on the drive. Happy flower sunshine.

      You should also Google "Windows disk corruption" and look at all the complaints and cries for help.

      One reason why I tried Linux, then switched to it and have stuck with it, was because I was sick and tired of having to run scandisk and/or chkdsk at least once a week on my Windows systems just to keep them running. At the time, I was a contract programmer doing a ton of development, and believe me, if you were constantly working the hard drive (as I was), you WOULD have corruption issues. At random, no explanation. You learned to do constant backups and to be prepared for anything.

      The only thing I've experienced even close to that under Linux is that the installer typically does a quick format instead of a full format. As a result, if you have a drive that's iffy and with bad sectors, the install will appear to complete successfully, but it won't work. The answer to that one is, "buy a new hard drive." :)

      (I had to learn that one the hard way. If you get ANY errors on a hard drive, just replace the blasted thing. Don't wait, either. Do it now.)

      Windows 7 seems to be fairly stable, but XP (just to name one) is notorious for just blowing things up at random. It might be a registry entry; it might be a corrupted executable image on disk. Who knows? But the standard cure is just to back up and reinstall.

      --
      Cogito, igitur comedam pizza.
    31. Re:Low impact by smpoole7 · · Score: 2

      OK, and now I'm probably off topic, but I'm an older guy and as we get older, we like to reminisce. (Between bellowed exhortations to remove ones feet from the lawn, of course.)

      I remember a million years ago, when I was developing VxDs for Windows 95. I rigged up the debugger to go active early in the boot ... and had to disable it.

      Windows 95 generated SO MANY faults during the boot, it took forever otherwise. I mean, it constantly klonged. Bang, bang, bang, one exception after another. They (mostly) went away when Windows 95 OSR2 appeared. :)

      Ah, memories ... Blue Screens of Death .. .. random disk corruption ... it was a beautiful thing.

      --
      Cogito, igitur comedam pizza.
    32. Re:Low impact by UltraZelda64 · · Score: 1

      Still, for all of the shit that Linux users talk about Windows, Windows has never had anything as serious as a file system corruption bug.

      I take it you've never experienced Win9x and Microsoft's FAT family of filesystems. If that's the case, you got lucky.

    33. Re:Low impact by Anonymous Coward · · Score: 0

      Windows XP (pre SP1) and a 48 bit LBA addressed 200 GB hard disk. "Hilarious" corruption to audio and video files...

    34. Re:Low impact by Anonymous Coward · · Score: 0

      Is that more of file corruption or file system corruption?

      The end effect is similar - corrupted or missing data, but not necessarily the same thing.

      Your example seems to me like a particular having a special FUSE module that causes files to be corrupted sometimes.

      Not so much like the Linux/Windows kernel having an Ext4/NTFS bug that causes data loss.

      With the first, most Linux/Windows users in the world can say "meh, don't care". With the latter, the users go uh-oh better hope I have backups.

    35. Re:Low impact by devent · · Score: 1

      How about the Windows Home Server?
      http://en.wikipedia.org/wiki/Windows_Home_Server#File_corruption

      The first release of Windows Home Server, RTM (Release to manufacturing), suffered from a file corruption flaw whereby files saved directly to or edited on shares on a WHS device could become corrupted.[29] Only the files that had NTFS Alternate Data Streams were susceptible to the flaw.[30] The flaw led to data corruption only when the server was under heavy load at the time when the file (with ADS) was being saved onto a share.[31]

      --
      http://www.mueller-public.de - My site http://www.anr-institute.com/ - Advanced Natural Research Institute
    36. Re:Low impact by Neil+Boekend · · Score: 2

      Windows 7 should not have automounted the partition once it detected it wasn't forward compatible with the partition formatting. Forced mounting and formatting would be possible user choices. The bug is in the detection (there may not be any) or the action after the detection.

      --
      Well, I might have a way, but it only works on a semi spherical planet in a vacuum.
    37. Re:Low impact by tirnacopu · · Score: 3, Interesting

      I got bit by this one: http://support.microsoft.com/kb/925308 on volumes with hundreds of thousands of small files. All who had a size multiple of 4kb were corrupted.

    38. Re:Low impact by Anonymous Coward · · Score: 0

      Well, note that this is the first ext* data corruption bug that I have ever had, in sixteen years of continuous Linux use, including the months when I had very bad RAM. The sheer fact that this bug made Phoronix, Slashdot, Heise and LWN before we'd even figured out what you had to do to trigger it suggests that such bugs are *rare*.

      In my experience ext4 is a reliable and trustworthy filesystem: my opinion is unchanged despite this one bug (and I note that fsck made the filesystems happy again after every instance of corruption, even if some data was chewed up).

        -- N.

    39. Re:Low impact by Anonymous Coward · · Score: 0

      It's probably improved but I had bad experiences trying to write to a Windows partition with ntfs-3g a few years back. It borked things.

    40. Re:Low impact by makomk · · Score: 1

      Actually, we do know of one really lovely bug in Windows Home Server. It didn't corrupt the filesystem metadata, but if you were foolish enough to save files to it from applications like Word the actual data got corrupted. Microsoft's advice was not to save data in ways that lead to it becoming corrupted; they didn't fix the underlying issue for months.

    41. Re:Low impact by Anonymous Coward · · Score: 0

      Umm, yeah, well.... if ANY operating system halts during a write operation, there is a chance of corruption. The EXT4 issue involves simply mounting and rebooting. Quite a difference, but not the kind of difference I'd expect someone like you to understand.

    42. Re:Low impact by Anonymous Coward · · Score: 0

      "Windows has never had anything as serious as a file system corruption bug."

      Huh? Aside from the less obvious problems that people have already mentioned with their SERVER distributions, you can find some unaddressed file/data loss bugs very easily.

      For example, Cut files out of a directory in windows 7. Go up a directory. Delete the directory. Go to another directory. Try paste. Granted, that's not technically the FS, but it's a pretty fucking obvious, pretty fucking severe bug.

      I seem to recall Windows 7 doing VERY BAD THINGS with file renames too (reproducibly), and files disappearing for a third, but it's been a while since I used it, so I don't recall the details now.

    43. Re:Low impact by Anonymous Coward · · Score: 0

      Wrong. It recognizes it as a ext2 volume with unsupported features and refuses to touch it.

    44. Re:Low impact by RockDoctor · · Score: 1
      That's a feature, not a bug. It's not a feature that you want, probably, but it's a feature that the people who designed the system and think that they still own it wanted.

      I'm a bit surprised that you got through putting a full-weight distro like Fedora on it and didn't notice the presence of peculiar partitioning schemes. Or was your "install" a matter of dropping the boot DVD into the drive and selecting the "unattended install" option. (It's been a long time since I did a Fedora myself - I don't know when / if it acquired such capabilities. Normally I like to know what is going onto my computers.

      --
      Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
    45. Re:Low impact by dotancohen · · Score: 1

      I did the partitioning myself, I always have: two for alternative /'s, one for swap and one for /home. I really don't know where the Media Center code hid. Possibly in an EPROM? I actually still have the machine, but the screen is unusable. I could plug it into a monitor if I were really curious.

      --
      It is dangerous to be right when the government is wrong.
    46. Re:Low impact by RockDoctor · · Score: 1
      Well, if it wasn't in a partition ... it could (I suppose) have been in known absolute sectors on the hard drive, outside the partitioning system in the same way that the boot code in a BIOS knows to get the partition table from [device]/Sector0, then to read the partition table to find out where the boot code is located. So, one way that I could see it being done would be to have BIOS code that responds to the second power switch, and goes to [device]/Sector(BIOSmaxSectorNumber), then read a location from there with some boot code in it.

      A guess : when you wrote your partition table and then made file systems on the partitions, you didn't clear the formatted partitions and overwrite everything with zeros. (Who does on the size of hard drives this decade?) So, even after writing your own partition table, and formatting the partitions, much of the Media Centre boot code could have survived. Second guess : the Media Centre hard drive had X sectors, but the partition scheme only covered X-[some] sectors. "some" could well be quite small (display a splash screen ; read some configuration file ; boot Windoze with certain parameters) ; conceivably just a few sectors. Just because writing compact code to the bare metal isn't exactly popular these days, doesn't mean that the Evil Empire couldn't hire Melto do it.

      every instruction he wrote could also be considered
      a numerical constant.
      He could pick up an earlier âoeaddâ instruction, say,
      and multiply by it,
      if it had the right numeric value.

      No, I still don't understand the "separate constants" bit ; at least not while I'm sober. Hail Mel!

      --
      Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
    47. Re:Low impact by dotancohen · · Score: 1

      That was some tangent!

      Here is a picture where both power buttons are visible, for the curious (it gives me shivers, I actually covered the second button after the second loss):
      http://www.notebookreview.com/assets/10236.jpg

      Interestingly, googling for some information on the Media Center (or Media Direct) I see almost nothing, as if there were never any issues with it or as if nobody ever used it!

      --
      It is dangerous to be right when the government is wrong.
    48. Re:Low impact by RockDoctor · · Score: 1
      If that computer and configuration was available in my country, I'd not have looked at it in any respect other than to check out the hardware specification to see if it was good enough value (hardware-wise) for the money. I'm not interested in "media centres" in any way shape or form (I don't waste time on music ; I put discs in the DVD player and then return them to the rental company ; podcast MP3s go onto the MP3 player for use on the bike) ; and I dumped Windows years ago after some abortion called Vista came into effect. So, there's nothing left to consider apart from how the hardware works.

      Meanwhile, the laptop whose graphics card I recently static'd has just been replaced, and for warranty reasons, I'm just cloning the hard drive before I power it up, for warranty reasons. According to a sticker on the machine it has "Windows 7" on it, but that just means that I need to clone the drive before I use the computer. How Win7 works and how it behaves doesn't even raise waning interest. Does this version of Windows still play media?

      --
      Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
    49. Re:Low impact by dotancohen · · Score: 1

      I wouldn't know either: I've been using Linux-bases OSes since 2001 and the last time I did try to install Windows for a neighbour, it wouldn't open Word files out of the box! Ubuntu just so happens to open Word files out of the box, by the way.

      Yeah, I had to give up on Fedora, but I do still prefer CentOS on the server even if I prefer Debian-based at home.

      --
      It is dangerous to be right when the government is wrong.
  3. This is why I stick to Reiser by Anonymous Coward · · Score: 5, Funny

    I know he'd never do anything to harm me or my data.

    1. Re:This is why I stick to Reiser by Anonymous Coward · · Score: 2, Funny

      Or your wife?

    2. Re:This is why I stick to Reiser by Anonymous Coward · · Score: 1

      There's a kind of sad irony to your comment. The people most enamored by the beauty of logic/algorithms/pure mathematics probably find it difficult to deal with the ugly realities of real life.

    3. Re:This is why I stick to Reiser by localhost8080 · · Score: 2, Funny

      yeah, reiser 4 has some killer features

    4. Re:This is why I stick to Reiser by psm321 · · Score: 1

      I know you're making a joke about the person, but I've had many corruption issues with ReiserFS. Granted, this was in its earlier days, but after it had been declared stable for use. I gave up on it after the problems, so no idea if later versions improved.

    5. Re:This is why I stick to Reiser by Anonymous Coward · · Score: 1

      Only if she needed .... correction.....

    6. Re:This is why I stick to Reiser by Anonymous Coward · · Score: 0

      G
      He loved 'your' wife. It was his own he wasn't so keen on.

    7. Re:This is why I stick to Reiser by Anonymous Coward · · Score: 0

      Weird. Reiser is the ONLY filesystem that HASN'T burnt me (since maybe 2000? 2.3.99-pre9). Of course, my problems with ext* have to do with stupid defaults related to journaling behavior, but what are you going to do, RTFM and then special notes on the filesystem around the internet?

      I think in general filesystems are fairly hard and people overthink just about everything, which lands us into broken behaviors like this new ext4 oops. Hell, even LogFS is making a resurgance, against any good sense.

    8. Re:This is why I stick to Reiser by davester666 · · Score: 2

      That was the problem. Somebody filed a bug report on his wife.

      --
      Sleep your way to a whiter smile...date a dentist!
    9. Re:This is why I stick to Reiser by Anonymous Coward · · Score: 0

      There are 16 in the list, versus (for instance) 288 writers who committed suicide (ignoring numerous subcategories). I wouldn't exactly call that a notable or significant trend.

  4. Reinventing the wheel by Anonymous Coward · · Score: 0

    It's a pity they can use ZFS instead of re-inventing the wheel. The other pity is that newest distro seems to force you to use EXT4 at installation (on your desktop).

    1. Re:Reinventing the wheel by UnknownSoldier · · Score: 4, Interesting

      I have to agree with you. This is one of the best demos of ZFS around :)
      http://www.youtube.com/watch?v=QGIwg6ye1gE

      ZFS solves 3 problems by taking a wholistic approach:

      * Volume Management
      * File System
      * Data Integrity

      Instead of fragmenting the problem into 3 layers which only have limited access and knowledge by using a unified layer you have more meta-information available to make smarter decisions.

      Some interesting essays:

      https://blogs.oracle.com/bonwick/entry/raid_z
      https://blogs.oracle.com/bonwick/en_US/entry/rampant_layering_violation

    2. Re:Reinventing the wheel by dimeglio · · Score: 1

      ...or XFS with a recent kernel.

      --
      Views expressed do not necessarily reflect those of the author.
    3. Re:Reinventing the wheel by h4rr4r · · Score: 1, Troll

      Hopefully BTFS will conquer this.

      Blame SUN, they choose a license for ZFS to ensure it never had proper in kernel linux support. They did that because Linux was eating their lunch and still is.

    4. Re:Reinventing the wheel by UnknownSoldier · · Score: 4, Interesting

      > Blame SUN, they choose a license for ZFS to ensure it never had proper in kernel linux support.

      That's a myth / blatant lie.

      Fork Yeah! The Rise and Development of illumos
      http://www.youtube.com/watch?feature=player_detailpage&v=-zRN7XLCRhc#t=1460s

      Why You Need ZFS
      http://www.youtube.com/watch?v=6F9bscdqRpo
      @5:40 I just want to clarify you comment "It would be illegal to ship"
      @5:45 I think there is a perception issue that we need to tackle.
      @5:55 One point that I would like to make because I think said earlier that I think we have much more in common then that separates us.
      @5:58 One of the most important things we all have in common is we are all open source systems.
      @6:02 And we need to end this self inflicted madness of open source licensing compatibility.
      @6:12 I think that it is a boogey man and we letting it us hold us back.
      @6:19 You say it would be illegal to ship. I say no one has standing
      @6:24 The GPL was never ever designed to counter-act other open source licenses.
      @6:33 That is a complete rewrite of history to believe the GPL was designed to be at war with BSD or with Cuddle.
      @6:39 The GPL was at war with properiety softwware. And thank the GPL and Stallman open source won.
      @6:45 That is the whole point. Open source won.
      @6:49 We are pissing on our own victory parade by not allowing these technologies to flow between systems.

    5. Re:Reinventing the wheel by UnknownSoldier · · Score: 0

      > Hopefully BTRFS will conquer this.

      While I agree btrfs looks very interesting however, unfortunately, they are not taking a wholistic approach to the design so currently they will never match what ZFS has. Now IF they take a step back and incorporate ALL the layers like ZFS does then they will have a chance.

      But do you really want another few years for btrfs to get it "right" when ZFS has already been debugged?

       

    6. Re:Reinventing the wheel by h4rr4r · · Score: 1
    7. Re:Reinventing the wheel by h4rr4r · · Score: 1

      ZFS has not already been debugged on linux. Is there even a non-FUSE ZFS implementation for linux?

      I am not sure everything has to be done in one step. Do one thing and do it well. This holistic idea is nice in concept but often leads to the windows outcome. Not much gets done and what gets done is not that great if at any point "just works" just doesn't.

    8. Re:Reinventing the wheel by amorsen · · Score: 1

      That's a myth / blatant lie.

      You are going to come up with better arguments than that. Your quotes do not support that statement.

      Sun was about as Linux-hostile as any company could get, basically from 1995 and forwards. They tried to do as much as they could to make sure that Linux did not benefit in any way from any Solaris or Sun technology.

      Of course it makes sense that they tried to fight against the OS which was destined to make them obsolete. Luckily they did not have a particularly competent legal team.

      --
      Finally! A year of moderation! Ready for 2019?
    9. Re:Reinventing the wheel by icebraining · · Score: 1

      There's a native kernel port of ZFS for Linux: http://zfsonlinux.org/

    10. Re:Reinventing the wheel by UnknownSoldier · · Score: 1

      > Your quotes do not support that statement.

      I'm not sure how clearer you can get with "Open source won. We are pissing on our own victory parade by not allowing these technologies to flow between systems."

      You _do_ realize who said them, right?

      Bryan Cantrill (wrote dtrace) worked with Jeff Bonwick (designed/wrote ZFS.) They were both together at Sun for 14 and 20 years respectively. If you watch the "Fork Yeah!" video the impression I get is that it looks like they wanted to open source as much possible but was held back by legal.

      The _only_ other two people who could weigh in would be the people who designed ZFS and the GPL.

      * Jeff Bonwick, and
      * Richard Stallman

      I don't know anyone else who _would_ actually have credibility in settling the question. Do you?

    11. Re:Reinventing the wheel by VortexCortex · · Score: 1

      The GPL was at war with properiety softwware. And thank the GPL and Stallman open source won.

      Amen.

    12. Re:Reinventing the wheel by fatphil · · Score: 1

      > they wanted to open source as much possible but was held back by legal.

      The legal dept. at Sun?

      --
      Also FatPhil on SoylentNews, id 863
    13. Re:Reinventing the wheel by Anonymous Coward · · Score: 0

      HA HA HA

      It is great that ZFS has data integrity protection features and will let you know unequivocally when your data has been trashed. It would be nice if all common filesystems in 2012 had this feature...

      But if you have never been bitten by ZFS bugs, you haven't used ZFS very much.

      An online filesystem with new features to prevent undetected corruption will never replace the need for backups.

    14. Re:Reinventing the wheel by amorsen · · Score: 1

      If you watch the "Fork Yeah!" video the impression I get is that it looks like they wanted to open source as much possible but was held back by legal.

      So what if certain engineers wanted to open source things? They didn't get to make that decision.

      The quotes are implying that the GPL does not work and that you can combine CDDL-licensed code with GPL'd code and distribute the combination. That position is rather weird, but then again Sun did suffer from a reality distortion field when it came to legal issues. The only other person I have heard of with the same view is Jörg Schilling.

      --
      Finally! A year of moderation! Ready for 2019?
    15. Re:Reinventing the wheel by makomk · · Score: 1

      I have to disagree. ZFS was infamous for filesystem metadata corruption issues amongst people who tried to use it seriously. If you were lucky it detected the corruption and remounted read-only, otherwise it kernel paniced the moment you tried to mount the FS and there was no way to recover data short of manually repairing the FS with a hex editor (ZFS didn't have a working fsck, partly for marketing reasons).

    16. Re:Reinventing the wheel by UnknownSoldier · · Score: 1

      > ZFS didn't have a working fsck, partly for marketing reasons

      If your File System (FS) needs fsck to recover from errors your FS _design_ is shoddy and incomplete.

      Your FS should NEVER get in that STATE in the first place! That's like locking the barn door after the horses escaped.

      Unfortunately people want to trade security for performance.

    17. Re:Reinventing the wheel by Anonymous Coward · · Score: 0

      And how much data have you lost because of XFS? After many years of XFS and a few serious situations there's still no known data I have lost because of XFS whereas ext4 managed to corrupt stuff literally every other week and best not a single mention in any log that ext4 has sent data to the outernet. From time to time I still find among files that were copied over from ext4 ones that are corrupt but no new files are corrupt and files protected by hashes no longer go corrupt either (used to happen every other week with ext4). And, yes, same hard drives.

  5. I don't see the problem then... by Zapotek · · Score: 5, Funny

    The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.

    We're talking about Linux users here...move along.

    1. Re:I don't see the problem then... by Anonymous Coward · · Score: 1

      Knew i shouldn't have dual booted my machine...

    2. Re:I don't see the problem then... by tessellated · · Score: 0

      Where are my moderator points when I *need* them?
      +1 funny

      --
      'When the Going gets Weird, the Weird turn Pro.' - Hunter S. Thompson
    3. Re:I don't see the problem then... by Anonymous Coward · · Score: 0

      To add, a lot of people like to wait months before upgrading, if not years. At least I do, I'm always a number behind.

    4. Re:I don't see the problem then... by vistapwns · · Score: 1, Troll

      What is it about Linux users' jokes that remind me of the Iraqi Information Minister? ;)

      --
      "...I think the Microsoft hatred is a disease." - Linus Torvalds
    5. Re:I don't see the problem then... by starless · · Score: 1

      Even though my linux desktop machine runs for long periods without needing rebooting, there are exceptions:
      My several year old Pioneer television runs linux. It crashes and reboots if I change HD channels more than 5 or 6 times.
      My roku box needs to be rebooted from time to time.
      So does my android phone.

    6. Re:I don't see the problem then... by fatphil · · Score: 0

      Don't Gentoo users recompile their kernel and reboot at least daily?

      (I can take your flamebaits, but I demand as many funnys to make up for them!)

      --
      Also FatPhil on SoylentNews, id 863
    7. Re:I don't see the problem then... by RR · · Score: 1

      Even though my linux desktop machine runs for long periods without needing rebooting, there are exceptions: My several year old Pioneer television runs linux. It crashes and reboots if I change HD channels more than 5 or 6 times. My roku box needs to be rebooted from time to time. So does my android phone.

      All those are also unlikely to be running EXT4. They store the system on flash and use SquashFS, JFFS2, or YAFFS2. The ones that use eMMC might use EXT4, but Samsung just donated F2FS for that use.

      Also, they tend to use very old kernels.

      --
      Have a nice time.
    8. Re:I don't see the problem then... by Anonymous Coward · · Score: 0

      Trolls tell lies to get a rise, check the OPs comment (all 30-odd of them) history and see for yourself.

    9. Re:I don't see the problem then... by Rich0 · · Score: 1

      Nope - Greg does a decent job with the Gentoo stable kernels. Granted, the current Gentoo stable kernel has a different ext4 bug that can cause panics when files are deleted, which is why I'm running unstable at the moment (I was getting nightly crashes when tmpreaper ran). Oh, the irony.

    10. Re:I don't see the problem then... by Anonymous Coward · · Score: 0

      Actually, Samsung itself uses ext4 on its newest phones/tablets' eMMC partitions. If F2FS was any better than ext4, you'd think the Galaxy S3 would run that, and not ext4, right?

  6. Really clever... by K.+S.+Kyosuke · · Score: 5, Funny

    The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."

    They're trying to boost the average uptime of all installations by making people keep their machines turned on. It's just a continuation of the uptime war waged with the BSD folks!

    --
    Ezekiel 23:20
    1. Re:Really clever... by Anonymous Coward · · Score: 0

      BSD will always win the uptime war. Linux should just stay home

    2. Re:Really clever... by OzoneLad · · Score: 1

      Actually, it's trying to punish you for having a crappy uptime.

  7. LKML Slashdotted by o'reor · · Score: 1

    Brilliant. Well, it certainly worries this Linux developer -- although I mostly rely on pre-3.0 kernels. Wasn't there a rule on Slashdot about mirroring articles before posting links to them ?

    --
    In Soviet Russia, our new overlords are belong to all your base.
    1. Re:LKML Slashdotted by Bill,+Shooter+of+Bul · · Score: 1

      Not that I've ever remebered. It was oft suggusted in comments, but most websites are nearly slashdot prooff these days. Kind of surprised that lkml is so sluggish under the load.

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
    2. Re:LKML Slashdotted by Anonymous Coward · · Score: 1

      Kind of surprised that lkml is so sluggish under the load.

      That's because they never wanted to reboot because of possible file system corruption. So they are still running on that 386DX they got 23 years ago.

    3. Re:LKML Slashdotted by Bill,+Shooter+of+Bul · · Score: 1

      I just pray someone hit the turbo button, we need all of that DX and all of the number co-processing it can give us.

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
    4. Re:LKML Slashdotted by Score+Whore · · Score: 2

      It was the 486DX that brought the FPU on chip. The 386DX had a 32-bit wide data bus and the 386SX has a 16-bit wide data bus, as well as only 24-bits of the address bus hooked up externally.

    5. Re:LKML Slashdotted by Anonymous Coward · · Score: 0

      Turbo button, all computers need a turbo button! My next machine will have one again, even if it does absolutely NOTHING! :)

    6. Re:LKML Slashdotted by Bill,+Shooter+of+Bul · · Score: 1

      D'oh. Always forgot that bit. DX was always whatever the manufactorer wanted it to be.

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
  8. Interesting bug, but don't get excited. by dacut · · Score: 5, Informative

    From Ted Ts'o's commentary, it's an optimization ("jbd2: don't write superblock when if its empty") gone awry:

    The reason why the problem happens rarely is that the effect of the buggy commit is that if the journal's starting block is zero, we fail to truncate the journal when we unmount the file system. This can happen if we mount and then unmount the file system fairly quickly, before the log has a chance to wrap.

    Basically, this optimization has the side effect of not updating the transaction log in this rare case. You can end up replaying old transactions after new ones, which will scramble metadata blocks. Given the rather unique conditions needed to hit this one, I'm not going to lose any sleep over any servers running without Ted's fix (though I'll certainly apply it once RedHat releases the patch).

    1. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 0

      though I'll certainly apply it once RedHat releases the patch

      So, in a couple of years you'll be 100% ok.

    2. Re:Interesting bug, but don't get excited. by Tough+Love · · Score: 4, Informative

      It means you could get an incorrect replay after a crash and end up needing to do a fsck. Good thing Ext2/3/4 fsck is awesome. Of course, having no replay bug will be much better. Note: the bug was introduced this October 8th. You are not running this kernel on your server or workstation unless you are a dev, it hasn't filtered through to distros yet.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    3. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 0

      So, you'd have to do two or more reboots in quick sequence to trigger it?

      I do that sometimes when tweaking stuff in /etc, just to make sure it comes up in a coherent state.

    4. Re:Interesting bug, but don't get excited. by NotBorg · · Score: 1

      You are not running this kernel on your server or workstation unless you are a dev, it hasn't filtered through to distros yet.

      I'm a crazy, bad ass, rebel that uses ArchLinux for my workstation. Living wild and dangerous, I reclessly shutdown my heathen ext4 computer every night. I feel like I'm that evil mayhem guy on the Allstate commercials. RECALCULATING!

      --
      I want this account deleted.
    5. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 0

      The summary says kernels 3.4, 3.5 and 3.6 are affected. There are certainly distributions out there using 3.4 and 3.5 kernels.

    6. Re:Interesting bug, but don't get excited. by Shimbo · · Score: 3, Insightful

      There are certainly distributions out there using 3.4 and 3.5 kernels.

      Yes, but not many of them will push kernel updates all the way through to end users in a couple of weeks.

    7. Re:Interesting bug, but don't get excited. by Bradmont · · Score: 1

      > it hasn't filtered through to distros yet.

      FTA:
      > Linux 3.4, 3.5, 3.6 stable kernels

      I'm running Ubuntu 12.10 stock kernel:
      % uname -r
      3.5.0-17-generic

    8. Re:Interesting bug, but don't get excited. by WuphonsReach · · Score: 2

      Note: the bug was introduced this October 8th.

      Probably one of the more informative comments here.

      --
      Wolde you bothe eate your cake, and have your cake?
    9. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 1

      Given that you're running RedHat servers, I don't think you'll need to lose any sleep over bugs in 3.4+ kernels for at least another two years.

    10. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 2, Insightful

      The offending commit is present in both Ubuntu's 12.10 and 13.04 generic kernels, though the package version are in proposed repositories.

    11. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 5, Informative

      Ubuntu users are at risk.

      http://www.ubuntuupdates.org/package/core/quantal/main/proposed/linux-image-3.5.0-18-generic

      Look for " jbd2: don't write superblock when if its empty
              - LP: #1066176"

      If any Ubuntu users have proposed repo enabled and they've updated to 3.5.0-18, they're vulnerable.

    12. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 0

      The bug is not in 3.5.0-17-generic, but it is in 3.5.0-18-generic. If you have the proposed repo enabled, don't update linux-image or you will be vulnerable.

    13. Re:Interesting bug, but don't get excited. by fatphil · · Score: 4, Informative

      $ git show eeecef0af5e
      commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3
      Author: Eric Sandeen <sandeen@redhat.com>
      Date: Sat Aug 18 22:29:40 2012 -0400

              jbd2: don't write superblock when if its empty

      --
      Also FatPhil on SoylentNews, id 863
    14. Re:Interesting bug, but don't get excited. by fatphil · · Score: 2

      That's Linus' tree. This is Greg's:

      linux-stable$ git show 14b4ed22a6
      commit 14b4ed22a6b5fc1549504336131be4f5f6ba1bf4
      Author: Eric Sandeen <sandeen@redhat.com>
      Date: Sat Aug 18 22:29:40 2012 -0400

              jbd2: don't write superblock when if its empty

              commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3 upstream.

      --
      Also FatPhil on SoylentNews, id 863
    15. Re:Interesting bug, but don't get excited. by FoolishOwl · · Score: 1

      I keep my Fedora VM regularly updated from the stable repository.

      $ uname -a
      Linux fedora.vm 3.6.2-4.fc17.x86_64 #1 SMP Wed Oct 17 02:43:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

      So yes, at least one major Linux distribution is using the 3.6 kernel.

    16. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 1

      If you're using the proposed repo which is equivalent to Debian's experimental repos then you expect crashes, breakage and bugs.

    17. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 0

      Linux htpc 3.6.3-1-ARCH #1 SMP PREEMPT Mon Oct 22 10:23:56 CEST 2012 x86_64 GNU/Linux

    18. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 0

      >
      I'm running Ubuntu 12.10 stock kernel:
      % uname -r
      3.5.0-17-generic

      Have you updated your 3.5 kernel since Oct 8 with freshly obtained sources? Has the Ubuntu repo? If not then I don't think it is possible for your kernel to have this bug.

    19. Re:Interesting bug, but don't get excited. by Anonymous Coward · · Score: 0

      Proposed is a repo for final testing before moving into main distribution. Packages are supposed to be stable by the time they hit proposed. Kernel devs and other testers use PPA's for the really buggy stuff.

    20. Re:Interesting bug, but don't get excited. by drinkypoo · · Score: 1

      Packages are supposed to be stable by the time they hit proposed

      Packages are supposed to be believed stable by the time they hit proposed. If you're submitting bug reports then we thank you for using that repo, though. Someone has to.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  9. Not defined by Anonymous Coward · · Score: 0

    Please define "too often" .... ?!?!

    1. Re:Not defined by Anonymous Coward · · Score: 1

      Write large chunks of data to every filesystem and force the journals to cycle before reboot. If you have to ask "how often is too often?", then you're probably already in trouble.

      Article suggested that people who shut down every day, say a laptop owner who doesn't use suspend/hibernate, will probably bump up against this. My suspicion is that those of us with uptimes of several months will have no trouble, but YMMV.

    2. Re:Not defined by zonky · · Score: 1

      I'm a laptop owner, who uses Dmcrypt, and with a 2 second boot time off SSD, i never bother hibernating. Better check what kernel....

    3. Re:Not defined by h4rr4r · · Score: 2

      This one occurred in october so pretty doubtful since none of the major distros are that up to date.

    4. Re:Not defined by DeathFromSomewhere · · Score: 1

      From a fully updated Ubuntu 12.10 (no patch for this bug yet):

      $ uname -r
      3.5.0-17-generic

      From the summary:
      The latest Linux 3.4, 3.5, 3.6 stable kernels have an EXT4 file-system bug

      --
      -1 overrated isn't the same thing as "I disagree".
    5. Re:Not defined by fatphil · · Score: 1

      It's not from october:

      linux-stable$ git show 14b4ed22a6
      commit 14b4ed22a6b5fc1549504336131be4f5f6ba1bf4
      Author: Eric Sandeen <sandeen@redhat.com>
      Date: Sat Aug 18 22:29:40 2012 -0400

              jbd2: don't write superblock when if its empty

              commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3 upstream.

      --
      Also FatPhil on SoylentNews, id 863
  10. The file system dug too greedily... by Bovius · · Score: 3, Funny

    ...and too deep. It awoke a being of segfaults and kernel panics.

    1. Re:The file system dug too greedily... by Anonymous Coward · · Score: 0

      But there was candy. Glorious, glorious candy.

    2. Re:The file system dug too greedily... by Jade_Wayfarer · · Score: 1

      I'd prefer lazy over greedy any day...

      --
      Absence of proof != proof of absence.
  11. Part of the game by ntropia · · Score: 2

    At first I had mixed feelings of slight disappointment and concern, especially because it is the default filesystem in several distros, (including Android). Although, after some second thoughts, I have come to the following conclusions:

    1) it is part of the game of having a continuous development toward improvement (most of the times) and new features implies some pitfalls. So far, benefits are much larger than costs.

    2) Despite the fact developers are still working on a fix, I wouldn't be surprised if it would be found soon.

    3) ...please, guys, don't do it again!

    1. Re:Part of the game by compro01 · · Score: 1

      This bug is only 10 days old. It's rather unlikely this has percolated down to anything important, much less Android, which still runs 3.0.31 from May.

      --
      upon the advice of my lawyer, i have no sig at this time
    2. Re:Part of the game by fatphil · · Score: 3, Informative

      It is *not* 10 days old.

      linux-stable$ git show 14b4ed22a6
      commit 14b4ed22a6b5fc1549504336131be4f5f6ba1bf4
      Author: Eric Sandeen <sandeen@redhat.com>
      Date: Sat Aug 18 22:29:40 2012 -0400

              jbd2: don't write superblock when if its empty

              commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3 upstream.

              This sequence:

              # truncate --size=1g fsfile
              # mkfs.ext4 -F fsfile
              # mount -o loop,ro fsfile /mnt
              # umount /mnt
              # dmesg | tail

              results in an IO error when unmounting the RO filesystem:

              [ 318.020828] Buffer I/O error on device loop1, logical block 196608
              [ 318.027024] lost page write due to I/O error on loop1
              [ 318.032088] JBD2: Error -5 detected when updating journal superblock for loop1-8.

              This was a regression introduced by commit 24bcc89c7e7c: "jbd2: split
              updating of journal superblock and marking journal empty".

              Signed-off-by: Eric Sandeen <sandeen@redhat.com>
              Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
              Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

      diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
      index e149b99..484b8d1 100644
      --- a/fs/jbd2/journal.c
      +++ b/fs/jbd2/journal.c
      @@ -1354,6 +1354,11 @@ static void jbd2_mark_journal_empty(journal_t *journal)

                      BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
                      read_lock(&journal->j_state_lock);
      + /* Is it already empty? */
      + if (sb->s_start == 0) {
      + read_unlock(&journal->j_state_lock);
      + return;
      + }
                      jbd_debug(1, "JBD2: Marking journal as empty (seq %d)\n",
                                          journal->j_tail_sequence);

      --
      Also FatPhil on SoylentNews, id 863
    3. Re:Part of the game by SiggyTheViking · · Score: 1

      >>>uname -r
      3.6.2-4.fc17.x86_64
      I suppose Fedora Core 17 isn't really important anymore, now that all the cool kids are on Arch...

  12. too new by Anonymous Coward · · Score: 1

    This is why I don't use file systems less than 10 years old.

  13. Reiserfs became 'murderfs'... by Omnifarious · · Score: 1

    What term do we get to use for ext4 now? It's unfortunate that Theodore Tso is actually a pretty decent guy instead of being a murderer (and a jerk). So there aren't any obviously negative terms that come to mind.

    But clearly, something needs to be done along these lines, as well as a legion of people who forever more claim that ext4 corrupts your data and you should never use it and stick with ext3 instead.

    1. Re:Reiserfs became 'murderfs'... by Anonymous Coward · · Score: 5, Funny

      So clearly the answer is General Tso's FS. Delicious, but you'll lose your data an hour later.

    2. Re:Reiserfs became 'murderfs'... by Anonymous Coward · · Score: 0

      Must be all the MSG.

    3. Re:Reiserfs became 'murderfs'... by corychristison · · Score: 1

      What term to we get to use for ext4 now?

      EXTerminator 4. Because its just awful. (Not really)
      EXTerminator 4. Because its corruptt
      EXTerminator 4. Because its on a (data) killing spree.

    4. Re:Reiserfs became 'murderfs'... by Anonymous Coward · · Score: 0

      He should have stuck with chicken. And the military.

  14. Workaround by Anonymous Coward · · Score: 0

    After recently discovering pm-suspend on my desktop, I have found I never need to turn off my computer again! Use "sudo visudo" to get rid of the annoying pw prompt.

  15. Your Papers Please by Anonymous Coward · · Score: 5, Funny

    grammar nazi's

    grammar Nazis

    1. Re:Your Papers Please by Anonymous Coward · · Score: 0

      grammar nazi's

      grammar Nazis

      Grammar Nazis.

    2. Re:Your Papers Please by Abreu · · Score: 1

      Grammar Nazis nazi'd grammar nazi'd grammar Nazis.
      .
      .
      .
      .
      (yeah, I had to invent the verb "to nazi" for this to work, but hey! It's a joke!)

      --
      No sig for the moment.
    3. Re:Your Papers Please by Anonymous Coward · · Score: 0

      Grammar Nazis nazi'd grammar nazi'd grammar Nazis.
      .
      (yeah, I had to invent the verb "to nazi" for this to work, but hey! It's a joke!)

      For that to work, I had to redefine "joke".

  16. Summary is wrong by DrJimbo · · Score: 5, Informative

    The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.

    This is wrong. The problem occurs when the fs is unmounted too *soon*. Twice in a row. The bug only appears if the journal buffer does not wrap. You only get catastrophic results if this happens twice in a row.

    --
    We don't see the world as it is, we see it as we are.
    -- Anais Nin
    1. Re:Summary is wrong by Anonymous Coward · · Score: 5, Interesting

      This appears to be untrue. My latest tests suggest that it happens if a single unclean umount happens while the fs is mounted in 3.6.3. (At least, I saw corruption in /var after a single boot, followed by a rescue boot into 3.6.1 and fsck: every filesystem that had journal replay invoked also had corruption.)

        -- N., original reporter, not much enjoying his fifteen minutes of fame since it comes with happy fun filesystem corruption attached: captcha is 'contrite', how appropriate

    2. Re:Summary is wrong by Anonymous Coward · · Score: 1

      Aside: let nobody say Oracle doesn't contribute to Linux, thankyouverymuch. No complaints from anyone @work while I tracked this down, even though it is quite far removed from DTrace work. (Admittedly it is sort of hard to work on anything while your filesystem is fried.)

    3. Re:Summary is wrong by DrJimbo · · Score: 1

      I suspect that unclean umounts may trigger the bug too but that does not contradict anything I said. I did not say there was no corruption when you hit the bug once, I said there was catastrophic corruption when you hit it twice in a row. If a bug can be triggered by a clean umount, it is not very surprising if it also gets triggered by an unclean umount.

      Your experience seems to confirm my correction. It is not about how *often* you mount, it is about how you umount. This is a non-trivial distinction because the misleading summary could tend to encourage some people who have been safely using a buggy kernel to unwittingly engage in behavior that triggers the bug, perhaps catastrophically.

      --
      We don't see the world as it is, we see it as we are.
      -- Anais Nin
    4. Re:Summary is wrong by Bronster · · Score: 1

      Man - I should wander back to the other place and read your war stories.

      So when is Oracle going to release ZFS to the Linux world rather than pushing btrfs which is still not finished?

    5. Re:Summary is wrong by Anonymous Coward · · Score: 0

      Agreed. This is confirmed by the fact that the only other person to report this problem reported it on a USB stick -- a slow-writing device which can easily get removed after a umount begins but before it terminates. It is quite possible that a normal unclean umount doesn't see this (I haven't tested that case yet). So this may well be a storm in a teacup over a bug that isn't likely to affect anyone much other than me.

      I consider this a reason to leave my shutdown scripts as they are, arguably insane though they be: they're stress-testing ext4 in ways that few are, and demonstrably combing out obscure bugs.

    6. Re:Summary is wrong by Anonymous Coward · · Score: 0

      My war stories? I have surprisingly few: this job is so much better than the last... I can thank the other place for this job, come to that. But yes, do come back, there are too few people there these days.

      As for releasing ZFS, that's a different division and I suspect I'd only be able to say anything about release dates if I knew nothing about it anyway! :)

      (As for releasing DTrace, well, there I *do* know something, so I say nothing. Bwahaha!)

    7. Re:Summary is wrong by Anonymous Coward · · Score: 0

      The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.

      This is wrong. The problem occurs when the fs is unmounted too *soon*.
      Twice in a row.
      The bug only appears if the journal buffer does not wrap.
      You only get catastrophic results if this happens twice in a row.

      Hey Steve. Didn't you get slapped enough after your "your holding it wrong comment"

    8. Re:Summary is wrong by Anonymous Coward · · Score: 0

      And when will Oracle release ACFS drivers for 2.6.39-300... such a nice kernel, can't use it because I can't mount my backups filesystem...

    9. Re:Summary is wrong by Anonymous Coward · · Score: 1

      The danger of things like this hitting Phoronix and Slashdot while we're still characterizing the problem is that it can spread what turn out to be inaccuracies. My latest tests suggest that you have to reboot (or disconnect the USB key or whatever) *while umounting* to cause this corruption.

      3.6.3 and possibly 3.6.2 are particularly vulnerable, in that even a lazy umount followed by a five-second sleep was causing corruption: but 3.6.1 and possibly every Linux kernel ever (I haven't tested) *is* vulnerable, if you try hard enough: it's just that in that case you have to reboot right away after starting the umount if things are to go wrong.

      This is still a disturbing bug, but possibly not a dangerous one unless you are have weird mount hierarchies, like me, with nested local and NFS mounts, requiring lazy umount to avoid the possibility of a permanent lockup waiting for an unresponsive NFS server in order to access the local fs's mount point during shutdown.

    10. Re:Summary is wrong by Anonymous Coward · · Score: 0

      When is Oracle going to support pNFS 4.1 , when is next solaris release?

    11. Re:Summary is wrong by Anonymous Coward · · Score: 0

      I'm so sorry for you, but thank you for reporting this, so that the rest of us don't have to go through what you did.

      Small comfort, I know.

      You make me proud of F/OSS! I hope I can someday repay the favor.

  17. Re:For butts sake by bluefoxlucid · · Score: 2

    I have used BSD. I found it .... quite striking. There's a hell of a lot of performance enhancement in Linux, and it really shows when you try to boot BSD and find it's ass-slow from the get-go. I even tried slapping down Debian-kfreebsd to compare something roughly the same and ... yeah it's just slow as shit. Solaris (both Sun Solaris and Nexenta = Ubuntu/Solaris) wasn't that slow.

  18. In other words... by Anonymous Coward · · Score: 1, Funny

    This is what you get when you use a filesystem that wasn't developed by a real company.

    Because if they had to worry about losing money, they would make damned sure that problem didn't exist. Or at least make it go away. I thought this "problem" existed with ext4 for years.

    Yeah, Micro$oft is evil, but their FS works. And file corruption isn't a serious issue except when hard drives fail, and, well, in that case...DERP!

    1. Re:In other words... by interval1066 · · Score: 1
      Figures... AC calls out FOSS.

      This is what you get when you use a filesystem that wasn't developed by a real company.

      Sounds like M$ FUD to me, but whatever. Is M$ the only "real" company?

      Because if they had to worry about losing money, they would make damned sure that problem didn't exist. Or at least make it go away.

      I got a list of "real"companies that haven't made good on many high-level flaws.

      I thought this "problem" existed with ext4 for years.

      You did? Would've made a nice /. article. Where are your notes regarding this flaw only you uncovered?

      Yeah, Micro$oft is evil, but their FS works.

      http://serverfault.com/questions/31709/how-to-workaround-the-ntfs-move-copy-design-flaw

      --
      Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
    2. Re:In other words... by Anonymous Coward · · Score: 0

      Ted Ts'o, the lead developer of EXT4, works for Google. Specifically, he's paid to work on the Linux kernel and filesystems. If Google isn't a company with a lot at stake with regard to filesystems, I don't know what is.

    3. Re:In other words... by Anonymous Coward · · Score: 0

      And the guy who found the bug (me) works for Oracle (though the bug struck his own personal idiosyncractically-configured system).

      Clearly Oracle have an interest in robust data storage :)

    4. Re:In other words... by Anonymous Coward · · Score: 0

      Well, the developer who authored the patch which introduced the bug works at RedHat. Is that enough of a "real company" for you. OTOH, Gnome3 is largely developed by RedHat guys, so maybe this and bugs that plague Gnome3 says something about the quality of patches coming from them.

  19. Re:For butts sake by Anonymous Coward · · Score: 0

    BSD died about 10 years ago.

  20. LOL by Anonymous Coward · · Score: 0, Funny

    The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."

    "You're just rebooting it wrong."
    -Loonix filesystem developer

    1. Re:LOL by Anonymous Coward · · Score: 0

      And the followup: "You remounted it wrong."

    2. Re:LOL by Anonymous Coward · · Score: 1

      The irony is that this *did* in fact happen because I was rebooting wrong. (There appears to be no way to reboot *right* reliably in my position, but rebooting while a umount is proceeding is definitely in some way wrong.)

    3. Re:LOL by Anonymous Coward · · Score: 0

      So what happens if the power cuts off in your building while a umount is proceeding? Sometimes rebooting isn't a choice...this bug is awful. Just use BSD.

  21. How many times by MetalliQaZ · · Score: 1

    ... can we get the words "stable", "linux", and "kernel" into a single summary? I like this game.

    --
    "Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
  22. Well of course! by Panaflex · · Score: 2

    They're mounting it wrong!

    When you mount your disks, you need to be sure of proper head alignment. Make sure she's spun up properly as well, otherwise the disks could be surprised and jump away causing a crash. Lastly, my geek friends, mounting too often can cause burning friction which can destroy data and cause irritation and discomfort.

    --
    I said no... but I missed and it came out yes.
    1. Re:Well of course! by girlintraining · · Score: 1

      Another side effect of broken optimization is a tendancy for some of its users to think they're funny...

      --
      #fuckbeta #iamslashdot #dicemustdie
    2. Re:Well of course! by Panaflex · · Score: 1

      All right, well fine. You have utterly destroyed my illusions of being quoted at the bottom of a slashdot page one day.

      I am such a foul and pestilent congregation of vapours. Thus, I take my leave to wallow in my sophomoric caricature of a man, having ripened my soul under the blistering gaze of.... girlintraining.

      --
      I said no... but I missed and it came out yes.
    3. Re:Well of course! by isorox · · Score: 3, Funny

      Lastly, my geek friends, mounting too often can cause burning friction which can destroy data and cause irritation and discomfort.

      I never had a problem with frequent mounting, however I have now found a side effect from a mount I performed last year. A child-process was forked into existence shortly after the mount, and now we find we're continuously receiving interrupts from the process, which has affected pretty much every aspect of system administration.

      I find that performing the mount is occasionally possible, but having to umount to give resources to deal with the child process (which often core dumps, and needs a lot of user interaction), before ejecting can lead to frustration and cold showers.

      Most of the time my team is simply trying to run sleep whenever we can.

    4. Re:Well of course! by girlintraining · · Score: 1

      I am such a foul and pestilent congregation of vapours. Thus, I take my leave to wallow in my sophomoric caricature of a man, having ripened my soul under the blistering gaze of.... girlintraining.

      A tale of woe and amazement! And lo, the girl did blow the poor distraught man a kiss from across the digital divide, before twirling her hair and skipping away giggling...

      --
      #fuckbeta #iamslashdot #dicemustdie
  23. EXT4 has had other issues... by Anonymous Coward · · Score: 0

    ...I believe that it had problems with large files (I don't know all of the details) at one point, too.
    This may still be an open issue.

    I stick with EXT3, but it has the "forever to perform a mkdir" issue after your filesystem crosses
    some file count threshold. But I've not had anything go sour with EXT3 even when the box has
    gone down hard from a power failure.

    Also, we're running Win 2008 server and this is the second time we've seen this where a whole
    partition becomes unusable. We have to restore the entire image from backup; it can't be repaired.

    CAPTCHA = sour grapes they're not!

  24. Re:For butts sake by marcosdumay · · Score: 1

    And Netcraft conirmed it. I know. Everybody knows. You don't need to keep repeating it.

    But, of course, zumbies are knwon to be slow... You may be up to something.

  25. As opposed to.. by Anonymous Coward · · Score: 0

    ... going to the Uptime War Battle Royale?

  26. Re:then good thing i switched to OS X years ago! by Anonymous Coward · · Score: 0

    s/behind/ahead/
    s/Fusion/Fussy/

  27. Wait what? by freman · · Score: 2

    People reboot linux?

    1. Re:Wait what? by Anonymous Coward · · Score: 0

      Actually yes, many times a day.
      And you should plan for that if you want non-nerd adoption of Linux.

    2. Re:Wait what? by TheDarkMaster · · Score: 1

      All time. After all, I as example are running a desktop here. Now if you're talking about a server...

      --
      Religion: The greatest weapon of mass destruction of all time
  28. Re:then good thing i switched to OS X years ago! by Anonymous Coward · · Score: 1

    Considering how those who manage the curve are rude, obstructive and just downright mean - I think Linux does a great job in keeping up.

  29. Re:For butts sake by Anonymous Coward · · Score: 0

    FreeBSD has never been slow for me. (At least I have never noticed it - possibly because with ports you don't build in stuff you don't need).

    (Other than when Mysql was only usable properly with linuxthreads (And the linuxulator didn't yet support them). but that really was a long time ago).

    There is no good video card drivers other than Nvidia though. (Same with Solaris x86).

    FreeBSD is not getting bloated to a ridiculous level either. (Stuff like DTRACE is worth it)

    If you build your main ports with the latest gcc (Unfair to use gcc 4.2 in the base system) / use an Nvidia Video Card / AHCI / 64 bit arch.

    Stuff that is supported works generally pretty well. (The usbaudio / envy24 just works unlike the combination of alsa and / or pulseaudio that always messes stuff up.)

  30. kill your journal by pentabular · · Score: 1

    Save yourself the extra write and extra opportunity for something to go wrong: disable the journal. worth considering in any case: http://pentabular.wordpress.com/ext4-on-laptop-ssd/

    1. Re:kill your journal by Rich0 · · Score: 1

      Uh, a journal helps prevent corruption of filesystem metadata by avoiding having it overwritten in place. You even get some benefit for data by doing ordered data writes.

      Granted, COW is better still, but we're not quite there yet on btrfs.

  31. Don't believe most of the early stories on the web by Anonymous Coward · · Score: 2, Informative

    I have a Google+ post where I've posted my latest updates to this still-developing story:

    https://plus.google.com/117091380454742934025/posts/Wcc5tMiCgq7

    Also, I will note that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by all of the major file system authors). So for example, my development laptop, which I am currently using to post this note, is currently running v3.6.3 with the ext4 patches which I have pushed to Linus for the 3.7 kernel. Why am I willing to do this? Specifically because I've run a very large set of automated regression tests on a very regular basis, and certainly before pushing the latest set of patches to Linus. So while it is no guarantee of 100% perfection, I and many other kernel developers *are* willing to eat our own dogfood.

  32. Most of the early stories on the web are wrong.... by tytso · · Score: 5, Informative

    I have a Google+ post where I've posted my latest updates to this still-developing story:

    https://plus.google.com/117091380454742934025/posts/Wcc5tMiCgq7

    Also, I will note that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by all of the major file system authors). So for example, my development laptop, which I am currently using to post this note, is currently running v3.6.3 with the ext4 patches which I have pushed to Linus for the 3.7 kernel. Why am I willing to do this? Specifically because I've run a very large set of automated regression tests on a very regular basis, and certainly before pushing the latest set of patches to Linus. So while it is no guarantee of 100% perfection, I and many other kernel developers *are* willing to eat our own dogfood.

  33. hmmm... Android? (using ext4?) by neurocutie · · Score: 1

    what's this mean about various versions of Android using ext4? I think I just flashed my tablet to use ext4 (ugh)... really don't want corruption my tablet...

    1. Re:hmmm... Android? (using ext4?) by MtHuurne · · Score: 2

      Android is unaffected: the bug was introduced after Linux 3.6 and no Android kernel is anywhere near that recent.

  34. Re:Most of the early stories on the web are wrong. by Anonymous Coward · · Score: 0, Insightful

    Nobody smart reads gag+. Or failbook. lern2internet

  35. Shit! by torsmo · · Score: 1

    I have many thumb drives formatted in ext4. I guess it will not be good idea to use it on my 3.5 kernel based distro, then?

  36. patch by anonieuweling · · Score: 2

    The more recent patch at http://marc.info/?l=linux-kernel&m=135105626207228&w=2 fixes stuff.

    1. Re:patch by Anonymous Coward · · Score: 1

      No it doesn't. I still see corruption even with this patch applied.

      Will people *please* stop jumping the gun? Verified-to-work fixes cannot magically appear before the only people who can reproduce the problem have verified that they work!

          -- N.

  37. Re:then good thing i switched to OS X years ago! by Ash-Fox · · Score: 1

    can Linux even run on a Apple Fusion drive?

    Yes, it just won't receive the benefits of an Apple Fusion drive, but it does run fine.

    --
    Change is certain; progress is not obligatory.
  38. Slackware by Anonymous Coward · · Score: 0

    After recently installing Slackware 14, I was a bit miffed that the distribution release had reverted at the last minute to kernel version 3.2.29. Now I am so grateful that Patrick played it safe.

  39. Grammatical errors by Anonymous Coward · · Score: 0

    Maybe it is a good idea to refuse patches with grammatical errors in the comments or descriptions.
    When the submitter has not made the effort to make at least the comments grammatically correct,
    probably the correctness of the code is questionable.

    1. Re:Grammatical errors by HeadBanger606 · · Score: 1

      On the one hand I would tend to agree... on the other hand, there are a lot of developers and contributers that are not native english. I've worked on projects with other devs, who would show me some code, documentation, etc... and be completel shocked(and appalled) when I pointed out what (to me) were glaring spelling/grammatical errors. Hell, I am native english, and though I try my best, I can sometimes read and re-read text a hundred times, and miss something stupid like "you're" instead "your" or "it's" instead of "its". :P

      --
      --- Amateur musician: http://josh.morine.net/headbanger/
  40. Re:For butts sake by Anonymous Coward · · Score: 0

    Um, if you're referring to the fairly recent Ubuntu/Fedora enhancements of speeding up boots by loading the daemons in parallel...
    The BSDs probably will never include that feature, as they are conservative and value having a simple, debuggable design.

    (But, well, your rant has no value if you give no concrete examples. I wonder who decided to moderate you upwards.)

  41. This means i can't say by Anonymous Coward · · Score: 0

    "have you tried turning it off and on again" anymore?!?!?

  42. there are reasons for layers by Chirs · · Score: 1

    I currently work on a product that uses fuse on top of xfs on top of LVM on top of RAID1. There are good solid reasons for the existence of each of those layers.

    No filesystem is the best for all uses, and when ZFS tries to do everything it means that it doesn't play nice with the rest of the stack.

    1. Re:there are reasons for layers by Anonymous Coward · · Score: 0

      The mere existence of something is not justification in of itself. What are the "good solid reasons"?

      "citation needed" on your claim about ZFS not "play[ing] nice with the rest of the stack". Do you actually understand what ZFS is and how it is architected? Or are you just speaking out of your ass?

  43. it's actually even more esoteric... by Chirs · · Score: 1

    According to Ted Ts'o's latest update (https://plus.google.com/117091380454742934025/posts) this actually involved a combination of "umount -l" and shutting down while the filesystem was still mounted, and the user also had "nobarrier" set on the filesystem as well as "journal_async_commit".

    So it sure looks like the user was playing fast and loose...this is not something that's going to hit your average person.

    1. Re:it's actually even more esoteric... by lastx33 · · Score: 1

      According to Ted Ts'o's latest update (https://plus.google.com/117091380454742934025/posts) this actually involved a combination of "umount -l" and shutting down while the filesystem was still mounted, and the user also had "nobarrier" set on the filesystem as well as "journal_async_commit".

      So it sure looks like the user was playing fast and loose...this is not something that's going to hit your average person.

      I noticed this addendum too. Looks to be a very specific (and oddly ill considered) configuration required to trigger the bug. Quite a few other people here are going off on tangents and haven't read the addendum or your comment by the looks of it.

      --
      "You can lead a horse to water but a pencil must be lead!" - Stan Laurel
  44. more clarifications by perles · · Score: 1

    Perhaps the author of this summary could have been more precise. The bug is very unlikely to be triggered, here are some examples: https://lkml.org/lkml/2012/10/24/535 and http://phoronix.com/forums/showthread.php?74697-EXT4-Data-Corruption-Bug-Hits-Stable-Linux-Kernels&p=293446#post293446 . Indeed is a good measure to downgrade to a safe version and wait for a patch to come. I have been using the 3.6.2 on my two Gentoo boxes for a couple of days and nothing happened. As a precaution I will downgrade till they release such fix.

  45. Stable isn't quite always stable... by HeadBanger606 · · Score: 1

    And this is why I wait before switching fs types. I waited almost 2 years after ext3 was considered stable, before I switched from ext2. I just rebuilt my machine 2 days ago, and I almost, almost went with ext4. But that little voice of caution(read, paranoid subconcious :P) told me to hold off, then someone points out this thread to me. With that said, after reading the posts in the mailing lists, I am once again proud of the kernel developers and the hardcore linux geeks, for so quickly jumping on this problem, as well as the calm of the "victims". If a similar problem occurred in windows, hoo-boy, there would be an uprising.

    --
    --- Amateur musician: http://josh.morine.net/headbanger/
    1. Re:Stable isn't quite always stable... by Anonymous Coward · · Score: 1

      Believe me, if I hadn't run a backup the day before the failure, there would have been notably less calm evident. But with a current backup, why, corrupting the filesystem of my primary fileserver over and over again was very much less frightening than it would otherwise have been.

      Let the moral be: backups are good. (Even if my backups *were* stored on ext4 on a removable USB drive...)

        -- N.

  46. Well, after scaring anyone about FS corruption... by dmpot · · Score: 2

    I guess it has come time to tell the truth.

    First of all, the bug has never been bisected, and the whole story that hit Slashdot and some other news sites was based solely on Ted's speculation, which was never confirmed. In fact, at the of the same day, Ted admitted that his hypothesis was wrong.

    After a few days of investigation, the problem was traced to an experimental mounting option, which is not turned on by default and was intended for developers only. Accidentally, this option was not marked as "experimental", so it is available to users. https://lkml.org/lkml/2012/10/26/570

  47. Re:Well, after scaring anyone about FS corruption. by Anonymous Coward · · Score: 1

    Nah. To get the case I found you need not one experimental option, but *three*.

    Specifically, you need nobarrier,journal_async_commit -- and the latter option implies journal_checksum, so it's really three options.

    If you do all that, reboots / blockdev disconnections while an unmount is proceeding will not merely give you filesystem corruption on second mount (regardless of options the second time), but *silent* filesystem corruption on remount (journal_checksum and any other options will give you a journal abort and read-only remount, which is a pretty big clue that something is wrong, though the filesystem is still corrupted).

    Fun stuff.

        -- N.