Slashdot Mirror


Bad Lockup Bug Plagues Linux

jones_supa (887896) writes "A hard to track system lockup bug seems to have appeared in the span of couple of most recent Linux kernel releases. Dave Jones of Red Hat was the one to first report his experience of frequent lockups with 3.18. Later he found out that the issue is present in 3.17 too. The problem was first suspected to be related to Xen. A patch dating back to 2005 was pushed for Xen to fix a vmalloc_fault() path that was similar to what was reported by Dave. The patch had a comment that read "the line below does not always work. Needs investigating!" But it looks like this issue was never properly investigated. Due to the nature of the bug and its difficulty in tracking down, testers might be finding multiple but similar bugs within the kernel. Linus even suggested taking a look in the watchdog code. He also concluded the Xen bug to be a different issue. The bug hunt continues in the Linux Kernel Mailing List."

14 of 257 comments (clear)

  1. Re: Upgrade to Windows for improved stability! by Anonymous Coward · · Score: 5, Funny

    It is a Genuine Advantage!

  2. Re:But guys... by multisync · · Score: 4, Insightful

    I thought open source software was supposed to be better because everyone could see the code and spot problems.

    It is, they can and do.

    --
    I don't care why you're posting AC
  3. Re:But guys... by itzly · · Score: 5, Insightful

    That's why having a bug is worthy of a news item.

  4. Come on Slashdot, get your news current by bruce_the_loon · · Score: 5, Informative

    The last mail in the thread, dated the 26th of November, explains that the Xen bug was a Xen bug and that the lockup was something different and traceable once the chap experiencing the bug managed to get a kernel backtrace.

    --
    Trying to become famous by taking photos. Visit my homepage please.
  5. Re:But guys... by jones_supa · · Score: 3, Interesting

    I thought open source software was supposed to be better because everyone could see the code and spot problems.

    Too often when I find a bug (even investigate the actual reason as well as I can) and talk about it in a mailing list or bug tracker, it's just crickets chirping. No one stands up and properly takes responsibility of the issue. I very well understand that this might be due to lacking developer resources, but it still results in bad software.

    I have started wondering if modern software is simply too complex to be developed in high quality with the resources (manpower and funding) that open source gets.

  6. Re:But guys... by itzly · · Score: 5, Insightful

    This is not unique to open source software. Closed source code also is complex, and lacks developers. Bugs that aren't reported by big customers are easily ignored.

  7. Re:But guys... by 0123456 · · Score: 3, Interesting

    In my experience, closed source software comes with much less bugs to begin with. With OSS, even some essential features can be glitchy or partially implemented.

    While I'd agree that much open source software is just hacked together and shipped when it does everything the developers care about, most of the bugs in our software (not open source per se, but our customers get all our source so they can modify it if they want) are caused by third-party, closed-source libraries that we use because licensing them was much cheaper than writing the same code from scratch. I haven't seen a single crash in a year that wasn't due to third party, closed source code.

    And, financially, it still makes sense, because developing workarounds for their bugs is still cheaper than writing the code from scratch.

  8. Re:Have they checked systemd? by binarylarry · · Score: 5, Funny

    It's not systemd related, you can check by opening a termin

    --
    Mod me down, my New Earth Global Warmingist friends!
  9. Try a stable distro like RH/CentOS. Or Mac by raymorris · · Score: 4, Informative

    > First got into it ... because Linux was totally stable

    If stable is your top priority, Fedora is approximately the worst possible choice. Fedora is essentially Red Hat Beta. If you want stable, the devel / beta branch is not for you. You'll probably be much happier with Red Hat or its twin, CentOS.

    Also, you mentioned that you did an "upgrade" to Debian Unstable. You didn't mention any _reason_ for doing that. If stability is a top priority for you, don't upgrade just because you can, don't fix it if it aint broke.

    Mac OSX may indeed be a good choice for you also. It is certified Unix and if you use the commondand line in Linux you'll find that day-to-day tasks are the same on a Mac. System internals are different of course, but bash, sed, awk, grep, and vim work just like they do on Linux.

  10. Re: Upgrade to Windows for improved stability! by binarylarry · · Score: 4, Funny

    So are you saying they failed your genuine advantage check?

    --
    Mod me down, my New Earth Global Warmingist friends!
  11. Some actual information by Anonymous Coward · · Score: 5, Informative

    So it may be a "bad" lockup bug in the sense that nobody knows exactly what causes it, but it's not "bad" in the sense that people should worry overly.

    Why?

    Dave Jones sees it only under insane loads (CPU loads of 150+) running a stress tester that is designed to do crazy things (trinity). And he can reproduce it on only one of his machines, and even there it takes hours. And it happens on a debug kernel that has DEBUG_PAGEALLOC and other explicit (and complex) debug code enabled. And even then the bug is a "Hmm. We made no progress in the last 21 seconds", rather than anything stranger.

    In other words, it's "bad" in the sense that any unknown behavior is bad, but it's unknown mainly because it's so hard to trigger. Nobody else than core developers should really care. And those developers do care, so it's not like it's worrisome there either. It just takes longer to figure out because the usual "bisect it" approach isn't very easy when it can take a day to reproduce..

  12. Re:Have they checked systemd? by lgw · · Score: 5, Funny

    I blame systemd anyhow. The growing use of systemd is also the primary cause of global warming, and the declining honeybee population.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  13. Re:But guys... by jones_supa · · Score: 3, Informative

    Have you ever compared enterprise class software (I also count Windows 7 Enterprise) with OSS Software? Windows does not even reliably support STR and resume. Using multiple monitors is a PITA.

    Suspend and multiple monitors have always worked great in Windows for me. Under Linux, they have also worked fine in some machines, but I have also occasionally experienced serious problems with those areas. During recent times I have found out that even laptop screen brightness adjustment cannot be expected to work reliably out of the box under Linux.

  14. Bug name by Lost+Race · · Score: 5, Funny

    Since every bug this year needs to have a catchy name for the headlines, I propose we call this one "Davy Jones' Lockup."