Slashdot Mirror


Bad Lockup Bug Plagues Linux

jones_supa (887896) writes "A hard to track system lockup bug seems to have appeared in the span of couple of most recent Linux kernel releases. Dave Jones of Red Hat was the one to first report his experience of frequent lockups with 3.18. Later he found out that the issue is present in 3.17 too. The problem was first suspected to be related to Xen. A patch dating back to 2005 was pushed for Xen to fix a vmalloc_fault() path that was similar to what was reported by Dave. The patch had a comment that read "the line below does not always work. Needs investigating!" But it looks like this issue was never properly investigated. Due to the nature of the bug and its difficulty in tracking down, testers might be finding multiple but similar bugs within the kernel. Linus even suggested taking a look in the watchdog code. He also concluded the Xen bug to be a different issue. The bug hunt continues in the Linux Kernel Mailing List."

5 of 257 comments (clear)

  1. Come on Slashdot, get your news current by bruce_the_loon · · Score: 5, Informative

    The last mail in the thread, dated the 26th of November, explains that the Xen bug was a Xen bug and that the lockup was something different and traceable once the chap experiencing the bug managed to get a kernel backtrace.

    --
    Trying to become famous by taking photos. Visit my homepage please.
  2. Try a stable distro like RH/CentOS. Or Mac by raymorris · · Score: 4, Informative

    > First got into it ... because Linux was totally stable

    If stable is your top priority, Fedora is approximately the worst possible choice. Fedora is essentially Red Hat Beta. If you want stable, the devel / beta branch is not for you. You'll probably be much happier with Red Hat or its twin, CentOS.

    Also, you mentioned that you did an "upgrade" to Debian Unstable. You didn't mention any _reason_ for doing that. If stability is a top priority for you, don't upgrade just because you can, don't fix it if it aint broke.

    Mac OSX may indeed be a good choice for you also. It is certified Unix and if you use the commondand line in Linux you'll find that day-to-day tasks are the same on a Mac. System internals are different of course, but bash, sed, awk, grep, and vim work just like they do on Linux.

    1. Re:Try a stable distro like RH/CentOS. Or Mac by Anonymous Coward · · Score: 0, Informative

      Fedora is essentially Red Hat Beta. If you want stable, the devel / beta branch is not for you.

      Netflix runs FreeBSD alpha in production. FreeBSD's new multi-core friendly firewall changes were ran in production on a core router that multiple 10gb interfaces, in alpha for almost a year before entering beta. A lot of FreeBSD devs are sysadmins for some large datacenters, and many run FreeBSD current(alpha) in prod. Sounds like Linux distro's "beta" is worse than FreeBSD's "alphas".

  3. Some actual information by Anonymous Coward · · Score: 5, Informative

    So it may be a "bad" lockup bug in the sense that nobody knows exactly what causes it, but it's not "bad" in the sense that people should worry overly.

    Why?

    Dave Jones sees it only under insane loads (CPU loads of 150+) running a stress tester that is designed to do crazy things (trinity). And he can reproduce it on only one of his machines, and even there it takes hours. And it happens on a debug kernel that has DEBUG_PAGEALLOC and other explicit (and complex) debug code enabled. And even then the bug is a "Hmm. We made no progress in the last 21 seconds", rather than anything stranger.

    In other words, it's "bad" in the sense that any unknown behavior is bad, but it's unknown mainly because it's so hard to trigger. Nobody else than core developers should really care. And those developers do care, so it's not like it's worrisome there either. It just takes longer to figure out because the usual "bisect it" approach isn't very easy when it can take a day to reproduce..

  4. Re:But guys... by jones_supa · · Score: 3, Informative

    Have you ever compared enterprise class software (I also count Windows 7 Enterprise) with OSS Software? Windows does not even reliably support STR and resume. Using multiple monitors is a PITA.

    Suspend and multiple monitors have always worked great in Windows for me. Under Linux, they have also worked fine in some machines, but I have also occasionally experienced serious problems with those areas. During recent times I have found out that even laptop screen brightness adjustment cannot be expected to work reliably out of the box under Linux.