Slashdot Mirror


Hope For Fixing Longstanding Linux I/O Wait Bug

DaGoodBoy writes "There has been a long standing performance bug in Linux since 2.6.18 that has been responsible for lagging interactivity and poor system performance across all architectures. It has been notoriously difficult to qualify and isolate, but in the last few days someone has finally gotten a repeatable test case! Turns out the problem may not even be disk related, since the test case triggers the bug only by transferring data either between two processes or threads. The test results are very revealing. The developer ran regressions all the way back to version 2.6.15 that demonstrate this bug has more than doubled the time to run the test in 2.6.28. Many, many people working at improving the desktop performance of Linux will be very happy to see this bug die. I know that I, personally, will find a way to send the guy that found this test case his beverage of choice in thanks. Please spread the word and bring some attention to this issue so we can get it fixed!"

4 of 180 comments (clear)

  1. Dang!! by camperdave · · Score: 5, Funny

    Dang! I was going for First Post, but my machine was stuck in some weird I/O wait state.

    --
    When our name is on the back of your car, we're behind you all the way!
  2. this is bad even for /. by Harik · · Score: 5, Informative

    wow, not just badsummary, utterly worthless summary. Here's the relevant discussion from LKML. Yes, this is all of it.

    Peter Zijstra

    Andrew Morton
    In http://bugzilla.kernel.org/show_bug.cgi?id=12309 the reporters have
    identified what appears to be a sched-related performance regression.
    A fairly long-term one - post-2.6.18, perhaps.

    Testcase code has been added today. Could someone please take a look
    sometime?

    There appear to be two different bug reports in there. One about iowait,
    and one I'm not quite sure what it is about.

    The second thing shows some numbers and a test case, but I fail to see
    what the problem is with it.

    This somewhat deflates the excitement evident in the OP. I mean, I know what he's talking about, these apparently random 1-2 second FREEZES while working, but if the guys in LKML arn't talking about it it's probably not being really worked on.

  3. I second this by waslap · · Score: 5, Interesting

    I am overjoyed that my suspicions have finally been vindicated. I've been working 10+hours a day on linux for the last 13years and you tend to get in tune with your environment (i can still today recite my DOS bootup tune on my XT even though I haven't worked on it for 20 years:-) and some time ago after installing a new flavour of linux I immediately started complaining to fellow workers that something has gone wrong in the kernel but it was not annoying enough to really do something about it; you start living with it. It manifests sometimes when I compile - my system simply locks up for 20-30 seconds which is something I never experienced before. I'd say it happens once out of every 50 compiles of the same program with gcc. During such occurrences, I can't access anything on my desktop which annoyes me cause I typically switch to another kterm session to prepare to run the build whilst compiling (to keep up the productivity and all that). I have also seen strange ratios of i/o to cpu wait in 'top' nowadays but can probably ascribe that to CPU's that just became ridiculously fast and the way top calculates its scores. Nevertheless, I've mumbled over and lambasted i/o wait in Linux ever since a very specific time in the past and even though I haven't noted the exact date, I'm sure its related to this. Anyway, I found this intrigueing enough to create a slashdot account after years to share my joy that the bugs days are hopefully numbered now.

  4. Problem is Real by Anonymous Coward · · Score: 5, Informative

    For what it is worth, the problem is real.

    We have experienced massive negative effects with our MySQL server; downgrading to early linux kernel solves the problem. This has been very difficult to debug as we never guessed that the OS would be a factor... we figured it had to be something we were doing. Only by chance did we try another distro / kernel only to find that everything starts working fine when you downgrade.