Hope For Fixing Longstanding Linux I/O Wait Bug
DaGoodBoy writes "There has been a long standing performance bug in Linux since 2.6.18 that has been responsible for lagging interactivity and poor system performance across all architectures. It has been notoriously difficult to qualify and isolate, but in the last few days someone has finally gotten a repeatable test case! Turns out the problem may not even be disk related, since the test case triggers the bug only by transferring data either between two processes or threads. The test results are very revealing. The developer ran regressions all the way back to version 2.6.15 that demonstrate this bug has more than doubled the time to run the test in 2.6.28. Many, many people working at improving the desktop performance of Linux will be very happy to see this bug die. I know that I, personally, will find a way to send the guy that found this test case his beverage of choice in thanks. Please spread the word and bring some attention to this issue so we can get it fixed!"
Dang! I was going for First Post, but my machine was stuck in some weird I/O wait state.
When our name is on the back of your car, we're behind you all the way!
That's because you're not transferring data between yourself and another thread.
bugzilla.kernel.org?
The real "Libtards" are the Libertarians!
Anyone else notice the article 404ing from the front page? I'd say /. needs to fix some bugs/user errors rather than speak about a Linux IO latency most users don't even notice. Just an observation, and if you can read this, they either fixed it or you doctored up a query string like I did :D.
I'm not sure if this is related, but has anyone else noticed KTorrent can really bog your system down without showing any excessive resource usage in KSysGuard? For all I know, it may be passing information between one thread and another, and it's disk I/O intensive.
Been waiting all of 2 years and change for your precious bug fix, 'ave you? You almost had my eyes tearing up there I tell ya: 25 Year Old BSD Bug.
OS not fast enough? Just upgrade your hardware components, preferably to a new, top-of-the-line system.
Oh wait... that's the Windows way of doing things.
Yeah, I saw it too
I'm not sure about anybody else here, but I was surprised to see that they mentioned that this will benefit 'Desktop' users.
I think that when it comes to the performance spectrum, Servers would be where this fix is the most needed. Admittedly if you are running a solid server, you should know to use older gen hardware and software that has been proven to be stable. However, some of this 'shiny new' tech coming out is appealing.
How about the Seagate 1500GB drive hang error? To my understanding Windows has been fixed, but the problem still persists in Linux. Could this potentially make a difference? I've been looking to build myself a nice NAS and those 1500GB drives are _cheap_. I can pick one up for about $160. I remember not too long ago that could only get me 80GB.
Heh, well i just hit the little link and then hit the link at the top to go back to the main topic... then sent a e-mail to /.
I'm sure kernel.org appreciates these links. Now instead of fixing the bug they're putting out fires in the data center...great job slashdot.
If this get resolved is there any chance the fix could get ported to Windows? I just had my Dad's XP laptop completely freeze after I plugged in a bog-basic USB thumbdrive. The desktop sprang to life only after I unplugged it. I wish some of the AC Windows fanboys who were hassling me here last week were around to see it. "Ready for the desktop" my ass.
I also don't notice any of the horrible problems you keep harping about with Windows. Funny that.
Will this make it in to Ubuntu 9.04?
wow, not just badsummary, utterly worthless summary. Here's the relevant discussion from LKML. Yes, this is all of it.
Peter Zijstra
Andrew Morton
In http://bugzilla.kernel.org/show_bug.cgi?id=12309 the reporters have
identified what appears to be a sched-related performance regression.
A fairly long-term one - post-2.6.18, perhaps.
Testcase code has been added today. Could someone please take a look
sometime?
There appear to be two different bug reports in there. One about iowait,
and one I'm not quite sure what it is about.
The second thing shows some numbers and a test case, but I fail to see
what the problem is with it.
This somewhat deflates the excitement evident in the OP. I mean, I know what he's talking about, these apparently random 1-2 second FREEZES while working, but if the guys in LKML arn't talking about it it's probably not being really worked on.
For more info see Karl Denninger's blog
It must also affect servers, because none of the links is transferring data either.
Kevin Smith on Prince
Don't buy Seagate!
'nuff said.
Linky taking too long to respond... oh wells :-)
That's because you're not transferring data between yourself and another thread.
But he is transferring data between himself and another sockpuppet.
I trrrrrrrrrrrrrrranssssssssfer data betwwwwwwwwwwwwwwwwwwwwwween threads alllllll the time......
Don't blame me, I voted for Baltar.
Give us a way to get your test program you've attached to the bug...
Hmm, I use 2.6.28 and on a duo core it kinda? and eventually freezes up whenever I try to burn a audio cd only, I've tried different cdrecord, no avail, lower kernel, not avail........ Showed up recently after kernel upgrade
I am overjoyed that my suspicions have finally been vindicated. I've been working 10+hours a day on linux for the last 13years and you tend to get in tune with your environment (i can still today recite my DOS bootup tune on my XT even though I haven't worked on it for 20 years:-) and some time ago after installing a new flavour of linux I immediately started complaining to fellow workers that something has gone wrong in the kernel but it was not annoying enough to really do something about it; you start living with it. It manifests sometimes when I compile - my system simply locks up for 20-30 seconds which is something I never experienced before. I'd say it happens once out of every 50 compiles of the same program with gcc. During such occurrences, I can't access anything on my desktop which annoyes me cause I typically switch to another kterm session to prepare to run the build whilst compiling (to keep up the productivity and all that). I have also seen strange ratios of i/o to cpu wait in 'top' nowadays but can probably ascribe that to CPU's that just became ridiculously fast and the way top calculates its scores. Nevertheless, I've mumbled over and lambasted i/o wait in Linux ever since a very specific time in the past and even though I haven't noted the exact date, I'm sure its related to this. Anyway, I found this intrigueing enough to create a slashdot account after years to share my joy that the bugs days are hopefully numbered now.
Could this be the mystery freeze in Linux Mint that forces a hard reset??
For what it is worth, the problem is real.
We have experienced massive negative effects with our MySQL server; downgrading to early linux kernel solves the problem. This has been very difficult to debug as we never guessed that the OS would be a factor... we figured it had to be something we were doing. Only by chance did we try another distro / kernel only to find that everything starts working fine when you downgrade.
From your tone: I assume that you will be sending this guy something of great value ?
Why do you not have the courage to say your name when you post ?
and here I was thinking that those pauses were because I had firefox open with >5 tabs for >1 hour.
If we had paid for the software, then the bug finder would be receiving 100k a year, and not just "appreciation"!!
Paul Sheer
Users notice this a ***lot***: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/131094
...when you insist on doing development in the 'stable' kernel tree and expect vendors to stablise it.
Genius!
but he would not be receiving 100k a year from individuals that were happy with his work.
Quite a few of the core kernel developers actually _are_ paid to work on the kernel these days.
Whether or not the end users pay for it has nothing to do with whether or not the developers get paid.
In addition, if this test case has come from the community, then it would _never have happened_ if the kernel was not an open source project.
Advanced users are users too!
Why was the date of your birth so essential to UNIX OS's that they couldn't have made one until you dropped out your momma's clacker?
I know that I, personally, will find a way to send the guy that found this test case his beverage of choice in thanks. Please spread the word and bring some attention to this issue so we can get it fixed!
Is this your plan?
Yes, that's right. I've also noticed that kernel 2.2 on an older PC is still much faster than 2.6 kernel on a new one. Especially when compiling.
Another source of trouble is the increasing bloatedness in GCC. 2.95.3 is still a neat and fast compiler, while GCC 3 and GCC 4 are so slow and big. And then this constant fiddling with the C++ standard. You can't even write a C++ program these days and be sure that it will compile in 3-4 years. That sucks.
oh is that behaviour due to this bug???? because that was happening on my dad's ubuntu computer
http://bugzilla.kernel.org/show_bug.cgi?id=12309
davecb5620@gmail.com
The economy is "fixed" just fine on a regular basis, but people don't seem to like that.
Ask yourself this: people have been spending money on worldwide flights, exotic holidays, fancy houses, big cars, cinema outings, fast computers... Now. let's say you work 40 hours a week, and get $40 for that. So that's essentially 40 tokens for work done. If your wage is average, then other people get similar tokens for a similar week of work. Now, how long did that flight take to build/arrange/fly/repair, in man-hours (or tokens)? What about the hotels, and excursions, and beauty treatments? And the big car? And that movie you saw? And the fast computer?
Ignoring what's the norm... do you REALLY think your 40 tokens per week can buy all this? That, if their were no tokens, and you simply had to contribute work on the plane to get a free flight, had to help build the car to get a free car... do you REALLY think you'd have time? Because, essentially, anything those tokens get you that you couldn't have gotten with man hours is borrowed time.
Unfortunately many humans preferred to be greedy and irresponsible with their resource use, gradually spending more than they have to get more than they need over decades as they slowly forget reality. That throws off the economy, and soon everyone's up the creek. Eventually, they realise it, and the whole thing crashes like a rollercoaster, down to much less than its worth and less than is needed. Finally people start to get a sense of normalcy, buying what they need, and everything is good. Until they start to forget where the line is, and then become irresponsible and greedy again.
It's a vicious cycle, that'll never change, until people start to be more responsible and share a little rather than grabbing a lot. BUT, none of this should matter much, to a frugal person who buys what he needs, and saves when he can. Not everyone will be affected by these boom/bust times -- only those who ride the rollercoaster.
I think way too many people are blaming their issues on this bug. Some of them may be valid but others probably have something misconfigured or maybe it only affects certain hardware. I don't expereience this bug. My interactivity does not suffer when I do anything I/O intensive.
Time makes more converts than reason
Don't hold your breath. Ubuntu is always behind in kernels. The earliest would be Q3 2009.
Please someone fix the damn economy for crissakes.
Ah, okay. I'll start coding that right away.
"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
This I/O scheduler was introduced as the default in 2.6.18 and available since 2.6.13. I wonder if that has something to do with it. I'm going to test it out on my home machines later today and have a look-see.
Supposedly it can be disabled and the AS scheduler can be used if you change it at runtime in /sys/block/hda/queue/scheduler, or use the "elevator=as" boot option.
Just disrupt the deflector shield with a tachyon burst.
In all fairness, the 20+% annual inflation we're going to see starting in 2010 will bring housing values back to their peaks by 2013.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
I bet your computer's cupholder even works right!
Yeah, but yours is just a duplex issue, been known about for years, and easily fixed.
Yes, they do. It's out there in a lot of naming permutations, with a lot of different causes - video, browser, disk, X, general high I/O, etc. I have personally run into this bastard a number of times.
Long-term US bonds are going to be unsellable internationally a year or two from now
Yeaaaaaaaahhhh...'cause you know, all the other countries in the world that sell long-term bonds have perfect economies. The interest rate paid by the U.S. Gov't might go up a little to increase demand, but people will still be buying U.S. treasuries for the foreseeable future. As in the rest of your life.
Advice: on VPS providers
You got free testing. Keep adding hardware and submitting /. stories until the system remains responsive...
-B
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
The #1 holder (China) is cutting back on US Treasuries, instead "strongly encouraging" that money be lent inside China.. Nobody else can take up the slack.
Kevin Smith on Prince
Sure it can. If China were to dump all of their US Treasuries on the market tomorrow, you'd see the effective interest rate jump up a couple points and they'd be sold out to other investors, likely within the same day if not within hours. There is always a demand for high-quality bonds.
Advice: on VPS providers
You've obviously not kept up with events. For a year now, the US has been under attack over its' AAA credit rating. This was BEFORE the market meltdown, etc.
From January 10th, 2008: http://www.reuters.com/article/bondsNews/idUSN1017237120080110
Since then, we've had a deficit that's ballooning, revenues dropping like a stone, unemployment going up up and away despite a fed rate of zero%, the sub-prime crisis now is calculated to affect at least 17% of ALL mortgages in the US,
Interest rates will have to go up a LOT to compensate for the inflationary effect of printing up all that new deficit spending. Do you really want to return to the days of 20% prime rate interest, like in April, 1981? Or 21.5% in December of 1980?
From October 14th, 1978 to May 20th, 1985, even the most credit-worthy couldn't get loans below 10%. How many people with prime mortgages can afford a 15% mortgage? How many businesses are viable if they have to pay 18% interest on their loans and bonds?
Kevin Smith on Prince
No, iluvcapra was having a stroke.
Or, alternatively, it could be that the US dollar should be massively devalued relative it's current state and it's only the frozen credit markets and the need to build reserves that is keeping it propped up. The dollar would be the new peso by now if it wasn't for that.
I'll have a bajillion dollars thanks!
Initially with VMWare 5.0 workstation on a Linux host running 2.6.17.13, it was possible to suspend and resume the guest OS relatively quickly. I don't know what has become broken, but now with workstation 6.5 it can take up to 10 minutes to do the same job on the same architecture.
My experience shows that processors with frequency scaling are affected the most. This points to the bug being either related to how jobs are scheduled or how the kernel deals with large files (VM disk images are large files). However, I tested a lot of different combinations with the scheduler (settling on Anticipatory as best); therefore, the bug may really be with some IO block as the submitter has indicated.
With the latest wrong-headed bailouts (Merrill Lynch) diverting even more capital to propping up bad investments and bad actors, inflating its' value away is inevitable.
The government should have allowed the failures, kept its' powder dry, then moved in after the market correction to help pick up the pieces. It would have been cheaper and more effective.
Kevin Smith on Prince
Oh my god. I've been suffering so much with Linux lately. Performance really sucks right now. :-(
Meanwhile, I've taken my good 5-year-old PC out of storage. That one still runs with a 2.0 kernel. Man, it's so fast. It's a totally different experience than all this "heavy-iron" lately.
Linus, please save us! Linux has gotten as slow and bloated as Windows lately...