The State of Linux IO Scheduling For the Desktop?
pinkeen writes "I've used Linux as my work & play OS for 5+ years. The one thing that constantly drives me mad is its IO scheduling. When I'm copying a large amount of data in the background, everything else slows down to a crawl while the CPU utilization stays at 1-2%. The process which does the actual copying is highly prioritized in terms of I/O. This is completely unacceptable for a desktop OS. I've heard about the efforts of Con Kolivas and his Brainfuck Scheduler, but it's unsupported now and probably incompatible with latest kernels. Is there any way to fix this? How do you deal with this? I have a feeling that if this issue was to be fixed, the whole desktop would become way more snappier, even if you're not doing any heavy IO in the background."
Update: 10/23 22:06 GMT by T : As reader ehntoo points out in the discussion below, contrary to the submitter's impression, "Con Kolivas is still actively working on BFS, it's not unsupported. He's even got a patch for 2.6.36, which was only released on the 20th. He's also got a patchset out that I use on all my desktops which includes a bunch of tweaks for desktop use." Thanks to ehntoo, and hat tip to Bill Huey.
This issue got so bad for me I switched to FreeBSD.
Isn't this also relevant when using Linux on a server? I mean, if one process or thread is copying a large file, you don't want your server to come to a crawl.
It doesn't sound like just a "desktop" issue to me.
If Pandora's box is destined to be opened, *I* want to be the one to open it.
have you tried ionice?
Do you even lift?
These aren't the 'roids you're looking for.
..download and compile the 2.6.36 kernel. A feature of the changes can be found at http://www.h-online.com/open/features/What-s-new-in-Linux-2-6-36-1103009.html
A very very easy to follow guide can be found at http://kernel.net/articles/how-to-compile-linux-kernel.html
Sidenote - What is up with not being able to paste links? That's annoying.
Con Kolivas is still actively working on BFS, it's not unsupported. He's even got a patch for 2.6.36, which was only released on the 20th. http://ck.kolivas.org/patches/bfs/ He's also got a patchset out that I use on all my desktops which includes a bunch of tweaks for desktop use. http://www.kernel.org/pub/linux/kernel/people/ck/patches/2.6/
If the CPU utilization is that low, it's an I/O scheduling problem. See Linux I/O scheduling.
The CFQ scheduler is supposed to be a fair queuing system across processes, so you shouldn't have a starvation problem. Are you thrashing the virtual memory system? How much I/O is going into swapping. (Really, today you shouldn't have any swapping; RAM is too cheap and disk is too slow.)
I've wondered on occasion if this problem is really only due to scheduling. After all, most of us still write our file access code more or less as follows: x=fopen('somefilename'); while ( !eof(x)) { print readln(x,1024); /* ---- */
}
fclose(x);
Point being, there's nothing that tells the marked line that the process should gracefully go to sleep while the drive is doing its thing, and there's no callback vector defined either- nothing that indicates we're dealing with non-blocking I/O. I'd like to think that our compilers have silently been improved to hide those implementation details from us, but I have no proof that this is the case. Unless the system functions use some dirty stack manipulation voodoo to extract the return address of the function and use that as callback vector?
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
Sidenote - What is up with this comment not showing up when I wasn't registered. That's stupid and annoying.
It did. Now who's stupid and annoying? I mean, besides me.
When you're afraid to download music illegally in your own home, then the terrorists have won!
I can remember that even as far back as 1999 I saw this issue with Linux. This is not bad only for the desktop, but also for the server. I have also experience with Solaris workstations and servers, and it usually doesn't behave this way.
I remember using OS/2 (IBM's desktop OS) and i was always amazed that you could format a floppy and do other tasks like nothing else was going on. I never did understand why that never seemed to make it into the mainstream.
That was a joke, right? You don't really think that all the millions of desktop Linux users just up and vanished because some idiot at PCWorld wanted a catchy headline?
This is not a case of Linux IO schedulers being unsuitable for the desktop, but more a case of desktop applications being written in a horrendous way in terms of data access. The general pattern being to open up a file object, load in a few hundred kilobytes, processing this then asking the operating system for more. This is a small inefficiency when the resource is doing nothing, but if the disk is actually busy, then it will probably be doing something else by the time you ask for it to read a little bit more. Not to mention the habit of reading through a few hundred resource files one at a time in seemingly random order, and blocking every time it reads, because the application programmer is too lazy to think about what resources the app is using.
Linux has such a nice implementation of mmap, which works by letting Linux actually know ahead of time what files you are interested in and managing them itself, without the application programmer worrying his pretty little head over it. Other options are running multiple non-blocking reads at the same time and loading the right amount of data and the right files to begin with.
The best thing about a simple CSCAN algorithm is that it gives applications what they asked for and if the application doesn't know what it wants, well, that's hardly a system issue.
When Argumentum ad Hominem falls short, try Argumentum ad Matrem
This is almost certainly not the IO scheduler's problem. IO scheduling priorities are orthogonal to CPU scheduling priorities.
What you are likely running into is the dirty_ratio limits. In Linux, there is a memory threshold for "dirty memory" (memory that is destined to be written out to disk), that once crossed, will cause symptoms like you've described. The dirty_ratio values can be tuned via /proc, but beware that the kernel will internally add its own heuristics to the values you've plugged in.
When the threshold is crossed, in an attempt to "slow down the dirtiers", the Linux kernel will penalized (in rate-limited fashion) any and every task on the system that tries to allocate a page. This allocation may be in response to userland needing a new page, but it can also occur if the kernel is allocating memory for internal data structures in response to a system call the process did. When this happens, the kernel will force that allocating thread (again, rate-limited) to take part in the flushing process, under the (misguided) assumption that whoever is allocating a lot of memory is the same thread that is dirtying a lot of memory.
There are a couple ways to work around this problem (which is very typical when copying large amounts of data). For one, the copying process can be fixed to rate limit itself, and to synchronously flush data at some reasonable interval. Another way that a system administrator can manage this sort of task (if automated of course) is to use Linux's support for memory controllers which essentially isolates the memory subsystem performance between tasks. Unfortunately, it's support is still incomplete and I don't know of any popular distributions that automate this cgroup subsystem's use.
Either way, it is very unlikely to be the IO scheduler.
I'm using Ubuntu 10.4 on an old Dell and big copies don't seem to slow it down any more than I'd expect on an old machine, either when copying to an external USB backup (with rsync) or over the net to my office systems (via scp). Serious slowdown would seem to indicate something deeper is wrong.
FYI, the IO scheduler and the CPU scheduler are two completely different beasts.
The IO scheduler lives in block/cfq-iosched.c and is maintained by Jens Axboe, while the CPU scheduler lives in kernel/sched*.c and is maintained by Peter Zijlstra and myself.
The CPU scheduler decides the order of how application code is executed on CPUs (and because a CPU can run only one app at a time the scheduler switches between apps back and forth quickly, giving the grand illusion of all apps running at once) - while the IO scheduler decides how IO requests (issued by apps) reading from (or writing to) disks are ordered.
The two schedulers are very different in nature, but both can indeed cause similar looking bad symptoms on the desktop though - which is one of the reasons why people keep mixing them up.
If you see problems while copying big files then there's a fair chance that it's an IO scheduler problem (ionice might help you there, or block cgroups).
I'd like to note for the sake of completeness that the two kinds of symptoms are not always totally separate: sometimes problems during IO workloads were caused by the CPU scheduler. It's relatively rare though.
Analysing (and fixing ;-) such problems is generally a difficult task. You should mail your bug description to linux-kernel@vger.kernel.org and you will probably be asked there to perform a trace so that we can see where the delays are coming from.
On a related note i think one could make a fairly strong argument that there should be more coupling between the IO scheduler and the CPU scheduler, to help common desktop usecases.
Incidentally there is a fairly recent feature submission by Mike Galbraith that extends the (CPU) scheduler with a new feature which adds the ability to group tasks more intelligently: see Mike's auto-group scheduler patch
This feature uses cgroups for block IO requests as well.
You might want to give it a try, it might improve your large-copy workload latencies significantly. Please mail bug (or success) reports to Mike, Peter or me.
You need to apply the above patch on top of Linus's very latest tree, or on top of the scheduler development tree (which includes Linus's latest), which can be found in the -tip tree
(Continuing this discussion over email is probably more efficient.)
Thanks,
Ingo
He tried that before. I think he's given up on getting his scheduler (though perhaps not a suspiciously similar one written by Inigo) in the kernel after what happened with CFQ.
I am trolling
I would definitely ditch an OS that fucked up a file copy because I used the computer for something else while I was waiting.
That's great that you post your experiences with server scheduling in a topic about desktop scheduling. It's so relevant. No wait, it's not.
-- Linux user #369862
That was a joke, right? You don't really think that all the millions of desktop Linux users just up and vanished because some idiot at PCWorld wanted a catchy headline?
StatCounter provides a global breakdown of OS market share by region and country.
It is something of a wake-up call when you look at these numbers and compare them to the endless stream of Linux success stories posted to Slashdot.
You are joking right?(OP) I'm using debian, and I routinely copy TB(s) of data from hard drive to hard drive via SATA and/or USB 2.0, and though the usb tramsfer speed is fairly slow, my system doesn't slow appreciably at all.
Weird, I've been using Linux for 10 years now, and one thing linux does really well is move large amounts of data around without killing the system (useability).
Lotsa ram is your freind, also make sure your / filesystem isn't the hdd that you're moving the data from/to or vice-versa. That does slow access down a bit.
jaz
Life is what happens to you while you are busy making other plans. No-one sees motorcycles
Cite? What exactly is the difference between "public" and "kernel"?
If all processes see the same 1G the distinction isn't meaningful, especially in this context.
I've encountered situations where I'm trying to do something online and a task starts up due to a cron job that builds some kind of index. The index building should be in the background but somehow takes priority over what I'm doing on the desktop. Those kinds of cron jobs should be default scheduled in the background, not take priority over what is happening on the desktop.
You're insane. If your computer ever silently drops "a few bits" while copying stuff, there's something seriously wrong with your OS or your hardware, and things will break whether or not you're using the computer while copying. You might as well sacrifice a chicken to make sure the data transfer works, it'll have about the same effect.
Switch back to Slashdot's D1 system.
That's great that you post your experiences with server scheduling in a topic about desktop scheduling. It's so relevant. No wait, it's not.
The boundary between the desktop space and the server space is rather fluid, and many of the problems visible on servers are also visible on desktops - and vice versa.
For example 'copying a large amount of data' on a server is similar to 'copying a big ISO on the desktop'. If the kernel sucks doing one then it will likely suck when doing the other as well.
So both cases should be handled by the kernel in an excellent fashion - with an optimization/tuning focus on desktop workloads, because they are almost always the more diverse ones, and hence are generally the technically more challenging cases as well.
Thanks,
Ingo
Generally Windows runs badly without a swap. Don't listen to people who tell you to disable it. You should have a swap file on Windows no matter how much memory you have.
Tweakers who don't really understand anything about Windows paging often conclude turning off the swap is a good idea, because they only run trivial applications and don't experience certain memory backed I/O operation failing with it off. They do see an initial speed boost though. The reason is NT is very pessimistic about memory. Windows assumes you will need to page out to disk. It therefore flush the set of static pages to disk almost right away. This is why there is so much more disk thrashing on Windows than say Linux when you start an application and plenty of memory is free. It will do its best to keep the working set out of the page file of course. This does give Windows a performance advantage under memory pressure however. When there is not enough memory to start a new application Windows can just drop the pages from memory of the application being paged out without the need to flush them to disk because they are already there; Linux will need to write those pages.
Given that Windows boxes (desktops anyway) tend to have large numbers proccess running in the background so they usually are under that memory pressure.
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
I would definitely not let a monkey like you get near my computers if some intense file copy was going on and they wanted to start doing other things while that was going on, sure you can do it but that does not make it a prudent thing to do, and the file may copy over just fine, and it may lose a few bits without even reporting any errors and that can happen on any OS, BSD, Linux, Winders & etc...etc...etc...
You sir, are a perfect specimen of a BOFH. You only have a dim notion of what actually goes on inside those mysterious boxes that are unfortunately left under your care. And yet, by some curious accident of nature, you've been entrusted with root passwords for said boxes. You use phrases like "intense file copy" like they mean anything. You place every idiotic restriction that you can think of on the users of said boxes (who, incidentally, are almost always smarter and more qualified than you in whatever field of work they're in) by using words like "prudent" and "safety"... or god forbid... "security". You actually think that because I run a second program along with your "intense" copy, it can result in loss of "a few bits without even reporting any errors" due to what ? The magical fairies that dance inside those little chips getting angry ? Tired ? Can you do everybody a favor and reduce the amount of utter nonsense emanating out of that tiny, befuddled brain ?
He tried that before. I think he's given up on getting his scheduler (though perhaps not a suspiciously similar one written by Inigo) in the kernel after what happened with CFQ.
One reason for why the principle of CFS may seem to you so suspiciously similar to Con's SD scheduler is that i used Con's fair scheduling principle when writing the initial version of CFS. This is credited at the very top of today's kernel/sched.c [the scheduler code]:
* 2007-04-15 Work begun on replacing all interactivity tuning with a
* fair scheduling design by Con Kolivas.
It was added in this commit.
The scheduler implementations (and even the user visible behavior) of the schedulers was and is very different - and there is where much of the disagreement and later flaming came from.
Note that this particular Slashdot article is about IO scheduling though - which is unrelated to CPU schedulers. Neither Con nor i wrote IO schedulers.
There are two main IO schedulers in Linux right now: CFQ and AS, written by Jens Axboe, Nick Piggin, et al.
What adds fuel to the confusion is that it is relatively easy to mix up 'CFQ' with 'CFS'.
Thanks,
Ingo
AFAIK there are only two I/O schedulers remaining in recent Linux (and if you squint you might say that RHEL 5's kernel could have been related to 2.6.34 at one point right? :) - CFQ and deadline (three if you count noop I guess). The anticipatory scheduler was removed in 2.6.33...
I wanted to write a lengthy rebuttal here explaining how computers work, but my computer is busy with a seriously *intense* copy right now so I don't want to chance it.
Did you even read the summary? He specifically points out where desktop I/O has different requirements from server I/O: "When I'm copying a large amount of data in the background, everything else slows down to a crawl while the CPU utilization stays at 1-2%." So I think he's talking about things like video playback, web browsing, and general UI responsiveness--things that 100% do not matter on a server.
I've noticed this myself--start a complex task and all of a sudden the UI becomes really jerky. If I'm trying to multitask and some mundane task is making the whole UI slow, that's bad. I it takes me 10 seconds to do something with an unresponsive UI instead of 5 just so a bunch of files can copy in 1:00:00 instead of 1:00:01, that's bad.
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
No, massive unfairness is just as bad on the server as it is on the desktop - in all but a few select batch processing situations.
Replace 'desktop' with 'database', 'Apache', 'Samba' or 'number crunching job' and you get the same kind of badness.
There's not much difference really. If it sucks on the desktop then it sucks on the server too: why would it be a good server if it slows down a DB/Apache/Samba/number-crunching-job while prioritizing some large copy operation?
I have 8GB of RAM, why do I need swap?
So you can use that 8 gigs for something else, or not buy 8 gigs in the first place. In particular, so when a program is using large amounts of memory for no good reason, it can be swapped out, maybe even just for disk cache.
Also, hibernate.
Don't thank God, thank a doctor!
There's lots of server workloads that involve large IO requests:
- backups
- DB startup/shutdown
- DB traffic that generates or reads a lot of new data (say report generation)
- HPC workloads that work with huge data sets
- animation farms that work with huge images/movies
- web servers streaming out big files
- fsck
- virtual desktop servers where the desktops are virtual instances running on the server. There any IO load within that 'desktop' runs on the server.
etc. As there is a fair number of server workloads that are IO heavy but which use small IO requests.
If you have those big files in networked storage or if you are backing them up to some network host then you've already transformed those kinds of IO requests into big IO requests on the server side as well: the big file you read or write on the desktop the network file/backup server will read/write from its own disks, etc.
Really, "interactivity sucks during big IO" kind of bugs can hurt servers just as much as they can hurt desktops. The boundary between desktops and servers is very fluid.
CowboyNeal came to my house yesterday and sat on the sofa and had a beer like it was just a basic day.
He even said I should work on my interior decoration - "Those empty white walls aren't very pretty." It sucked, so don't come telling me Slashdot hasn't violated my privacy. :)
I often note that multiple simultaneous low-priority file copies implemented as:
run faster than multiple simultaneous high-priority copies implemented as:
If the copies are run one at a time, the higher priority rsync runs faster. For multiple copies, often the lower priority rsyncs run faster. Also, desktop usability is much improved with the lower priority rsyncs.
I suspect a priority inversion occurs inside the file systems write back cache. At regular priority levels, data is not written back to disk in a timely manner. The ionice -c 3 gives the disk caches a higher priority than the rsync I/O commands, preventing the I/O commands from filling the cache and creating a priority inversion.
The Gnome GUI in Ubuntu is particularly vulnerable to this priority inversion, as by default it does multiple copies simultaneously inside a separate window. Ubuntu usually performs better than Windows however. Between the A-V software in Windows, and the tendency to swap applications out of memory to maximize disk cache, Windows usually performs the same copy operations more slowly than Ubuntu and with less system responsiveness.