The State of Linux IO Scheduling For the Desktop?
pinkeen writes "I've used Linux as my work & play OS for 5+ years. The one thing that constantly drives me mad is its IO scheduling. When I'm copying a large amount of data in the background, everything else slows down to a crawl while the CPU utilization stays at 1-2%. The process which does the actual copying is highly prioritized in terms of I/O. This is completely unacceptable for a desktop OS. I've heard about the efforts of Con Kolivas and his Brainfuck Scheduler, but it's unsupported now and probably incompatible with latest kernels. Is there any way to fix this? How do you deal with this? I have a feeling that if this issue was to be fixed, the whole desktop would become way more snappier, even if you're not doing any heavy IO in the background."
Update: 10/23 22:06 GMT by T : As reader ehntoo points out in the discussion below, contrary to the submitter's impression, "Con Kolivas is still actively working on BFS, it's not unsupported. He's even got a patch for 2.6.36, which was only released on the 20th. He's also got a patchset out that I use on all my desktops which includes a bunch of tweaks for desktop use." Thanks to ehntoo, and hat tip to Bill Huey.
use a real OS you cock-smoking faggot!
This issue got so bad for me I switched to FreeBSD.
Isn't this also relevant when using Linux on a server? I mean, if one process or thread is copying a large file, you don't want your server to come to a crawl.
It doesn't sound like just a "desktop" issue to me.
If Pandora's box is destined to be opened, *I* want to be the one to open it.
have you tried ionice?
Do you even lift?
These aren't the 'roids you're looking for.
..download and compile the 2.6.36 kernel. A feature of the changes can be found at http://www.h-online.com/open/features/What-s-new-in-Linux-2-6-36-1103009.html
A very very easy to follow guide can be found at http://kernel.net/articles/how-to-compile-linux-kernel.html
Sidenote - What is up with not being able to paste links? That's annoying.
Con Kolivas is still actively working on BFS, it's not unsupported. He's even got a patch for 2.6.36, which was only released on the 20th. http://ck.kolivas.org/patches/bfs/ He's also got a patchset out that I use on all my desktops which includes a bunch of tweaks for desktop use. http://www.kernel.org/pub/linux/kernel/people/ck/patches/2.6/
Perhaps if Con Kolivas named his scheduler ...named his scheduler something else, it might gain more traction ...
x
After a long time of being frustrated with this on my Ubuntu laptop, I figured out a great solution: Installing Windows.
If the CPU utilization is that low, it's an I/O scheduling problem. See Linux I/O scheduling.
The CFQ scheduler is supposed to be a fair queuing system across processes, so you shouldn't have a starvation problem. Are you thrashing the virtual memory system? How much I/O is going into swapping. (Really, today you shouldn't have any swapping; RAM is too cheap and disk is too slow.)
Using Mandriva 2010.0 (or on any earlier builds for that matter). Not sure if their stock kernel is using scheduling patches or not but the only time I've ever seen slowdowns on my wimpy P4 machine is with really serious oversubscribing to memory, which obvious will turn it into a dog. IO seems to have little to no effect however.
So maybe you just need a better desktop distribution? A newer one perhaps? Don't expect that if you slap just any old distro on a machine and call it a workstation that you get something beyond garbage. I'd expect Suse and/or Fedora to work equally well. Ubuntu is probably doing OK but I wouldn't know. Most of the smaller/less mainstream distros however are quite random, and running something like CentOS on a desktop is just asking for a crappy desktop.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
How is Linux supposed to know that this copying should be done in the background? Maybe I need the copying done before I can do anything else.
There is an ionice command you know. But telling the system what you want would be stupid, everything must be done by magical guesswork instead.
..download and compile the 2.6.36 kernel. A feature of the changes can be found at http://www.h-online.com/open/features/What-s-new-in-Linux-2-6-36-1103009.html A very very easy to follow guide can be found at http://kernel.net/articles/how-to-compile-linux-kernel.html Sidenote - What is up with this comment not showing up when I wasn't registered. That's stupid and annoying.
The 2.6.36 kernel supposedly has a fix for this issue. I haven't been able to test it yet myself, but it sounds like they finally tracked it down. See here for more information.
I've wondered on occasion if this problem is really only due to scheduling. After all, most of us still write our file access code more or less as follows: x=fopen('somefilename'); while ( !eof(x)) { print readln(x,1024); /* ---- */
}
fclose(x);
Point being, there's nothing that tells the marked line that the process should gracefully go to sleep while the drive is doing its thing, and there's no callback vector defined either- nothing that indicates we're dealing with non-blocking I/O. I'd like to think that our compilers have silently been improved to hide those implementation details from us, but I have no proof that this is the case. Unless the system functions use some dirty stack manipulation voodoo to extract the return address of the function and use that as callback vector?
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
"I've heard about the efforts of Con Kolivas and his Brainfuck Scheduler, but it's unsupported now and probably incompatible with latest kernels."
I don't know what you're talking about: http://users.on.net/~ckolivas/kernel/
It's updated for the latest kernel which came out just yesterday.
It is been worked on: http://kernelnewbies.org/Linux_2_6_36#head-738bffb3415051b478ecdfd2eabb0294e35146a9 and http://lkml.org/lkml/2010/10/19/123
Did Con unleash some of his trolls on Slashdot?
Yeah, I think he just did ...
Supposedly the 2.6.36 kernel addresses this issue. I don't know if the problem has been completely fixed, or mostly fixed, or what, since I haven't tried that kernel yet (too bad there isn't an easy way to install kernels in a cross-distro fashion!).
Read the bullet points here, particularly the ones in the middle, as there has been multiple things done to this kernel to improve performance:
http://www.h-online.com/open/features/What-s-new-in-Linux-2-6-36-1103009.html?page=6
Promote true freedom - support standards and interoperability.
I ask this question with utmost sincerity. Folks Over here believe it is indeed dead. I am afraid I agree with them. I hear so little about desktop Linux these days. It's all about iOS, Android and RIM. The future does not appear to be on track to change anytime soon. Now tell me I am wrong and why.
i just run it and let it own the computer for whatever time it takes = anywhere from 10 to 30 minutes, and just walk off, maybe go get a fresh cup of coffee or cold beer depending on where i am and what time of day it is. one thing i dont want is a borked copy because i was too impatient to let it do its job.
Politics is Treachery, Religion is Brainwashing
I can remember that even as far back as 1999 I saw this issue with Linux. This is not bad only for the desktop, but also for the server. I have also experience with Solaris workstations and servers, and it usually doesn't behave this way.
I ran into the same problems and ended up switching to the "deadline" scheduler. Haven't had a single problem since. I changed it via the "elevator=deadline" on the kernel boot prompt, but you can change it on the fly for individual devices. See Configuring and Optimizing Your I/O Scheduler to see how.
This is not a thread scheduling issue, it's a disk scheduling issue. If CPU utilization is only 1-2% and things aren't snappy then the issue is because the foreground process's I/Os aren't given higher (high enough?) priority. Easy enough to believe too, a whole lot of writes get cached and then queued up. With an elevator algorithm they'll likely all get performed before any reads required by the foreground process.
"noop scheduler: just service next request in the queue without any algorithm to prefer this or that request."
I remember using OS/2 (IBM's desktop OS) and i was always amazed that you could format a floppy and do other tasks like nothing else was going on. I never did understand why that never seemed to make it into the mainstream.
Use a desktop OS, such as Windows 7, not a server OS, such as Linux.
This is not a case of Linux IO schedulers being unsuitable for the desktop, but more a case of desktop applications being written in a horrendous way in terms of data access. The general pattern being to open up a file object, load in a few hundred kilobytes, processing this then asking the operating system for more. This is a small inefficiency when the resource is doing nothing, but if the disk is actually busy, then it will probably be doing something else by the time you ask for it to read a little bit more. Not to mention the habit of reading through a few hundred resource files one at a time in seemingly random order, and blocking every time it reads, because the application programmer is too lazy to think about what resources the app is using.
Linux has such a nice implementation of mmap, which works by letting Linux actually know ahead of time what files you are interested in and managing them itself, without the application programmer worrying his pretty little head over it. Other options are running multiple non-blocking reads at the same time and loading the right amount of data and the right files to begin with.
The best thing about a simple CSCAN algorithm is that it gives applications what they asked for and if the application doesn't know what it wants, well, that's hardly a system issue.
When Argumentum ad Hominem falls short, try Argumentum ad Matrem
This is almost certainly not the IO scheduler's problem. IO scheduling priorities are orthogonal to CPU scheduling priorities.
What you are likely running into is the dirty_ratio limits. In Linux, there is a memory threshold for "dirty memory" (memory that is destined to be written out to disk), that once crossed, will cause symptoms like you've described. The dirty_ratio values can be tuned via /proc, but beware that the kernel will internally add its own heuristics to the values you've plugged in.
When the threshold is crossed, in an attempt to "slow down the dirtiers", the Linux kernel will penalized (in rate-limited fashion) any and every task on the system that tries to allocate a page. This allocation may be in response to userland needing a new page, but it can also occur if the kernel is allocating memory for internal data structures in response to a system call the process did. When this happens, the kernel will force that allocating thread (again, rate-limited) to take part in the flushing process, under the (misguided) assumption that whoever is allocating a lot of memory is the same thread that is dirtying a lot of memory.
There are a couple ways to work around this problem (which is very typical when copying large amounts of data). For one, the copying process can be fixed to rate limit itself, and to synchronously flush data at some reasonable interval. Another way that a system administrator can manage this sort of task (if automated of course) is to use Linux's support for memory controllers which essentially isolates the memory subsystem performance between tasks. Unfortunately, it's support is still incomplete and I don't know of any popular distributions that automate this cgroup subsystem's use.
Either way, it is very unlikely to be the IO scheduler.
Gee, most of us *nix people - what did that guy call us, something about smoking roosters over small pieces of wood - know that when you need to copy a few gigabytes in background, you use "nice" and crank the priority way down. This has been around since something like 1975 or so.
Don't take life too seriously; it isn't permanent.
Hey, I'm all for grabbing a beer any time of the day, but surely you don't think watching a YouTube video, sending emails, playing chess, or shopping online on your machine as it is copying a file in the background will "bork" the copy. I would toss any O/S that would do such thing.
Is that a roll of dimes in your pocket or are you happy to see me?
There's another massive problem with I/O scheduling on Linux: all of the schedulers are designed for physical disks. With solid state drives as opposed to physical spinning platters, a ladder algorithm is useless and only serves to reduce performance. With solid state drives, the best scheduler is currently noop, which doesn't implement priorities. I prototyped a lottery based scheduler for a class that would allow ionice to be used in a sensible way on solid state drives, but never got it into a state where it didn't crash the kernel. The whole system does seem a little massively out of date.
OP didn't state that he wants to have a fine tunable kernel schedulin algorithm. He stated a problem and is looking for a solution. So, if the problem has already been fixed in another system (So you don't need to tune the kernel scheduling algorithm there... It just works), it is irrelevant whether you actually could tune the kernel scheduling algorithm if you wanted to.
Not saying, that GP wasn't a troll or a flamebait - he obviously was. But just noting that your answer didn't really refute his post in any way.
On my current openSUSE 11.3 install I've only observed severe slowdown whenever I read/write large amounts of data from/to NTFS partitions. Similar operations that only involve ext4 largely remain unnoticed. My best guess would be, that the NTFS-3G driver was written around a spec that was for one thing closed and, perhaps more importantly, not designed with the Linux kernel in mind.
If you are doing something non-interactive that uses a lot of I/O, use IOnice. experiment, but I find
ionice -p [pid] -c 2 -n 7
to produce reasonable results.
My absolutely puny hardware (all 5+ years old, or netbooks) does not experience this problem at all running different releases of Ubuntu. I did notice that Transmission sometimes chewed up too much processor when I had 10+ torrents going, but my bulk drive was NTFS. After I formatted it to ext4, even that went away. I routinely copy multiple GB files intra-drive, inter-drive, and intranetwork while browsing, youtubing, etc.
Maybe you're using an NTFS filesystem that isn't as efficient?
Again, my hardware is majorly obsolete. My only "multicore" setup is on a hyperthreading Atom.
--why?
FYI, the IO scheduler and the CPU scheduler are two completely different beasts.
The IO scheduler lives in block/cfq-iosched.c and is maintained by Jens Axboe, while the CPU scheduler lives in kernel/sched*.c and is maintained by Peter Zijlstra and myself.
The CPU scheduler decides the order of how application code is executed on CPUs (and because a CPU can run only one app at a time the scheduler switches between apps back and forth quickly, giving the grand illusion of all apps running at once) - while the IO scheduler decides how IO requests (issued by apps) reading from (or writing to) disks are ordered.
The two schedulers are very different in nature, but both can indeed cause similar looking bad symptoms on the desktop though - which is one of the reasons why people keep mixing them up.
If you see problems while copying big files then there's a fair chance that it's an IO scheduler problem (ionice might help you there, or block cgroups).
I'd like to note for the sake of completeness that the two kinds of symptoms are not always totally separate: sometimes problems during IO workloads were caused by the CPU scheduler. It's relatively rare though.
Analysing (and fixing ;-) such problems is generally a difficult task. You should mail your bug description to linux-kernel@vger.kernel.org and you will probably be asked there to perform a trace so that we can see where the delays are coming from.
On a related note i think one could make a fairly strong argument that there should be more coupling between the IO scheduler and the CPU scheduler, to help common desktop usecases.
Incidentally there is a fairly recent feature submission by Mike Galbraith that extends the (CPU) scheduler with a new feature which adds the ability to group tasks more intelligently: see Mike's auto-group scheduler patch
This feature uses cgroups for block IO requests as well.
You might want to give it a try, it might improve your large-copy workload latencies significantly. Please mail bug (or success) reports to Mike, Peter or me.
You need to apply the above patch on top of Linus's very latest tree, or on top of the scheduler development tree (which includes Linus's latest), which can be found in the -tip tree
(Continuing this discussion over email is probably more efficient.)
Thanks,
Ingo
Con Kolivas released a patch set against the 2.6.36 kernel just a few days ago. Check lkml.org.
C|N>K
i have never seen such a problem... i play music off of a mounted drive, read/write to another mounted partition all the time... move files from here to there... absolutely no noticeable slowdowns coz of that...
but hands down, windows 7 beats the pants off any linux distro (or even mac for that matter)... love it completely! very professional work...
Sometimes I see my system get bogged down doing copies and even lock up for a few seconds. And when I see this I always become very nervous because usually it means I have a hard disk failure of some sort (sometimes it can be just a bad connector, but still a hard failure).
I just copied about 5GB from my 5 disk raid5 to my main partition WHILE I copied about the same amount of data back TO that same raid WHILE watching a video from that raid and didn't see much of an issue. I saw one little "stutter" for about 100mS and that was it.
I am using LVM. And the braindead way ubuntu configures LVM to put swap in the same LVM partition as your home and system directories DOES cause all sorts of nastiness. This used to drive me nuts before I fixed it simply by disabling swap. I have 8GB of RAM, why do I need swap?
Edit: this "new and improved" page formatting SUCKS. Now the buttons that were right there are hiddden behind DUMBASS popup menus. Fucking "engineers" thinking they know how to improve shit. How do I turn this bullshit off?
it doesn't do much for I/O. ionice, a much newer program does something similar to what 'nice' does for CPU for I/O intensive tasks. It's pretty good, not as good as nice is for cpu-bound tasks, but eh.
I would definitely ditch an OS that fucked up a file copy because I used the computer for something else while I was waiting.
In Arch Linux I installed the 2.6.35 kernel with BFS enabled, and I found that it gave too much priority to user input. When playing a standard definition xvid file, it would literally pause for a second while opening Firefox. The default scheduler might have problems, but it will keep playing the same type of file just fine, while also opening something like a web browser.
Here's my experience with this issue:
I develop a camera surveillance system. So, I see machines with constant I/O. 25fps (PAL) on several cameras. So, an average system configured in full-recording will be saving 200fps to disk. All the cameras are shown on the screen through SDL on X11. When WD rolled out it's new 4k sector sized disks, I had to figure out how to make them work properly on GNU/Linux, since they came with a special format utility for XP, but no docs on what was required on GNU/Linux. They report to the machine a 512b sector size, but I know they are 4k. So, I managed to align partitions with cylinders properly, and the disk performance spiked. If you didn't do this, partitions and cylinders would end up unaligned, and disk performance would suck. The whole machine would slow down badly, even the SDL windows showing video. That video comes straight from V4L, and disk-output happens later in another thread (i.e, no the same doing the SDL visuals). SDL windows would only show 4-5 fps for several seconds (after every disk flush), then go back to normal.
There are serious I/O issues. Actually, the IO model is ok for servers, but for anything realtime, it fucking sucks.
WTF am I doing replying to an AC at 5 A.M on a Friday night?
I see your mistake. Linux is not a "desktop OS", it is an advanced OS for production systems or people who like to tinker. Pretending it is otherwise will only cause grief in the long term.
what you need is http://code.google.com/p/pagecache-mangagement/ This tool allows the user to limit the amount of pagecache used by applications under Linux. This is similar to nice, ionice etc. in that it usually doesn't make an application go faster, but does reduce the impact of the application on other applications performance. This is especially useful for applications that walk sequentially through data sets larger than memory, as discarding their pagecache does not reduce their performance (although this tool does add overhead of about 2%). See http://lwn.net/Articles/224653/ Although it is little more than a proof-of-concept it seems to be fairly useful. When running pagecache-management.sh dd if=100-mb-file of=foo or pagecache-management.sh cp -a /usr/src/linux-2.6.20 /usr/src/foo
I would definitely not let a monkey like you get near my computers if some intense file copy was going on and they wanted to start doing other things while that was going on, sure you can do it but that does not make it a prudent thing to do, and the file may copy over just fine, and it may lose a few bits without even reporting any errors and that can happen on any OS, BSD, Linux, Winders & etc...etc...etc...
Politics is Treachery, Religion is Brainwashing
Cooooonnnnnn!
My point was only that a system which is totally bogged down by disk IO is somehow not exactly optimum and that a lot of the less mainstream distros don't exactly seem to know a heck of a lot about tuning or what patches to use, which CAN be an issue. You want a desktop specific distribution from a reputable mainstream source.
People expect that some guy in his garage somewhere knows how to put together a well built OS, but that is largely a vain hope. The Linux kernel is a wonderful piece of software, but that doesn't mean you can't horribly misuse it and get bad results.
No doubt many older releases work fine too. I can't recall having this sort of problem in any version of Mandriva for instance, at least not in years. I don't think my particular system is special in that respect, I just use a reasonably high quality distro and things work!
Of course the OP could be dealing with some horrible bad specific device driver or hardware too. That can make a significant difference. I think the flaw in his thinking is generalizing this as a Linux desktop issue and not a hardware or OS specific issue that HE has.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
If you just change your I/O scheduler to anticipatory this should go away. I think the simplest in Ubuntu is to add "elevator=anticipatory" to your kernel command line arguments. This is done differently in GRUB and GRUB2, so fgi.
Error 404 - Sig Not Found
The problem is this: Let's say you do an action that reads 5 blocks on the disk. While the system is idle it has nothing else to do so your 5 blocks are read immediately, super fast.
While the system is doing some other I/O intensive job, it might be doing 500 block reads at the same time. Everything goes in the same queue, so your task is only %1 of the requests that have to be done in a set time. Result: Your task takes 100 times longer.
This is the problem that all the scheduler are trying to solve: trying to be fair so that every task gets a reasonable share of priority, while keeping performance at an optimum level.
For example, some O/S researchers have tried to implemented multiple-tiered system where every I/O is tagged with flags that indicate if the call came from an interactive user action, or was generated by non-interactive jobs (daemons, lower-level layers, etc...) and then give higher priority to the user requests. Two problems with that approach is it can be very hard to differentiate the two and that any heavy user task may prevent system tasks to work in a timely fashion and the user tasks may depend on the system tasks to complete their jobs in order to proceed; vicious circles and race conditions.
I'm glad I'm not trying to code a kernel scheduler, they're very hard problems and figuring one out that can be fair for all types of uses is nigh impossible.
The great thing about the open source O/Ss is that everything's done in the open, there's intense discussions going on about in the field, and there's multiple solutions being worked on and tested.
To me, Linux has always felt like it gave much higher priority to I/O than the "user experience". It's something I've come to expect. If I copy gigabytes from a disk set to another I gladly accept that my web browser's going to be sluggish for a time, all the while feeling content that at least it's going to be done so efficiently that it's going to last for the shortest amount of time possible.
Other O/Ss that I won't name may "feel" better, but have nowhere near the same I/O throughput that Linux has.
Funny, I do that constantly and have never seen any corruption even possibly attributable to it on any system besides Windows.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
I'll chime in too about not having this problem. I've used Linux since late 2004. I've used Fedora 3, 4, 6 and 12, Ubuntu (since Edgy) and Gentoo (my main Linux distro), with Ubuntu in VMs sometimes -- all these across six or seven machines. I've never had the desktop grind to a halt with heavy I/O. In fact, I was relieved that unlike Windows, the harddrive can be grinding away and I can still actually use my computer and start programs in a reasonable amount of time. So when I first started hearing about this issue recently, I was quite surprised and thought maybe it was just a localized config problem or something specific to certain people's hardware. Now that everyone on Slashdot seems to be having this problem, I can't help but wonder how I managed to be so lucky all these years. WTF is going on?
My experience is that when IO affects "feel", it's mostly because of increased competition for memory. Someone else mentioned tuning the vm.dirty_* sysctls, which certainly helps, but what surprises me is that we don't use O_DIRECT and splice/vmsplice more. "cp" is still a loop of 32k read/writes, with only the obvious O_ modes on either descriptor (and needless to say, no fadvise either.)
In short, the kernel offers fine mechanisms to resolve these problems - it's user-space that isn't taking advantage of them.
Lose a few bits? Are you being serious? Despite what you may believe, during the file copy which you walked away from, several hundred processes continue to run in the background and do what they're supposed to, doing things way more complex than your file copy. You think X stops doing what it's doing just so the file can be copied? Or the kernel drops everything it's doing except process the file copy? The very simple movement of a mouse fires hundreds of IRQs each second which are services by a service routine. Watching a YouTube video is no different.
Is that a roll of dimes in your pocket or are you happy to see me?
You're looking at a bug that was fixed some time ago but those patches didn't make it into stable kernel yet. It should be fixed for everyone in 2.6.37 (about 3 months from now as 2.6.36 came out 3 days ago). In the mean time, you can grab those patches and compile your own kernel.
when the kernel accesses the slow disk, it is aggressive in trying to cache the read. if there's free memory this is obviously the correct thing to do, since if the memory is needed the cache can be dropped. but if memory is full, the kernel needs to decide whether to drop some file cache, or swap out a process. the default settings tend to favor disk cache, meaning every time you try to access anything on the desktop, the application has been swapped out and it has to wait for disk access to swap back in (often several seconds on my machine)
setting /proc/sys/vm/swappiness to a low value, eg 0, tells the kernel to favor processes at the expense of caching disk reads. this helps a lot in keeping the desktop snappy. kernel trap has a good summary of the issue and the developers motivations
swappiness doesn't help with applications that want to access a file repeatedly, but rely on the disk cache instead of an internal cache. eg, an IDE might have 10 source files in tabs, but instead of keeping the files in memory, it could just reread them each time a tab is switched. as long as the file remains in cache, this works fine. but when you copy a huge file, the source file gets dropped from cache, and the tab takes forever to refresh
not sure if there's an easy way for the kernel to know the difference between an application just copying a file, and actually reading it. but if there is, it would make sense to favor reads
My blog
i just run it and let it own the computer for whatever time it takes = anywhere from 10 to 30 minutes, and just walk off, maybe go get a fresh cup of coffee or cold beer depending on where i am and what time of day it is. one thing i dont want is a borked copy because i was too impatient to let it do its job.
I see you're a long-time Windows 95 user.
- Tell me about multitasking, daddy!
- Hold on, boy, the floppy didn't finish formatting yet.
I've encountered situations where I'm trying to do something online and a task starts up due to a cron job that builds some kind of index. The index building should be in the background but somehow takes priority over what I'm doing on the desktop. Those kinds of cron jobs should be default scheduled in the background, not take priority over what is happening on the desktop.
I ditched Windows for this reason, among many others.
You're insane. If your computer ever silently drops "a few bits" while copying stuff, there's something seriously wrong with your OS or your hardware, and things will break whether or not you're using the computer while copying. You might as well sacrifice a chicken to make sure the data transfer works, it'll have about the same effect.
Switch back to Slashdot's D1 system.
The only way NTFS would "eat" a file is if you were writing a new file, and the system crashed before it was completed. In that case, to make the FS consistent, the file will not be there as having it there would be inconsistent. However in the case up updating an existing file that isn't what happens. If the system crashes during updating a file, the file will be rolled back to the state before the write.
As you say, this is how a journaled system works. Only once a write is complete, once things are consistent, is it applied in a permanent fashion. If a crash happens and the disk would be in an inconsistent state, the journal is used to roll things back so that everything is consistent.
My guess is he's confusing ext4 with NTFS in a case of wishful anti-MS thinking. Ext4 had a case of the "nom noms" with regards to files because it could delay writes for so long. Because of the way some programs choose to update things like bookmarks and config files, they could vanish in a crash. This has been fixed, of course, but it was a problem initially you can search Slashdot for it. That is perhaps what he was thinking of.
A modern OS should be able to deal with being asked to do more than one thing with its disk. There will, of course, be a slowdown as disks are not good at random access, but it should not bring the system to a halt. At work all the time I copy large data files around. 10-100GB videos and VMs. I copy them between local drives, and to servers on the net and so on. System works fine when this is going on. Webbrowsing is fast and responsive, e-mail has no problems, everything works as normal. Only time you notice a slowdown is if you try and do something else disk intensive. Copy a VM on a drive and then boot another VM from that drive and both the boot and the copy slow down as the drive jumps back and forth. However it still works just fine, and the system is still responsive.
This is not too much to ask, this is how it should work.
But it fucked it up with highest possible bandwidth, and in a way that's O(1) scalable up to 1024 processors in a NUMA cluster. Don't you care about server fuckup performance?
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
I've been seeing something similar for sometime too but thought it might be an isolated case. Now I've just searched I notice there is a Slashdot comment paste issues in Chrome which describes what I see (very slow pasting doesn't necessarily succeed in pasting).
That's great that you post your experiences with server scheduling in a topic about desktop scheduling. It's so relevant. No wait, it's not.
College-Pages.com - Online Colleges, Degrees, and Programs
If you are using the CFQ I/O scheduler on Linux a process' nice value also impacts its default I/O priority. From the ionice man page:
For kernels after 2.6.26 with CFQ io scheduler a process that has not asked for an io priority inherits CPU scheduling class. The io priority is derived from the cpu nice level of the process
I would definitely not let a monkey like you get near my computers if some intense file copy was going on and they wanted to start doing other things while that was going on, sure you can do it but that does not make it a prudent thing to do, and the file may copy over just fine, and it may lose a few bits without even reporting any errors and that can happen on any OS, BSD, Linux, Winders & etc...etc...etc...
You sir, are a perfect specimen of a BOFH. You only have a dim notion of what actually goes on inside those mysterious boxes that are unfortunately left under your care. And yet, by some curious accident of nature, you've been entrusted with root passwords for said boxes. You use phrases like "intense file copy" like they mean anything. You place every idiotic restriction that you can think of on the users of said boxes (who, incidentally, are almost always smarter and more qualified than you in whatever field of work they're in) by using words like "prudent" and "safety"... or god forbid... "security". You actually think that because I run a second program along with your "intense" copy, it can result in loss of "a few bits without even reporting any errors" due to what ? The magical fairies that dance inside those little chips getting angry ? Tired ? Can you do everybody a favor and reduce the amount of utter nonsense emanating out of that tiny, befuddled brain ?
Assuming your SSD is detected correctly, the Linux block layer maintainer is proposing changes to improve SSD performance. The idea of waiting for requests (so as to be able to reorder them in a ladder fashion) is not used on SSD devices since 2.6.28 though.
I would guess the Slashdot article about painful fsync behaviour on ext3 was "Kernel Hackers On Ext3/4 After 2.6.29 Release".
(And wow - a developer who still reads and posts to Slashdot! I've got to ask, which tech news site did you all migrate to in the end?)
I tried using a USB hard drive to back up about 30 gigs of data from my CentOS 4.8 server. The IO wait on the system shot up to 80% and I had to kill the process since it brought the machine to a crawl and the other processes were taking forever to complete. Something as simple as copying a file to a USB drive should not cause the system to slow to a crawl and become no long functional.
AFAIK there are only two I/O schedulers remaining in recent Linux (and if you squint you might say that RHEL 5's kernel could have been related to 2.6.34 at one point right? :) - CFQ and deadline (three if you count noop I guess). The anticipatory scheduler was removed in 2.6.33...
Somewhere, a guy just won a bet, that he could get slashdot to print the word 'brainfuck' on its front page. And not, mind you, on April Fool's Day.
His elaborate scheme has paid off, and it might even slip under the radar, and remain undiscovered. I'm watching you. /* uses 2 finger gesture for 'watching you'
WARNING: Smartphones have side effects--most of them undocumented.
You've got it. Why are you fsync()ing so often for a userland app with trivial data. There are so many better ways to do this.
The right one is... why am I still bothering with crappy desktop Linux in the first place?
splice/vmsplice only work if the source or target is a pipe. Completely useless for file-to-file copy operations.
I wanted to write a lengthy rebuttal here explaining how computers work, but my computer is busy with a seriously *intense* copy right now so I don't want to chance it.
"When I'm copying a large amount of data in the background, everything else slows down to a crawl while the CPU utilization stays at 1-2%"
maybe the network card is crap or the wiring is faulty? myabe the HDD is near death?
"have you tried ionice?" - by larry bagina (561269) on Saturday October 23, @02:43PM (#33997928)
http://sourceforge.net/projects/ultradefrag/forums/forum/709672/topic/3690136
Find the analog to that in Linux' API's (thread and process level priority control code), & integrate it into your own open sourced code for projects you think needs it (which I imagine MOST of you around here don't operate fluently at sourcecode levels, some do I wager, but most not & even if you do, it takes time to study the flow of any project first).
Still, per your idea? Well - That's how I'd go about it I suppose at this level, and you open source crowd do have that much going for you, which could work out nicely at times.
APK
P.S.=> See, I figure it this way, especially to those of you that code: You've got threads now in Linux (since what, kernel 2.2 or so?? All I know was that around 1998/1999, the Linux kernel wasn't preemptible or re-entrant, & that meant no threads @ the kernel level (usermode "round-robin to kernel" cooperative threads do NOT count, ala Windows 3.x)), vs. forks only, thank goodness...
So, that all said & aside? Well, for "the industrious" & skilled amongst you, yes, this IS doable, because if I can do it in Windows, along with others now as shown above in the UltraDefrag64 project (& even show others also as I have for a 64 bit defragger in Windows) you can in Linux as well is my guess, even across languages (Object Pascal to C/C++ &/or Win32/64 API calls) and you additionally have your "Fresh Meat" sites etc. & Open code to work with... apk
Well, I don't even recall something like that in NT branches of windows.
If I need to move around huige amount of data, I use a commandline. Let it be Windows or Linux, it makes it easier on the system.
Tomorrow is another day...
What do you use? Genuinely curious.
I'm using ext3 on a 1TB WD Green. I used to experience huge fsck times in Karmic (hours), but ext3 in Lucid seems fine except that it takes 45 seconds to create a directory if I haven't created on in the last 5 min.
Are the "exotic" filesystems good for normal use and low fsck times? Is Reiser dead? Has btrfs reached a fork in the road?
I'm not a lawyer, but I play one on the Internet. Blog
Just to mention a very bad app, I'll say Evolution. If there is any heavy disk I/O it becomes ultra sluggish. And you're asking for coffee break especially when you're changing from mail folder to another or deleting messages. I was practically forced to use ionice with any disk to maintain acceptable performance with Evolution. (2.6.32-25-generic #45-Ubuntu SMP x86_64 GNU/Linux / GNOME evolution 2.28.3)
http://www.redhat.com/magazine/008jun05/features/schedulers/
I'd like to buy homeland for our 10 million people. http://twitter.com/mahadiga
"When I'm copying a large amount of data in the background, everything else slows down to a crawl while the CPU utilization stays at 1-2%."
I think what you describe is due to kernel bug 12309 and it looks to be fixed in 2.6.36. See https://bugzilla.kernel.org/show_bug.cgi?id=12309 and git commit http://git.kernel.org/linus/e31f3698cd3499e676f6b0ea12e3528f569c4fa3
Is this also an issue under Freebsd ? Im thinking of switching to one of the BSD desktop OS'es.
... i switched to FreeBSD, years go.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
I'm 48, but you sound to me like an old dude. Why would I NOT buy 8GB of ram? I use a big chunk of it for my thumbnail directory and other temp data that would otherwise go to a real hard disk, and it's NOT expensive. Well, maybe it is NOW but when I bought it it was cheap. Funny to think I could actually have MADE money by investing in RAM back then, as the 4GB "kits" I use are now about twice what they were when I bought them.
Anyway, who uses hibernate? Did you not see the comment about my five disk raid? This isn't a notebook. Who uses hibernate when they have torrents, freenet, and other p2p channels to make use of? You never get a good priority when you're constantly going on and off line.
I don't care if you can do oodles of HTML, if you can't paste the bloody link into the bloody white box, then all that HTML isn't worth diddly squat.
One of the 2.6.36 patches explicitly mentions addressing poor responsiveness when doing IO on slow (e.g. USB) devices. The CentOS 4 kernel seems to be a heavily patched 2.6.9 though...
I don't know about how default conditions apply, but with CFQ, you should learn how to use 'ionice'. When an I/O bound process is assigned, "idle", it goes completely 'idle' when any other process competitively wants I/O. So complaining about I/O processes swamping userland, is showing a configuration problem, not a problem with the scheduler.
What more do you want? This is the whole reason different I/O schedulers were added.
If you don't want the I/O to go into cache, then make sure your I/O heavy processes use the posix_fadvice( fd, , , POSIX_FADVIS_DONTNEED);
So what's the problem. You have a way to control priority as well as usage. What more do you want?
A refinement, of possible benefit, would be a param to limit the cache-pages/process in memory at any point. That could be another way to address the issue. Hmmm...
I would definitely ditch an OS that fucked up a file copy because I used the computer for something else while I was waiting.
Really +4 Insightful for what is obviously a Trollish, poor choice of curse words language choice. Okay lets run with your hypothesis, consider the following...
Than you have already ditched Windows, and if not you could not help but note that all new Windows OS development is using Linux development strategies, philosophies, etc, heck they even use the term kernel now too.
Not just Windows either, the Mac is built on Unix/Linux as well, they just call it OS X.
Pretty soon you will not have an operating system to use, based on your own definition, that is unless you step up and start helping to develop open source and Linux specifically. That way all the great new stuff will eventually get ported over whatever non-Linux operating system you are using, be it Windows 7 or OS X. Regardless Linux will have it first!
Of course that would mean you could not just complain, but you would have to have a solution as well...now that would NOT be Troll-like...that would be my suggestion to you. Don't complain unless you are going to provide a solution. No solution, then you must be a troll.
How about switching to Solaris 10? It has no scheduling issues: it is lightning fast, even during very heavy I/O.
you all trust and have much more confidence in computers than I do, I surely don't trust them and have very little confidence in them
Politics is Treachery, Religion is Brainwashing
Sometimes poor I/O scheduling is because the I/O scheduler can't see the requests at all. /sys/block/sda/queue/nr_requests, it is by default some low number like 128.
Look at
Considering a few I/O requests, and their associated read-aheads, it can quickly fill up.
At that point no further requests are even *seen* by the I/O scheduler until it empties some of the queue.
Did you try setting it to some higher number like: /sys/block/sda/queue/nr_requests
echo 4096 >
http://linux.die.net/man/1/ionice
I often note that multiple simultaneous low-priority file copies implemented as:
run faster than multiple simultaneous high-priority copies implemented as:
If the copies are run one at a time, the higher priority rsync runs faster. For multiple copies, often the lower priority rsyncs run faster. Also, desktop usability is much improved with the lower priority rsyncs.
I suspect a priority inversion occurs inside the file systems write back cache. At regular priority levels, data is not written back to disk in a timely manner. The ionice -c 3 gives the disk caches a higher priority than the rsync I/O commands, preventing the I/O commands from filling the cache and creating a priority inversion.
The Gnome GUI in Ubuntu is particularly vulnerable to this priority inversion, as by default it does multiple copies simultaneously inside a separate window. Ubuntu usually performs better than Windows however. Between the A-V software in Windows, and the tendency to swap applications out of memory to maximize disk cache, Windows usually performs the same copy operations more slowly than Ubuntu and with less system responsiveness.
Didn't even knew it was there. Thanks, works perfectly.
You actually think that because I run a second program along with your "intense" copy, it can result in loss of "a few bits without even reporting any errors" due to what ? The magical fairies that dance inside those little chips getting angry ?
Actually, he's not wrong. If you have broken hardware (whether broken-by-design or just ordinary faulty stuff), there are a lot of failure modes which might not show up under light load but will show up under heavy load. Examples of this kind of thing really do show up in the field. However, the solution to it is not to live in fear, but instead to identify hardware which is solid under any load condition and buy that hardware rather than cheap shitty junk which breaks if you look at it crosseyed.
Here's just one example of how hardware might be able to handle light loads but not heavy: power distribution. As you load a chip more, it consumes more power. The supply voltage is ideally held constant, so more power implies greater current (since P = current * voltage). However, the regulator typically can only regulate the voltage at the chip's connections to the circuit board. Inside the chip, there are "wires" distributing power, and they have resistive losses like any other type of wire. The more current that's flowing, the greater the voltage drop. If there's too much drop, the supply voltage for some circuits inside the chip might dip below the point where they can operate reliably.
So in other words, it's quite possible to design a chip which functions very reliably under light load, but under heavy load some of its circuits don't get sufficient voltage and start malfunctioning.
That doesn't seam to be true.
http://stackoverflow.com/questions/1580923/how-can-i-use-linuxs-splice-function-to-copy-a-file-to-another-file
This is something I was wondering. As I understand it splice/sendpage just marks a page belonging to a block of one filesystem as a page of a block of another filesystem. Thus no memory copying, the same RAM the data was read from one disc is used to write that data to another disc. So I think markhahn has a point, the problem is userspace, not the kernel. The cp code could be faster if it used splice/sendpage zero copy stuff (when dealing with filesystems that it is possible to do so (i.e non-FUSE)).
BusyBox certainly doesn't http://git.busybox.net/busybox/tree/libbb/copyfd.c
The GNU cp doesn't either:
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/copy.c
Could be an interesting test.
Because copying over a big file and doing other things is too taxing for a modern machine? Seriously?
Most shocking i/o behaviour like this I've suffered from has been a result of crap RAID controllers. Big cache, shoddy drivers/firmware/hardware, and linux i/o scheduling doesn't get a look in. Potentially you find your one critical read stuck behind 512Mbytes of poorly performing writes all within the confines of the card.
jh