The State of Linux IO Scheduling For the Desktop?

It sucks I agree by Anonymous Coward · 2010-10-23 06:40 · Score: 4, Interesting

This issue got so bad for me I switched to FreeBSD.

Re:It sucks I agree by Anonymous Coward · 2010-10-23 07:06 · Score: 4, Insightful

We switched our dedicated web servers from Linux to FreeBSD and OpenSolaris. When we upload videos (usually 10GB or larger) over our 100Mbps internet connection to the server, or a client was downloading the videos, those who were accessing the server using the web server complained it took seconds serve each web page. The videos were on a magnetic hard drives, the OS and web server was on SSDs (which was mirrored in RAM). Server logs were fine, CPU utilisation was low, the servers have 1Gbps connection. We put it down to I/O scheduling. Switching the OS solved the problems.
Re:It sucks I agree by ObsessiveMathsFreak · 2010-10-23 07:13 · Score: 4, Interesting

This is the number one problem with all Linux installations I have ever used. The problem is most noticeable in Ubuntu where, any time one of the frequent update/tracker programs runs, the entire system will become all but unusable for several minutes.
I don't know if it's all that related, but swap slowdown is an appalling issue as well. If a single program spikes in RAM usage, I often have to reboot the whole system as it hangs indefinitely. As I work with Octave a lot, often a script will gobble up a few hundred megs of memory and push the system into swap. Once that happens, it's often too late to do anything about it as programs simply will not respond.

--
May the Maths Be with you!
Re:It sucks I agree by Lord+Byron+II · 2010-10-23 07:22 · Score: 3, Interesting

That's exactly why I stopped using swap a couple of years ago. On my main machine I have 3 GB and I feel that if I reach the limit on that, then whatever program is running is probably a lost cause anyway. The next malloc/new causes the program to crash, saving the system.
Re:It sucks I agree by lsllll · 2010-10-23 07:36 · Score: 3, Informative

Such drastic change! I have seen this happen on numerous systems and I just change the elevator to "deadline" and poof! The problem is gone. See this discussion for some details. The CFQ scheduler is great for a Linux server running a database, but it completely sucks for desktop or any server used to write large files to.

--
Is that a roll of dimes in your pocket or are you happy to see me?
Re:It sucks I agree by Anonymous Coward · 2010-10-23 07:43 · Score: 4, Informative

swapoff -a && swapon -a
will force everything back into memory
Re:It sucks I agree by pegdhcp · 2010-10-23 08:05 · Score: 2, Insightful

swapoff -a && swapon -a
will force everything back into memory
Exactly, and it is a very good solution when things get out of hand. At least once or twice a month I recycle swap in order to prevent an imminent freeze. Maybe every 4 or 5 months I lost the system state for I/O scheduling problems on my desktop. On the other hand, my servers usually reboot only during power maintenances and/or fried hardware cases...
Re:It sucks I agree by dogsbreath · 2010-10-23 08:11 · Score: 2, Informative

Solaris is still a terrific operating system. Every version has made advances in stability and performance, particularly with disk and network i/o. SMP, threading are very mature as well.
I doubt that Oracle will be able to do anything with it except bury it. Too bad.
Re:It sucks I agree by Sir_Lewk · 2010-10-23 08:34 · Score: 2, Informative

[user@host] ~ % time rm Fedora-13-x86_64-Live.iso rm Fedora-13-x86_64-Live.iso 0.00s user 0.00s system 40% cpu 0.002 total
Huh-wha?
Deleting a file is no more intensive than renaming it. Both should complete in constant time.

--
"linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)
Re:It sucks I agree by ChipMonk · 2010-10-23 08:42 · Score: 4, Informative

It's more than that. Since most Linux systems use ext{2,3,4}, CFQ is designed to behave very well with them. However, XFS and JFS do better with deadline or no-op. In fact, on my Athlon 64 X2 w/ 4G RAM, using XFS and CFQ at 2.5GHz did worse than XFS and deadline at 1GHz. Yes, CFQ and XFS clash that badly.

(Site pimp: I did some of my own testing, and reported on it here. I also provide basic shell scripts, so others can do their own tests.)
Re:It sucks I agree by makomk · 2010-10-23 09:12 · Score: 4, Informative

For years I've wondered if it was just me; everyone I'd asked naturally denied any problems, when all I had to do was delete a 1GB file and I could kiss goodbye to my system for 20 seconds or so.
That's a very well known ext2/ext3 problem - they're really slow at deleting huge files, and the amount of disk access involved in doing so slows down any other application accessing the disk. ext4 should fix the issue. (There's also another subtle bug, finally fixed in 2.6.36, where heavy disk IO can cause processes that aren't doing any IO to become unresponsive.)
Re:It sucks I agree by Yokaze · 2010-10-23 09:45 · Score: 4, Insightful

Renaming is O(filename), usually a single table entry. Deleting is probably more along the lines of O(filesize).
You have to keep track of the free blocks, too.

--
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
Re:It sucks I agree by julesh · 2010-10-23 09:47 · Score: 4, Informative

Depends on the paranoia of the user. FTFY.
Any "sane" filesystem will simply unlink that entry in the directory or table.
The only reason to be physically overwriting the entire space occupied by the 1GB file is some "super secret secure" filesystem used by people scared of having their porn browsing habits discovered by the FBI.
The problem isn't overwriting the data, it's adding the space previously used by the file to the free space bitmap/list. For a 1GB file on an FS with 1k blocks (not uncommon), you're going to be deallocating about a million blocks. Now, unless your system is fragmented horrendously, a lot of those are going to be hits to the same bitmap block (or similar), but you're still looking at writing about 5,000 or so blocks, probably scattered over several cylinders of your disk (=> more than one seek), so on a typical hard disk the process is going to take tens or hundreds of milliseconds at best. If badly fragmented, this could easily take over a second.
Re:It sucks I agree by hackstraw · 2010-10-23 11:21 · Score: 3, Informative

I'm actually reading a book about filesystems with a focus on the BFS from Be Inc. The author in it actually says that renaming a file is the most complicated operation on a file. Before the file is renamed, lots of validation must take place in some implementations a rename locks the entire filesystem. The source and destination must be verified to be reachable and unused. The rename could go into another directory, so its must do the proper checks there as well. There are edge cases if the source or destination is a directory.
Its still seems like an O(1) maybe with a big 1, but this author spent a considerable amount of time on renaming.
Re:It sucks I agree by Waffle+Iron · 2010-10-23 13:18 · Score: 4, Interesting

MythTV added a feature a while back to work around this issue. IIRC, they now keep a handle open to video files while they delete them. This causes the kernel to not actually do the delete, then over a span of about 10 minutes MythTV repeatedly shaves chunks off the end using truncate() until the file reaches 0 bytes.
Prior to this, the system could get really bogged down right after deleting shows. I was careful not to delete too many shows at once; I had actually seen the back end lock up after telling it to delete a bunch of shows.
Re:It sucks I agree by mehemiah · 2010-10-23 13:19 · Score: 2, Interesting

Not to be combative, I bet you're right, but which IO scheduler were you using? There are three, if you were using a desktop distro like an Ubuntu desktop variant they were using the sense 2007 it was using cfq, otherwise it was using the deadline scheduler on a server distro. If it was 2007, I dont know which you might have been using.
Re:It sucks I agree by dutchwhizzman · 2010-10-23 19:18 · Score: 2, Insightful

Now put ten million files on it and try again. This is about actually used filesystems, not your stash of 500 ISO DVD rips, your bookmarks and your resume.

--
I was promised a flying car. Where is my flying car?
Re:It sucks I agree by Ingo+Molnar · 2010-10-23 21:02 · Score: 4, Informative

Such drastic change! I have seen this happen on numerous systems and I just change the elevator to "deadline" and poof! The problem is gone. See this discussion for some details. The CFQ scheduler is great for a Linux server running a database, but it completely sucks for desktop or any server used to write large files to.
I see that the bug entry you referred to contains measurements from early 2010, at which point Ubuntu was using v2.6.31-ish kernels IIRC. (and that's the kernel version that is being referred to in the bug entry as well.)
A lot of work has gone into this area in the past 1-2 years, and v2.6.33 is the first kernel version where you should see the improvements. Slashdot reported on that controversy as well.
If you can still reproduce interactivity problems during large file copies with CFQ on v2.6.36 (and it goes away when you switch the IO scheduler to deadline), please report it to linux-kernel@vger.kernel.org so that it can be fixed ASAP. (You can also mail me directly, i'll forward it to the right list and people.)
Thanks,
Ingo
Re:It sucks I agree by Ingo+Molnar · 2010-10-23 22:50 · Score: 4, Interesting

There's also the VM fix from Wu Fengguang, included in v2.6.36, which addresses similar "slowdown while copying large amounts of data" bugs.
There were about a dozen kernel bugs causing similar symptoms, which we fixed over the course of several kernel releases. They were almost evenly spread out between filesystem code, the VM and the IO scheduler. And yes, i agree that it took too long to acknowledge and address them - these problems have been going on for several years. It's a serious kernel development process failure.
If anyone here still experiences bad desktop stalls while handling big files with v2.6.36 too then we'd appreciate a quick bug report sent to linux-kernel@vger.kernel.org.
Thanks,
Ingo
Re:It sucks I agree by shellbeach · 2010-10-24 00:48 · Score: 3, Informative

I don't know if it's all that related, but swap slowdown is an appalling issue as well. If a single program spikes in RAM usage, I often have to reboot the whole system as it hangs indefinitely. As I work with Octave a lot, often a script will gobble up a few hundred megs of memory and push the system into swap. Once that happens, it's often too late to do anything about it as programs simply will not respond.
I'm surprised you're seeing this with a process taking up a few hundred Mb -- that suggests to me that you have very little RAM in your system. But if it really is the kernel's fault for being more swappy than it should be, doing something like
echo 10 > /proc/sys/vm/swappiness
(as root) should fix your problems fine. (Swappiness can be a value between 0 and 100; 0 means never swap out, 100 means swap out all the time; 60 is generally the default value). See Ubuntu's swap FAQ for lots more info.
Of course, if you simply don't have enough memory in your system to support the RAM-intensive process and basic system functions, buying more memory might be the best solution ... :)
Re:It sucks I agree by swillden · 2010-10-24 02:21 · Score: 3, Informative

The malloc/new that fails causing a process to crash might not be the process that is consuming huge amounts of memory in the first place.
It might not, but it usually is. The probability that a given process will be the one that triggers the OOM killer is proportional to the amount of memory that process is allocating. If one process is allocating 99.9% of the memory, there's a 99.9% chance it will be the one that triggers the OOM killer.
But, actually, Linux doesn't just pick the process with the failed allocation to kill. Instead, when a process makes a memory request which cannot be fulfilled, the OS runs a quick calculation of the memory usage "badness" of all processes. The base of the badness score is the processes resident memory, plus the resident memory of child processes. Processes that have been "nice'd" get a score boost (on the theory they're likely to be less important), but long-running processes get a score decrease (on the theory they're likely to be more important). Superuser processes have their score decreased. Finally, processes have their scores decreased by a user-settable value in /proc//oom_adj (default is no adjustment). Also, if /proc//oom_adj is set to the constant OOM_DISABLE, then the process is not killable.
When memory runs out, Linux kills the process with the highest score. If a single ordinary user process, especially a short-lived desktop process, has consumed nearly all of the system RAM, and no one has messed with oom_adj for that process, then it WILL be the one that dies.
Here's a (probably excessively long and complex; I'm no shell guru) one-liner that will show the current OOM scores for all of your processes, sorted from lowest to highest:
find /proc -maxdepth 2 -name oom_score | while read i; do echo -n "$i "; cat $i; done | sort -n -k2

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:It sucks I agree by dogsbreath · 2010-10-24 05:19 · Score: 2, Insightful

Yeah, and I hope it continues but I don't know any of our critical vendors who develop for it. Forgive me for marginalizing the open community on this particular item but a lot of the steam behind Solaris (by reflection OpenSolaris) was the industrial use of the o/s. Not just web servers but all kinds of specialized applications tied to command and control of the internet. Companies like Siemens, Alcatel-Lucent, Cisco etc all have or had Solaris based platforms. Our company has not installed a Solaris application system in two years. Everything is Linux, Windows, AIX and any vendor who comes to us with Solaris might as well save their breath. As well, we have not bought any Sun h/w in two years. Our h/w criteria is basically: how cheaply and reliably can we run VMWare?
I fear that Solaris/OpenSolaris is becoming at best a niche operating system. Sun h/w and Solaris are the walking dead and I don't see Oracle being able to do anything about it.
Too bad IBM didn't buy Sun. Solaris really would have had a chance to grow if IBM wanted to push it.

what about servers? by StripedCow · 2010-10-23 06:41 · Score: 3, Insightful

Isn't this also relevant when using Linux on a server? I mean, if one process or thread is copying a large file, you don't want your server to come to a crawl.

It doesn't sound like just a "desktop" issue to me.

--
If Pandora's box is destined to be opened, *I* want to be the one to open it.

Re:what about servers? by Anonymous Coward · 2010-10-23 06:49 · Score: 2, Interesting

On IO intensive server: this is also a real issue. 20-30% of processors and cores stuck with a 99% iowait for hours, while the rest tries to cope. Total CPU load does not go above 20%. No solution yet after months of study and experimenting. Linux is indeed really bad at IO scheduling in general, it seems.
Notw think of that situation and a heavy database system. A no-no solution.
Re:what about servers? by Anonymous Coward · 2010-10-23 06:55 · Score: 5, Informative

There are some interactive-response fixes queued up for 2.6.37 that may help (a lot!) with this stuff.
Start reading here: http://www.phoronix.com/scan.php?page=news_item&px=ODU0OQ
Re:what about servers? by man_of_mr_e · 2010-10-23 07:06 · Score: 2, Interesting

How does this happen? Every year it seems I read about how this problem has been fixed in the latest kernel, and then it's like those fixes mysterious vanish?

--
If you need web hosting, you could do worse than here
Re:what about servers? by fishbowl · 2010-10-23 07:07 · Score: 2, Interesting

This problem is highly visible in VMs. When you have one VM doing write-heavy disk IO, the other VMs suffer.
I don't think it's a Linux problem as much as a general problem of the compromises that must be made by any scheduling algorithm.
What about you Linux mainframe guys? You have unbeatable IO subsystems. Do you see the same problems?

--
-fb Everything not expressly forbidden is now mandatory.
Re:what about servers? by Lord+Byron+II · 2010-10-23 07:23 · Score: 2, Interesting

It's been a big issue for me. Go to a directory with a couple of large files (say a dvd rip) and do a "cat * > newfile". Watch your system come to a crawl.
Re:what about servers? by joaosantos · 2010-10-23 07:34 · Score: 5, Informative

I just did it and didn't notice any slowdown.
Re:what about servers? by b4dc0d3r · 2010-10-23 08:16 · Score: 2, Informative

Vista's I/O priority is linked to the process priority. Requests for high-priority tasks are high priority i/o requests. Unfortunately this borks things like virus scan, which give themselves boosted priority thinking that the user wants a file on-access-scanned and ready to use. Background tasks run, open a file, get scanned on access, and suddenly you have a high-priority process reading the file. And then once it's scanned it's probably in the disk cache so the low priority process/thread reads it instantly. Now that everything is high priority, nothing is, and we're back where we started.
http://download.microsoft.com/download/a/f/7/af7777e5-7dcd-4800-8a0a-b18336565f5b/Priorityio.doc
SetPriorityClass() and SetThreadPriority() adds a new option that says "I'm in the background now" and "I'm no longer in the background", but few apps use this. Certainly no XP apps did, because it didn't exist, so it would have to be Vista-onwards apps. SetFileInformationByHandle() I think is new, allowing you to specifically set i/o priority for each file handle. Who is going to voluntarily set themselves low priority? Not many apps. There are some other calls to reserve bandwidth, and driver-level calls, but it not very much. Windows 7 does not make any significant changes to this model. And although you can set priority in the task manager, there is no way without a third party tool (I still consider sysinternals to be third party) to change priority. I think it uses SetFileInformationByHandle.
I first noticed this on Windows NT 4, probably on a machine without enough ram. I watched each control paint itself. Today, on a core 2 duo 2.5 ghz with 2GB of ram, Vista occasionally still paints individual controls at watchable speed. This is a work computer, so no torrents or large file copying, but enforced virus scan. I have two VBScripts to control this - one sets certain apps to low priority (setting their i/o priority accordingly). The other disables several services including virus scan. When I need to debug a .NET website, virus scan gets turned off. It's still not snappy enough, but it's a vast improvement. Still unacceptable.
Re:what about servers? by ksandom · 2010-10-23 08:21 · Score: 4, Interesting

Sorry dude, it looks like it's a hardware specific problem. I did that on nearly 700G of large files and then fired up the flight sim while it was still going. The only slow down was on file related activity, which is totally what you'd expect. I had it running full screen across two monitors without any drop in frame rate. AND I'm using economy hardware.

--
Funnyhacks - Wierd, unusual, and fun hacks
Re:what about servers? by Ingo+Molnar · 2010-10-23 08:33 · Score: 5, Informative

I think the Phoronix article you linked to is confusing the IO scheduler and the VM (both of which can cause many seconds of unwanted delays during GUI operations) with the CPU scheduler.
The CPU scheduler patch referenced in the Phoronix article deals with delays experienced during high CPU loads - a dozen or more tasks running at once and all burning CPU time actively. Delays of up to 45 milliseconds were reported and they were fixed to be as low as 29 milliseconds.
Also, that scheduler fix is not a v2.6.37 item: i have merged a slightly different version and sent it to Linus, so it's included in v2.6.36 already: you can see the commit here.
If you are seeing human-perceptible delays - especially in the 'several seconds' time scale, then they are quite likely not related to the CPU scheduler (unless you are running some extreme workload) but more likely to the CFQ IO scheduler or to the VM cache management policies.
In the CPU scheduler we usually deal with milliseconds-level delays and unfairnesses - which rarely raise up to the level of human perception.
Sometimes, if you are really sensitive to smooth scheduling, can see those kinds of effects visually via 'game smoothness' or perhaps 'Firefox scrolling smoothness' - but anything on the 'several seconds' timescale on a typical Linux desktop has to have some connection with IO.
Thanks,
Ingo
Re:what about servers? by Ingo+Molnar · 2010-10-23 20:38 · Score: 4, Informative

Sorry dude, it looks like it's a hardware specific problem. I did that on nearly 700G of large files and then fired up the flight sim while it was still going. The only slow down was on file related activity, which is totally what you'd expect. I had it running full screen across two monitors without any drop in frame rate. AND I'm using economy hardware.
It may also be kernel version dependent - with older kernels still showing this bug.
A lot of work has gone into the Linux kernel in the past 2 years to improve this area - and yes, i think much of the criticism from those who have met this bug and were annoyed by it was fundamentally justified - this bug was real and it should have been fixed sooner.
Kernels post v2.6.33 ought to be much better - with v2.6.36 bringing another set of improvements in this area. The fixes were all over the place: IO scheduler, VM and filesystem code and few of them were simple.
This Slashdot article from 1.5 years ago shows when more attention was raised to this category of Linux interactivity bugs.
Thanks,
Ingo
Re:what about servers? by Ingo+Molnar · 2010-10-23 20:52 · Score: 3, Informative

Ingo, I find delays of 29-45ms to be pretty noticeable. To put it another way, if you had a delay of 10ms before, and you're now getting a delay of 50ms due to some background copy, all of your applications went from running at 100fps to 20fps, which I think even non-sensitive people can pick up on, even outside of games and smooth scrolling. VIM feels different over a 10ms LAN connection vs. a 45ms connection from my home.
Yes i agree with you that if a 45 msecs latency happens on every frame then that will snowball and will thoroughly ruin game interactivity - but note the specific context here:
you can see the commit referenced by Phoronix here
(hm, my first link above was broken, sorry about that.)
Those 45 msec delays were statistical-max outliners - with the average latency at 7.3 msecs. This got cut down to 25 msecs / 6.6 msecs respectively via the patch. Note that it's also a specific, CPU overloaded workload that was measured here, so not typical of the desktop unless you are a developer running make -j build jobs.
We care about optimizing maximum latencies because those are what can cause occasional hickups on the desktop - a lagging mouse pointer - or some other non-smooth visual artifact.
Thanks,
Ingo
Re:what about servers? by Ingo+Molnar · 2010-10-24 01:17 · Score: 2, Informative

Yeah, that's what the discussion was about - we improved that particular case, see this commit (which can be found in v2.6.36), and Phoronix reported about that upstream fix.
Thanks,
Ingo

have you tried ionice? by larry+bagina · 2010-10-23 06:43 · Score: 5, Informative

have you tried ionice?

--
Do you even lift?

These aren't the 'roids you're looking for.

Re:have you tried ionice? by atrimtab · 2010-10-23 07:01 · Score: 5, Informative

ionice works great in a terminal window, but isn't integrated into any of the Desktop GUIs.
I suppose you could prefix the various file transfer commands used by the GUI with an added "ionice -c 3", but I haven't bothered to look.
Using ionice to lower the i/o priority of various portions of MythTV like mythcommflag, mythtranscode, etc. can make it quite snappy.

--
Facebook is billions of individual "Skinner Boxes." And if you use it you are the pigeon!
Re:have you tried ionice? by daveime · 2010-10-23 08:58 · Score: 3, Funny

Unfortunately, due to the global recession, the only person who can currently afford to buy one is the Sultan of Brunei.
Re:have you tried ionice? by JohnFluxx · 2010-10-23 09:35 · Score: 4, Informative

Poor me! I added ionice integration into KDE since pretty much the dawn of time.
In KDE, just press ctrl+esc to bring up my System Activity, right click on a process, then chose renice. You get a really pretty (imho heh) dialog letting you change the CPU or hard disk priority, scheduler, and so on.

Perhaps you should.. by Anonymous Coward · 2010-10-23 06:44 · Score: 3, Insightful

..download and compile the 2.6.36 kernel. A feature of the changes can be found at http://www.h-online.com/open/features/What-s-new-in-Linux-2-6-36-1103009.html

A very very easy to follow guide can be found at http://kernel.net/articles/how-to-compile-linux-kernel.html

Sidenote - What is up with not being able to paste links? That's annoying.

Re:Perhaps you should.. by Qzukk · 2010-10-23 06:58 · Score: 3, Interesting

Theres a bug in chrome that causes it to usually be unable to paste into slashdot's comment box once you've placed an < character in the box. (Slashdot, specfically. It does fine on all sorts of other sites with even fancier ajaxy textareas like the stackoverflow sites)

--
If I have been able to see further than others, it is because I bought a pair of binoculars.

BFS Isn't Unsupported by ehntoo · 2010-10-23 06:44 · Score: 5, Informative

Con Kolivas is still actively working on BFS, it's not unsupported. He's even got a patch for 2.6.36, which was only released on the 20th. http://ck.kolivas.org/patches/bfs/ He's also got a patchset out that I use on all my desktops which includes a bunch of tweaks for desktop use. http://www.kernel.org/pub/linux/kernel/people/ck/patches/2.6/

Re:BFS Isn't Unsupported by emergentessence · 2010-10-23 08:01 · Score: 2, Interesting

I had been wondering about this myself, for some reason I was under the impression that the BFS was no longer being maintained.
It turns out there is an up-to-date package for Ubuntu (I'm running 10.10) as well: http://launchpad.net/~chogydan/+archive/ppa
I thought I'd try it out as the installation was much more straightforward than I'd expected.
'uname -r' now reveals "2.6.35-22ck-generic" and, while this is just my subjective assessment, a few of the quirks I had noticed before on my own system where things would get sluggish when switching between apps / opening closing apps while running things that read/write to the disk, seem to have been ironed out.
I would love to test this in a more empirical manner, as I can now boot into either kernel to do comparisons, but I don't know of any software that would allow me to benchmark performance in a way that is sensitive to the optimizations the BFS allegedly implements.

Linux I/O scheduling by Animats · 2010-10-23 06:49 · Score: 5, Insightful

If the CPU utilization is that low, it's an I/O scheduling problem. See Linux I/O scheduling.

The CFQ scheduler is supposed to be a fair queuing system across processes, so you shouldn't have a starvation problem. Are you thrashing the virtual memory system? How much I/O is going into swapping. (Really, today you shouldn't have any swapping; RAM is too cheap and disk is too slow.)

Re:Linux I/O scheduling by julesh · 2010-10-23 10:07 · Score: 2, Insightful

Your disks might be too slow, but OUM-based MLC flash drives are so fast most current SSDs would look like 80's tech.
MLC flash has a life cycle of only around 10,000 writes, though, which for swap is way too small to be useful. Your lifespan on that SSD is likely to be only a couple of years, or even less for demanding applications, and the pricing on them is still high enough that DRAM isn't actually that much more expensive (I see about $4/GB for SSDs compared to about $16/GB for DRAM), at which point more RAM is probably the better way to go.
Re:Linux I/O scheduling by Khyber · 2010-10-23 10:32 · Score: 2, Informative

OUM MLC is FAR different than anything in typical use.
Try 10^8 read/write.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.

Is it really only a matter of scheduling? by mrjb · 2010-10-23 06:56 · Score: 4, Interesting

I've wondered on occasion if this problem is really only due to scheduling. After all, most of us still write our file access code more or less as follows: x=fopen('somefilename'); while ( !eof(x)) { print readln(x,1024); /* ---- */ } fclose(x); Point being, there's nothing that tells the marked line that the process should gracefully go to sleep while the drive is doing its thing, and there's no callback vector defined either- nothing that indicates we're dealing with non-blocking I/O. I'd like to think that our compilers have silently been improved to hide those implementation details from us, but I have no proof that this is the case. Unless the system functions use some dirty stack manipulation voodoo to extract the return address of the function and use that as callback vector?

--
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book

Re:Is it really only a matter of scheduling? by Anonymous Coward · 2010-10-23 07:06 · Score: 5, Informative

The kernel will preempt the process calling "readln", in other words putting it to sleep.
The kernel will make sure the I/O happens, allowing other processes to work at the same time.
You only need non-blocking code if your own process needs to other things at the same time.
Re:Is it really only a matter of scheduling? by Anonymous Coward · 2010-10-23 07:07 · Score: 4, Informative

The process will go to sleep inside the read() system call (inside readln() somewhere presumably). Other processes will be able to run in the meantime. It works by interrupting into kernel code, and the kernel changes the stack pointer (and program counter, and lots of other registers) to that of another process. When the data comes back from the disk, the kernel will consult its tables and see that your process is runnable again, and when the scheduler decides it's its turn, in a timer interrupt, the stack pointer will be switched back to your stack. (So yes, dirty stack manipulation voodoo.) Every modern OS works this way.
Re:Is it really only a matter of scheduling? by Ingo+Molnar · 2010-10-23 07:55 · Score: 5, Informative

Yes. Here there is another problem at play: cp reads in the whole (big) file and then writes it out. This brings the whole file into the Linux pagecache (file cache).
That, if the VM is not fully detecting that linear copy correctly, can blow a lot of useful app data (all cached) out of the pagecache. That in turn has to be read back once you click within Firefox, etc. - which generates IO and is a few orders of magnitude slower than reading the cached copy. That such data tends to be fragmented (all around on the disk in various small files) and that there is a large copy going on does not help either.
Catastrophic slowdowns on the desktop are typically such combined 'perfect storms' between multiple kernel subsystems. (for that reason they also tend to be the hardest ones to fix.)
It would be useful if /bin/cp explicitly dropped use-once data that it reads into the pagecache - there are syscalls for that.
And yes, we'd very much like to fix such slowdowns via heuristics as well (detecting large sequential IO and not letting it poison the existing cache), so good bugreports and reproducing testcases sent to linux-kernel@vger.kernel.org and people willing to try out experimental kernel patches would definitely be welcome.
Thanks,
Ingo
Re:Is it really only a matter of scheduling? by daveime · 2010-10-23 09:20 · Score: 2, Interesting

Wow, that's a new one ?
Perform a non-necessary fget on a file already known to be zero bytes, just so we can get a result "this fget failed because the file is zero bytes".
while (!eof()) {
readsomething();
}
is something I learnt perhaps 20 years ago, and it's never failed me yet. Why must people always try reinventing the wheel, just to end up with an octagon ?
Re:Is it really only a matter of scheduling? by Ingo+Molnar · 2010-10-23 09:48 · Score: 3, Informative

While certainly the whole file may end up cached, the source for cp does a simple read/write with a small buffer -- not read in the whole file and then write it out.
Many apps or DB engines will have a similar pattern: they read/write in a relatively small buffer, but then expect the exact opposite of what you'd expect /bin/cp to do: they expect the file to stay cached (because they will read it again in the future).
So the kernel cannot know why the files are being read and written: will it be needed in the future (Firefox sqlite DB) or not (cp of a big file).
(Unfortunately, the planned mind reading extension to the kernel is still a few years out.)
Even in the specific case of /bin/cp often the files might be needed shortly after they have been copied. If you have 4 GB of RAM and you are copying a 750 MB ISO, you'd expect that ISO to stay fully cached so that the CD-writer tool can access it faster (and without wasting laptop power), right?
So in 99% of the cases it is the best kernel policy to keep around cached data as much as possible.
What makes caching wrong in the "copy huge ISO around" case is that both files are too large to fit into the cache and that cp reads and writes to the totality of both files. Since /bin/cp does not declare this in advance the kernel has no way of knowing this for sure as the operation progresses - and by the time we hit limits it's too late.
It would all be easier for the kernel if cp and dd used fadvise/madvise to declare the read-once/write-once nature of big files. It would all just work out of box. The question is, how can cp figure out whether it's truly use-once ...
The other thing that can go wrong is that arguably other apps should not be affected by this negatively - and this was the point of the article as well. I.e. cp may fill up the pagecache, but those new pages should not throw out well-used pages on the LRU, plus other write activties by other apps should not be slowed down just because there's a giant copy going on.
Those kinds of big file operations certainly work fine on my desktop boxes - so if you see such symptoms you should report it to linux-kernel@vger.kernel.org, where you will be pointed to the right tools to figure out where the bug is. (latencytop and powertop are both a good start.)
Note that i definitely could see similar problems two years ago, with older kernels - and a lot of work went into improving the kernel in this area. v2.6.35 or v2.6.36 based systems with ext3 or some other modern filesystem should work pretty well. (The interactivity break-through was somewhere around v2.6.32 - although a lot of incremental work went upstream after that, so you should try as new of a kernel as you can.)
Also, i certainly think that the Linux kernel was not desktop-centric enough for quite some time. We didn't ever ignore the desktop (it was always the primary focus for a simple reason: almost every kernel developer uses Linux as their desktop) - but the kernel community certainly under-estimated the desktop and somehow thought that the real technological challenge was on the server side. IMHO the exact opposite is true.
Fortunately, things have changed in the past few years, mostly because there's a lot of desktop Linux users now, either via some Linux distro or via Android or some of the other mobile platforms, and their voice is being heard.
Thanks,
Ingo
Re:Is it really only a matter of scheduling? by Ingo+Molnar · 2010-10-23 21:24 · Score: 2, Interesting

So I know some people may read this and think "haha, funny joke" but given that most users are extremely predictable regarding what programs they use and when and how they use them (same with web browsing), shouldnt it be possible to gather user activity over time and analyze it to help improve scheduling.

Yeah, that's certainly a possibility.
This is also the goal of most heuristics in the kernel: to figure out a hidden piece of information that the application (and user) has not passed to the kernel explicitly.
The problem comes when the kernel gets it wrong - the kernel and applications can easily get into a feedback loop / arms race of who knows how to trick the other one into doing what the app writer (or kernel writer) thinks is best. In such cases we get the worst of both worlds: we get the bad case and we get the cost of heuristics.
(Heuristic and predictive systems also tend to be complex and hard to analyze: you can rarely reproduce bugs without having the exact same filesystem layout and usage pattern as the user experienced, etc.)
What we found is that in terms of default behavior it's a bit better to keep things simple and predictable/deterministic and then give apps the way to inject extra information into the kernel. We have the fadvise/madvise calls which can be used with the POSIX_FADV_DONTNEED flag to drop cached content from the page cache.
Heuristics and predictive techniques are done when we can be reasonably sure that we get the decisions right: for example there's a piece of fairly advanced code in the Linux page cache trying to figure out whether to pre-fetch data or not.
The large file copy interactivity problems some have mentioned here were most likely real kernel bugs (in the filesystem, IO scheduling and VM subsystems) and were hopefully fixed in the v2.6.33 - v2.6.36 timeframe.
If you can still reproduce any such problems then please report them to linux-kernel@vger.kernel.org so we can fix it ASAP.
In any case, we could all be wrong about it, so if you have a good implementation of more aggressive predictive algorithms i'm sure a lot of people would try them out - me included. We kernel developers want a better desktop just as much as you want it.

Re:Perhaps you should... by biryokumaru · 2010-10-23 06:57 · Score: 2, Funny

Sidenote - What is up with this comment not showing up when I wasn't registered. That's stupid and annoying.

It did. Now who's stupid and annoying? I mean, besides me.

--
When you're afraid to download music illegally in your own home, then the terrorists have won!

It has always been like that by guacamole · 2010-10-23 07:07 · Score: 2, Informative

I can remember that even as far back as 1999 I saw this issue with Linux. This is not bad only for the desktop, but also for the server. I have also experience with Solaris workstations and servers, and it usually doesn't behave this way.

OS/2 by picross · 2010-10-23 07:08 · Score: 2, Interesting

I remember using OS/2 (IBM's desktop OS) and i was always amazed that you could format a floppy and do other tasks like nothing else was going on. I never did understand why that never seemed to make it into the mainstream.

Re:Is Desktop Linux [still] relevant? by bieber · 2010-10-23 07:09 · Score: 4, Informative

That was a joke, right? You don't really think that all the millions of desktop Linux users just up and vanished because some idiot at PCWorld wanted a catchy headline?

Wrong Question by donscarletti · 2010-10-23 07:12 · Score: 2, Interesting

This is not a case of Linux IO schedulers being unsuitable for the desktop, but more a case of desktop applications being written in a horrendous way in terms of data access. The general pattern being to open up a file object, load in a few hundred kilobytes, processing this then asking the operating system for more. This is a small inefficiency when the resource is doing nothing, but if the disk is actually busy, then it will probably be doing something else by the time you ask for it to read a little bit more. Not to mention the habit of reading through a few hundred resource files one at a time in seemingly random order, and blocking every time it reads, because the application programmer is too lazy to think about what resources the app is using.

Linux has such a nice implementation of mmap, which works by letting Linux actually know ahead of time what files you are interested in and managing them itself, without the application programmer worrying his pretty little head over it. Other options are running multiple non-blocking reads at the same time and loading the right amount of data and the right files to begin with.

The best thing about a simple CSCAN algorithm is that it gives applications what they asked for and if the application doesn't know what it wants, well, that's hardly a system issue.

--
When Argumentum ad Hominem falls short, try Argumentum ad Matrem

Probably not the IO scheduler by crlf · 2010-10-23 07:13 · Score: 5, Informative

This is almost certainly not the IO scheduler's problem. IO scheduling priorities are orthogonal to CPU scheduling priorities.

What you are likely running into is the dirty_ratio limits. In Linux, there is a memory threshold for "dirty memory" (memory that is destined to be written out to disk), that once crossed, will cause symptoms like you've described. The dirty_ratio values can be tuned via /proc, but beware that the kernel will internally add its own heuristics to the values you've plugged in.

When the threshold is crossed, in an attempt to "slow down the dirtiers", the Linux kernel will penalized (in rate-limited fashion) any and every task on the system that tries to allocate a page. This allocation may be in response to userland needing a new page, but it can also occur if the kernel is allocating memory for internal data structures in response to a system call the process did. When this happens, the kernel will force that allocating thread (again, rate-limited) to take part in the flushing process, under the (misguided) assumption that whoever is allocating a lot of memory is the same thread that is dirtying a lot of memory.

There are a couple ways to work around this problem (which is very typical when copying large amounts of data). For one, the copying process can be fixed to rate limit itself, and to synchronously flush data at some reasonable interval. Another way that a system administrator can manage this sort of task (if automated of course) is to use Linux's support for memory controllers which essentially isolates the memory subsystem performance between tasks. Unfortunately, it's support is still incomplete and I don't know of any popular distributions that automate this cgroup subsystem's use.

Either way, it is very unlikely to be the IO scheduler.

Re:Probably not the IO scheduler by 0123456 · 2010-10-23 07:27 · Score: 2, Insightful

Then there are programs like Firefox, which continually write to sqlite databases, which causes multiple fsync() calls, which will flush the disk cache each time if you're running on an ext3 filesystem. All because NTFS used to eat your bookmarks file if Windows crashed.

Re:Not had the slightest problems with this by frisket · 2010-10-23 07:13 · Score: 2, Informative

I'm using Ubuntu 10.4 on an old Dell and big copies don't seem to slow it down any more than I'd expect on an old machine, either when copying to an external USB backup (with rsync) or over the net to my office systems (via scp). Serious slowdown would seem to indicate something deeper is wrong.

IO scheduler != CPU scheduler by Ingo+Molnar · 2010-10-23 07:32 · Score: 5, Insightful

FYI, the IO scheduler and the CPU scheduler are two completely different beasts.

The IO scheduler lives in block/cfq-iosched.c and is maintained by Jens Axboe, while the CPU scheduler lives in kernel/sched*.c and is maintained by Peter Zijlstra and myself.

The CPU scheduler decides the order of how application code is executed on CPUs (and because a CPU can run only one app at a time the scheduler switches between apps back and forth quickly, giving the grand illusion of all apps running at once) - while the IO scheduler decides how IO requests (issued by apps) reading from (or writing to) disks are ordered.

The two schedulers are very different in nature, but both can indeed cause similar looking bad symptoms on the desktop though - which is one of the reasons why people keep mixing them up.

If you see problems while copying big files then there's a fair chance that it's an IO scheduler problem (ionice might help you there, or block cgroups).

I'd like to note for the sake of completeness that the two kinds of symptoms are not always totally separate: sometimes problems during IO workloads were caused by the CPU scheduler. It's relatively rare though.

Analysing (and fixing ;-) such problems is generally a difficult task. You should mail your bug description to linux-kernel@vger.kernel.org and you will probably be asked there to perform a trace so that we can see where the delays are coming from.

On a related note i think one could make a fairly strong argument that there should be more coupling between the IO scheduler and the CPU scheduler, to help common desktop usecases.

Incidentally there is a fairly recent feature submission by Mike Galbraith that extends the (CPU) scheduler with a new feature which adds the ability to group tasks more intelligently: see Mike's auto-group scheduler patch

This feature uses cgroups for block IO requests as well.

You might want to give it a try, it might improve your large-copy workload latencies significantly. Please mail bug (or success) reports to Mike, Peter or me.

You need to apply the above patch on top of Linus's very latest tree, or on top of the scheduler development tree (which includes Linus's latest), which can be found in the -tip tree

(Continuing this discussion over email is probably more efficient.)

Thanks,

Ingo

Re:IO scheduler != CPU scheduler by Ingo+Molnar · 2010-10-23 09:09 · Score: 3, Informative

(1) As soon as RAM is exhausted and the kernel starts swapping out to disk, the desktop experience is severely impacted (and immediately so). [...]
Right. If a desktop starts swapping seriously then it's usually game over, interactivity wise. Typical desktop apps produce so much new dirty data that it's not funny if even a small portion of it has to hit disk (and has to be read back from disk) periodically.
But please note that truly heavy swapping is actually a pretty rare event. The typical event for desktop slowdowns isn't deadly swap-thrashing per se, but two types of scenarios:
1) dirty threshold throttling: when an app fills up enough RAM with dirty data (which has to be written to disk sooner or later), then the kernel first starts a 'gentle' (background, async) writeback, and then, when a second limit is exceeded starts a less gentle (throttling, synchronous) writeback. The defaults are 10% and 20% of RAM - and you can set them via. To see whether you are affected by this phenomenon you can try much more agressive values like:
echo 1 > /proc/sys/vm/dirty_background_ratio
echo 90 > /proc/sys/vm/dirty_ratio
These set async writeback to kick in ASAP (the disk can write back in the background just fine), but sets the 'aggressive throttling' limit up really high. This tuning might make your desktop magically faster. It may also cause really long delays if you do hit the 90% limit via some excessively dirtying app (but that's rare).
2) fsync delays. A handful of key apps such as Firefox use periodic fsync() syscalls to ensure that data has been saved to disk - and rightfully so. Linux fsync() performance used to be pretty dismal (the fync had to wait for a really long time on random writers to the disk, delaying Firefox all the time) and went through a number of improvements. If you have v2.6.36 and ext3 then it should be all pretty good.
I think a fair chunk of the "/bin/cp /from/large.iso /to/large.iso" problem could be eliminated if cp (and dd) helped the kernel and dropped the page-cache on large copies via fadvise/madvise. Linux really defaults to the most optimistic assumption: that apps are good citizens and will dirty only as much RAM as they need. Thus the kernel will generally allow apps to dirty a fair amount of RAM, before it starts throttling them.
VM and caching heuristics are tricky here - a app or DB startup sequence can produce very similar patterns of file access and IO when it warms up its cache. In that case it would be absolutely lethal to performance to drop pagecache contents and to sync them out agressively.
If the cp app did something as simple as explicitly dropping the page-cache via the fadvise/madvise system calls then a lot of user side grief could be avoided i suspect. DVD and CD burning apps are already rather careful about their pagecache footprint.
But, if you have a good testcase you should contact the VM and IO developers on linux-kernel@vger.kernel.org - we all want Linux desktops to perform well. (server workloads are much easier to handle in general and are secondary in that aspect.) We have various good tools that allow more than enough data to be captured to figure out where delays come from (blktrace, ftrace, perf, etc.) - we need more reporters and more testers.
Thanks,
Ingo
Re:IO scheduler != CPU scheduler by Ingo+Molnar · 2010-10-23 09:24 · Score: 4, Informative

I know some of the patches have made it back into the mainline kernel, any idea when they all will be merged?
The -tip tree contains development patches for the next kernel version for a number of kernel subsystems (scheduler, irqs, x86, tracing, perf, timers, etc.) - and i'm glad that you like it :-)
We typically send all patches from -tip into upstream in the merge window - except for a few select fixlets and utility patches that help our automated testing. We merge back Linus's tree on a daily basi and stabilize it on our x86 test-bed - so if you want some truly bleeding edge kernel but want proof that someone has at least built and booted it on a few boxes without crashing then you can certainly try -tip ;-)
Otherwise we try to avoid -tip specials. I.e. there are no significant out-of-tree patches that stay in -tip forever - there are only in-progress patches which we try to push to Linus ASAP. If we cannot get something upstream we drop it. This happens every now and then - not every new idea is a good idea. If we cannot convince upstream to pick up a particular change then we drop it or rework it - but we do not perpetuate out-of-tree patches.
So the number of extra commits/changes in -tip fluctuates, it typically ranges from up to a thousand down to a few dozen - depending on where we are in the development cycle.
Right now we are in the first few days of the v2.6.37 merge window and Linus pulled most of our pending trees already in the past two days, so -tip contains small fixes only. While v2.6.37 is being releasified in the next ~2.5 months, -tip will fill up again with development commits geared towards v2.6.38 - and we will also keep merging back Linus's latest tree - and so the cycle continues.
Thanks,
Ingo
Re:IO scheduler != CPU scheduler by Ingo+Molnar · 2010-10-23 10:20 · Score: 2, Informative

Ingo,
I believe most desktop users run into this problem when they complain about IO schedulers. Is there any immediate plan to address it?
Thanks,
Jason
Regarding plans you need to ask the VM and IO folks (Andrew Morton, Jens Axboe, Linus, et al).
Regarding that bugzilla entry, there's this suggestion in one of the comments:
echo 10 > /proc/sys/vm/vfs_cache_pressure
echo 4096 > /sys/block/sda/queue/nr_requests
echo 4096 > /sys/block/sda/queue/read_ahead_kb
echo 100 > /proc/sys/vm/swappiness
echo 0 > /proc/sys/vm/dirty_ratio
echo 0 > /proc/sys/vm/dirty_background_ratio
or use "sync" fs-mount option.
If you can reproduce that problem with a new kernel (v2.6.36 would be ideal) then please try to describe the symptoms in a mail to linux-kernel@vger.kernel.org, and also point out whether the tunings above improved things. Please Cc: Jens, Andrew, me and Linus as well.
To turn interactivity woes on the desktop into actual hard numbers you can use Arjan van de Ven's latencytop tool. It will measure your worst-case delays with and without big copies being done in the background, which numbers you can cite in your email.
Thanks,
Ingo

Re:Perhaps if Con Kolivas named his scheduler .. by m50d · 2010-10-23 07:38 · Score: 3, Informative

He tried that before. I think he's given up on getting his scheduler (though perhaps not a suspiciously similar one written by Inigo) in the kernel after what happened with CFQ.

--
I am trolling

Re:if i have many gigs of data to copy over somewh by eldepeche · 2010-10-23 07:44 · Score: 4, Insightful

I would definitely ditch an OS that fucked up a file copy because I used the computer for something else while I was waiting.

Re:easy solution: by JonJ · 2010-10-23 08:06 · Score: 2, Insightful

That's great that you post your experiences with server scheduling in a topic about desktop scheduling. It's so relevant. No wait, it's not.

--
-- Linux user #369862

Re:Is Desktop Linux [still] relevant? by westlake · 2010-10-23 08:49 · Score: 2, Informative

That was a joke, right? You don't really think that all the millions of desktop Linux users just up and vanished because some idiot at PCWorld wanted a catchy headline?

StatCounter provides a global breakdown of OS market share by region and country.

It is something of a wake-up call when you look at these numbers and compare them to the endless stream of Linux success stories posted to Slashdot.

Re:easy solution: by jazzmans · 2010-10-23 08:53 · Score: 2, Insightful

You are joking right?(OP) I'm using debian, and I routinely copy TB(s) of data from hard drive to hard drive via SATA and/or USB 2.0, and though the usb tramsfer speed is fairly slow, my system doesn't slow appreciably at all.

Weird, I've been using Linux for 10 years now, and one thing linux does really well is move large amounts of data around without killing the system (useability).

Lotsa ram is your freind, also make sure your / filesystem isn't the hdd that you're moving the data from/to or vice-versa. That does slow access down a bit.

jaz

--
Life is what happens to you while you are busy making other plans. No-one sees motorcycles

Re:easy solution: by Captain+Segfault · 2010-10-23 09:46 · Score: 2, Insightful

Cite? What exactly is the difference between "public" and "kernel"?

If all processes see the same 1G the distinction isn't meaningful, especially in this context.

It isn't only IO scheduling by grandpa-geek · 2010-10-23 09:47 · Score: 3, Interesting

I've encountered situations where I'm trying to do something online and a task starts up due to a cron job that builds some kind of index. The index building should be in the background but somehow takes priority over what I'm doing on the desktop. Those kinds of cron jobs should be default scheduled in the background, not take priority over what is happening on the desktop.

Re:if i have many gigs of data to copy over somewh by moonbender · 2010-10-23 09:50 · Score: 2, Insightful

You're insane. If your computer ever silently drops "a few bits" while copying stuff, there's something seriously wrong with your OS or your hardware, and things will break whether or not you're using the computer while copying. You might as well sacrifice a chicken to make sure the data transfer works, it'll have about the same effect.

--
Switch back to Slashdot's D1 system.

Re:easy solution: by Ingo+Molnar · 2010-10-23 10:10 · Score: 3, Interesting

That's great that you post your experiences with server scheduling in a topic about desktop scheduling. It's so relevant. No wait, it's not.

The boundary between the desktop space and the server space is rather fluid, and many of the problems visible on servers are also visible on desktops - and vice versa.

For example 'copying a large amount of data' on a server is similar to 'copying a big ISO on the desktop'. If the kernel sucks doing one then it will likely suck when doing the other as well.

So both cases should be handled by the kernel in an excellent fashion - with an optimization/tuning focus on desktop workloads, because they are almost always the more diverse ones, and hence are generally the technically more challenging cases as well.

Thanks,

Ingo

Re:easy solution: by DarkOx · 2010-10-23 10:41 · Score: 4, Insightful

Generally Windows runs badly without a swap. Don't listen to people who tell you to disable it. You should have a swap file on Windows no matter how much memory you have.

Tweakers who don't really understand anything about Windows paging often conclude turning off the swap is a good idea, because they only run trivial applications and don't experience certain memory backed I/O operation failing with it off. They do see an initial speed boost though. The reason is NT is very pessimistic about memory. Windows assumes you will need to page out to disk. It therefore flush the set of static pages to disk almost right away. This is why there is so much more disk thrashing on Windows than say Linux when you start an application and plenty of memory is free. It will do its best to keep the working set out of the page file of course. This does give Windows a performance advantage under memory pressure however. When there is not enough memory to start a new application Windows can just drop the pages from memory of the application being paged out without the need to flush them to disk because they are already there; Linux will need to write those pages.

Given that Windows boxes (desktops anyway) tend to have large numbers proccess running in the background so they usually are under that memory pressure.

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html

Re:if i have many gigs of data to copy over somewh by grepya · 2010-10-23 10:45 · Score: 4, Funny

I would definitely not let a monkey like you get near my computers if some intense file copy was going on and they wanted to start doing other things while that was going on, sure you can do it but that does not make it a prudent thing to do, and the file may copy over just fine, and it may lose a few bits without even reporting any errors and that can happen on any OS, BSD, Linux, Winders & etc...etc...etc...

You sir, are a perfect specimen of a BOFH. You only have a dim notion of what actually goes on inside those mysterious boxes that are unfortunately left under your care. And yet, by some curious accident of nature, you've been entrusted with root passwords for said boxes. You use phrases like "intense file copy" like they mean anything. You place every idiotic restriction that you can think of on the users of said boxes (who, incidentally, are almost always smarter and more qualified than you in whatever field of work they're in) by using words like "prudent" and "safety"... or god forbid... "security". You actually think that because I run a second program along with your "intense" copy, it can result in loss of "a few bits without even reporting any errors" due to what ? The magical fairies that dance inside those little chips getting angry ? Tired ? Can you do everybody a favor and reduce the amount of utter nonsense emanating out of that tiny, befuddled brain ?

Re:Perhaps if Con Kolivas named his scheduler .. by Ingo+Molnar · 2010-10-23 10:52 · Score: 4, Informative

He tried that before. I think he's given up on getting his scheduler (though perhaps not a suspiciously similar one written by Inigo) in the kernel after what happened with CFQ.

One reason for why the principle of CFS may seem to you so suspiciously similar to Con's SD scheduler is that i used Con's fair scheduling principle when writing the initial version of CFS. This is credited at the very top of today's kernel/sched.c [the scheduler code]:
* 2007-04-15 Work begun on replacing all interactivity tuning with a * fair scheduling design by Con Kolivas.

It was added in this commit.

The scheduler implementations (and even the user visible behavior) of the schedulers was and is very different - and there is where much of the disagreement and later flaming came from.

Note that this particular Slashdot article is about IO scheduling though - which is unrelated to CPU schedulers. Neither Con nor i wrote IO schedulers.

There are two main IO schedulers in Linux right now: CFQ and AS, written by Jens Axboe, Nick Piggin, et al.

What adds fuel to the confusion is that it is relatively easy to mix up 'CFQ' with 'CFS'.

Thanks,

Ingo

AS I/O scheduler was removed in 2.6.33 by Sits · 2010-10-23 11:22 · Score: 3, Informative

AFAIK there are only two I/O schedulers remaining in recent Linux (and if you squint you might say that RHEL 5's kernel could have been related to 2.6.34 at one point right? :) - CFQ and deadline (three if you count noop I guess). The anticipatory scheduler was removed in 2.6.33...

Re:AS I/O scheduler was removed in 2.6.33 by Ingo+Molnar · 2010-10-23 12:41 · Score: 2, Informative

You are right, deadline is the other (much smaller/simpler) one - CFQ is the main IO scheduler remaining.
You can still test AS by going back to an older kernel - and as long as it's a performance regression that is reported (relative to that old kernel, running AS), it should not be ignored on lkml.
Thanks,
Ingo

Re:if i have many gigs of data to copy over somewh by BeanThere · 2010-10-23 12:51 · Score: 2, Funny

I wanted to write a lengthy rebuttal here explaining how computers work, but my computer is busy with a seriously *intense* copy right now so I don't want to chance it.

Re:easy solution: by sootman · 2010-10-23 13:08 · Score: 2, Informative

Did you even read the summary? He specifically points out where desktop I/O has different requirements from server I/O: "When I'm copying a large amount of data in the background, everything else slows down to a crawl while the CPU utilization stays at 1-2%." So I think he's talking about things like video playback, web browsing, and general UI responsiveness--things that 100% do not matter on a server.

I've noticed this myself--start a complex task and all of a sudden the UI becomes really jerky. If I'm trying to multitask and some mundane task is making the whole UI slow, that's bad. I it takes me 10 seconds to do something with an unresponsive UI instead of 5 just so a bunch of files can copy in 1:00:00 instead of 1:00:01, that's bad.

--
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.

Re:easy solution: by Ingo+Molnar · 2010-10-23 13:34 · Score: 2, Insightful

No, massive unfairness is just as bad on the server as it is on the desktop - in all but a few select batch processing situations.

Replace 'desktop' with 'database', 'Apache', 'Samba' or 'number crunching job' and you get the same kind of badness.

There's not much difference really. If it sucks on the desktop then it sucks on the server too: why would it be a good server if it slows down a DB/Apache/Samba/number-crunching-job while prioritizing some large copy operation?

Re:What am I doing wrong? by SanityInAnarchy · 2010-10-23 16:01 · Score: 2, Insightful

I have 8GB of RAM, why do I need swap?

So you can use that 8 gigs for something else, or not buy 8 gigs in the first place. In particular, so when a program is using large amounts of memory for no good reason, it can be swapped out, maybe even just for disk cache.

Also, hibernate.

--
Don't thank God, thank a doctor!

Re:easy solution: by Ingo+Molnar · 2010-10-24 01:32 · Score: 2, Informative

The trouble is that in server workloads you generally don't see ONE LARGE I/O operation - lots of small ones instead. There are very very few server workloads that involve transferring >100MB data at a time (even when it comes to DB snapshoting).

There's lots of server workloads that involve large IO requests:

- backups
- DB startup/shutdown
- DB traffic that generates or reads a lot of new data (say report generation)
- HPC workloads that work with huge data sets
- animation farms that work with huge images/movies
- web servers streaming out big files
- fsck
- virtual desktop servers where the desktops are virtual instances running on the server. There any IO load within that 'desktop' runs on the server.

etc. As there is a fair number of server workloads that are IO heavy but which use small IO requests.

On the desktop this is common (all your AVI files).

If you have those big files in networked storage or if you are backing them up to some network host then you've already transformed those kinds of IO requests into big IO requests on the server side as well: the big file you read or write on the desktop the network file/backup server will read/write from its own disks, etc.

Really, "interactivity sucks during big IO" kind of bugs can hurt servers just as much as they can hurt desktops. The boundary between desktops and servers is very fluid.

Re:What am I doing wrong? by Iskender · 2010-10-24 03:50 · Score: 2, Funny

CowboyNeal came to my house yesterday and sat on the sofa and had a beer like it was just a basic day.

He even said I should work on my interior decoration - "Those empty white walls aren't very pretty." It sucked, so don't come telling me Slashdot hasn't violated my privacy. :)

ionice -c 3 can improve performance by Cassini2 · 2010-10-24 06:30 · Score: 2, Interesting

I often note that multiple simultaneous low-priority file copies implemented as:

ionice -c 3 rsync bigfilein directoryout

run faster than multiple simultaneous high-priority copies implemented as:

rsync bigfilein directoryout

If the copies are run one at a time, the higher priority rsync runs faster. For multiple copies, often the lower priority rsyncs run faster. Also, desktop usability is much improved with the lower priority rsyncs.

I suspect a priority inversion occurs inside the file systems write back cache. At regular priority levels, data is not written back to disk in a timely manner. The ionice -c 3 gives the disk caches a higher priority than the rsync I/O commands, preventing the I/O commands from filling the cache and creating a priority inversion.

The Gnome GUI in Ubuntu is particularly vulnerable to this priority inversion, as by default it does multiple copies simultaneously inside a separate window. Ubuntu usually performs better than Windows however. Between the A-V software in Windows, and the tendency to swap applications out of memory to maximize disk cache, Windows usually performs the same copy operations more slowly than Ubuntu and with less system responsiveness.

Slashdot Mirror

The State of Linux IO Scheduling For the Desktop?

86 of 472 comments (clear)