Linux Gets Completely Fair Scheduler
SchedFred writes "KernelTrap is reporting that CFS, Ingo Molnar's Completely Fair Scheduler, was just merged into the Linux kernel. The new CPU scheduler includes a pluggable framework that completely replaces Molnar's earlier O(1) scheduler, and is described to 'model an "ideal, precise multi-tasking CPU" on real hardware. CFS tries to run the task with the "gravest need" for more CPU time. So CFS always tries to split up CPU time between runnable tasks as close to "ideal multitasking hardware" as possible.' The new CPU scheduler should improve the desktop Linux experience, and will be part of the upcoming 2.6.23 kernel."
just finished make xconfig;make from 2.6.22!
For the really touchy-feely OS out there!
Engineering is the art of compromise.
What sort of gain can the typical linux user expect because of this?
I know enough about process scheduling to fill a ketchup cup at the nearest burger joint, but it struck me that this sounds like the debate about "network neutrality" vs "tiered service." The O(1) was supposed to be a very generic decision-making system that made a decision in a very agnostic way (to simplify the work down to a predictable consistent order of work). This CFS strikes me as a system which will have a much higher level of complexity and context awareness, which sounds like some processes will get more than others. The intention is to make it fair in the real world but not necessarily balanced, since not all processes are alike in their needs or expectations of task switching.
This is just rambling on, and admittedly it may be straining a metaphor too far, so don't go crazy biting my head off for not knowing all things about the kernel. See 'ketchup cup' above.
[
I've sort of gazed for a few seconds at the CFS articles and the following phrase caught my attention the most
But more importantly, I think the factor which'll probably sway me the most is /proc/sys/kernel/sched_granularity_ns. Except I've been salting my config options with one true test ... that kind of thing makes you paranoid about random tune-ups :)
Quidquid latine dictum sit, altum videtur
If you really want a rough time, see how long it takes to rebuild a different OS.
Engineering is the art of compromise.
Steal your insightful comments from http://linux.slashdot.org/article.pl?sid=07/04/22/ 1335255
Slashdot: news for Apple. Stuff that Apple.
Why does this sound like the title of a Monty Python Skit?
"Why isn't my process getting more CPU time?"
"Well, Sir, it's a Completely Fair Scheduler."
Computers are useless. They can only give you answers.
-- Pablo Picasso
really? and how it's suppose to do that wonderful thing?
ps: i'm just curious and noob, so please don't smash me...
Slashdot ya no es que lo era!
We saw crazy performance improvements implementing kqueue in bsd, would love to see something that great at handling many sockets standard linux.
This isn't really the same kind of component.
On the other hand, Linux has epoll, which fills the same role as kqueue.
In my experience, epoll is at least as good.
http://www.kegel.com/c10k.html#nb.epoll
Now MacOS X needs to fix their kqueue bugs, and the world will be a happy place.
Is there a default scheduler in the linux kernel? If so, which is it? Are there several schedulers to choose from? If so, which one(s) do the major distros use? Will the new CFS be the new default or just another choice?
CFS has been available for some time in Andrew Morton's -mm branch of the kernel. If you really want it now, just download his latest patch and there you go.
I've reen running with it for some time, and I really like it. I'm still not sure if it is better than Con Kolivas' SD scheduler in his patchset, but we'll see.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
I thought Linux used Cron as a scheduler ?
You know, there is a difference between trolling and pointing out the flaws in your reasoning. Just saying.
The only way to make it completely fair is to let one thread slice the time up, and let the other thread choose which slice it wants. ;-)
So does Linux reached the computer's communist's holy grail?
The new CPU scheduler should improve the desktop Linux experience, and will be part of the upcoming 2.6.23 kernel.
Could someone outline concrete problem the Linux desktop scheduling had right now that are visible resolved by CFS.
I'm not a heavy user of Linux desktop (just servers on the shell), but it was always my experience that Linux handles simultaneous multimedia tasks (for example) better on the same hardware than Windows.
While I contribute this more to architectural problem on the Windows side (such as.. it's quite easy an app to stall Explorer.exe or vice versa, no amount of scheduling helps there), I'm curious to see if there's tangible difference someone could describe with CFS running desktop software in Linux.
I've always been more impressed with the Solaris' scheduler, I've done a few ports recently of realtime applications to linux from solaris 8 and 10 that have a hard time running properly under linux...
Of course these applications have had years of tuning under SOlaris so it's not an entirely accurate example...
No.
Cron schedules tasks to execute at specified times. This article refers to the kernel's CPU scheduler which determines which running process gets to use the CPU at any given moment.
Ha, I was about to come in and say the same thing. I've always been disappointed in the Linux scheduler compared with my Solaris servers. I run an ISP and frequently get abnormally high load spikes -- my Linux servers handle the load poorly, degrading all of the sudden to gridlock. The Solaris servers, on the other hand, degrade gracefully, still serving up requests but getting slower as the load skyrockets.
What version of KDE are you running?
I fear the Y2038 bug
So little credit is given to Con Kolivas, whose Staircase Deadline scheduler (a more mature and refined design than CFS) spurred Ingo to finally improve his scheduler (which he wrote on the spot because, apparently, Con's scheduler wasn't good enough for him).
And all Con gets is a minor footnote.
grey wolf
LET FORTRAN DIE!
A complete fair scheduler for geeks? I can just see it:
Crumb's Corollary: Never bring a knife to a bun fight.
(disclaimer, i'm the main author of CFS.)
I'd like to point out that CFS is O(1) too.
With current PID limits the worst-case depth of the rbtree is ~15 [and O(15) == O(1), so execution time has a clear upper bound]. Even with a theoretical system that can have 4 million tasks running at once (!), the rbtree depth would have a maximum of ~20-21.
The "O(1) scheduler" that CFS replaces is O(140) [== O(1)] in theory. (in practice the "number of steps" it takes to schedule is much lower than that, on most platforms.)
So the new scheduler is O(1) too (with a worst-case "number of steps" of 15, if you happen to have 32 thousand tasks running at once(!)), and the main difference is not in the O(1)-ness but in the behavior of the scheduler.
"This update seems to have come relatively soon after the O(1) scheduler (about a year?) which is relatively quick for changes to really important low-level parts of an operating system. Does this mean that the Linux community was relatively unhappy with the O(1) scheduler which was touted as a very good thing at the time"
The Linux O(1) scheduler has been around since 2002.
It's pretty good, but there are corner cases where you can fool it. For example, if a process classified as interactive goes CPU-bound, it can cause starvation for other processes.
I rarely criticize things I don't care about.
Ingo Molnar is the worst kind of loser: an attention whore. His O(1) scheduler turned out to be a piece of crap and Con Kolivas spent a considerable amount of time implementing a better solution: Staircase Deadline (SD). The SD scheduler is a well tested, good performing scheduler and just when you think its going to be merged into Linus' kernel and replace Ingo's O(1) turd (and remove Ingo's name from some "important" files), Igno spends a couple of days reimplementing SD. I guess he wont be getting his name deleted after all!
This shows the black side of open source. Con developed SD in the open and Igno stole his ideas. It was only after people started pointing out that CFS looked _very_ similar to SD that Igno even admitted that the design was based on Con's SD work.
The only reason CFS is in the kernel and not SD is politics.
You can download it here. Screenshots here.
Weaselmancer
rediculous.
Hm, that seems to be more of a VM/IO-scheduling problem than a process scheduling problem.
Did you have a chance to try Peter Zijlstra's excellent per-bdi patches, as suggested in the bugzilla?
But in general, CFS ought to improve such workloads too (to a limited degree), in terms of not making any IO starvation worse by adding CPU starvation to the mix :-)
Too bad that the NIH syndrome hit Linux Kernel development too, and Ingo Molnar, after blocking all the attempts to merge SD into mainline because "it couldn't be done", uses the same idea, whips out his own scheduler calling it "Completely Fair", and woosh it gets merged (easily, given that Ingo Molnar himself is the maintainer of that part of the kernel).
Con Kolivas is (obviously and justifiably) disgruntled, to say the least, he stops working on the SD project, and Linux loses an excellent developer because of politics.
"I'm never quite so stupid as when I'm being smart" (Linus van Pelt)
Then there's the American Dream sheduler where you get priority if you work hard at it. You can't just inheret your priority like some rich child process.
Engineering is the art of compromise.
That's correct in my experience. I have no experience with Solaris, but I have seen FreeBSD go through high spikes where Linux grinds to a halt. With smaller loads Linux feels a tad more responsive though.
I do hope this scheduler will make things even better: gracefull degrading and responsiveness in one. Might make it the ideal OS for my needs (I now have Linux on the desktop and FreeBSD on the servers).
Free beer is never free as in speech. Free speech is always free as in beer.
Well, given that he is the maintainer, Ingo Molnar's code is presumably more maintainable. It happens all the time in free software projects, someone submits a patch, the idea in the patch is good, but the section of the code is important enough that the maintainer must be certain he understands it. Rewriting it is a good way to gain such understanding.
Back when I was a maintainer, I guess I rewrote half the patches I got. Most submitters are just happy to see the functionality in there, but there was a few people with fragile egos take it as a personal insult That happens, life goes on, and usually the fragile egos grow more robust with time, and learn that developing what amounts to a prototype of the final code is also a valuable contribution.
His approach for years was to knock all contributions that improved on his earlier code so that they didn't get into the kernel. And now suddenly he sees the light and gathers up all those ideas of others into a new version as if it were novel, totally forgetting that he had rejected them before. And to compound the problem, his quick rehash then immediately gets merged in despite being effectively untested.
If this were business or politics, "corrupt" would describe this state of affairs quite adequately.
For all the talk of code merging based on merit, what we really have in the kernel is an old boys' club in which the inner circle get a free pass even when they deliver crap, as in this case.
Ingo's CFS code is utterly immature and bug-ridden, and should not have been merged. Since he accepts that Con's SD is a great idea, and as it has the benefit of a lot of testing and code maturity, Ingo should have backed its incorporation into the kernel, instead of opposing it.
But no, then it wouldn't bear Ingo's name on it, which would be total anathema to Ingo.
This is a bad state of affairs folks.
Lets say that the machine was running for 100s.
g )
50 seconds of that time, it spent running process A.
50 seconds of that time, it spent running process B.
The 50 seconds of A may be distributed differently by different algorithms.
In some algorithms, A will run for 50 seconds, and then B will run for 50 seconds.
Obviously, this is not the best when you want some interactivity...
In other algorithms, the running of A and B will be interspersed, for instance, A may run for 200ms, followed by B for 200ms, etc. until the 100 seconds is up.
This gives a more interactive system.
Note that both of these algorithms give a 'fair' amount of time to each process, but one is only fair when 'fairness' is computed at the end.
A "better" algorithm, e.g. Inigo's CFS, EDF, GRRR (GR3), VTRR, etc. will also attempt to be fair on -small- timescales with divergent (and possibly grossly divergent) weights.
Wikipedia has a fairly nice page with links to more information:
http://en.wikipedia.org/wiki/Scheduling_(computin
Sigh... I can't believe I'm giving tutorials on /. :-(
./loop
./loop" you will also see CPU usage go to 100% and the System Monitor reveals that the loop shell is running at a nice value of "-19". *And* your system will be performing like a dog. You will not even be able to get the mouse to move "reliably" (this is on a Pentium IV @ 2.8 GHz).
The man page is worthless (and if the universe had any sense of justice many of the Linux man pages would be rewritten).
If one has a shell command file, "loop" containing...
EXPR=1; while true; do EXPR=$[ $EXPR + 1 ]; done
and one says:
nice -19
then CPU usage goes to 100% and a glance at the nice column on the System Monitor reveals that a shell is running "loop" with a nice value of "19", i.e. the system is quite responsive.
If one (as root) says "nice --19
Negative "nices" are are a lower numeric value but a "higher" effective priority (i.e. they get greater CPU time slice allocations).
For those of you who want the history on this, this is because in UNIX version 6, the priority of a process as well as the nice value were kept as signed bytes. Priorities less than zero were negative were system priorities which could not be interrupted. Low value positive priorities were system priorities which could be interrupted. User priorities started at 100. They could be niced to -20 (100 + -20 = 80) or +19 (100+19 = 119) as "starting" points for the scheduler (lowest priority got the CPU). If I recall, the running process had its priority bumped with each clock tick -- so it would go 101, 102, 103, etc. If niced its effective value would go 119, 120, 121, etc. The scheduler did a complete scan of the process table every few clock ticks and reset the priorities so that the totals wouldn't get above 127. You have to remember in the "old" days (1974-5), memory (for storing priorities and nice values) and CPU time for scanning scheduler tables (which are cheaper than linked lists) was expensive and programs were written to get the job done using as few resources as possible.
The problem we now have is that too many system developers (be they Linux kernel developers or Firefox developers) think resources like CPU time and memory are in infinite supply. This of course leads to [1] & [2].
I run both my Gentoo Linux package "emerges" (which can take many hours depending on # of packages) and my Firefox builds (which generally take about an hour) at "nice -19" but it doesn't do any good because the scheduler isn't designed to handle high CPU loads resulting from a process collective (build) vs. low average CPU loads (but potentially high burst loads) associated with long running processes (e.g. X11, mplayer, etc.). It would be very nice if I could actually *use* my system for editing, browsing, etc. while I'm running background system maintenance or development tasks.
1. The "Oh, throw another core at the problem" mentality.
2. The "Oh, throw another GB at the problem" mentality.
"Hello. My name is Ingo Molnar. You killed my task. Prepare to die."
Yep, there are two things I don't like about the Linux scheduler:
1) No I/O awareness. When copying a bunch of big files around, I want that process to have lower possible priority, and not interfere with other system activities, like opening a new program, or doing small I/Os. Bottom line: give bulky transfers idle priority.
2) Lack of idle priority. I want to be able to run a process that only gets CPU time if there's nothing else to do. Even with the lowest possible priority, it will still eat some precious (~5% last time I checked) CPU time.
factor 966971: 966971