Linux Gets Completely Fair Scheduler
SchedFred writes "KernelTrap is reporting that CFS, Ingo Molnar's Completely Fair Scheduler, was just merged into the Linux kernel. The new CPU scheduler includes a pluggable framework that completely replaces Molnar's earlier O(1) scheduler, and is described to 'model an "ideal, precise multi-tasking CPU" on real hardware. CFS tries to run the task with the "gravest need" for more CPU time. So CFS always tries to split up CPU time between runnable tasks as close to "ideal multitasking hardware" as possible.' The new CPU scheduler should improve the desktop Linux experience, and will be part of the upcoming 2.6.23 kernel."
just finished make xconfig;make from 2.6.22!
For the really touchy-feely OS out there!
Engineering is the art of compromise.
What sort of gain can the typical linux user expect because of this?
I know enough about process scheduling to fill a ketchup cup at the nearest burger joint, but it struck me that this sounds like the debate about "network neutrality" vs "tiered service." The O(1) was supposed to be a very generic decision-making system that made a decision in a very agnostic way (to simplify the work down to a predictable consistent order of work). This CFS strikes me as a system which will have a much higher level of complexity and context awareness, which sounds like some processes will get more than others. The intention is to make it fair in the real world but not necessarily balanced, since not all processes are alike in their needs or expectations of task switching.
This is just rambling on, and admittedly it may be straining a metaphor too far, so don't go crazy biting my head off for not knowing all things about the kernel. See 'ketchup cup' above.
[
If you really want a rough time, see how long it takes to rebuild a different OS.
Engineering is the art of compromise.
Steal your insightful comments from http://linux.slashdot.org/article.pl?sid=07/04/22/ 1335255
Slashdot: news for Apple. Stuff that Apple.
Why does this sound like the title of a Monty Python Skit?
"Why isn't my process getting more CPU time?"
"Well, Sir, it's a Completely Fair Scheduler."
Computers are useless. They can only give you answers.
-- Pablo Picasso
We saw crazy performance improvements implementing kqueue in bsd, would love to see something that great at handling many sockets standard linux.
This isn't really the same kind of component.
On the other hand, Linux has epoll, which fills the same role as kqueue.
In my experience, epoll is at least as good.
http://www.kegel.com/c10k.html#nb.epoll
Now MacOS X needs to fix their kqueue bugs, and the world will be a happy place.
CFS has been available for some time in Andrew Morton's -mm branch of the kernel. If you really want it now, just download his latest patch and there you go.
I've reen running with it for some time, and I really like it. I'm still not sure if it is better than Con Kolivas' SD scheduler in his patchset, but we'll see.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
I thought Linux used Cron as a scheduler ?
You know, there is a difference between trolling and pointing out the flaws in your reasoning. Just saying.
The only way to make it completely fair is to let one thread slice the time up, and let the other thread choose which slice it wants. ;-)
The CPU scheduler affects the latency of processes. Interactive applications are very latency sensitive - if they do not get scheduled frequently enough the system will feel sluggish. A good desktop scheduler will therefore schedule all of your interactive tasks frequently. I don't understand the details of the CFS, but if it claims to improve the desktop Linux experience then it must do this.
The tradeoff with short timeslices is that there's more overhead due to context switches and so the overall time spent doing useful work on the cpu will be lower. For non-latency sensitive applications it makes sense to keep the cpu residency time of processes high to maximize throughput. Hence the "desktop->server" tunable.
The blurb does mention that that CFS has 'no notion of timeslices' which sounds like nonsense, but I trust Ingo knows what he's talking about so maybe we have different definitions for that term. Anyone care to explain?
So does Linux reached the computer's communist's holy grail?
Ha, I was about to come in and say the same thing. I've always been disappointed in the Linux scheduler compared with my Solaris servers. I run an ISP and frequently get abnormally high load spikes -- my Linux servers handle the load poorly, degrading all of the sudden to gridlock. The Solaris servers, on the other hand, degrade gracefully, still serving up requests but getting slower as the load skyrockets.
What version of KDE are you running?
I fear the Y2038 bug
So little credit is given to Con Kolivas, whose Staircase Deadline scheduler (a more mature and refined design than CFS) spurred Ingo to finally improve his scheduler (which he wrote on the spot because, apparently, Con's scheduler wasn't good enough for him).
And all Con gets is a minor footnote.
grey wolf
LET FORTRAN DIE!
A complete fair scheduler for geeks? I can just see it:
Crumb's Corollary: Never bring a knife to a bun fight.
Its been already said, but ill repeat just for completion.
Basically right now the scheduler is unbiased, giving ticks to all applications regardless of their need for processing time. An example of this would be in X windows when you have little taskbar icons that rarely do anything, vs having cd burning software running.
The scheduler will quickly learn that most of the time it asks the taskbar application if it needs to do anything, it doesnt, and that most of the time it asks the cd writing software to do anything, it neeeds cpu. So very quickly it will start checking on the cd writing process more frequently than the taskbar process. This will give you a very noticable performance increase in your system.
With this in mind, there should be a very noticable performance increase in all desktop and server systems. This scheduling change is a very big addition to the main branch of the kernel. Its been available for some time in various kernel patches but the fact that its making it to the main kernel branch means its matured enough for prime time and its been ackhowledged as benefitial to the linux kernel.
I for one am anxious to try this out on all our systems. From what Im reading it has some fine tuning options which should be really nice to play with.
http://interserver.net/
CFS and Con Kolivas' SD both aim to improve interactivity of processes under high load - in particular, the goal was to reduce scheduling latency for applications which have realtime needs - like audio players. Con Kolivas has been maintaining variations no his low-latency Staircase design for several years with precisely that goal in mind.
On the desktop, it improves latencies for (for example) music players and 3D games, improving performance and elimingating jitter, lag, and general choppiness. Both SD and CFS achieved this under loads as high as 50.
On the server, it can have several benefits, including improved time-to-network latencies. They both want and need test cases for servers that show no detrimental effects. If you want to help, you can try out CFS on a server and report to Ingo if there are performance or latency issues.
grey wolf
LET FORTRAN DIE!
Actually, no, Gnome and KDE aren't the troublemakers. It turns out that certain X drivers are poorly written and X preempts processes vying for CPU. CFS helps improve the situation - almost to the point where you don't notice it.
grey wolf
LET FORTRAN DIE!
(disclaimer, i'm the main author of CFS.)
I'd like to point out that CFS is O(1) too.
With current PID limits the worst-case depth of the rbtree is ~15 [and O(15) == O(1), so execution time has a clear upper bound]. Even with a theoretical system that can have 4 million tasks running at once (!), the rbtree depth would have a maximum of ~20-21.
The "O(1) scheduler" that CFS replaces is O(140) [== O(1)] in theory. (in practice the "number of steps" it takes to schedule is much lower than that, on most platforms.)
So the new scheduler is O(1) too (with a worst-case "number of steps" of 15, if you happen to have 32 thousand tasks running at once(!)), and the main difference is not in the O(1)-ness but in the behavior of the scheduler.
No, CFS does not do that, and that would be quite silly to do indeed :-)
CFS keeps tasks that woke up in the runqueue, and allows them to run immediately in the typical case - just like the old scheduler did.
Where CFS differs from the old scheduler is mainly the case when there are more tasks runnable than there are CPUs/cores available. In such cases, on any modern multitasking kernel, the scheduler has to decide which task to run, and in what order and weight to run those tasks, with the goal to provide to the user the happy illusion of multiple, snappy applications running at once.
The old O(1) scheduler decided the "order and weight" of runnable tasks based on an pretty elaborate set of heuristics. The rules are pretty complex, but it mostly boils down to 'sleepers get more CPU time than runners'.
(sidenote: CFS is an O(1) scheduler too for all practical purposes, with an upper limit of ~15 algorithmic steps worst-case)
Now those heuristics worked pretty well for 15 years (those sleep-heuristics were always part of Linux scheduling, the O(1) scheduler i wrote inherited them from the original O(N) scheduler), but good is never good enough in the land of Linux ;-)
How does CFS work? CFS follows an approach similar to Con Kolivas' SD project: a scheduler core that instead of heuristics uses "fair scheduling" to achieve interactivity. Runnable tasks are scheduled in a painstakingly fair way (and that seemingly simple concept alone is pretty hard to achieve in a general purpose kernel).
The simplest case is when there are only CPU-intense tasks running. For example, if there are 8 CPU-intense tasks running on the CPU, each task gets exactly 12.5% CPU time. If you watch how much CPU time the tasks get it will be 12.5% long-term too, with no deviations, with no skewing caused by other tasks running inbetween.
The more complex case is when applications schedule frequently (and that is the case on most desktops and servers), so CFS extends the concept of 'fairness' to sleeping tasks too. CFS accounts not only 'runners', but 'sleepers' too. Tasks that sleep/run frequently are still given their full 'fair share' of the CPU, up to the limit they could have gotten were they not sleeping at all.
So for example, if you have two tasks on a CPU, one a 100% CPU hog, the other one an application that sleeps/runs 50% of the time - both will get 50% of the CPU in CFS. Under the strict 'runner fairness' approach (which for example SD is following), the 100% CPU hog would get ~66% of CPU time, the sleeper would get ~33% of CPU time.
To achieve 'sleeper fairness', CFS runs the (ex-)sleeper task sooner, to offset its disadvantage of not hanging around on the CPU all the time. Or in other words: interactive tasks (tasks that sleep often) will get to the CPU with lower latencies. Which is the holy grail of good desktop scheduling :-)
(granted, CFS does a whole lot more than that, its patch-impact size is 3 times larger than SD. CFS is not a single patch but a series of 50 patches, which also modularize kernel scheduling policy implementation (note, it does not modularize the scheduler itself a'la PlugSched), offer "group scheduling" (nifty thing for containers/virtualization and large systems, written by Srivatsa Vaddagiri of IBM), offer precise CPU usage accounting to /proc (used by CPU/task monitoring tools), and much more. We decided to turn Linux scheduling upside down, which gave me the easy excuse^H^H^H opportunity to extend the scheduler's design a bit more ;-)
Ingo Molnar is the worst kind of loser: an attention whore. His O(1) scheduler turned out to be a piece of crap and Con Kolivas spent a considerable amount of time implementing a better solution: Staircase Deadline (SD). The SD scheduler is a well tested, good performing scheduler and just when you think its going to be merged into Linus' kernel and replace Ingo's O(1) turd (and remove Ingo's name from some "important" files), Igno spends a couple of days reimplementing SD. I guess he wont be getting his name deleted after all!
This shows the black side of open source. Con developed SD in the open and Igno stole his ideas. It was only after people started pointing out that CFS looked _very_ similar to SD that Igno even admitted that the design was based on Con's SD work.
The only reason CFS is in the kernel and not SD is politics.
You can download it here. Screenshots here.
Weaselmancer
rediculous.
Too bad that the NIH syndrome hit Linux Kernel development too, and Ingo Molnar, after blocking all the attempts to merge SD into mainline because "it couldn't be done", uses the same idea, whips out his own scheduler calling it "Completely Fair", and woosh it gets merged (easily, given that Ingo Molnar himself is the maintainer of that part of the kernel).
Con Kolivas is (obviously and justifiably) disgruntled, to say the least, he stops working on the SD project, and Linux loses an excellent developer because of politics.
"I'm never quite so stupid as when I'm being smart" (Linus van Pelt)
Then there's the American Dream sheduler where you get priority if you work hard at it. You can't just inheret your priority like some rich child process.
Engineering is the art of compromise.
Well, given that he is the maintainer, Ingo Molnar's code is presumably more maintainable. It happens all the time in free software projects, someone submits a patch, the idea in the patch is good, but the section of the code is important enough that the maintainer must be certain he understands it. Rewriting it is a good way to gain such understanding.
Back when I was a maintainer, I guess I rewrote half the patches I got. Most submitters are just happy to see the functionality in there, but there was a few people with fragile egos take it as a personal insult That happens, life goes on, and usually the fragile egos grow more robust with time, and learn that developing what amounts to a prototype of the final code is also a valuable contribution.
"Hello. My name is Ingo Molnar. You killed my task. Prepare to die."