Linux Gets Completely Fair Scheduler
SchedFred writes "KernelTrap is reporting that CFS, Ingo Molnar's Completely Fair Scheduler, was just merged into the Linux kernel. The new CPU scheduler includes a pluggable framework that completely replaces Molnar's earlier O(1) scheduler, and is described to 'model an "ideal, precise multi-tasking CPU" on real hardware. CFS tries to run the task with the "gravest need" for more CPU time. So CFS always tries to split up CPU time between runnable tasks as close to "ideal multitasking hardware" as possible.' The new CPU scheduler should improve the desktop Linux experience, and will be part of the upcoming 2.6.23 kernel."
I've sort of gazed for a few seconds at the CFS articles and the following phrase caught my attention the most
But more importantly, I think the factor which'll probably sway me the most is /proc/sys/kernel/sched_granularity_ns. Except I've been salting my config options with one true test ... that kind of thing makes you paranoid about random tune-ups :)
Quidquid latine dictum sit, altum videtur
really? and how it's suppose to do that wonderful thing?
ps: i'm just curious and noob, so please don't smash me...
Slashdot ya no es que lo era!
Sort of. Scheduling algorithms are very important for routers too. So there is an analogy. But the analogy isn't with a tiered internet. It's with protocol based QoS. For instance, VoIP requires very low latency, but BitTorrent doesn't. So shaping traffic so that VoIP stuff gets handled by a router first (while minimally affecting BitTorrent) improves the quality of service. On the kernel scheduling side of the analogy, some software needs to have quick access to the processor, often, but for short periods of time. A GUI interface is an example. Real-time software is a more important example.
A tiered internet is something else entirely.
This isn't really the same kind of component.
On the other hand, Linux has epoll, which fills the same role as kqueue.
In my experience, epoll is at least as good.
http://www.kegel.com/c10k.html#nb.epoll
Now MacOS X needs to fix their kqueue bugs, and the world will be a happy place.
CFS has been available for some time in Andrew Morton's -mm branch of the kernel. If you really want it now, just download his latest patch and there you go.
I've reen running with it for some time, and I really like it. I'm still not sure if it is better than Con Kolivas' SD scheduler in his patchset, but we'll see.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
That is for scheduling background tasks that run once a day (or whatever you set it to)
This is for scheduling CPU resouces in real time. To decide if Firefox or Apache is going to be executed the following split second.
I think you have this TOTALLY backwards.
The old scheduler was filled with huge chunks of complex code to try to guess at which processes were interactive and such, and would then specially treat those processes differently when scheduling.
The CFS does none of that. It schedules all processes the same, in a completely fair manner, and doesn't have any special logic in it that tries to classify processes at all, other than nice levels.
The part yet to be merged is the process grouping, which again isn't anything like the interactivity guessing code. It's just a simple way to say "these processes belong together, so when you do the CPU scheduling, treat them as a single group." It's basically just a weighting mechanism with a logical container.
No.
Cron schedules tasks to execute at specified times. This article refers to the kernel's CPU scheduler which determines which running process gets to use the CPU at any given moment.
Its been already said, but ill repeat just for completion.
Basically right now the scheduler is unbiased, giving ticks to all applications regardless of their need for processing time. An example of this would be in X windows when you have little taskbar icons that rarely do anything, vs having cd burning software running.
The scheduler will quickly learn that most of the time it asks the taskbar application if it needs to do anything, it doesnt, and that most of the time it asks the cd writing software to do anything, it neeeds cpu. So very quickly it will start checking on the cd writing process more frequently than the taskbar process. This will give you a very noticable performance increase in your system.
With this in mind, there should be a very noticable performance increase in all desktop and server systems. This scheduling change is a very big addition to the main branch of the kernel. Its been available for some time in various kernel patches but the fact that its making it to the main kernel branch means its matured enough for prime time and its been ackhowledged as benefitial to the linux kernel.
I for one am anxious to try this out on all our systems. From what Im reading it has some fine tuning options which should be really nice to play with.
http://interserver.net/
CFS and Con Kolivas' SD both aim to improve interactivity of processes under high load - in particular, the goal was to reduce scheduling latency for applications which have realtime needs - like audio players. Con Kolivas has been maintaining variations no his low-latency Staircase design for several years with precisely that goal in mind.
On the desktop, it improves latencies for (for example) music players and 3D games, improving performance and elimingating jitter, lag, and general choppiness. Both SD and CFS achieved this under loads as high as 50.
On the server, it can have several benefits, including improved time-to-network latencies. They both want and need test cases for servers that show no detrimental effects. If you want to help, you can try out CFS on a server and report to Ingo if there are performance or latency issues.
grey wolf
LET FORTRAN DIE!
It works quite well. I use Con Kolivas' SD scheduler (on which CFS is based), and in a similar situation (with heavy I/O and numerous power-hungry apps), it performs exceedingly well.
Ingo tests CFS with a kernel make -j50 - just to give you an idea of what we're shooting for here.
grey wolf
LET FORTRAN DIE!
[ck] It is the end of -ck
This is pretty sad for linux kernel development.
(disclaimer, i'm the main author of CFS.)
I'd like to point out that CFS is O(1) too.
With current PID limits the worst-case depth of the rbtree is ~15 [and O(15) == O(1), so execution time has a clear upper bound]. Even with a theoretical system that can have 4 million tasks running at once (!), the rbtree depth would have a maximum of ~20-21.
The "O(1) scheduler" that CFS replaces is O(140) [== O(1)] in theory. (in practice the "number of steps" it takes to schedule is much lower than that, on most platforms.)
So the new scheduler is O(1) too (with a worst-case "number of steps" of 15, if you happen to have 32 thousand tasks running at once(!)), and the main difference is not in the O(1)-ness but in the behavior of the scheduler.
"This update seems to have come relatively soon after the O(1) scheduler (about a year?) which is relatively quick for changes to really important low-level parts of an operating system. Does this mean that the Linux community was relatively unhappy with the O(1) scheduler which was touted as a very good thing at the time"
The Linux O(1) scheduler has been around since 2002.
It's pretty good, but there are corner cases where you can fool it. For example, if a process classified as interactive goes CPU-bound, it can cause starvation for other processes.
I rarely criticize things I don't care about.
Well, no offense, but I'm glad it isn't you that's in charge of making important decisions in that case. I realize that you were probably less than half-serious, but I would hate for the Linux community to ever be in the stage where "attract more masses" is a goal that diverts effort from interesting projects like this one.
With that said, what's wrong with Qt/KDE, particularly the new versions (the ones still in Alpha)? I'd say it is very much a "non-ugly GUI lib", and a "sane windowing environment".
Hm, that seems to be more of a VM/IO-scheduling problem than a process scheduling problem.
Did you have a chance to try Peter Zijlstra's excellent per-bdi patches, as suggested in the bugzilla?
But in general, CFS ought to improve such workloads too (to a limited degree), in terms of not making any IO starvation worse by adding CPU starvation to the mix :-)
> So little credit is given to Con Kolivas ...
> And all Con gets is a minor footnote.
I'm a kernel developer myself and quite surprised you see it that way.
Let's take a look at the kernel code:
1) Ingo credited Con for the "fair scheduling" approach right on the first page of kernel/sched.c. That's the
most prominent place you can get credited for working on the Linux scheduler
* 2007-04-15 Work begun on replacing all interactivity tuning with a
* fair scheduling design by Con Kolivas.
2) He credited Con for a line of code that he added to CFS from SD, in kernel/sched.c
* This idea comes from the SD scheduler of Con Kolivas:
This is the only SD code in CFS - the two designs and approaches are quite different.
3) He credited Con in Documentation/sched-design-CFS.txt
I'd like to give credit to Con Kolivas for the general approach here:
he has proven via RSDL/SD that 'fair scheduling' is possible and that
it results in better desktop scheduling. Kudos Con!
4) Finally he credited Con in the CFS commit log as well:
commit c31f2e8a42c41efa46397732656ddf48cc77593e
Author: Ingo Molnar
Date: Mon Jul 9 18:52:01 2007 +0200
sched: add CFS credits
add credits for recent major scheduler contributions:
Con Kolivas, for pioneering the fair-scheduling approach
Peter Williams, for smpnice
Mike Galbraith, for interactivity tuning of CFS
Srivatsa Vaddagiri, for group scheduling enhancements
Signed-off-by: Ingo Molnar
I don't see much more places, where credit could be documented.
tglx
I'm not a kernel developer but happened to be reading the mailing lists when the "CFS" originally hit the scene a few months ago.
l /755787#755787
3 .html )
Basically Ingo Molnar, the author of CFS, who is also the maintainer of the scheduler in the kernel, opposed the inclusion of the competing SD scheduler from Con Kolivas for years. Then he claimed that he was just suddenly inspired to whip up a new scheduler that addresses the exact same problems. He then did so in "62 hours".
If you start at this point and read the next 20 or so messages it gives a pretty clear flavor of how things went down. ( the 62 hour comment is in there too).
http://www.gossamer-threads.com/lists/linux/kerne
you'll note that Ingo's defense is to use smileys and to tell some guy that he's a BSD developer and therefore doesn't understand Linux and should therefore butt out. (I also enjoyed the comment about how having pluggable schedulers is not desirable because it would confuse people. Not like there's already io schedulers, for example. )
After 10 years of working with developers in corporate land, to me it reads like a clear power-play followed by some significant chest thumping. On technical merit the scheduler sounds fine, but on process it was clearly crap and resulted in an obviously skilled and motivated contributor being driven from the world of kernel development.
(some have already posted this link: http://bhhdoa.org.au/pipermail/ck/2007-June/00789
i'll just post AC since i don't really want this to come back and haunt me in the future (yet i still feel compelled to say something on the topic)
> Ingo please comment on this because I have read similar stories elsewhere and would like to hear a
/. article. (Direct link to full KernelTrap article not provided, in the hope of saving the site from a slashdotting).
> response.
I'd understand if Ingo doesn't want to comment on this; it was a painful clash between two competent and strong characters, which expanded to other parties accusing Ingo of elitism and plagiarism.
For reference, this was archived on kerneltrap.org, and I believe it was covered in an earlier
For what it's worth, here's the "facts" as I see them :
1/ It looks as though Ingo *and*Linus* refused Con's original patch on certain grounds which weren't clearly understood/communicated. Ingo, however, stated that in general he was "quite positive about the staircase scheduler." He proceeded to test it and gave Con feedback.
2/ Con's work was good enough that Ingo about-turned on his earlier, negative stance about fair schedulers and was inspired to go and develop something very similar (but which fitted better with the overall kernel architecture). It's clear that this was predominantly Ingo's own code (hence no plagiarism), and Ingo credits Con in the code comments for coming up with the general approach.
3/ Somewhere in the middle of the ensuing discussion on lkml there are complaints that Con wasn't kept in the loop. However, Ingo cites examples where he *did* communicate to Con; by Con's own admission he was very ill (hospitalised) during a critical period.
4/ Parent suggests that Con has since stopped contributing to the kernel. I don't see any indication of this in the kernel thread - in fact Con's post gives every indication that he'll continue to contribute.
My analysis :
I put the situation down to an applied case of "standing on the shoulders of giants". It's very rare that anyone creates something completely new, and in large projects this can occasionally generate friction.
Con was in a susceptible condition when the CFS code was released, had a grumble on the list, but generally acted pretty maturely. Ingo credited Con's contributions wherever feasible, clarified this in discussion, and stayed polite and friendly throughout. End of story.
What's pretty disgusting is the partisan name-calling that follows in the KernelTrap comments. "Shame on Ingo", "Con is acting like a baby", etc. I hope that this doesn't generate bad feeling between Molnar & Kolivas, because after Con's original complaint on lkml and Ingo's response things seemed to be settled.
No doubt in future Ingo will take an increased amount of care about vetting other people's code, not promoting his own to the exclusion of others, and crediting other people in his own work (note: I don't claim that he has been lacking in this respect in the past). Con, likewise, will doubtless be mollified when his contributions are more readily recognised as being of merit in future. In the meantime Linus has emphasised that competition between developers is a *good* thing to a reasonable extent, as it directly increases motivation.
Now, I suggest that everyone else with a ready opinion hold their breath a while, and let all them get on with coding.
Conrad
Surely you jest. The vast majority of GUI CPU time on a typical GNOME box is spent in Pango. I dare you to profile it and prove me wrong. Pango puts in a lot of hard work, and most of it goes to waste. Now parts of GNOME are actually written in C# using Mono and Gtk#, giving you a couple of extra layers of failware. The work X.Org does is extremely minimal these days, especially when it uses hardware acceleration for some render tasks.
Smart scheduling is no competition for fast code, and KDE wins hard by using Qt. Even Swing with the (apparently proprietary) Java2D backend is much faster than GTK, even when GTK uses Cairo.
Look at 'top' and sort by total CPU time. X.Org will be one of the highest, but you have to remember it takes a chunk of everyone's CPU time and that persists even after they die. I'm sure if you add up all your other graphical programs, even the ones that are running just now and not counting the old ones, it'll be much higher than X.Org.
Sam ty sig.
Learn Model View Controller programming. I have coded a similar thing myself in Qt without such problems.
No, Kolivas has definitely withdrawn from kernel development. From his -ck mailing list post:
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
> No, Kolivas has definitely withdrawn from kernel development. From his -ck
> mailing list post:
>
> "So, I've had enough. I'm out of here forever."
I stand corrected. What a shame that it spiralled this far.
Conrad
How did this rubbish get modded informative ? Is it someone's idea of a joke ? Or do people simply apply the "informative" mod on things they know nothing about ?
The scheduler doesn't "ask" the processes anything. It goes through the list of runnable tasks - the tasks which aren't currently blocked waiting for data to arrive from the network, the user to press a key, some time to elapese, or something else - and decides which one should run next, and for how long. After it has run, it picks the next task, and so on.
The "taskbar processes" are inactive because they are blocking on a socket (which connects to the X server), waiting for message from X server, which might carry user input or whatever. They aren't in the runqueue so the shceduler doesn't have anything to do with them. Only once they receive the message they've been waiting for do they become runnable again, and thus subject to scheduling.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Lets say that the machine was running for 100s.
g )
50 seconds of that time, it spent running process A.
50 seconds of that time, it spent running process B.
The 50 seconds of A may be distributed differently by different algorithms.
In some algorithms, A will run for 50 seconds, and then B will run for 50 seconds.
Obviously, this is not the best when you want some interactivity...
In other algorithms, the running of A and B will be interspersed, for instance, A may run for 200ms, followed by B for 200ms, etc. until the 100 seconds is up.
This gives a more interactive system.
Note that both of these algorithms give a 'fair' amount of time to each process, but one is only fair when 'fairness' is computed at the end.
A "better" algorithm, e.g. Inigo's CFS, EDF, GRRR (GR3), VTRR, etc. will also attempt to be fair on -small- timescales with divergent (and possibly grossly divergent) weights.
Wikipedia has a fairly nice page with links to more information:
http://en.wikipedia.org/wiki/Scheduling_(computin
For a requirement like that, you would have likely wanted to implement your own model backing a QTableView. You would have extended QAbstractItemModel and handled data updates there, and let notifications to the GUI flow when needed. You would probably have seen better performance. See http://doc.trolltech.com/4.3/qtableview.html , http://doc.trolltech.com/4.3/qabstracttablemodel.h tml , http://doc.trolltech.com/4.3/qabstractitemmodel.ht ml , and http://doc.trolltech.com/4.3/sql-tablemodel.html for more information.
Sigh... I can't believe I'm giving tutorials on /. :-(
./loop
./loop" you will also see CPU usage go to 100% and the System Monitor reveals that the loop shell is running at a nice value of "-19". *And* your system will be performing like a dog. You will not even be able to get the mouse to move "reliably" (this is on a Pentium IV @ 2.8 GHz).
The man page is worthless (and if the universe had any sense of justice many of the Linux man pages would be rewritten).
If one has a shell command file, "loop" containing...
EXPR=1; while true; do EXPR=$[ $EXPR + 1 ]; done
and one says:
nice -19
then CPU usage goes to 100% and a glance at the nice column on the System Monitor reveals that a shell is running "loop" with a nice value of "19", i.e. the system is quite responsive.
If one (as root) says "nice --19
Negative "nices" are are a lower numeric value but a "higher" effective priority (i.e. they get greater CPU time slice allocations).
For those of you who want the history on this, this is because in UNIX version 6, the priority of a process as well as the nice value were kept as signed bytes. Priorities less than zero were negative were system priorities which could not be interrupted. Low value positive priorities were system priorities which could be interrupted. User priorities started at 100. They could be niced to -20 (100 + -20 = 80) or +19 (100+19 = 119) as "starting" points for the scheduler (lowest priority got the CPU). If I recall, the running process had its priority bumped with each clock tick -- so it would go 101, 102, 103, etc. If niced its effective value would go 119, 120, 121, etc. The scheduler did a complete scan of the process table every few clock ticks and reset the priorities so that the totals wouldn't get above 127. You have to remember in the "old" days (1974-5), memory (for storing priorities and nice values) and CPU time for scanning scheduler tables (which are cheaper than linked lists) was expensive and programs were written to get the job done using as few resources as possible.
The problem we now have is that too many system developers (be they Linux kernel developers or Firefox developers) think resources like CPU time and memory are in infinite supply. This of course leads to [1] & [2].
I run both my Gentoo Linux package "emerges" (which can take many hours depending on # of packages) and my Firefox builds (which generally take about an hour) at "nice -19" but it doesn't do any good because the scheduler isn't designed to handle high CPU loads resulting from a process collective (build) vs. low average CPU loads (but potentially high burst loads) associated with long running processes (e.g. X11, mplayer, etc.). It would be very nice if I could actually *use* my system for editing, browsing, etc. while I'm running background system maintenance or development tasks.
1. The "Oh, throw another core at the problem" mentality.
2. The "Oh, throw another GB at the problem" mentality.
Want to hear the voice of GOD? cat