The Really Fair Scheduler
derrida writes "During the many threads discussing Ingo Molnar's recently merged Completely Fair Scheduler, Roman Zippel has repeatedly questioned the complexity of the new process scheduler. In a recent posting to the Linux Kernel mailing list he offered a simpler scheduler named the 'Really Fair Scheduler' saying, 'As I already tried to explain previously CFS has a considerable algorithmic and computational complexity. This patch should now make it clearer, why I could so easily skip over Ingo's long explanation of all the tricks CFS uses to keep the computational overhead low — I simply don't need them.'"
Let's just go back to cooperative multitasking like Mac OS where everything was simple.
I don't think any scheduler will help you with that. The slowness is due to the swapping in and out from the disk, and that's going to be limited by the horribly slow speed of the disk.
/proc/sys/vm/overcommit_memory. No more OOM killer killing some random unrelated process. Memory allocations will fail and programs will be able to handle that correctly.
/etc/security/limits.conf
You could tweak things to make this a less likely ocurrence though.
Disable overcommit by echo 2 >
Set some memory limits in
Avoid having too much swap space. It's awfully slow, if you're using it too much all you'll manage is to run more things slower.
Get more RAM, it's cheap. If you're regularly swapping then you definitely should.
The scheduler is at the very heart of the kernel. It's relatively hard to make the logic for choicing what and when to context-switch modular, while keeping the actual context-switches fast enough. Diferent schedulers tend to have different ideas on what stats to keep, and you all want it with good memory locality. After all, we should remember that this is a piece of code that's relevant tens or hundreds of times per second, no matter what you do with your machine.
"A Lisp programmer knows the value of everything, but the cost of nothing." - Alan Perlis
Screw the CPU scheduler at this point. The kernel folks are missing the obvious and utter brokenness of the IO scheduling. These bugs have been outstanding about a year now!! And it's not just AMD64 anymore either. Quoth the kernel bug report:
o p-to-its-knees-(2.6.22.1-ck1)-t4192136.html- 500.html
"Now, as far as this bug being AMD64 only. We develop a portable data analysis
tool and we run it on Intel Core Mobile systems (Sony UX series, Panasonic
Toughbook series) and see this bug or one almost exactly like it on those
platforms as well.
"
http://bugzilla.kernel.org/show_bug.cgi?id=7372
http://bugzilla.kernel.org/show_bug.cgi?id=8636
http://www.nabble.com/IO-activity-brings-my-deskt
http://forums.gentoo.org/viewtopic-t-482731-start
At first, deadline IO was touted as an answer, but that doesn't completely fix things.
Some say Native Command Queueing is broken. One person claims deadline + NCQ disabled helps.
Some say the kernel's vfs_cache_pressure settings help, while others refute it (compare kernel bug report versus page 21 of the gentoo forum thread). But no one understands what's really broken in the kernel.
Can we please get Ingo working on IO scheduling? PLEASE?
"To retain respect for sausages and Linux schedulers, one must not watch them in the making."
-- Otto von Bismarck (paraphrased)
How about the Scarbrough Fair Scheduler, that allocates Parsley, Sage, Rosemary and Thymeslices.
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Oh my gosh, the Linux scheduler is on Slashdot. Again! :-)
Frankly, this amount of interest in the Linux scheduler is certainly flattering to all of us Linux scheduler hackers, but there are certainly more important areas that need improvement: 3D support, the MM / IO schedulers, stability, compatibility, etc. (There's also the FreeBSD scheduler that went through a total rewrite recently - and it got not a single Slashdot article that i remember.)
But i digress. A couple of quick high-level points (most of the details can be found in the discussions on lkml):
I find the RFS submission interesting and useful, and i have asked the author to split the patch up a bit better, to separate the core idea from optimizations and unrelated changes - to ease review and merging of the changes, and to make the changes bisectable during QA after they have been applied to the mainstream kernel. (That is how patches are typically submitted to the Linux-kernel mailing list - it's a basic requirement before anything can be merged. CFS for example was applied to the 2.6.23 development tree in form of a series of 50 (!) separate patches. (And the scheduler works at every patching/bisection point.))
I also pointed him to the latest "bleeding edge" scheduler tree, which already implements the same non-normalized form of math and makes some of the rounding and performance arguments moot i believe. (lkml mail).
There are some issues where i disagree with Roman at the moment: even when comparing to unmodified current upstream CFS, i think Roman makes too much out of rounding behavior and i have asked him to substantiate his claims with numbers (lkml mail).
The current precision/rounding of CFS is better than one part in a million. (in fact it's currently even better than that, but i'm saying 1:1000000 here because we could in the future consciously decrease precision, if performance or simplicity arguments justify it.)
I can understand his desire towards creating interest in his patch, but IMO it should not be done by unfairly (pun unintended ;) trash-talking other people's code. The math code in CFS that achieves precision has gone through more than 5 complete rewrites already in the 20-plus CFS versions, and the current variant was not written by me but was largely authored by Thomas Gleixner and Peter Zijlstra.
New, better approaches are possible of course and the math is relatively easy to replace, due to the internal modularity of CFS. So we are keeping an open mind towards further improvements. (which includes the possibility of total replacements as well. Dozens of times has my own kernel code been replaced with new, better implementations in the past - and that includes large parts of the scheduler too. In fact only ~30% of current kernel/sched.c was authored by me, the rest has been written by the other 90+ scheduler contributors, according to the git-annotate output that covers the past ~2.5 years of kernel history. Beyond that numerous other people have contributed to the scheduler in the past.)
About the submitted code: it was a bit hard to review it because the new code did not contain any comments - it only included raw code - which is very uncommon for patches of such type. The email gave the theoretical background but there was little implementational detail in the patch itself connecting the theory to practice.
So to drive this issue forward i have today posted a question to Roman in form of a tiny patch that extracts only his suggested new math from his patch and applies it to CFS. If it is indeed what Roman intended then we can analyze that in isolation and in more detail. The patch is as small as it gets: