The Really Fair Scheduler
derrida writes "During the many threads discussing Ingo Molnar's recently merged Completely Fair Scheduler, Roman Zippel has repeatedly questioned the complexity of the new process scheduler. In a recent posting to the Linux Kernel mailing list he offered a simpler scheduler named the 'Really Fair Scheduler' saying, 'As I already tried to explain previously CFS has a considerable algorithmic and computational complexity. This patch should now make it clearer, why I could so easily skip over Ingo's long explanation of all the tricks CFS uses to keep the computational overhead low — I simply don't need them.'"
The the fancy fair scheduler.
Still waiting for Steve Jobs' "Insanely Fair Scheduler."
If you don't know where you are going, you will wind up somewhere else.
help in the case when a process goes nuts allocating memory, and stops the GUI dead in its tracks? No Alt-Ctrl-Backspace, no switching to console, unbearably slow remote login...
Tsunami -- You can't bring a good wave down!
I'd have to imagine doing so much work to prove a particular implementation's value mathematically is a good step toward depoliticizing the scheduler. That should help in what's been a contentious piece of the kernel of late.
Slashdot - where whining about luck is the new way to make the world you want.
In which no process gets any resources at all. I've also been considering a quantum scheduler, in which each CPU cycle is assigned to every process simultaneously.
Shit, I've just figured out why I'm a project manager.
I just hope it will work as-advertised with IFB
Xatrix Security - Computer Security news portal
Let's just go back to cooperative multitasking like Mac OS where everything was simple.
Nobody is stopping you from using Windows Me
I would love to use your unfair scheduler! A scheduler with liberal tendencies might block all this spyware. We know how fond 'W' is of spying...
Of course, there's the companion "pork barrel scheduler" which randomly spawns useless processes in order to take time from those that deserve it.
The higher the technology, the sharper that two-edged sword.
After all, isn't that the idea of open source software -- may the best code win?
Free Conference Call -- No Spam, High Quality
has been scheduled for use by the slashdot server farm on September 6, 2007 at 14:54:23. Please refresh this page at that time for fishthegeek's insightful comment.
Automatically generated by:
Slashdot Predictive Post Scheduler v 2.12.02-16
load "$",8,1
Completely rejecting both liberal and conservative ideals, it allocates time slices only to processes that already have them.
This is a "great" way to run things and if it ever goes to a vote, I hope lkml ops can be convinced to go the diebold route.
That just takes all the cycels and keeps them for itself?
The completely unfair scheduler, which takes all the time from processes that deserve it and gives it to processes that are blocked. Otherwise known as the liberal scheduler.
As opposed to the "REALLY completely unfair scheduler" (otherwise known as the conservative scheduler or "not nice" scheduler), which takes time from processes that need it desperately and give it to the top one tenth of one percent of processes that are swimming in priority and don't need it.
I'm waiting for a true revolution : PFS, the Porn Fair Scheduler. All processes related to porn (playback, download, etc.) receive much larger time slices than everything else.
The lecturer is no native English speaker. So sometimes you have to replace the word 'base' with 'scheduler'. The clip shows deep insight into what Con Kolivas really feels is going on right now.
http://www.scene.org/redhound/AYB.swf/
I read the article in question. There is obviously much disagreement about the value of the Really Fair Scheduler, and so I must assume that "derrida" and the Slashdot editors are once again just trying to invite more people to the flame-fest as usual.
The comments on the article at the linked-to site suggest that there are potentially flaws in the logic behind the Really Fair Scheduler, and that its author has ignored advancements in the CFS that make most (or all?) of its improvements irrelevent. Also there are many suggestions that the author of the Really Fair Scheduler, some guy named Roman something-or-other, is raging on the kernel lists rather than working cooperatively to improve the Linux scheduler.
Given what I have seen, I suspect that the Really Fair Scheduler is going nowhere, and that "derrida" knows that and is just trying to add more fuel to the flame-fire by posting about it on Slashdot.
Con has announced that he's really pissed off about this new development, and is divorcing Linux for a second time.
Ingo's reply can be found here. Roman's reply to that is here and here
You're more insightful than you think. I don't want a fair scheduler. I want a very unfair one, that favours my favourite processes. And I want one that has as little overhead as possible -- a scheduler so complex that it eats 20% of the available cycles just to figure out who to give the remaining 80% to, I have no use for.
How about I really fair scheduler you ?! Let us see how you like that, slimeball !
poor guy... :(
Screw the CPU scheduler at this point. The kernel folks are missing the obvious and utter brokenness of the IO scheduling. These bugs have been outstanding about a year now!! And it's not just AMD64 anymore either. Quoth the kernel bug report:
o p-to-its-knees-(2.6.22.1-ck1)-t4192136.html- 500.html
"Now, as far as this bug being AMD64 only. We develop a portable data analysis
tool and we run it on Intel Core Mobile systems (Sony UX series, Panasonic
Toughbook series) and see this bug or one almost exactly like it on those
platforms as well.
"
http://bugzilla.kernel.org/show_bug.cgi?id=7372
http://bugzilla.kernel.org/show_bug.cgi?id=8636
http://www.nabble.com/IO-activity-brings-my-deskt
http://forums.gentoo.org/viewtopic-t-482731-start
At first, deadline IO was touted as an answer, but that doesn't completely fix things.
Some say Native Command Queueing is broken. One person claims deadline + NCQ disabled helps.
Some say the kernel's vfs_cache_pressure settings help, while others refute it (compare kernel bug report versus page 21 of the gentoo forum thread). But no one understands what's really broken in the kernel.
Can we please get Ingo working on IO scheduling? PLEASE?
Who's the fairest scheduler made?
He's right on. IO has a much bigger impact.
This sig does not contain any SCO code.
Hmmm, ever heard of nice?
Working in a DevOps shop is like playing in a band made up entirely of keytarists.
http://slashdot.org/comments.pl?sid=252305&cid=199 10521
Everything I write is lies, read between the lines.
Unfortunately Ingo Molmar's opportunistic and unethical behavior in stonewalling the valuable contributions of Con Kolivas invite increased outside scrutiny for his handling of scheduler contributions. In my opinion he should be replaced.
an ill wind that blows no good
People don't you understand? Ingo Molnar is the favourite puppy of Linus, so completely forget anybody else getting a chance to make a better scheduler for the kernel...
It's the usual politics and corruption...
Not that maths isn't useful, but much of the time it can't give you definitive answers for the questions you really want answers to, only somewhat related, simpler ones.
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
Why would we want a Windows kernel in Linux...?
Read other posts about I/O. nice on linux only works on CPU cycles and is close to useless because I/O speeds haven't raised at the same rate as CPU speeds have. I mean. I have quad core CPUs that have a lot of spare cycles. Still, even if I renice 19 some process, it can still choke the machine because it is an I/O intensive process that takes .01 % of the CPU. So even at nice 19, one process can bring your machine down. We need a better I/O scheduler. Something similar to software raid (mdX_raidX processes) which only takes a little portion of I/O bandwidth available when you add a new partition to an array.
Everything I write is lies, read between the lines.
Linus chose the scheduler written by the person that best interacted within the existing developer structure and responded to problem reports. The rejected scheduler may have been slightly better, but the developer was much less cooperative and responsive to bug reports. He killed his own project because of attitude.
Kindness is the language which the deaf can hear and the blind can see. - Mark Twain
Ever heard of ionice?
You can recompile your kernel with a different scheduler if you wish.
Just disrupt the deflector shield with a tachyon burst.
"To retain respect for sausages and Linux schedulers, one must not watch them in the making."
-- Otto von Bismarck (paraphrased)
I experimented with it, but not in depth. As far as I remember, ionice didn't help a lot compared to real mainframe I/O scheduler. I have always felt that Linux was weak on I/O scheduling and other posts tend to confirm what I suspect.
Now, if you tell me that I can do real I/O scheduling with ionice and that you have managed to accomplish that. I might give it a second try, more in depth this time.
Also, please specify kernel tweaking parameters to cause ionice to act as a real I/O scheduler.
Again, I might not have experimented with ionice enough to possess an accurate picture but other posts on this thread seem to lead to what I assumed so far.
Everything I write is lies, read between the lines.
Changing scheduling after boot is not easy.
However, it should be a boot-time option. The compile scripts should let you add in as many schedulers as you like, and select the default scheduler.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Writing a fair scheduler is difficult. Why not let the user decide? I propose a popup message for each context switch: "Hello, it seems the CPU is doing a context switch. Which application to you want to allow to run this time?".
Open Source Alternatives
I think the two people in question are the primary "anonymous" users posting.
-anonymous
I've not heard of that and where do I get it? I've been looking for such a solution very long.
It's not in the debian repositories so I get the feeling there is something wrong with it. (?)
If you mod this up, your slashdot background will turn into a beautiful sunset!
TSNF.
It came up all the time. This was a G5 with 4Gb of RAM. It usually only made an appearance when I tried to get to a downed server through the finder. The other apps were usable, but Finder was out for about five minutes as it figured out what the problem was. This could also happen through a program's file menu dialogs, so if I was trying to open a file in Photoshop and misclicked on a toasted server in the sidebar, Photoshop became frozen.
ionice is a part of Debian's util-linux package in sid. ionice is not in lenny's util-linux version. It appears it is new to util-linux.
This should have been posted only on slashdot's mirror. Only it could tell us who's the fairest of them all
Does Linus like him? More than Ingo?
...I'm going to hold out for the Renaissance Faire Scheduler, so that I can finally get some use out of the Elizabethan hardware I've been hanging on to for so long.
Something wrong with Debian, you mean?
I want to delete my account but Slashdot doesn't allow it.
The meaning of "fair" in this case is that it equally allocates CPU time to programs run in the same priority. Mostly this is managed by allocating certain "timeslices". The reason you want the fairest scheduler you can get is so that your priorities are properly respected and processes at the same priority aren't discriminated against in terms of CPU time, something the old scheduler failed at.
Instead of all the communist central planning nonsense trying to come up with ever cleverer politburo schemes, we should have a Market-Based Scheduler: CPU resources should be auctioned every 100ms. Let the market decide.
It's fairly well known that large writes to the filesystem can cause huge read delays.
This seems to be aggravated by a number of conditions listed in the links posted by the parent post, but it's also aggravated when using ext3 and ordered data journaling as well (which is the default on most systems).
There is some work being done to reduce the huge latency in reads that can occur during heavy write loads with the "per device dirty throttling" patchset. Initial results look very promising.
LWN article: Smarter write throttling
per device dirty throttling -v8
This patch set seems to hold a lot of promise in being able to fix this problem, but I'm not sure what the latest status is or what kernel it will make it into. It could make it into 2.6.24 at the earliest.
I am looking forward to the 'Fairly Fair Scheduler.'
Next week: a completely new scheduler, written by Ingo, in 05:12:43.33213, called the 'Astoundingly Fair Scheduler', which doesn't look at all like this new improvement, especially - hey look ! Something shiny ! And in two weeks time, a defence written by Linus Torvalds, detailing why the AFS is so much better than the RFS, and why Ingo can be trusted so much more when it comes to maintaining stuff like that.
Religion is what happens when nature strikes and groupthink goes wrong.
$ apt-cache show schedutils
Package: schedutils
Status: install ok installed
Priority: optional
Section: utils
Installed-Size: 88
Maintainer: Guus Sliepen
Architecture: i386
Version: 1.5.0-1
Depends: libc6 (>= 2.3.6-6)
Description: Linux scheduler utilities
These are the Linux scheduler utilities - schedutils for short. These programs
take advantage of the scheduler family of syscalls that Linux implements across
various kernels. These system calls implement interfaces for scheduler-related
parameters such as CPU affinity and real-time attributes. The standard UNIX
utilities do not provide support for these interfaces -- thus this package.
.
The programs that are included in this package are taskset, chrt and ionice.
Together with nice and renice (not included), they allow full control of
process scheduling parameters.
It's in the schedutils package in Debian stable. You need a 2.6 kernel.
that would be iNtarwebs
Bunch of school kids. Life is unfair, deal with it.
Suggestions for next iterations: Ass of a scheduler, bastard scheduler, unfair bully scheduler, depressed goth scheduler... (I will leave the exercise of figuring out the allocation semantics to reader)
You might be interested in the FreeDOS project. HTH.HAND.
Cut that out, or I will ship you to Norilsk in a box.
Oh my gosh, the Linux scheduler is on Slashdot. Again! :-)
Frankly, this amount of interest in the Linux scheduler is certainly flattering to all of us Linux scheduler hackers, but there are certainly more important areas that need improvement: 3D support, the MM / IO schedulers, stability, compatibility, etc. (There's also the FreeBSD scheduler that went through a total rewrite recently - and it got not a single Slashdot article that i remember.)
But i digress. A couple of quick high-level points (most of the details can be found in the discussions on lkml):
I find the RFS submission interesting and useful, and i have asked the author to split the patch up a bit better, to separate the core idea from optimizations and unrelated changes - to ease review and merging of the changes, and to make the changes bisectable during QA after they have been applied to the mainstream kernel. (That is how patches are typically submitted to the Linux-kernel mailing list - it's a basic requirement before anything can be merged. CFS for example was applied to the 2.6.23 development tree in form of a series of 50 (!) separate patches. (And the scheduler works at every patching/bisection point.))
I also pointed him to the latest "bleeding edge" scheduler tree, which already implements the same non-normalized form of math and makes some of the rounding and performance arguments moot i believe. (lkml mail).
There are some issues where i disagree with Roman at the moment: even when comparing to unmodified current upstream CFS, i think Roman makes too much out of rounding behavior and i have asked him to substantiate his claims with numbers (lkml mail).
The current precision/rounding of CFS is better than one part in a million. (in fact it's currently even better than that, but i'm saying 1:1000000 here because we could in the future consciously decrease precision, if performance or simplicity arguments justify it.)
I can understand his desire towards creating interest in his patch, but IMO it should not be done by unfairly (pun unintended ;) trash-talking other people's code. The math code in CFS that achieves precision has gone through more than 5 complete rewrites already in the 20-plus CFS versions, and the current variant was not written by me but was largely authored by Thomas Gleixner and Peter Zijlstra.
New, better approaches are possible of course and the math is relatively easy to replace, due to the internal modularity of CFS. So we are keeping an open mind towards further improvements. (which includes the possibility of total replacements as well. Dozens of times has my own kernel code been replaced with new, better implementations in the past - and that includes large parts of the scheduler too. In fact only ~30% of current kernel/sched.c was authored by me, the rest has been written by the other 90+ scheduler contributors, according to the git-annotate output that covers the past ~2.5 years of kernel history. Beyond that numerous other people have contributed to the scheduler in the past.)
About the submitted code: it was a bit hard to review it because the new code did not contain any comments - it only included raw code - which is very uncommon for patches of such type. The email gave the theoretical background but there was little implementational detail in the patch itself connecting the theory to practice.
So to drive this issue forward i have today posted a question to Roman in form of a tiny patch that extracts only his suggested new math from his patch and applies it to CFS. If it is indeed what Roman intended then we can analyze that in isolation and in more detail. The patch is as small as it gets:
At least in linux, and I presume FreeBSD's swap strategy is similar, you miss the point. Let's look at two scenarios, one with proactive swapping, one without, and a malloc comes in that exceeds system memory.
Non-proactive case:
-kernel sees malloc, knows it lacks physical memory to accommodate, malloc is blocked while kernel does housekeeping.
-kernel picks the appropriate amount of pages to write to swap, then writes those pages to swap space, taking a while since block storage IO is excruciatingly slow.
-After the extremely long previous step, the memory is freed
-the malloc is allowed to continue, after a number of milliseconds have passed to execute the drive write, aside from the drive write, everything was in the microsecond scale, so it was delayed by a factor of thousands.
Proactive case:
-The system has some idle time, with nothing immediately better to do, the kernel notes free swap space and flags some appropriate memory as what would be swapped out if and when the system was in need, and copies it to disk, but it *leaves it in memory*. The kernel remembers that while these pages are indeed in memory, it can zap them and be able to restore. This is the critical point, the data in memory has not been *moved* to swap, it has been *copied* to swap.
-Program using that data randomly kicks back to life. It's needed data is on disk, but it is a moot point because it is in physical memory too, so it isn't slowed down. The kernel might take this opportunity to re-evaluate things when idle in terms of what it thinks is mostly unwanted pages.
-Later on, a program needs to malloc and physical memory is exhausted, the kernel blocks to do housekeeping, finds pages that it knows it has copied to disc, frees them and uses them to satisfy the malloc, within microseconds.
Proactive swapping causes extra IO activity during idle, but does not, if implemented correctly, impact things proactively swapped unnecessarily negatively, and allows swap on actual demand to be nearly trivially fast. It may be wasteful to have gobs of swap, and certainly if the swap has the sole copy of tons of data then performance is hopeless, but don't think seeing the swap used count go up 'mysteriously' without significant mallocs going on that it will impact access to the data written to swap later on.
XML is like violence. If it doesn't solve the problem, use more.
Check it out.
IBM has a really good sheduler (called "dispatcher") in their Mainframes (z/OS et al), couldn't Linux try a version of that one for a change ?
After all, IBM is running Linux on their Mainframes.
Mundus Vult Decipi
The algorithm is straightforward, if a little labor-intensive.
I like cutting the steak into smaller pieces, so as to enjoy CFS as a finger food and increase the amount of breading. So computationally, it tends to leave grease and crumbs in my keyboard.
Does your system with four CPU cores have four or more fast disks? Lots of the problems I hear about with IO being slow is that people put two, four, or eight cores into a machine, knowing that their one disk will be the bottleneck. Sure, four or eight disks will still be a bottleneck, but not as much.
/usr and swap, one to be /home, one to be /var, and one to be /tmp. Alter this scheme as fits your special needs, of course. 12 * $80 = $960. You'd probably also need an extra SATA adapter card unless you're running one hell of a server board.
/usr and swap unless you're some kind of software collector. You could probably also do with that much for /var and for /tmp for that matter. So get 320 GB for /home, and 40 GB for each of the others, and save a bit more. If you want to stick with the 7002.10 series for the perpendicular recording, the 80GB drives are about $44. So 3 * 44 = 80 = 212.
/home and make it a $132 investment.
How much is your four-core CPU? You can buy a nice Seagate SATA 3.0Gbps PRT 320GB disk for for less than $80 on NewEgg. Ideally you could buy twelve, and set up four RAID 5 arrays. Assign one array to be
If that's too steep to go along with your underutilized $500 processor, maybe just get four for $80 and use single drives vs. arrays. That's $320 and is only a bit more than the cheapest Core 2 Quad.
Of course, you could go even cheaper and use 40GB for
$212 dollars worth of disks to speed your reads and writes significantly if you're using a single disk. Or even reuse the single disk for
Of course, this doesn't include shipping or power costs. But you get the idea. You can speed your machine's IO up quite a bit by having more than one disk. It's not just that you're talking about four times the transfer speed, either. You're cutting down on seeks, missed read and write opportunities, cache contention, and command queue length by going to more drives on separate interfaces. On lots of machine workloads, such as a heavily trafficked file server or mail server with local logging, you can, IME, get a machine to handle up to about four times the workload by separating the data and the logging onto separate disks.
Actually, in my case, I found that RAM was a cheaper bang for the buck than a fast array of drives ;-)
http://slashdot.org/comments.pl?sid=252305&cid=199 10521
Of course, the best solution depends on the use case, what you suggest could be needed for some applications ;-)
Everything I write is lies, read between the lines.
"Liberals are evil, because they are branded that way. Facts don't matter, truth is what we say it is"
That equation, of course, should say '3 * 44 + 80 = 212'. I'm pretty sure I did a preview, too, but I didn't notice that until looking back at it later.