The Completely Fair Scheduler
hichetu writes "Kernel trap has a nice summary of what is going on behind the scenes to change the Linux Scheduler. The O(1) Linux scheduler is going to be changed so that it is fair to interactive tasks. You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks."
I thought Linux used Cron as a scheduler ?
If scheduling was completely fair, this would have been a frist ps0t.
Free software: because some processes are more equal than others.
You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks.
No I won't, because I don't know what the hell it means.
Hah! In your face, Taco!
Will the new scheduler be more efficient at scheduling single processes across a multi-core system?
On another not[e], when are we going to be able to build a default kernel capable of low latency I/O for A/V work?
Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation. Windows now has it, so there's no excuse. CPU power is fairly abundant these days so managing its usage is less of an issue than it used to be, but I/O bandwidth is often in short supply and I/O-bound applications can choke a system and make interactive processes a pain in the ass to use. I'd like to see some way of reserving and limiting bandwidth to particular devices for particular processes. And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy... as far as I know, the system calls don't even exist in the kernel to do this yet.
This new scheduler may have 'Ingo Molnar' written all over it but I'll be giving the credit to ck!
It's a very interesting phenomenon.
After enough number of iterations trying to optimize a software program to do everything very well compared to the base "naive" solution, you end up with an OS that does everything poorly.
It's counterintuitive, but we see it it every day around us.
Doing a constant time reevaluation for n processes is still O(n) whether you do it individually or at once. Granted you don't have a pause as all processes are reevaluated but the amount of work the scheduler does is NOT independent of the # of processes.
I would love to know how the algorithm actually works. Is this just a new data structure for organizing the processes, or is there other stuff involved?
Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
On Multics, tasks that used up their CPU were put in a lower priority run queue and tasks that didn't use up their CPU time were put into a higher priority run queue.
So interactive tasks naturally floated to the top and compute bound tasks naturally sank.
(Anyone remember Multics?)
I want a nice-like utility that works for more than just CPU time. I want to be able to do hddnice -5 firefox and networknice -7 pr0ndownloader. Why can't the principles of nice work for various forms of I/O scheduling as well as CPU scheduling? Does anyone know of a Unix that has these sorts of things. I just hate it when my system is unresponsive because some process, which is niced into all get out, is swapping like crazy, and is allowed to do so because "it's not using CPU time". Bah!
This sort of scheduler is more suited for desktops, where you want everything to react responsively at the cost of lower throughput.
Would have modded up myself if I had the points to do it.
FTA:
>> [...] the core scheduler got smaller by more than 700 lines:
Someone who I respect tremendously once said
"Perfection is attained when you remove everything that is not essential"
I have to agree
I don't know the meaning of the word 'don't' - J
GP: Can't we just give the processes weapons and let them decide which follows?
P: That is actually the kind of question that my Operations Research professor (who also did a lot of work in CPU simulation and performance estimating) used to throw onto final exams as the "separate the B+ from the A" question. If your answer was interesting enough he would send you over to one of his Masters candidates to see if it could be taken any further. So I wouldn't count your suggestion out from the start!
Behold: The Mother of All Possible Comp Sci Flame Wars: The Darwinistically Selected Genetic Algorithms -vs- the Intelligently Designed Algorithms.
Bumper Stickers $4.95; T-Shirts $19.95:
It doesn't, it relates to the particular implementation used in Linux, which has a constant complexity. The new scheduler uses RB-trees, which would make it O(log(N)) for insertion.
- These characters were randomly selected.
No, I think I'll wait for the unbelievably fair scheduler, or perhaps the ridiculously fair scheduler.
The current O(1) schedule is not entirely fair to attain so named O(1) performance. That's how jackass.
Money is the root of all evil?
You are incorrect because the act of scheduling the next process to run requires constant time regardless of how many processes there are. The O(1) refers to this, correctly.
The reason this is important, and the reason they are worried about the act of scheduling the next process rather than time complexity over all N processes, is that if scheduling the next process were not constant time, the percentage of time spent scheduling the next process would grow larger as you added more processes. That's fine if you're on a desktop system and you go from 100 to 200, but as the number starts getting large on (say) big servers, you start running into a situation where all your CPUs are perpetually tied up trying to figure out what process to run next.
O(n) time over N processes is not a problem; you've either got the CPUs or you don't. If you don't, then your performance will suck for reasons that are your own fault. If you do have enough CPUs, then the time spent scheduling will remain in step with the time spent running the processes, and this is fine. However, if the time spent scheduling grew, every time a process was scheduled, across all CPUs, then your $50000 server would be worthless because it wouldn't be able to handle the workloads you intended for it. All those expensive CPUs would sit there figuring out what process to run rather than running it.
So, it is NOT a misnomer. It accurately describes the portion of the problem that the developers are concerned with. It's O(n) over n processes, and that's great because it means you can get to n without breaking down.
I rarely criticize things I don't care about.
Fairness has absolutely nothing to do with algorithmic complexity. O(1) means the scheduler runs in the same amount of time, every time, and says nothing about how fair the scheduling is. Jackass.
ResidntGeek
For this reason, I've been using Con Kolivas' patches to replace the scheduler. http://members.optusnet.com.au/ckolivas/kernel/ - very helpful especially if you don't have the fastest computer around. Also seems to help a bit with I/O - if my hard drive is trashing for whatever reason, interactive stuff still remains reasonably responsive. Or at least it doesn't make my mouse cursor skip...
Even so, I'd prefer to have IO better scheduled - ionice doesn't really seem to work at least for me.
There's a huge thread about broken disk IO in kernels > 2.6.17.
http://forums.gentoo.org/viewtopic-t-482731.html
How about fixing that first?
I've noticed all kinds of new schedulers coming out of the woodwork in 2.6. Optional kernel preemption. And also changes to the default timer (250 instead of 100, etc.). These are choices like I've never had in Linux before. Should be good, right?
But at a higher level there's been such brokenness.
K3B CD burning broke in 2.6.8 and wasn't fixed until 2.6.10.
Firewire performance (for me) started fluctuating after ~2.6.7. Although grabbing DV over firewire doesn't take much CPU time, for some reason having video preview enabled during capture may/may not cause dropped frames (on a modern amd64! pathetic!) depending on kernel version.
And the promise of 2.6 being responsive and feeling interactive under heavy load? Also ended after 2.6.7 for me.
Recent kernels, like 2.6.18, seem to take minutes to balance tasks out. Playing video during heavy CPU usage? For me it gives priority to everything else BUT the new video process for about 30 seconds. (CFM or anticipatory scheduler)
So I'm not sure if we should be welcoming all this new "choice" with scheduler-du-jour or try to give us a good default choice that actually works right first.
To be smarter the new Linux scheduler will use artificial neural networks and for speed-up it will be written in Lisp.
What's the vertical axis represent?
Good job sending all those /.ers over there, it will be the mother of all process fights on that poor server now. Sysadmins battling their way through hordes of zombies and monster processes, with ammo (ehm.. mem,cpu) running lower and lower until they're out, just as another wave of uglies comes out of nowhere...
http://linux.inet.hr/iostat_utility.html
Actually I find the Windows (XP, haven't tried Vista yet) scheduler to be quite horrible especially with regards to interactivity. One resource hog and things pretty much come to a standstill. IMO it's worse than O(1) and no comparison to Con's staircase.
Klingon multitasking systems do not support "time-sharing". When a Klingon program wants to run, it challenges the scheduler in hand-to-hand combat and owns the machine.
(from here)
You fail at reading comprehension. Let's rephrase: The current scheduler runs in O(1) time. To do this, it uses an algorithm that sometimes isn't optimal; that is, it isn't entirely fair.
The relation between fairness and algorithmic complexity, is that you may sacrifice fairness to get a lower complexity.
This is just rephrasing GP, I don't know shit enough about schedulers to comment on the actuall algorithms.
In a fair world, refrigerators would make electricity.
At the risk of being called a troll, I'm running a FreeBSD desktop, and the GUI is very responsive. Significantly more so than Windows on the same machine, and slightly more so than on Linux.
Don't blame me, I didn't vote for either of them!
O(1) just means that it doesn't lower in performance for the more processes that are running. In other words, scalable. This is good, and the current kernel does have modes for interactivity and server workloads, they're just not as good as this new scheduler. As I understand it, the scheduler in Linux has been better than the one in Windows for a good while now.
I'm impressed. I did some work on CPU dispatching on a mainframe system in the distant past, and we were never able to beat O(log n) on an ordered dispatch queue. The obvious implementation is O(n); getting to O(log n) requires a tree, and I want to see how they got to O(1).
This stuff really matters now. If we're going to have 80-CPU multiprocessors, all the main operating system algorithms have to be near O(1), or the scale-up is a lose.
Possible explanation: More people know about Linux now, so there's less need to Google to learn about it.
Alternative explanation: People have less problems now using Linux, so they google less for solutions on Linux problems.
Third explanation: Linux documentation got substantially better, so people have less need to use Google as a substitute.
Fourth explanation: The larger density of Linux installations comes with a larger density of Linux experts, so people are more likely to consult their local Linux guru than Google.
Pick your favorite choice or make up yet another explanation.
Yes, those explanations are all completely made up, but so was the explanation you had in mind.
The Tao of math: The numbers you can count are not the real numbers.
Well, a true Darwinian scheduler would have to measure user happyness. Linux kernels all around the net would share and try different scheduling policies. Then they measure the happyness of the user. The algorithms running on the machines with the most happy users get an evolutionary advantage.
The Tao of math: The numbers you can count are not the real numbers.
There is something like that to balance ethernet traffic: http://www.linux.com/article.pl?sid=07/04/06/15162 52
You can do the same with iptables and route2, see the advanced routing howto on tldp.org.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
said with his fine wit: 'A thing should be as simple as possible, but no simpler'.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Of course there's every possibility that your O(1) scheduling algorithm takes as long to run as the O(log(n)) algorithm, or even the O(n^2) algorithm for the typical number of processes running on your system... Ideally, you'd use the best scheduling algorithm for the job and the number of processes, and you'd change on the fly when the number of processes increased or decreased such that the current algorithm was no longer the most efficient.
We rarely see the charts that compare the O(1) scheduler with the O(whatever) schedulers that get proposed over a range of process counts though... Maybe that's because linux_kernel is an all text forum.
Honestly, if there weren't so many credit whores in the non-commercial linux kernel development world, you'd think there would be hundreds of schedulers in the kernel. There's really no reason not to have many, dynamically loaded schedulers. They're easy to write, easy to test, and every CS student writes one at some point.
This is communism!
This... is... SPARTA!
(sorry)
You can't buy a Big Linux hamburger.
O(log(N)) for insertion, but that only occurs when a new task is created. The scheduler's complexity would be calculated based on how long it takes to choose what process to allow to execute next.
Tux and I will drive over with large quantities of booze and herrings. You sneak round the back with the schedulers.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
There is OQO, but it is by no means the only such product. Poqet PC, an early PDA, had an 8088 processor. Bandai WonderSwan is a handheld video game system with an 8086 clone. Tablet PCs, including the compact XO laptop/tablet designed by OLPC, have newer CPUs using the x86 architecture. But even if handhelds do tend to prefer ARM CPUs over x86 CPUs, why aren't the same Windows Mobile apps available recompiled for traditional Windows OS?
You can choose the amount of bloat. Take Debian base, and add a lightweight Window Manger (like fluxbox, WindowMaker and many others) and just the applications you need.But can one choose the level of bloat on the Windows OS that came preinstalled on a machine? Or do you have to switch operating systems and run the risk of losing compatibility with paid-for hardware? I guess I know why Dell plans to reintroduce Windows XP Service Pack 2 as an option on newer machines, as it can be slimmed down further than Microsoft's latest offering.
I'm rather curious myself. It certainly doesn't look like a "stolen idea" in any straight forward manner. Judging from the article it looks more like a power-play.
/sys fs. Yes the process scheduler is a biger deal, but it would have no impact on most users, it would make things easier for others, and would make things harder for few, if any.
Mr. Molnar looks to be employing an approach of "establishing facts on the ground". That is, instead of asking people (the list) to comment on his IDEA for doing something he went ahead and did it and presented the patch set. This switches the discussion from "is this an interesting avenue for inquiry?" to "what specific thing can you point out that is wrong with this thing ?". I've seen this in the corporate world all the time and it's considered the mark of someone who "gets things done" and "moves the ball forward".
Often it actually IS a good thing and prevents teams of people from getting bogged down in endless discussion without ever producing anything of worth. On the flip side it annoys the heck out of people if it's used to short-circuit debate, one-up others etc.
In this case Mr. Molnar's argument that he's just putting something forth for discussion and this was the fastest way to do it rings kind of hollow to me. Clearly he had been opposing similar, and less desruptive, propositions for years and had been using his position of relative authority to keep them out of the kernel.
Further, the argument that Linux shouldn't have pluggable schedulers I also find curious. For example it already has pluggable io schedulers that you can choose with boot time parameters AND change on a live system through the
Mr. Trovalds chimes in with a "free market" argument and says that he likes competition and competitiveness.
In the end is Mr. Molnar pulling a power play or is Mr. Kolivas being "whiny" ? My suspicion is the former. As I mentioned above, i don't find compelling the distinctions he draws between his current work and the work he'd rejected previously. Then again i've been accused of too often siding with the underdog.
(note: i don't have anything to do with linux development though i've been a linux user for ~13 years and a software engineer for 10).
(On a grander note, this whole question of the balance between competition and cooperation I personally find very difficult to deal with. Both extremes are bad: no competition = no innovation. hyper-competition = some sort of dystopian Ferengi scenario. Guess i should go read Adam Smith... or something)
In Soviet Russia processes schedule YOU!
I will never live for sake of another man, nor ask another man to live for mine.
This reminds me of a movie called Tron.
You pesky young folk who think yer flash ram is better than good old reliable ferrite donuts have at with ye! We had disk drives that could pull yer filings out from two feet away on a max seek. Do ya get technology like that nowdays? Do ya? Nuuuuu.....
Mind ye what we had for an OS scheduler was a smiling polish gennelmun who wore too much after shave, and he was a bit much, but we knew which program ran after which we did! Off with ye now, come back when you have a *real* algorithm.
(Walks away muttering at his shoes)
Do not mock my vision of impractical footwear
Bah, it's so easy...
If at first you don't succeed, skydiving is not for you
Actually I find the Windows (XP, haven't tried Vista yet) scheduler to be quite horrible especially with regards to interactivity. One resource hog and things pretty much come to a standstill. IMO it's worse than O(1) and no comparison to Con's staircase.
Windows NT's scheduler is (and always has been) O(1).
(Can't say I've ever had a problem with its scheduling either - then again, I haven't used a single processor Windows machine in anger since about 1998, so I don't have a lot of experience with how it handles single-processor machines.)
In addition a lot of Ubuntu users do straight to the Ubuntu wiki or forums and search there first. The same probably happens with other distros that have good communities
maybe people just search by the name of the distro.
Badass Resumes
Why does this seem so much like the Fair Share Schedular in Solaris 10? I think I know where they are getting there clues from. http://www.princeton.edu/~unix/Solaris/troubleshoo t/schedule.html
Third explanation: Linux documentation got substantially better, so people have less need to use Google as a substitute.
Documentation? We don't need no stink'n documentation! Real men don't read documentation!
Have you ever met someone who actually read the documentation before asking questions about a software product? Rare, that is.
Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
It's not a similar implementation, but it's very much an elegant instance of a very similar idea.
--dave
davecb@spamcop.net
Great comeback, but in all seriousness, O(1) translates to constant time for a varying number of processes. O(n) increases depending on n in a linear fashion. So on, etc.