The Completely Fair Scheduler
hichetu writes "Kernel trap has a nice summary of what is going on behind the scenes to change the Linux Scheduler. The O(1) Linux scheduler is going to be changed so that it is fair to interactive tasks. You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks."
I thought Linux used Cron as a scheduler ?
If scheduling was completely fair, this would have been a frist ps0t.
Free software: because some processes are more equal than others.
You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks.
No I won't, because I don't know what the hell it means.
Hah! In your face, Taco!
Will the new scheduler be more efficient at scheduling single processes across a multi-core system?
On another not[e], when are we going to be able to build a default kernel capable of low latency I/O for A/V work?
Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation. Windows now has it, so there's no excuse. CPU power is fairly abundant these days so managing its usage is less of an issue than it used to be, but I/O bandwidth is often in short supply and I/O-bound applications can choke a system and make interactive processes a pain in the ass to use. I'd like to see some way of reserving and limiting bandwidth to particular devices for particular processes. And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy... as far as I know, the system calls don't even exist in the kernel to do this yet.
This new scheduler may have 'Ingo Molnar' written all over it but I'll be giving the credit to ck!
It's a very interesting phenomenon.
After enough number of iterations trying to optimize a software program to do everything very well compared to the base "naive" solution, you end up with an OS that does everything poorly.
It's counterintuitive, but we see it it every day around us.
On Multics, tasks that used up their CPU were put in a lower priority run queue and tasks that didn't use up their CPU time were put into a higher priority run queue.
So interactive tasks naturally floated to the top and compute bound tasks naturally sank.
(Anyone remember Multics?)
Actually the Linux scheduler is O(1). Processes are placed on O(1) queues, and the oldest processes on each queue are compared to each other. The number of queues is fixed, so Linux only has 5 nice levels. This is a scheduler because newer processes on a queue do not deserve the time more then the processes ahead of them.
Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
If you want a good description of the old way it works (which is what you're talking about), [PDF Warning] here it is. As this PDF describes, the 2.4.x kernel uses a O(n) fast algorithm for choosing the next task, going through the array of tasks to re-evaluate them. The the Linux 2.6.x scheduler recalculates timeslices as each task uses up its timeslice. In this fashion, the kernel should always know the most important thread, it is simply the next element. Granted, you are doing the same amount of work, but the pauses for re-evaluation are spread out so that you don't get long lapses where nothing important happens.
My UID is a prime number. Yeah, I planned that.
GP: Can't we just give the processes weapons and let them decide which follows?
P: That is actually the kind of question that my Operations Research professor (who also did a lot of work in CPU simulation and performance estimating) used to throw onto final exams as the "separate the B+ from the A" question. If your answer was interesting enough he would send you over to one of his Masters candidates to see if it could be taken any further. So I wouldn't count your suggestion out from the start!
Behold: The Mother of All Possible Comp Sci Flame Wars: The Darwinistically Selected Genetic Algorithms -vs- the Intelligently Designed Algorithms.
Bumper Stickers $4.95; T-Shirts $19.95:
No, I think I'll wait for the unbelievably fair scheduler, or perhaps the ridiculously fair scheduler.
The current O(1) schedule is not entirely fair to attain so named O(1) performance. That's how jackass.
Money is the root of all evil?
You are incorrect because the act of scheduling the next process to run requires constant time regardless of how many processes there are. The O(1) refers to this, correctly.
The reason this is important, and the reason they are worried about the act of scheduling the next process rather than time complexity over all N processes, is that if scheduling the next process were not constant time, the percentage of time spent scheduling the next process would grow larger as you added more processes. That's fine if you're on a desktop system and you go from 100 to 200, but as the number starts getting large on (say) big servers, you start running into a situation where all your CPUs are perpetually tied up trying to figure out what process to run next.
O(n) time over N processes is not a problem; you've either got the CPUs or you don't. If you don't, then your performance will suck for reasons that are your own fault. If you do have enough CPUs, then the time spent scheduling will remain in step with the time spent running the processes, and this is fine. However, if the time spent scheduling grew, every time a process was scheduled, across all CPUs, then your $50000 server would be worthless because it wouldn't be able to handle the workloads you intended for it. All those expensive CPUs would sit there figuring out what process to run rather than running it.
So, it is NOT a misnomer. It accurately describes the portion of the problem that the developers are concerned with. It's O(n) over n processes, and that's great because it means you can get to n without breaking down.
I rarely criticize things I don't care about.
No. You are wrong. It looks O(n) unless you notice that the processes back on the queue ran more recently then the ones at the front. Therefore they have lower priority then the ones at the front, so we only need to worry about the processes at the front of the queues. To evaluate these priorities we just store the last time they ran, and then subtract from the current time, then add that to the inherent priority.
Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
Here's how Linux 2.6.x task scheduler works:
- A "runqueue" keeps track of all essential tasks assigned to a given CPU [Queue is O(1) efficient]
- Each runqueue contains two priority arrays, the active and the expired. As a task completes, it is moved (during the move, the new timeslice is calculated - the point of debate and largest fundamental change to the task scheduler) [Moving from static array to static array is O(1) efficient]
- When the active priority array is finished, the expired array becomes the active array [Swapping 2 pointers is O(1) efficient]
- The priority arrays are of fixed length 140, as that is the amount of priority levels Linux has [O(140) = O(1)]
The point of debate comes if two threads have the same priority, in which case they are put on a round robin in the priority array. However, the confusion comes in that the execution is O(n) (which makes sense if you think about it), but the scheduler itself handles these at O(1) efficiency.My UID is a prime number. Yeah, I planned that.
For this reason, I've been using Con Kolivas' patches to replace the scheduler. http://members.optusnet.com.au/ckolivas/kernel/ - very helpful especially if you don't have the fastest computer around. Also seems to help a bit with I/O - if my hard drive is trashing for whatever reason, interactive stuff still remains reasonably responsive. Or at least it doesn't make my mouse cursor skip...
Even so, I'd prefer to have IO better scheduled - ionice doesn't really seem to work at least for me.
There's a huge thread about broken disk IO in kernels > 2.6.17.
http://forums.gentoo.org/viewtopic-t-482731.html
How about fixing that first?
I've noticed all kinds of new schedulers coming out of the woodwork in 2.6. Optional kernel preemption. And also changes to the default timer (250 instead of 100, etc.). These are choices like I've never had in Linux before. Should be good, right?
But at a higher level there's been such brokenness.
K3B CD burning broke in 2.6.8 and wasn't fixed until 2.6.10.
Firewire performance (for me) started fluctuating after ~2.6.7. Although grabbing DV over firewire doesn't take much CPU time, for some reason having video preview enabled during capture may/may not cause dropped frames (on a modern amd64! pathetic!) depending on kernel version.
And the promise of 2.6 being responsive and feeling interactive under heavy load? Also ended after 2.6.7 for me.
Recent kernels, like 2.6.18, seem to take minutes to balance tasks out. Playing video during heavy CPU usage? For me it gives priority to everything else BUT the new video process for about 30 seconds. (CFM or anticipatory scheduler)
So I'm not sure if we should be welcoming all this new "choice" with scheduler-du-jour or try to give us a good default choice that actually works right first.
It seems like the difference is that people call it an O(1) scheduler to reflect the fact that all the work required to be done at the end of a timeslice (record keeping, picking a new process etc) is done in constant time, and people arguing for O(n) are referring to the cost to schedule all processes. Nobody's saying they can find a schedule for n processes without looking at all of them.
I Browse at +4 Flamebait
Open Source Sysadmin
Good job sending all those /.ers over there, it will be the mother of all process fights on that poor server now. Sysadmins battling their way through hordes of zombies and monster processes, with ammo (ehm.. mem,cpu) running lower and lower until they're out, just as another wave of uglies comes out of nowhere...
Actually I find the Windows (XP, haven't tried Vista yet) scheduler to be quite horrible especially with regards to interactivity. One resource hog and things pretty much come to a standstill. IMO it's worse than O(1) and no comparison to Con's staircase.
Klingon multitasking systems do not support "time-sharing". When a Klingon program wants to run, it challenges the scheduler in hand-to-hand combat and owns the machine.
(from here)
You fail at reading comprehension. Let's rephrase: The current scheduler runs in O(1) time. To do this, it uses an algorithm that sometimes isn't optimal; that is, it isn't entirely fair.
The relation between fairness and algorithmic complexity, is that you may sacrifice fairness to get a lower complexity.
This is just rephrasing GP, I don't know shit enough about schedulers to comment on the actuall algorithms.
In a fair world, refrigerators would make electricity.
I'm impressed. I did some work on CPU dispatching on a mainframe system in the distant past, and we were never able to beat O(log n) on an ordered dispatch queue. The obvious implementation is O(n); getting to O(log n) requires a tree, and I want to see how they got to O(1).
This stuff really matters now. If we're going to have 80-CPU multiprocessors, all the main operating system algorithms have to be near O(1), or the scale-up is a lose.
Possible explanation: More people know about Linux now, so there's less need to Google to learn about it.
Alternative explanation: People have less problems now using Linux, so they google less for solutions on Linux problems.
Third explanation: Linux documentation got substantially better, so people have less need to use Google as a substitute.
Fourth explanation: The larger density of Linux installations comes with a larger density of Linux experts, so people are more likely to consult their local Linux guru than Google.
Pick your favorite choice or make up yet another explanation.
Yes, those explanations are all completely made up, but so was the explanation you had in mind.
The Tao of math: The numbers you can count are not the real numbers.
This reminds me of a movie called Tron.
You pesky young folk who think yer flash ram is better than good old reliable ferrite donuts have at with ye! We had disk drives that could pull yer filings out from two feet away on a max seek. Do ya get technology like that nowdays? Do ya? Nuuuuu.....
Mind ye what we had for an OS scheduler was a smiling polish gennelmun who wore too much after shave, and he was a bit much, but we knew which program ran after which we did! Off with ye now, come back when you have a *real* algorithm.
(Walks away muttering at his shoes)
Do not mock my vision of impractical footwear
Bah, it's so easy...
If at first you don't succeed, skydiving is not for you