The Completely Fair Scheduler
hichetu writes "Kernel trap has a nice summary of what is going on behind the scenes to change the Linux Scheduler. The O(1) Linux scheduler is going to be changed so that it is fair to interactive tasks. You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks."
That is for scheduling background tasks that run once a day (or whatever you set it to)
This is for scheduling CPU resouces in real time. To decide if Firefox or Apache is going to be executed the following split second.
In computer science, Big O notation is used for the complexity of a task.
Doing a constant time reevaluation for n processes is still O(n) whether you do it individually or at once. Granted you don't have a pause as all processes are reevaluated but the amount of work the scheduler does is NOT independent of the # of processes.
> What it could really do with is I/O prioritisation.
ionice. Available since 2005-08-28.
> And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy
I agree, that would be nice.
A scheduler is the piece of software that brings you the illusion of multi-tasking. Because a single processor (with a single core) can only run one process at the same time, the operating system switches the process currently running. And it does this very fast (IIRC up to 1000 times a second in the case of linux).
The scheduler decides which process runs when and has to make sure that no process has to wait in the queue forever without getting his share of CPU time (this is what is called "starving").
Since the scheduler is a program by itself, it has a specific runtime characteristic, usually dependent of the number of programs waiting for their CPU share. The special property of the current scheduler in linux is that its runtime is in fact independent of this number. That's expressed in CS by O(1).
Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation. Windows now has it, so there's no excuse.
I dunno. Linux has has some changes recently in the scheduling department, and an O(1) process scheduler can only be "a good thing". Recently, the I/O block layer got a new scheduler (linky http://kerneltrap.org/node/7637). Regarding other I/O prioritization, I can't say with authority that this is needed or not.
Maybe all of these things are related, but in my selfish world, I want Linux to scale better. Scaling well beyond 64 CPUs or cores, but I believe this is a much greater task than ~60 hours of coding. It took a while before Linux got SMP clean, and today SMP is normal, not something that reserved for servers. The monolithic high clockrate CPU is dead, and this has been overtaken by multi-core processors, and more than one of them inside of one box. In my eye, this is where all OSes should be focusing their attention.
Linux has had I/O prioritization for a few years now. There are a couple of I/O schedulers adapted to different workloads (deadline, anticipatory, fair, no op) that can be set globally or per block device. The default scheduler since 2.6.18 is cfq (the "fair" one) and it supports manual prioritization too (using the "ionice" command).
In fact, I don't think Linux has ever suffered from the same I/O problems that people often complain about in Windows.
Now, page faults are indeed a form of I/O, but a page fault is technically seen just the fact that some memory required isn't in physical memory. I don't think the parent poster was talking about that. One of the most common reasons for page faults are simply that a block of memory has been swapped to disk, and then suddenly it is required, and as such the block of memory needs to be read into the physical memory.
I'd say: add some memory to that box of yours.
You can read up on it here
Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)
atop seems to have support for disk monitoring, but it requires a kernel patch:
http://www.atconsultancy.nl/atop/kernpatch.html
Yes, that's called psDoom
http://psdoom.sourceforge.net/
I don't know the meaning of the word 'don't' - J
You are incorrect because the act of scheduling the next process to run requires constant time regardless of how many processes there are. The O(1) refers to this, correctly.
The reason this is important, and the reason they are worried about the act of scheduling the next process rather than time complexity over all N processes, is that if scheduling the next process were not constant time, the percentage of time spent scheduling the next process would grow larger as you added more processes. That's fine if you're on a desktop system and you go from 100 to 200, but as the number starts getting large on (say) big servers, you start running into a situation where all your CPUs are perpetually tied up trying to figure out what process to run next.
O(n) time over N processes is not a problem; you've either got the CPUs or you don't. If you don't, then your performance will suck for reasons that are your own fault. If you do have enough CPUs, then the time spent scheduling will remain in step with the time spent running the processes, and this is fine. However, if the time spent scheduling grew, every time a process was scheduled, across all CPUs, then your $50000 server would be worthless because it wouldn't be able to handle the workloads you intended for it. All those expensive CPUs would sit there figuring out what process to run rather than running it.
So, it is NOT a misnomer. It accurately describes the portion of the problem that the developers are concerned with. It's O(n) over n processes, and that's great because it means you can get to n without breaking down.
I rarely criticize things I don't care about.
Linux has has some changes recently in the scheduling department, and an O(1) process scheduler can only be "a good thing".
Which, presumably, is why they're thinking of taking out its current O(1) process scheduler and replacing it with an O(log(n)) one?
iostat is probably an easier way to achieve similar output.
It still doesn't break it down by task though, so not really the iotop people are looking for.
-- Mike
There's a huge thread about broken disk IO in kernels > 2.6.17.
http://forums.gentoo.org/viewtopic-t-482731.html
How about fixing that first?
I've noticed all kinds of new schedulers coming out of the woodwork in 2.6. Optional kernel preemption. And also changes to the default timer (250 instead of 100, etc.). These are choices like I've never had in Linux before. Should be good, right?
But at a higher level there's been such brokenness.
K3B CD burning broke in 2.6.8 and wasn't fixed until 2.6.10.
Firewire performance (for me) started fluctuating after ~2.6.7. Although grabbing DV over firewire doesn't take much CPU time, for some reason having video preview enabled during capture may/may not cause dropped frames (on a modern amd64! pathetic!) depending on kernel version.
And the promise of 2.6 being responsive and feeling interactive under heavy load? Also ended after 2.6.7 for me.
Recent kernels, like 2.6.18, seem to take minutes to balance tasks out. Playing video during heavy CPU usage? For me it gives priority to everything else BUT the new video process for about 30 seconds. (CFM or anticipatory scheduler)
So I'm not sure if we should be welcoming all this new "choice" with scheduler-du-jour or try to give us a good default choice that actually works right first.
Well, yes & no. It really was Con's RSDL that got ppl looking seriously into changing the mainline scheduler (there already being several out-of-mainline alternatives like nicksched, etc.)
Con's scheduler seemed to work better at higher workloads than the mainline, by just trying to distribute load evenly and not trying pretty interactivity tricks. But several ppl did say it didn't perform well for certain X client workloads. That's when Ingo's CFS was posted.
There really is 2 alternatives Ingo's CFS & Con's SDL that's being simultaneously tested by the kernel developers now, and none is accepted into mainline.
So it wouldn't be fair to say that CFS is *the* next Linux scheduler. It could be SDL as well.
You fail at reading comprehension. Let's rephrase: The current scheduler runs in O(1) time. To do this, it uses an algorithm that sometimes isn't optimal; that is, it isn't entirely fair.
The relation between fairness and algorithmic complexity, is that you may sacrifice fairness to get a lower complexity.
This is just rephrasing GP, I don't know shit enough about schedulers to comment on the actuall algorithms.
In a fair world, refrigerators would make electricity.
A one line script does the trick, name it kplayer and put it in ~/bin (and put ~/bin at the top of your $PATH):
/usr/bin/kplayer
#!/bin/sh
nice -n -19 ionice -c 1
--
WHO ATE MY BREAKFAST PANTS?
I think the intended meaning is "Unsurprisingly, the O(1) behavior came at the expense of optimum scheduling."
What I'm listening to now on Pandora...
don't forget arguments :)
/usr/bin/kplayer $@
#!/bin/sh
nice -n -19 ionice -c 1
In fact, this functionality is pretty darn useless. (I should know, I got it into 2.6. :-) ) For read I/O it's okay, but for write I/O it sucks. The thing is, what's logged is the actual I/Os, which are generally done by pdflush (if the modifying process doesn't use fsync, which it usually doesn't), and kjournald if you're using ext3. The only thing that's logged otherwise is that a process dirties an inode, but that doesn't tell you how many pages it's modified. You get something like
process syslogd dirtied inode daemon.log
process Y dirtied inode some_other_file1
process Z dirtied inode some_other_file2
followed by 300 writes by pdflush, which only specify a device and a block number, not a file name. There's no way you can find out for each of the 300 writes whether it was caused by the "syslogd", X or Z process. So there's no way you can count the amount of write I/O that a process has done.
They cheated.
Internally, the kernel only has five priority levels. Each one is a queue, and it compares among all of them to determine which task to run, but it only compares the head of the queue. So it's O(5), which is of course O(1), but if it supported an arbitrary number of priority levels (which IMO it should) then it would become O(n) again.
They cheated. Instead of having a real priority queue Linux places each process into one of a bunch of smaller queues each one with a different priority. Then scheduling requires looking a the processes that are at the head of the fixed number of small queues.
Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
nice -n -19 ionice -c 1 /usr/bin/kplayer "$@"
Quoting the $@ is necessary to ensure that file names with spaces are passed as single words.
It's a philosophical question, with no correct answer, but I don't think grades should evaluate the understanding of a course. Grades should evaluate the ability to *use* the material of a course. This is beyond understanding. A student who completely understands a course should get a B. An A-level student is one who grasps the course material so well that he builds on it to produce other conclusions.
If you lack skills needed to compound your understanding of the material, tough luck. A B is not a poor grade...
If at first you don't succeed, skydiving is not for you