Slashdot Mirror


The Completely Fair Scheduler

hichetu writes "Kernel trap has a nice summary of what is going on behind the scenes to change the Linux Scheduler. The O(1) Linux scheduler is going to be changed so that it is fair to interactive tasks. You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks."

30 of 292 comments (clear)

  1. Re:Isnt this called Cron ? by ozamosi · · Score: 5, Informative

    That is for scheduling background tasks that run once a day (or whatever you set it to)

    This is for scheduling CPU resouces in real time. To decide if Firefox or Apache is going to be executed the following split second.

  2. Re:Surprised? by Eevee · · Score: 5, Informative

    In computer science, Big O notation is used for the complexity of a task.

  3. O(1) - what a huge misnomer by Anonymous Coward · · Score: 1, Informative

    Doing a constant time reevaluation for n processes is still O(n) whether you do it individually or at once. Granted you don't have a pause as all processes are reevaluated but the amount of work the scheduler does is NOT independent of the # of processes.

    1. Re:O(1) - what a huge misnomer by Watson+Ladd · · Score: 3, Informative

      Actually the Linux scheduler is O(1). Processes are placed on O(1) queues, and the oldest processes on each queue are compared to each other. The number of queues is fixed, so Linux only has 5 nice levels. This is a scheduler because newer processes on a queue do not deserve the time more then the processes ahead of them.

      --
      Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
    2. Re:O(1) - what a huge misnomer by Anonymous Coward · · Score: 1, Informative

      I think what it really means is that the time to select a job is O(1) time whenever the scheduler is called. And this doesn't seem to be unintuitive for me. Plus that's totally reasonable: in any time-sharing system where you have a time-slice, you have an unbounded amount of times that the scheduler is called. All you can possibly hope for is that each time the scheduler is called, it gives you a job in constant amount of time. (You cannot expect that to be sub-constant, really.)

    3. Re:O(1) - what a huge misnomer by ASBands · · Score: 4, Informative

      If you want a good description of the old way it works (which is what you're talking about), [PDF Warning] here it is. As this PDF describes, the 2.4.x kernel uses a O(n) fast algorithm for choosing the next task, going through the array of tasks to re-evaluate them. The the Linux 2.6.x scheduler recalculates timeslices as each task uses up its timeslice. In this fashion, the kernel should always know the most important thread, it is simply the next element. Granted, you are doing the same amount of work, but the pauses for re-evaluation are spread out so that you don't get long lapses where nothing important happens.

      --
      My UID is a prime number. Yeah, I planned that.
    4. Re:O(1) - what a huge misnomer by Watson+Ladd · · Score: 3, Informative

      No. You are wrong. It looks O(n) unless you notice that the processes back on the queue ran more recently then the ones at the front. Therefore they have lower priority then the ones at the front, so we only need to worry about the processes at the front of the queues. To evaluate these priorities we just store the last time they ran, and then subtract from the current time, then add that to the inherent priority.

      --
      Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
    5. Re:O(1) - what a huge misnomer by ASBands · · Score: 5, Informative

      Here's how Linux 2.6.x task scheduler works:

      • A "runqueue" keeps track of all essential tasks assigned to a given CPU [Queue is O(1) efficient]
      • Each runqueue contains two priority arrays, the active and the expired. As a task completes, it is moved (during the move, the new timeslice is calculated - the point of debate and largest fundamental change to the task scheduler) [Moving from static array to static array is O(1) efficient]
      • When the active priority array is finished, the expired array becomes the active array [Swapping 2 pointers is O(1) efficient]
      • The priority arrays are of fixed length 140, as that is the amount of priority levels Linux has [O(140) = O(1)]
      The point of debate comes if two threads have the same priority, in which case they are put on a round robin in the priority array. However, the confusion comes in that the execution is O(n) (which makes sense if you think about it), but the scheduler itself handles these at O(1) efficiency.
      --
      My UID is a prime number. Yeah, I planned that.
  4. Re:I/O prioritisation by Anonymous Coward · · Score: 5, Informative

    > What it could really do with is I/O prioritisation.
    ionice. Available since 2005-08-28.

    > And an equivalent of "top" for monitoring processes' I/O activity would also be extremely handy

    I agree, that would be nice.

  5. Re:Surprised? by jrschulz · · Score: 5, Informative

    A scheduler is the piece of software that brings you the illusion of multi-tasking. Because a single processor (with a single core) can only run one process at the same time, the operating system switches the process currently running. And it does this very fast (IIRC up to 1000 times a second in the case of linux).

    The scheduler decides which process runs when and has to make sure that no process has to wait in the queue forever without getting his share of CPU time (this is what is called "starving").

    Since the scheduler is a program by itself, it has a specific runtime characteristic, usually dependent of the number of programs waiting for their CPU share. The special property of the current scheduler in linux is that its runtime is in fact independent of this number. That's expressed in CS by O(1).

  6. Re:I/O prioritisation by hackstraw · · Score: 2, Informative

    Linux really doesn't need a new process scheduler. What it could really do with is I/O prioritisation. Windows now has it, so there's no excuse.

    I dunno. Linux has has some changes recently in the scheduling department, and an O(1) process scheduler can only be "a good thing". Recently, the I/O block layer got a new scheduler (linky http://kerneltrap.org/node/7637). Regarding other I/O prioritization, I can't say with authority that this is needed or not.

    Maybe all of these things are related, but in my selfish world, I want Linux to scale better. Scaling well beyond 64 CPUs or cores, but I believe this is a much greater task than ~60 hours of coding. It took a while before Linux got SMP clean, and today SMP is normal, not something that reserved for servers. The monolithic high clockrate CPU is dead, and this has been overtaken by multi-core processors, and more than one of them inside of one box. In my eye, this is where all OSes should be focusing their attention.

  7. Re:I/O prioritisation by Anonymous Coward · · Score: 2, Informative

    Linux has had I/O prioritization for a few years now. There are a couple of I/O schedulers adapted to different workloads (deadline, anticipatory, fair, no op) that can be set globally or per block device. The default scheduler since 2.6.18 is cfq (the "fair" one) and it supports manual prioritization too (using the "ionice" command).

    In fact, I don't think Linux has ever suffered from the same I/O problems that people often complain about in Windows.

  8. Re:I/O prioritisation by jawtheshark · · Score: 4, Informative

    Now, page faults are indeed a form of I/O, but a page fault is technically seen just the fact that some memory required isn't in physical memory. I don't think the parent poster was talking about that. One of the most common reasons for page faults are simply that a block of memory has been swapped to disk, and then suddenly it is required, and as such the block of memory needs to be read into the physical memory.

    I'd say: add some memory to that box of yours.

    You can read up on it here

    --
    Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)
  9. Re:I/O prioritisation by DaleGlass · · Score: 3, Informative

    atop seems to have support for disk monitoring, but it requires a kernel patch:

    http://www.atconsultancy.nl/atop/kernpatch.html

  10. Re:Isnt this called Cron ? by Progman3K · · Score: 4, Informative

    Yes, that's called psDoom
    http://psdoom.sourceforge.net/

    --
    I don't know the meaning of the word 'don't' - J
  11. Re:O(1) - what a huge misnomer - INCORRECT by ArbitraryConstant · · Score: 4, Informative

    You are incorrect because the act of scheduling the next process to run requires constant time regardless of how many processes there are. The O(1) refers to this, correctly.

    The reason this is important, and the reason they are worried about the act of scheduling the next process rather than time complexity over all N processes, is that if scheduling the next process were not constant time, the percentage of time spent scheduling the next process would grow larger as you added more processes. That's fine if you're on a desktop system and you go from 100 to 200, but as the number starts getting large on (say) big servers, you start running into a situation where all your CPUs are perpetually tied up trying to figure out what process to run next.

    O(n) time over N processes is not a problem; you've either got the CPUs or you don't. If you don't, then your performance will suck for reasons that are your own fault. If you do have enough CPUs, then the time spent scheduling will remain in step with the time spent running the processes, and this is fine. However, if the time spent scheduling grew, every time a process was scheduled, across all CPUs, then your $50000 server would be worthless because it wouldn't be able to handle the workloads you intended for it. All those expensive CPUs would sit there figuring out what process to run rather than running it.

    So, it is NOT a misnomer. It accurately describes the portion of the problem that the developers are concerned with. It's O(n) over n processes, and that's great because it means you can get to n without breaking down.

    --
    I rarely criticize things I don't care about.
  12. Re:I/O prioritisation by makomk · · Score: 2, Informative

    Linux has has some changes recently in the scheduling department, and an O(1) process scheduler can only be "a good thing".

    Which, presumably, is why they're thinking of taking out its current O(1) process scheduler and replacing it with an O(log(n)) one?

  13. Re:I/O prioritisation by Mike+McTernan · · Score: 2, Informative

    iostat is probably an easier way to achieve similar output.

    It still doesn't break it down by task though, so not really the iotop people are looking for.

    --
    -- Mike
  14. Wait wait wait! What about fixing disk IO first? by Anonymous Coward · · Score: 2, Informative

    There's a huge thread about broken disk IO in kernels > 2.6.17.
    http://forums.gentoo.org/viewtopic-t-482731.html

    How about fixing that first?

    I've noticed all kinds of new schedulers coming out of the woodwork in 2.6. Optional kernel preemption. And also changes to the default timer (250 instead of 100, etc.). These are choices like I've never had in Linux before. Should be good, right?

    But at a higher level there's been such brokenness.
    K3B CD burning broke in 2.6.8 and wasn't fixed until 2.6.10.
    Firewire performance (for me) started fluctuating after ~2.6.7. Although grabbing DV over firewire doesn't take much CPU time, for some reason having video preview enabled during capture may/may not cause dropped frames (on a modern amd64! pathetic!) depending on kernel version.
    And the promise of 2.6 being responsive and feeling interactive under heavy load? Also ended after 2.6.7 for me.
    Recent kernels, like 2.6.18, seem to take minutes to balance tasks out. Playing video during heavy CPU usage? For me it gives priority to everything else BUT the new video process for about 30 seconds. (CFM or anticipatory scheduler)

    So I'm not sure if we should be welcoming all this new "choice" with scheduler-du-jour or try to give us a good default choice that actually works right first.

  15. Re:credit goes to Con Kolivas by jb.cancer · · Score: 5, Informative

    Well, yes & no. It really was Con's RSDL that got ppl looking seriously into changing the mainline scheduler (there already being several out-of-mainline alternatives like nicksched, etc.)

    Con's scheduler seemed to work better at higher workloads than the mainline, by just trying to distribute load evenly and not trying pretty interactivity tricks. But several ppl did say it didn't perform well for certain X client workloads. That's when Ingo's CFS was posted.

    There really is 2 alternatives Ingo's CFS & Con's SDL that's being simultaneously tested by the kernel developers now, and none is accepted into mainline.

    So it wouldn't be fair to say that CFS is *the* next Linux scheduler. It could be SDL as well.

  16. O(1) too good. Surprise? by ozone_sniffer · · Score: 2, Informative

    You will be surprised to know that O(1) is really too good not to have any side-effects on fairness to all tasks. No I won't. AFAIK, O(1) is generally too good for any algorithm that does something useful, given the size of your input varies (n processes). With an input of size n, O(n^k) is generally (i.e. not considering a specific algorithm) acceptable, O(n) is generally good, O(log n) is generally very good and O(1) is generally suppa-duppa doubleplusgood.
  17. Re:And that relates to "fairness" how? by GrievousMistake · · Score: 3, Informative

    You fail at reading comprehension. Let's rephrase: The current scheduler runs in O(1) time. To do this, it uses an algorithm that sometimes isn't optimal; that is, it isn't entirely fair.
    The relation between fairness and algorithmic complexity, is that you may sacrifice fairness to get a lower complexity.
    This is just rephrasing GP, I don't know shit enough about schedulers to comment on the actuall algorithms.

    --
    In a fair world, refrigerators would make electricity.
  18. Re:I/O prioritisation by Breakfast+Pants · · Score: 4, Informative

    A one line script does the trick, name it kplayer and put it in ~/bin (and put ~/bin at the top of your $PATH):

    #!/bin/sh
    nice -n -19 ionice -c 1 /usr/bin/kplayer


    --

    --

    WHO ATE MY BREAKFAST PANTS?
  19. Re:Surprised? by Otter · · Score: 4, Informative

    I think the intended meaning is "Unsurprisingly, the O(1) behavior came at the expense of optimum scheduling."

  20. Re:I/O prioritisation by brezel · · Score: 2, Informative

    don't forget arguments :)

    #!/bin/sh
    nice -n -19 ionice -c 1 /usr/bin/kplayer $@

  21. Re:I/O prioritisation by Handyman · · Score: 4, Informative

    In fact, this functionality is pretty darn useless. (I should know, I got it into 2.6. :-) ) For read I/O it's okay, but for write I/O it sucks. The thing is, what's logged is the actual I/Os, which are generally done by pdflush (if the modifying process doesn't use fsync, which it usually doesn't), and kjournald if you're using ext3. The only thing that's logged otherwise is that a process dirties an inode, but that doesn't tell you how many pages it's modified. You get something like

    process syslogd dirtied inode daemon.log
    process Y dirtied inode some_other_file1
    process Z dirtied inode some_other_file2

    followed by 300 writes by pdflush, which only specify a device and a block number, not a file name. There's no way you can find out for each of the 300 writes whether it was caused by the "syslogd", X or Z process. So there's no way you can count the amount of write I/O that a process has done.

  22. Re:This is very impressive by Anonymous Coward · · Score: 3, Informative

    They cheated.

    Internally, the kernel only has five priority levels. Each one is a queue, and it compares among all of them to determine which task to run, but it only compares the head of the queue. So it's O(5), which is of course O(1), but if it supported an arbitrary number of priority levels (which IMO it should) then it would become O(n) again.

  23. Re:This is very impressive by Watson+Ladd · · Score: 2, Informative

    They cheated. Instead of having a real priority queue Linux places each process into one of a bunch of smaller queues each one with a different priority. Then scheduling requires looking a the processes that are at the head of the fixed number of small queues.

    --
    Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
  24. Re:I/O prioritisation by QuoteMstr · · Score: 2, Informative

    nice -n -19 ionice -c 1 /usr/bin/kplayer "$@"

    Quoting the $@ is necessary to ensure that file names with spaces are passed as single words.

  25. Re:Isnt this called Cron ? by Khazunga · · Score: 2, Informative

    Unfortunately, you could learn the material without being creative, quick thinking, or a good writer. If you are this unfortunate sole you just got a lower grade. Grades are supposed to demonstrate understanding of the course material, not how bright you are.

    It's a philosophical question, with no correct answer, but I don't think grades should evaluate the understanding of a course. Grades should evaluate the ability to *use* the material of a course. This is beyond understanding. A student who completely understands a course should get a B. An A-level student is one who grasps the course material so well that he builds on it to produce other conclusions.

    If you lack skills needed to compound your understanding of the material, tough luck. A B is not a poor grade...

    --
    If at first you don't succeed, skydiving is not for you