Slashdot Mirror


Con Kolivas Returns, With a Desktop-Oriented Linux Scheduler

myvirtualid writes "Con Kolivas has done what he swore never to do: returned to the Linux kernel and written a new — and, according to him — waaay better scheduler for the desktop environment. In fact, BFS appears to outperform existing schedulers right up until one hits a 16-CPU machine, at which point he guesses performance would degrade somewhat. According to Kolivas, BFS 'was designed to be forward looking only, make the most of lower spec machines, and not scale to massive hardware. i.e. [sic] it is a desktop orientated scheduler, with extremely low latencies for excellent interactivity by design rather than 'calculated,' with rigid fairness, nice priority distribution and extreme scalability within normal load levels.'"

68 of 333 comments (clear)

  1. BFS is the Brain Fuck Scheduler. by Anonymous Coward · · Score: 5, Interesting

    Why would the summary omit this precious bit of information?

    1. Re:BFS is the Brain Fuck Scheduler. by Fizzl · · Score: 4, Informative

      from the dare-not-speak-its-name dept.

    2. Re:BFS is the Brain Fuck Scheduler. by Swizec · · Score: 4, Funny

      Great, now when someone mentions BFS I won't be able to just assume Breadth First Search.

      Another one for the Geeks-are-great-at-naming-things wall.

    3. Re:BFS is the Brain Fuck Scheduler. by Jurily · · Score: 2, Interesting

      Another one for the Geeks-are-great-at-naming-things wall.

      All the TLAs are taken anyway. As always, you'll have to look at the context.

  2. great news by amn108 · · Score: 5, Interesting

    Great news :-) Now, will the kernel people with Mr. Torvalds at their head, restart the whole debate on pluggable schedulers. Since his scheduler, as he says, degrades beyond 16 CPUs, better options already exists for servers where I am guessing CFS is used. So, he may be back, but the road ahead is still as steep?

    1. Re:great news by s4m7 · · Score: 5, Insightful

      I think that's only going to be a good thing, because IMO the arguments against pluggable schedulers are weak. "we need the few people working on this to just make the core better for ALL CASES" is about the most valid i've heard, but linux is too broadly applied to force it to meet all cases. realtime, embedded, servers, desktop: i just don't think one scheduler can be shoehorned to maximize performance for all those. You wind up with a crippled scheduler that really only achieves maximum performance in at most one of those four domains. And the question of there being enough developer minds working on it? you can bet that more commercial enterprise will start throwing money at it when they can customize it for their domain.

      It's like the dynamic syscall argument in a way. without dynamic syscalls, the argument goes, all the 'fringe functionality' people have to think harder and have to integrate their stuff into the current syscalls/drivers/subsystems. (apologies ingo) however, without dynamic syscalls, all the "middle of the road" functionality people like hardware manufacturers, are unwilling to release drivers that they essentially have to ask customers to compile as a supported option.

      Both, IMO are cases of cutting off your leg to spite your foot.

      --
      This comment is fully compliant with RFC 527.
    2. Re:great news by MrNaz · · Score: 5, Insightful

      I think anyone who cares and knows anything about this debate is hoping Linus sees the light and allows work to begin on pluggable schedulers. There are no definitive arguments against having pluggable schedulers, and plenty of formidable ones for them. I never really understood Linus' handling of Con in the past, I really hope that, this time round, the new BFS is given a fair assessment, and if it's found to be better under desktop use patterns, adopted for use in desktop distros.

      The idea that the Nokia N900 smartphone uses the same process scheduler as my now-dated laptop as well as my 8 core server is just silly.

      --
      I hate printers.
    3. Re:great news by smoker2 · · Score: 2, Interesting

      The hardware is not the point. The software is. I run a linux machine which I use both as a media and web server, and as my main desktop for web browsing, email, WP etc. A hard coded setup would not be useful there.

      While I'm here, why does the summary [sic] i.e. It is a contraction of 2 words and perfectly acceptable. And in case they were worried about repetition with the following words " it is ", i.e. means "that is" as in "that is to say" used with a pause in normal speech. You have to read the preceding sentence, not take terms in isolation.

    4. Re:great news by myvirtualid · · Score: 2, Informative

      ...why does the summary [sic] i.e

      Because the 'i' should have been capitalized since it was the beginning of a new sentence. Had Kolivas written "hardware, i.e." there would be no sic.

      --
      I'm here EdgeKeep Inc.
    5. Re:great news by TheRaven64 · · Score: 4, Interesting

      Why does Linux not have pluggable schedulers already? You can choose the scheduler in FreeBSD by changing a compile-time option and in OpenSolaris and Xen by changing a boot-time parameter. I think HURD can swap them out at run time, but I only know one person who actually runs HURD, and he also runs other systems for real work. If your system already has clean interfaces for the scheduler, then making them pluggable at compile time is trivial and making them pluggable at boot time is only a small amount of effort (although a bit more to make sure this has no performance side-effects). If it doesn't already have clean interfaces to the scheduler, then it probably has more serious problems than the lack of plug-support.

      --
      I am TheRaven on Soylent News
    6. Re:great news by gbjbaanb · · Score: 4, Interesting

      OOh. I've just seen the 'thought for the day' at the bottom of the page:

      "One size fits all": Doesn't fit anyone.

      Even the gods of slashdot are getting in on the debate.

    7. Re:great news by ta+bu+shi+da+yu · · Score: 2, Insightful

      The sic is in the wrong spot.

      It reads "it is a desktop orientated scheduler". Note the topic subject is "desktop oriented scheduler".

      It should read "it is a desktop orientated (sic.) scheduler.

      --
      XML is like violence. If it doesn't solve the problem, use more.
    8. Re:great news by macshit · · Score: 4, Interesting

      I never really understood Linus' handling of Con in the past

      Linux kernel development is all about "playing well with others": a very important part of the process is being able to handle criticism constructively and fix the problems it addresses, or show that it is wrong; that's the way progress is made. You need to do this again and again and again. Most criticism is very technical and can be quite insightful, but can also be strong and relentless -- people will point out every single little flaw, and possible flaws, and unclear points, and whitespace inconsistencies, and... To be a successful linux developer you need to be able to deal with this constructively, and the more important and core the area you're dealing with, the more important this becomes.

      The impression I've gotten from reading various past "Con threads", is that while he tries in the beginning, Con doesn't deal well with this process; he can't keep his ego submerged, gets frustrated, and everything (perhaps including Con himself last time I read one of these threads) ends up unravelling. The same thing has derailed other big projects too (i.e., reiser4, when Reiser himself was still involved).

      It's a shame when this happens, but basically the process is more important that specific pieces of technology -- technology can be replaced, but the process is what makes linux as good as it is.

      --
      We live, as we dream -- alone....
    9. Re:great news by Anonymous Coward · · Score: 3, Funny

      The sic is forward looking...

      - Peder

    10. Re:great news by rtfa-troll · · Score: 2, Insightful

      Linus Torvalds has, for once, made pretty clear arguments against it. Various philosophical ones etc. but also several solid technical ones

      1. it's better to have one tested scheduler
      2. since the scheduler can be parametrised there's nothing to stop it scaling
      3. "nobody has come close" to providing a pluggable implementation which efficient enough

      See this email and this one.

      The grandparent's statement that "here are no definitive arguments against having pluggable schedulers" glosses over the fact that Linus' arguments have to be proven wrong. I can believe that in this, like most things, Linus is wrong; however it's experimental science not philosophy. Someone has to write the code.

      The scheduler is probably the piece of kernel code which actually does something which gets called most (many times a second even if only user activity is ongoing). A level of inefficiency which would be okay in an IO scheduler which will normally have to wait for a slow disk access just can't be accepted in a process scheduler and even a single level of indirection might really be a killer. Possibly it would be better to have separate kernel builds for small and large installs than having pluggability in which case even CK's new scheduler may not prove the need for pluggable schedulers. Alternatively, maybe pluggability would have to be done with self modifying code which left no indirection in place?

      --
      =~ s,(.*),<sarcasm>$1</sarcasm>,g if any_point_you_wish();
    11. Re:great news by pthisis · · Score: 4, Informative

      The impression I've gotten from reading various past "Con threads", is that while he tries in the beginning, Con doesn't deal well with this process; he can't keep his ego submerged, gets frustrated, and everything (perhaps including Con himself last time I read one of these threads) ends up unravelling.

      Agreed; Con seems not to be able to work well in the process.

      e.g. Ingo ran a bunch of benchmarks on BFS and made a long post to LKML explaining his results, that, while critical of its performance on a series of benchmarks, bent over backwards to be very polite in tone, with things like:

      First and foremost, let me say that i'm happy that you are hacking the Linux scheduler again. It's perhaps proof that hacking the scheduler is one of the most addictive things on the planet ;-) ...

      General interactivity of BFS seemed good to me - except for the pipe test when there was significant lag over a minute. I think it's some starvation bug, not an inherent design property of BFS, so i'm looking forward to re-test it with the fix. ...
      I hope to be able to work with you on this, please dont hesitate sending patches if you wish - and we'll also be following BFS for good ideas and code to adopt to mainline.

      And Con responded with a very defensive and confrontational tone:
      I'm not interested in a long protracted discussion about this since I'm too busy to live linux the way full time developers do, so I'll keep it short, and perhaps you'll understand my intent better if the FAQ wasn't clear enough.

      Do you know what a normal desktop PC looks like? No, a more realistic question based on what you chose to benchmark to prove your point would be: Do you know what normal people actually do on them?

      Feel free to treat the question as rhetorical.

      Full exchange here:
      http://thread.gmane.org/gmane.linux.kernel/886319

      --
      rage, rage against the dying of the light
    12. Re:great news by smash · · Score: 2, Informative
      Well, he has a point.

      For desktop use, I doubt many users *care* whether or not they drop some percentage of throughput on interactive apps, if it means that processes actually run "properly" (eg, video playback, gaming, audio processing, etc).

      ingo benchmarking some abstract processes that no desktop user would actually run day to day merely reinforces con's point.

      Yes, con may have come off as a bit of an arse, but given his previous "do not contact me regarding kernel matters" posting to LKML, only to be e-mailed with benchmarks on non-desktop hardware, performing non-desktop tasks that shows CFS to be "superior", I'm not surprised.

      I'm guessing he's pretty much "over" banging his head against the wall, trying to get people to "see the light" (or understand that the point is improving interactivity, rather than benchmark numbers).

      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    13. Re:great news by BhaKi · · Score: 2, Informative

      Alternatively, maybe pluggability would have to be done with self modifying code which left no indirection in place?

      No. In most modern CPU architectures, schedulers are implemented by handling a timer interrupt. The address for the handling code is put in the interrupt vector table during kernel start-up. For implementing pluggable-scheduling, all you need to do is to change the contents of the interrupt vector table. Once that is done, scheduling happens the same way as when there's only a single scheduler. So no. It doesn't require self modifying code and it's not a performance overhead to have pluggable schedulers.

      --
      The largest prime factor of my UID is 263267.
    14. Re:great news by Abcd1234 · · Score: 2, Insightful

      Well, he has a point.

      So what? If you have a point, but you're being a dick about it, people are far less likely to notice. And given Ingo at least *tried* to be civil, the least Con could do is return the favour, rather than immediately becoming an offensive asshole. For example, he could've responded with:

      "Well, recall, the purpose of the scheduler is to enhance desktop performance. Thus, I've designed it to favour low latency over high throughput, and as a result, it's not really surprising that, in throughput-related tests, which I consider more of a server-style workload, BFS performs less well as compared to other schedulers."

      No no. He opted for the far more dickish:

      "Do you know what a normal desktop PC looks like? No, a more realistic question based on what you chose to benchmark to prove your point would be: Do you know what normal people actually do on them?"

      Honestly, WTF? And he's surprised when he attracts hostility? Please.

      Con: Step 1 to becoming a decent human being: try not being an asshole. No, really.

  3. Glory! by CAIMLAS · · Score: 5, Interesting

    May I be the first to say "amen"? I've been very dissatisfied with the 2.6 kernel and its schedulers on the desktop, CFS in particular. CFS seems entirely braindead for desktop use compared to the older schedulers in 2.4 and yes, even 2.2.

    A desktop machine needs to be, first and foremost, responsive. If it isn't, it's comparable to the cursor freezing and input taking several seconds to appear: on today's hardware, one might start to think "hey, did it freeze on me?" - completely unacceptable.

    Maybe it can be chalked up to the non-priority of X and video at the kernel level; I don't know. Whatever it is, it used to be better, on very pathetic (133MHz) hardware, while doing a lot more (and when such hardware was not all that powerful anymore, as well).

    My question is: is it in the kernel tree yet? Is this that 2.6.31 scheduler change I heard about earlier yesterday, or is it something Completely Different?

    Oh yeah, and which other scheduler's, if any, did this guy write?

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    1. Re:Glory! by kav2k · · Score: 5, Interesting

      Citing the FAQ:

      Are you looking at getting this into mainline?

      LOL.

      No really, are you?

      LOL.

      Really really, are you?

      No. They would be crazy to use this scheduler anyway since it won't scale to
      their 4096 cpu machines. The only way is to rewrite it to work that way, or
      to have more than one scheduler in the kernel. I don't want to do the former,
      and mainline doesn't want to do the latter. Besides, apparently I'm a bad
      maintainer, which makes sense since for some reason I seem to want to have
      a career, a life, raise a family with kids and have hobbies, all of which
      have nothing to do with linux.

    2. Re:Glory! by Trepidity · · Score: 4, Informative

      My question is: is it in the kernel tree yet? Is this that 2.6.31 scheduler change I heard about earlier yesterday, or is it something Completely Different?

      No, and probably won't ever be, though perhaps some ideas will be borrowed.

      From his FAQ:

      Are you looking at getting this into mainline?

      LOL.

      No really, are you?

      LOL.

      Really really, are you?

      No. They would be crazy to use this scheduler anyway since it won't scale to
      their 4096 cpu machines. The only way is to rewrite it to work that way, or
      to have more than one scheduler in the kernel. I don't want to do the former,
      and mainline doesn't want to do the latter. Besides, apparently I'm a bad
      maintainer, which makes sense since for some reason I seem to want to have
      a career, a life, raise a family with kids and have hobbies, all of which
      have nothing to do with linux.

      Can it be made to scale to 4096 CPUs?

      Sure I guess you could run one runqueue per CPU package instead of a global
      one and so on, but I have no intention whatsoever at doing that because it
      will compromise the performance where *I* care.

      The "bad maintainer" part is referring to bad blood over the adoption of Ingo Molnar's CFS over Kolivas's own RSDL, in particular at least one LKML poster suggesting that, all else being equal, it'd be better to merge Molnar's code, as he was more likely to be a reliable maintainer (Molnar's more tied into the workings of the mainline kernel development/merging/etc.).

    3. Re:Glory! by kav2k · · Score: 5, Informative

      Oh yeah, and which other scheduler's, if any, did this guy write?

      SD scheduler

    4. Re:Glory! by BiggerIsBetter · · Score: 5, Insightful

      No. They would be crazy to use this scheduler anyway since it won't scale to
      their 4096 cpu machines. The only way is to rewrite it to work that way, or
      to have more than one scheduler in the kernel. I don't want to do the former,
      and mainline doesn't want to do the latter. Besides, apparently I'm a bad
      maintainer, which makes sense since for some reason I seem to want to have
      a career, a life, raise a family with kids and have hobbies, all of which
      have nothing to do with linux.

      Which is not to say that it might not find it's way into the Ubuntu Desktop mainline patchset, for example. Sure it might not make sense for the mainline kernel, but it surely makes sense for a user focused distro like Ubuntu - they already have patched base and server kernels, so why not a genuine desktop targeted kernel?

      --
      Forget thrust, drag, lift and weight. Airplanes fly because of money.
    5. Re:Glory! by MichaelSmith · · Score: 2, Interesting

      The "bad maintainer" part is referring to bad blood over the adoption of Ingo Molnar's CFS over Kolivas's own RSDL

      Yeah but Con just didn't give the impression that he intended to be around to support his code. He is an anaesthetist. Software is a hobby which he could give up whenever he wants to. I think that is very different from somebody who is doing software for their career.

    6. Re:Glory! by blind+biker · · Score: 4, Insightful

      I wonder what BeOS had, that was so good. I mean, was it a scheduler thing? Or was it the pervasive multithreadedness that the OS almost forced upon the developers? Whatever it is, it worked like black magic: BeOS would always listen to the user input, no matter what the heck it was doing in the background, no matter what insane load was on the CPU - your mouseclicks were always reacted upon immediately, your drags were always reacted upon immediately, your typing, resizing, brushstrokes, midi-signals, whatever, always, under any circumstance, were immediately and smoothly followed by the correct response.

      I was hoping Windows 2000 would achieve that, then I was hoping Windows XP would achieve that, then I was hoping some of the newer 2.6 kernels in Linux coupled with innovations in X would achieve that - but I was always deeply, utterly disappointed. Then I kinda hoped Vista would get somewhat close to what BeOS did. Oh yeah, now that was a hope decisively smashed.

      --
      "The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
    7. Re:Glory! by Trepidity · · Score: 4, Informative

      Yeah, that makes sense, but he seems to have taken it personally. It sounds like part of it stems from his feeling that Molnar unnecessarily wrote a replacement using his ideas and got credit for it, instead of helping out to turn one of Kolivas's fair-scheduling proposals into something that could be merged. Though from what I can tell Molnar's replies are all pretty friendly, and he seemed keen to provide appropriate credit.

    8. Re:Glory! by kojot350 · · Score: 2, Informative

      "...While all that pervasive multithreading made for impressive technology demos and a great user experience, it could be extremely demanding on the programmer. BeOS was all about threads, going so far as to maintain a separate thread for each window. Whether you liked it or not, your BeOS program was going to be multithreaded."

      "GCD embodies a philosophy that is at the opposite end of the spectrum from BeOS's "pervasive multithreading" design. Rather than achieving responsiveness by getting every possible component of an application running concurrently on its own thread (and paying a heavy price in terms of complex data sharing and locking concerns), GCD encourages a much more limited, hierarchical approach: a main application thread where all the user events are processed and the interface is updated, and worker threads doing specific jobs as needed."
      Very good in-depth article btw. http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/1

      --
      [ $[ $RANDOM % 6 ] == 0 ] && rm -rf / || echo *Click*
    9. Re:Glory! by mwvdlee · · Score: 4, Insightful

      The whole point is moot. Relying on a single maintainer is just plain stupid. "All things being equal" they should choose the code which OTHER people can maintain easier.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    10. Re:Glory! by Hurricane78 · · Score: 5, Informative

      What is that? You don't have the choice of scheduler in your kernel? I'm using the Zen sources, and I get to choose between least half a dozen schedulers, including other settings. I am certain that this scheduler will make it into that patchset, and that I will enable it, as soon as zen-sources-2.6.31 get installed on my system.

      After all this is Linux! Not some one-company-one-kernel monoculture!

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    11. Re:Glory! by Anonymous Coward · · Score: 2, Insightful

      You realize most Linux and open-source developers in general are employed to do what they do? The image of the lone bedroom programmer cranking out awesome code is mostly a romanticised myth.

    12. Re:Glory! by Anonymous Coward · · Score: 2, Insightful

      The pervasive threading made it somewhat more difficult to actually write applications for, and considerably more difficult to write cross-platform applications that worked well on BeOS and other systems (Windows, Mac, Unix and so on). That didn't help with the fairly small number of applications available for BeOS. By all accounts, the rest of the OS provided a pretty decent API though.

      Using a multi-threaded UI isn't unique to BeOS though. It just happens to be the only platform that required a multi-threaded UI to do anything at all. At least two platforms come to mind where a multi-threaded UI is required, because the framework is just too slow and unresponsive if you don't.

      In Java, Swing UIs tend to perform abysmally badly if you do any non-trivial work inside the UI thread. The UI code isn't all that fast, and it's design lends itself toward doing lots of work in the UI thread, which causes the UI to hang. Most Swing applications have terrible responsiveness as a result. However, you can use worker threads to actually do the work, and use the UI thread only for event handling - if you do that, a Swing application can be extremely responsive. It's slightly trickier to do, but once you get the hang of it, it's not too hard.

      The same is pretty much true of .Net's Windows.Forms. It's a bit faster than Swing, although not by much (some parts are actually slower - System.Drawing vs Java2D, for example), so it's a little more forgiving of doing work in the UI thread. It will still bite you in a non-trivial application. Of course, the framework provides absolutely no help in writing a multithreaded application, and all of the tools, examples and documentation make writing a multi-threaded application far more difficult than it should be. You [i]can[/i] write a multi-threaded Windows.Forms application, but nearly nobody does. Which is a shame because, as with Swing, getting all the work off the UI thread makes a huge difference to the application's responsiveness.

      Most other frameworks are fast enough that most application developers don't feel the need to multi-thread the UI, because the UI isn't noticeably slow. While it might not actually be slow, it surely could be much faster.

      I kind of like Qt 4's approach. It's still optional, but it makes it pretty easy to create worker threads. The worker threads communicate using signals and slots, and Qt automatically handles dispatch between threads by mapping a cross-thread signal to an event on the target thread. It's pretty much the simplest approach I've ever seen - it works the same way as .Net's cross-thread delegate invocation, but it's completely transparent, and doesn't require anywhere near as much pointless boilerplate code.

    13. Re:Glory! by Blakey+Rat · · Score: 4, Insightful

      No normal user cares about their video encoding being 2 seconds slower (over a 3 hour process) because they wanted to answer their email. If that's really important to you, you are probably doing your video encode overnight or during some time when nobody's using the computer, anyway, and then it doesn't matter.

      Instant response is *always*, *always* more important than all other tasks. Always. One of the many, many things BeOS got right.

    14. Re:Glory! by Simetrical · · Score: 3, Interesting

      Anyway, Windows has had 2 schedulers for ages - you can select desktop or server style processing (and cache strategy) since NT4.

      That's not two schedulers, it's just some tunables. See pages 391 to 444 of Windows Internals, 5th Edition (or comparable pages in earlier editions). For instance, on Vista the default quantum is two clock intervals (a "clock interval" is usually about 10 to 15 ms), while on Windows Server it's twelve clock intervals. Similarly, on desktops an extra boost is given to the currently focused application. You can adjust this at runtime in the GUI on Vista under Advanced System Settings -> Advanced -> Performance -> Settings -> Advanced (yes, apparently scheduler adjustments are very advanced in Microsoft's view). It can be controlled with slightly more granularity with the registry key HKLM\SYSTEM\CurrentControlSet\Control\PriorityControl\Win32PrioritySeparation (a six-bit bitfield).

      Linux currently offers scheduler tunables both at compile-time and runtime. Try ls /proc/sys/kernel/sched_*. It has more than Windows, apparently. I expect there are some compile-time options too, but I'm not an expert in anything related to kernels or systems programming.

      --
      MediaWiki developer, Total War Center sysadmin
    15. Re:Glory! by shutdown+-p+now · · Score: 4, Informative

      The same is pretty much true of .Net's Windows.Forms. It's a bit faster than Swing, although not by much (some parts are actually slower - System.Drawing vs Java2D, for example), so it's a little more forgiving of doing work in the UI thread. It will still bite you in a non-trivial application. Of course, the framework provides absolutely no help in writing a multithreaded application, and all of the tools, examples and documentation make writing a multi-threaded application far more difficult than it should be.

      Yes, and things like Control.Invoke to marshal invocations from background threads to UI, and especially BackgroundWorker, which are there specifically to provide a high-level (i.e. without locks) API for worker threads, with progress reporting and cancellation, must be just figments of my imagination?

      Have you actually written any WinForms code in .NET 2.0+?

    16. Re:Glory! by ultranova · · Score: 2, Insightful

      ck's advocates are suffering from observer bias. They try his scheduler, and since they know they're trying his scheduler, and since we have a cognitive bias toward seeing what we expect to find, of course they claim it feels "snappier". Of course they can't bring up numbers to support this perception because there is no real effect.

      I haven't tested this scheduler. However, during Con's previous scheduler effort, sound skipped in ZSNES under mainline kernel, and didn't skip under Con's scheduler, in identically loaded machines. However, idle priority was not really idle-only, since a cpu-burning task running at idle priority could cause similar skipping (despite doing nothing but a simple "while(1);").

      That's why, in science, we use numbers.

      The numbers relevant here are the average, maximum and minimum latency, where latency is defined as the time between a sleeping task becoming eligible to run again and it actually starting to execute or a task exhausting its timeslice and the next time it starts to execute, in idle, lightly loaded and heavily loaded machines.

      The argument for a purely forward-looking scheduler that doesn't implement any heuristics is that the maximum latency is a function of the number of tasks running, their priorities, the priority of the current task and whether the task is waking from sleep or has been descheduled due to having used its timeslice. This means that maximum latency is bound (and usually low), resulting in execution that feels snappy (low latency) and smooth (no great variations in latency). A heuristic-using scheduler, on the other hand, can easily end up in a situation where a task is unexpectedly scheduled a lot later than it would in a nonheuristic scheduler; in other words, while the average latency can be low, the maximum is unbound (or at least the bound is very high). These unexpected seemingly random huge latencies (latency variation, to be exact) are what's perceived as "jerky" behaviour, or so the theory goes anyway.

      I agree that we need actual objective data to base decisions on. Does the kernel currently have capability of measuring these things (time when a task starts executing, time when a task stops executing and the reason for it, and a time when a task becomes eligible for execution and the reason for it) and if not, could one be added?

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    17. Re:Glory! by Chrisq · · Score: 2, Informative

      That's not two schedulers, it's just some tunables. See pages 391 to 444 of Windows Internals, 5th Edition (or comparable pages in earlier editions).

      I'd mod you informative, given that this is Linus's preferred option this is an important distinction

  4. *sniff* by s4m7 · · Score: 4, Funny

    I smell another LKML flamewar coming....

    --
    This comment is fully compliant with RFC 527.
    1. Re:*sniff* by tpgp · · Score: 3, Funny

      I smell another LKML flamewar coming....

      A flamewar on the LKML? Pfffffffffffft. Impossible. Never happened, never will happen.

      --
      My pics.
  5. Linux on the Desktop/Linux on the Server by erroneus · · Score: 2, Insightful

    Clearly, Desktop Linux and Server Linux have some things in common, but they also have different needs. I'm not intimately familiar with any kernel programming but I do have some basic understanding of how it all works and even I find it relatively easy to understand that the needs of a good and snappy desktop and those of reliable server are going to have some differences.

    I think it is beyond time that some sort of kernel operating mode optimizations are enabled like this scheduler thing for desktop even if the defaults are for server.

    1. Re:Linux on the Desktop/Linux on the Server by TheRaven64 · · Score: 4, Interesting

      From what I understood from the kernel discussion last time, this would probably have to be #ifdefs galore.

      No, it really wouldn't. Take a look at how Xen and FreeBSD implement pluggable schedulers. Each scheduler in Xen is identified by a struct which contains pointers to its state and all of the functions related to actions the scheduler needs to take. These are called from the rest of the code (most commonly the timer interrupt handler). The total extra cost is one extra load instruction per call, which is tiny compared to the amount of work that the scheduler does. In FreeBSD, it's even simpler. The functions that implement the scheduler are declared in a header and implemented once in each scheduler's .c file(s). At compile time, you simply compile in the scheduler you want. Total run time cost is zero. FreeBSD cares about stability, so they've retained the old 4BSD scheduler all through the transition to the ULE scheduler (which, by the way, was outperforming the CFS in the last set of benchmarks I saw, although not by as large a margin as it outperformed the old Linux scheduler). This allows people operating servers that would rather sacrifice a little performance than use relatively new code to select the old one. Xen is designed for a variety of workloads, and so it has several schedulers that you can choose between.

      Of course, these are only possible if the interface between the scheduler and the rest of the kernel is clean already. If it isn't, however, then you almost certainly have bigger problems than not being able to choose between two schedulers.

      --
      I am TheRaven on Soylent News
  6. forward looking by Trepidity · · Score: 5, Informative

    Took me a while to figure out what "forward looking" means in this context, since "forward-looking scheduler" doesn't seem to be common terminology, and I assumed he wasn't talking about his grand forward-looking vision for schedulerdom.

    Based on some previous arguments he's had, it sounds like he opposes the common heuristic of upping interactive process priority by keeping track of how long processes sleep--- processes that sleep a lot are probably I/O bound, and should get a priority boost so they can run on the (less frequent than for CPU-bound processes) occasions when they're ready. Kolivas wants schedulers to be forward-looking in the sense that they decide how to schedule without looking at process run history, by looking purely at who's ready to run, available timeslices, priorities, etc.

  7. Re:Cool, but what does that spec mean? by Trepidity · · Score: 5, Informative

    He means something different by it--- that the scheduler should only look forward, not look back to per-process history in making its scheduling decisions. A common hack/heuristic to improve interactive performance is to boost the priority of processes that sleep a lot, since CPU-bound jobs sleep rarely, while interactive processes sleep a lot. Kolivas think that's a hack that obscures the real problems with interactive performance, and leads to unpredictable performance since it doesn't fix the underlying issues. So wants to design schedulers with good interactive performance that make decisions based purely on the current set of running processes and priorities, and the upcoming timeslices.

  8. Welcome back Kolivas by BlackSabbath · · Score: 2, Funny

    Haven't run Linux as my personal OS since 2003 but I had a lot of time (pun intended) for CK's schedulers. Now a whole new generation of youngsters can finally learn what a _REAL_ LKML flamewar looks like ;-)

  9. 4096 cpu machines by boldie · · Score: 4, Interesting

    Still some grudge towards Torvalds and Molnar? From the FAQ:
    Are you looking at getting this into mainline?
    LOL.

    No really, are you?
    LOL.

    Really really, are you?

    No. They would be crazy to use this scheduler anyway since it won't scale to their 4096 cpu machines. The only way is to rewrite it to work that way, or to have more than one scheduler in the kernel. I don't want to do the former, and mainline doesn't want to do the latter. Besides, apparently I'm a bad maintainer, which makes sense since for some reason I seem to want to have a career, a life, raise a family with kids and have hobbies, all of which have nothing to do with linux.


    Reminds me of this XKCD.

    I don't have 4096 CPUs, good job Con Kolivas!

    1. Re:4096 cpu machines by amn108 · · Score: 2, Interesting

      Well who knows, maybe instead of the elusive year of Linux on desktop, we should be expecting and applauding years of downstream personal automated-installing GNU/Linux distributions like LFS or diy-linux, which will let users to choose schedulers and what not. Not exactly something I expect to happen soon, but my feeling is GNU/Linux is being institutionalized. It is like if the trust is just not there to anything but the mainline. People assume that the majority is right here - that the maintainers of mainline kernel know best - and every other hacker in minority like Con, is just experimenting. What if we can distribute this trust better - use "non-standard" schedulers etc - then the benchmarking will reach the users and the truth will be distilled eventually. If Cons new scheduler is as good as he tries to paint it, build kernel and use it in thousands. Currently, all eyes are on mainline, which is what prevents choice, even though the choice is "potentially" there.

    2. Re:4096 cpu machines by petrus4 · · Score: 2

      Still some grudge towards Torvalds and Molnar? From the FAQ:

      Apparently Linus genuinely is growing a little more prickly in his old age. While he's still got a fair way to go to equal Theo, he apparently does have a tendency to snap and snarl at people, on occasion. You might want to look up how he treated Alan Cox in relation to the tty code in the kernel, as well.

    3. Re:4096 cpu machines by antimatter15 · · Score: 2, Insightful

      Last part: Thanks to the guys in irc.oftc.net #ck for inspiration to work on this and early testing! Many of them sat idle in that channel for years while nothing happened. The xkcd comic supported_features also helped motivate this work. Yes I know you probably still can't watch full screen videos on youtube, but that's not entirely the scheduler's fault.

    4. Re:4096 cpu machines by TheRaven64 · · Score: 4, Insightful

      Having read flames from both Theo and Linus, it's difficult to make a fair comparison. Linus is a bit more gentle than Theo, but he is much more likely to be wrong when he flames someone. Neither of them is any good at admitting when they are wrong, but Linus has had a lot more practice at being wrong and not admitting it.

      --
      I am TheRaven on Soylent News
    5. Re:4096 cpu machines by SL+Baur · · Score: 2, Informative

      If Cons new scheduler is as good as he tries to paint it, build kernel and use it in thousands.

      Ingo did some benchmarking. The following landed in my lkml mailbox about an hour ago:

      hi Con,

      I've read your BFS announcement/FAQ with great interest: ...

      As can be seen in the graph BFS performed very poorly in this test:
      at 8 pairs of tasks it had a runtime of 45.42 seconds - while
      sched-devel finished them in 3.8 seconds.

      I saw really bad interactivity in the BFS test here - the system
      was starved for as long as the test ran. I stopped the tests at 8
      loops - the system was unusable and i was getting IO timeouts due
      to the scheduling lag:

        sd 0:0:0:0: [sda] Unhandled error code
        sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
        end_request: I/O error, dev sda, sector 81949243
        Aborting journal on device sda2.
        ext3_abort called.
        EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted journal
        Remounting filesystem read-only

      I measured interactivity during this test:

          $ time ssh aldebaran /bin/true
          real 2m17.968s
          user 0m0.009s
          sys 0m0.003s

      A single command took more than 2 minutes. ...

      (Lots of text elided) Apparently he did a lot of benchmarking and BFS didn't fare very well. Ah well.

      I hope this time Con takes him on. Competition Is Good and concentration on desktop interactivity is certainly high on my wishlist of desired optimizations.

    6. Re:4096 cpu machines by SL+Baur · · Score: 2, Interesting

      You might want to look up how he treated Alan Cox in relation to the tty code in the kernel, as well.

      I followed that. Linus wasn't wrong about anything and Alan was acting a tad obtuse. 2.6.32 has been delayed another week to pick another louse out of the pty code.

      There was more going on than was posted on lkml. Alan has always called Linus "pinhead" and gotten away with it.

      Although he has an abrasive personality with developers at times, Linus is pretty good with testers. He was very patient with me in the 1.3 cycle (including sending me patches to test) as I was debugging what would ultimately prove to be a bad cache chip.

      If that makes me a Linus fan boy, whatever. I'm amazed at what he's managed to accomplish. And for all of his "snapping and snarling" I've faced far worse from managers at work with no proven technical skills whatsoever.

  10. Re:Linux gets Yet Another Scheduler by tpgp · · Score: 2, Interesting

    I've yet to be impressed by any of them, for any use, with any hardware.

    I've yet to be impressed by your comment, which contains no reason for your opinion.

    Care to give us some examples of your uses & hardware?

    --
    My pics.
  11. He ain't kidding. by Ant+P. · · Score: 4, Insightful

    CFS can't even cope with a CPU-bound application.

    Who here runs Linux on anything with more than 16 cores? Why should everyone else get the shitty end of the stick just because of maybe a dozen institutes with deep pockets?

    1. Re:He ain't kidding. by dbIII · · Score: 2, Interesting

      I don't know about you, but I run 8 CPU linux cluster nodes at 100% on all CPUs for weeks at a time and I'm only at the very bottom end of "high performance computing". For about two minutes in total a day the nodes are dumping things to disk (snapshots) and are I/O bound. The rest of the time they are pegged at 100% until the job finishes (which takes days to weeks - geophysical stuff). There are several applications that behave this way on these nodes, but there are some that sit waiting doing nothing because they are badly written. That means I think your above statement is a strongly misleading pile of steaming rubbish.

    2. Re:He ain't kidding. by markdavis · · Score: 2, Insightful

      >Who here runs Linux on anything with more than 16 cores?

      Along the same lines... Who here runs their Linux *servers* with 16 or *less* cores? Probably 99.9%?

      And "server" doesn't really mean anything. At work, we use Linux thin clients, so the Linux "server" is really dealing with 150 desktops, except not managing X/kb/mouse. So should it be treated like a "server" or a "desktop" for scheduling?

    3. Re:He ain't kidding. by gbjbaanb · · Score: 4, Insightful

      I think what you want is not a single scheduler designed for the desktop, but one designed for server processes. That's probably the whole argument here - there isn't a single scheduler that can work efficiently for the 2 wildly different types of work a user put a machine to, but currently you don't have a choice. This is all about giving users choice of what kind of scheduler they'd like to run. You might even find that a scheduler designed for lots of CPUs (at the expense of interactivity probably) would suit you much more than the current system, especially when you buy more cores.

    4. Re:He ain't kidding. by TheRaven64 · · Score: 2, Interesting

      And how does Linux handle the T2? The chip has some incredibly complex scheduling constraints; for good performance you need to track both the cycle counter and the wall clock (to balance memory and CPU-bound loads), you need to balance cache churn with workload in your processor affinity (sometimes having related threads on the same core is faster, sometimes it isn't). Somehow, I can't help feeling that the one-size-fits-all scheduler in Linux doesn't actually do all of this.

      --
      I am TheRaven on Soylent News
  12. 16... okay for the desktop for 12 months by MosesJones · · Score: 4, Interesting

    16 sounds like a ridiculously high number for a desktop but is it?

    Already we have 4 core processes which have "soft" additional threads (Intel's HT for instance) and some people already have dual CPU desktop machines meaning they are already at the 16 CPU limit.

    Roll on 12-18 months and we'll be seeing 8 core CPUs with 8 soft-cores as coming in on top end desktops. Roll forwards 3 years and you'll be seeing 32 core CPUs with 32 soft-cores which is where the scheduler breaks down.

    So the problem here is that this is a brilliant optimisation for today and for pieces like the netbook market but won't be good for the desktop market long term.

    With Linux looking to be strong in the netbook market however it does say that having a more efficient scheduler for that market would be a better idea than just optimising everything for the server side.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
    1. Re:16... okay for the desktop for 12 months by TheRaven64 · · Score: 2, Insightful

      16 probably isn't very far off. The ARM Cortex A9, which is starting to ship into handhelds and mobile phones, scales to 4 cores. The A10 will probably handle 16, so expect to see handheld computers with 16 cores in the next couple of years. Of course, when you're on battery power, you'll probably want to turn a few of these off, so the scheduler has to decide not just which jobs to run, but how many cores to enable at any given time. This is a really difficult problem (you can read some interesting papers on the subject, quite a few funded by Intel research grants, if you look) because running two cores at 500MHz can use less power than running one at 1GHz, but only if they are both loaded. Once you add in the ability to scale the clocks on each core independently it becomes even more tricky. Then you need to add in the requirements of asymmetric multiprocessing environments; deciding if it is worth turning the GPU core on to run this OpenCL kernel, or should you schedule it on the CPU, for example.

      Any scheduler created today is likely to look horribly antiquated in five years. There are so many open research problems in the domain, before you even get down to implementing the algorithms.

      --
      I am TheRaven on Soylent News
    2. Re:16... okay for the desktop for 12 months by SpinyNorman · · Score: 3, Insightful

      I guess you didn't read TFA:

      Can it be made to scale to 4096 CPUs?

      Sure I guess you could run one runqueue per CPU package instead of a global
      one and so on, but I have no intention whatsoever at doing that because it
      will compromise the performance where *I* care.

      In the meantime if you care about CPU utilization and latency then use this. Tomorrow will take care of itself. It's not like if you buy one computer or graphics card, or build one kernel, that you're tied to it for the rest of your life. You use this year what's available and update when the situation warrants it.

  13. Re:Linux gets Yet Another Scheduler by deniable · · Score: 2, Funny

    Musical Schedulers? Let me guess, when the music starts to skip, a random process gets killed.

  14. Re:What about FreeBSD? by markdavis · · Score: 2, Informative

    Almost all runaway processes are due to bugs in the end applications, not some situation created by the kernel.

  15. Re:Who cares? by Mprx · · Score: 2, Informative

    Compiling with SSD vs. mechanical HD:
    http://anandtech.com/storage/showdoc.aspx?i=3631&p=25

    Compiling is CPU bound.

  16. Re:That has NEVER been the goal by TheRaven64 · · Score: 2, Interesting

    Hurd is not unsuccessful because it is a microkernel, it is unsuccessful because it is run by perfectionists. Every time they get something quite good, they realise that a complete rewrite could make it even better and they throw away a lot of good code.

    Xen seems to be doing quite well as a microkernel, but until everyone is using multiprocessor machines there is a performance penalty for using a microkernel. When everyone is using multicore, they still have the disadvantage that monolithic kernels have been under active development for the last thirty years (more in a few cases) while microkernels have been largely ignored.

    A modern OS kernel, however, often has a lot more in common with microkernel designs even if it's all running in a single address space. Take a look, for example, at the OpenSolaris network stack. Every component runs in a separate thread and communicates with those above and below via message passing. It would be trivial to separate these out into different userspace processes, but there's no real advantage to doing so.

    --
    I am TheRaven on Soylent News
  17. Re:Who cares? by internet-redstar · · Score: 2, Insightful
    This article talks about compiling pidgin, 54Mb source code tree with at least 15% of that multimedia clutter.

    During testing (on the Windows platform!) I guess it's safe to assume that everything was handled by filesystem cache.

    The comparisation with compiling the kernel on Linux on a machine with not too much RAM doesn't stand.

  18. Re:Who cares? by TheRaven64 · · Score: 2, Informative

    Also, compilation is not all that I/O bound, it is more CPU bound.

    Depends a lot on what you're compiling. A typical program on OS X, for example, begins with #import <Cocoa/Cocoa.h>. This includes a header which brings in around a hundred other headers for a total of about 3MB of preprocessed source. Most of the time you'll be using a precompiled header for this, but you still often get a spike of read activity at the start of a compilation, then a CPU-bound chunk, then a write-bound part as it generates the object code. This is why, when you use -j, you are recommended to use a few more processes than you have cores, so you can overlap the I/O-bound parts in one compile with the CPU-bound parts in the next.

    --
    I am TheRaven on Soylent News
  19. Kudos Con by amightywind · · Score: 3, Interesting

    Welcome back Con! I wonder how long it is before Ingo "Kudos Con" Molnar rips of the new design? The kernel team has developed a very bad case of "not invented here." http://kerneltrap.org/node/8059

    --
    an ill wind that blows no good
  20. Re:Who cares? by TheRaven64 · · Score: 3, Informative

    If you're interested, the clang team have done a lot of profiling of exactly what takes time when compiling. It's particularly interesting how much of a bottleneck preprocessing is with gcc and, more importantly, distcc (which sends the preprocessed sources over the network for compilation). Most of the results are on the web site, with a few in the mailing list archives.

    --
    I am TheRaven on Soylent News