The Really Fair Scheduler

Coming soon to a linux kernel near you: by El_Muerte_TDS · 2007-09-01 08:58 · Score: 3, Funny

The the fancy fair scheduler.

Re:Coming soon to a linux kernel near you: by Megane · 2007-09-01 11:32 · Score: 3, Funny

I'm waiting for the Science Fair Scheduler. And the ladies out there might want to try the Vanity Fair Scheduler.

--
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Re:Coming soon to a linux kernel near you: by BronsCon · 2007-09-01 12:10 · Score: 3, Funny

"fancy fair scheduler"

Oh, FFS.

--
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
Re:Coming soon to a linux kernel near you: by gowen · 2007-09-01 20:53 · Score: 5, Funny

How about the Scarbrough Fair Scheduler, that allocates Parsley, Sage, Rosemary and Thymeslices.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.

Still waiting for the IFS by amliebsch · 2007-09-01 08:58 · Score: 4, Funny

Still waiting for Steve Jobs' "Insanely Fair Scheduler."

--
If you don't know where you are going, you will wind up somewhere else.

Re:Still waiting for the IFS by Xtravar · 2007-09-01 09:02 · Score: 4, Funny

Still waiting for Steve Jobs' "Insanely Fair Scheduler." Wouldn't that be named something more like iFS or iSched?

God forbid we drop the lower-case I naming convention. It stands for "interwebs compatible".

--
Buckle your ROFL belt, we're in for some LOLs.
Re:Still waiting for the IFS by JoeCommodore · 2007-09-01 09:10 · Score: 4, Funny

Sure would be better than the "Multicolored Pinwheel of Wait" part of OS X now.

--
"Enjoy what you're doing! If it becomes drudgery, you're doing it wrong!" - Jim Butterfield
Re:Still waiting for the IFS by LiquidCoooled · 2007-09-01 09:41 · Score: 3, Funny

Sorry, Apple already has designs on the iSched moniker.
Where else would you keep your iLawnmower?

--
liqbase :: faster than paper
Re:Still waiting for the IFS by Anti-Trend · 2007-09-01 09:59 · Score: 3, Interesting

Agreed. While I recognise and appreciate the humor in your comment, this is the main reason I use Debian on the desktop rather than OS X -- I multitask heavily. A Linux kernel with a Desktop preemption model and 1000Hz Timer frequency is a Godsend for those who push their PC's a tad too hard on a regular basis. I would like to see a simplified version of the scheduler, but all said CFS isn't as bad as everybody makes it out to be.

--
Working in a DevOps shop is like playing in a band made up entirely of keytarists.

Does it... by markov_chain · 2007-09-01 09:03 · Score: 3, Interesting

help in the case when a process goes nuts allocating memory, and stops the GUI dead in its tracks? No Alt-Ctrl-Backspace, no switching to console, unbearably slow remote login...

--
Tsunami -- You can't bring a good wave down!

Re:Does it... by DaleGlass · 2007-09-01 09:12 · Score: 5, Informative

I don't think any scheduler will help you with that. The slowness is due to the swapping in and out from the disk, and that's going to be limited by the horribly slow speed of the disk.

You could tweak things to make this a less likely ocurrence though.

Disable overcommit by echo 2 > /proc/sys/vm/overcommit_memory. No more OOM killer killing some random unrelated process. Memory allocations will fail and programs will be able to handle that correctly.

Set some memory limits in /etc/security/limits.conf

Avoid having too much swap space. It's awfully slow, if you're using it too much all you'll manage is to run more things slower.

Get more RAM, it's cheap. If you're regularly swapping then you definitely should.
Re:Does it... by Anonymous Coward · 2007-09-01 09:43 · Score: 3, Informative

(on a bash shell)
ulimit -v 4096 command_that_uses_memory
This will limit the amount of memory available to command_that_uses_memory, and kill it once that limit is reached. But do you really want firefox forcibly killed every time you visit youtube?
Re:Does it... by ForumTroll · 2007-09-01 09:47 · Score: 5, Funny

But do you really want firefox forcibly killed every time you visit youtube?
Yes.

--
"A Lisp programmer knows the value of everything, but the cost of nothing." - Alan Perlis
Re:Does it... by Just+Some+Guy · 2007-09-01 10:35 · Score: 4, Informative

Avoid having too much swap space. It's awfully slow, if you're using it too much all you'll manage is to run more things slower.
FreeBSD likes lots of extra swap space. An idle system will notice that some process hasn't run in a month and will push it to swap, proactively freeing RAM for something else that might want it. Note that it will only page out a process's data segment; it's code segment uses the filesystem itself for paging (why copy "firefox" into swap when there's already a perfectly readable copy on the filesystem?).

Unless, of course, you unlink its executable file, in which case it allocates swap to hold the file first. Which also illustrates that while unnecessary computational complexity is bad, willingness to do complex things when the situation demands can lead to some pretty cool stuff.

--
Dewey, what part of this looks like authorities should be involved?

Interestingly rigorous by heinousjay · 2007-09-01 09:03 · Score: 3, Interesting

I'd have to imagine doing so much work to prove a particular implementation's value mathematically is a good step toward depoliticizing the scheduler. That should help in what's been a contentious piece of the kernel of late.

--
Slashdot - where whining about luck is the new way to make the world you want.

Re:Interestingly rigorous by ianare · 2007-09-01 09:46 · Score: 3, Informative

One would hope, but it doesn't look like it's going that way. If you look at Ingo's reply, then Roman's reply to that, you can see what could be the start of yet another flame fest :
Hi,

On Fri, 31 Aug 2007, Ingo Molnar wrote:

> So the most intrusive (math) aspects of your patch have been implemented
> already for CFS (almost a month ago), in a finegrained way.

Interesting claim, please substantiate.

> Peter's patches change the CFS calculations gradually over from
> 'normalized' to 'non-normalized' wait-runtime, to avoid the
> normalizing/denormalizing overhead and rounding error.

Actually it changes wait-runtime to a normalized value and it changes nothing about the rounding error I was talking about. It addresses the conversion error between the different units I was mentioning in an earlier mail, but the value is still rounded.

> > This model is far more accurate than CFS is and doesn't add an error
> > over time, thus there are no more underflow/overflow anymore within
> > the described limits.

> ( your characterisation errs in that it makes it appear to be a common
> problem, while in practice it's only a corner-case limited to extreme
> negative nice levels and even there it needs a very high rate of
> scheduling and an artificially constructed workload: several hundreds
> of thousand of context switches per second with a yield-ing loop to be
> even measurable with unmodified CFS. So this is not a 2.6.23 issue at
> all - unless there's some testcase that proves the opposite. )

> with Peter's queue there are no underflows/overflows either anymore in
> any synthetic corner-case we could come up with. Peter's queue works
> well but it's 2.6.24 material.

Did you even try to understand what I wrote? I didn't say that it's a "common problem", it's a conceptual problem. The rounding has been improved lately, so it's not as easy to trigger with some simple busy loops. Peter's patches don't remove limit_wait_runtime() and AFAICT they can't, so I'm really amazed how you can make such claims.

> All in one, we dont disagree, this is an incremental improvement we are
> thinking about for 2.6.24. We do disagree with this being positioned as
> something fundamentally different though - it's just the same thing
> mathematically, expressed without a "/weight" divisor, resulting in no
> change in scheduling behavior. (except for a small shift of CPU
> utilization for a synthetic corner-case)

Everytime I'm amazed how quickly you get to your judgements... :-( Especially interesting is that you don't need to ask a single question for that, which would mean you actually understood what I wrote, OTOH your wild claims tell me something completely different.

BTW who is "we" and how is it possible that this meta mind can come to such quick judgements?

The basic concept is quite different enough, one can e.g. see that I have to calculate some of the key CFS variables for the debug output. The concepts are related, but they are definitively not "the same thing mathematically", the method of resolution is quite different, if you think otherwise then please _prove_ it.

bye, Roman
Re:Interestingly rigorous by HeroreV · 2007-09-01 15:26 · Score: 3, Insightful

When will people learn that being rude doesn't help? If you want somebody to work with you, you need to play nice. It's not pleasant, and it's not easy to make yourself calm down and act like a pussy, but it's important if you ever want any collaboration.

Example:
Interesting, but I don't see this. Can you point it out?

I think you misunderstood me. It may not be a common problem, but it is a conceptual problem. The rounding has been improved lately, so it's not as easy to trigger with some simple busy loops. Peter's patches don't remove limit_wait_runtime() and AFAICT they can't, so I don't see how what you said can be correct.

I'm worried about how quickly you judged this issue, and that you haven't been more in contact with me discussing it. This issue is important to me, and I'd really like to work with you to get it resolved.

The Infintely Fair Scheduler of Solomon by WombatDeath · 2007-09-01 09:04 · Score: 4, Funny

In which no process gets any resources at all. I've also been considering a quantum scheduler, in which each CPU cycle is assigned to every process simultaneously.

Shit, I've just figured out why I'm a project manager.

Fuck this. by Anonymous Coward · 2007-09-01 09:11 · Score: 5, Funny

Let's just go back to cooperative multitasking like Mac OS where everything was simple.

Re:Fuck this. by bcat24 · 2007-09-01 11:42 · Score: 3, Funny

Woosh!

Re:Coming soon by ScrewMaster · 2007-09-01 09:15 · Score: 4, Funny

Of course, there's the companion "pork barrel scheduler" which randomly spawns useless processes in order to take time from those that deserve it.

--
The higher the technology, the sharper that two-edged sword.

Why not swappable? by jimmyhat3939 · 2007-09-01 09:15 · Score: 3, Interesting

What I don't understand is why these schedulers can't just be swapped out by the users. I know there was some discussion of this, and it was vetoed by the kernel maintainers. It makes a lot of sense to me to just allow users to insert kernel modules with schedulers and just do something in the /proc filesystem to go between them. Then people could use whatever they like, and if they write their own, they wouldn't have to recompile the kernel.

After all, isn't that the idea of open source software -- may the best code win?

--
Free Conference Call -- No Spam, High Quality

Re:Why not swappable? by cnettel · 2007-09-01 09:32 · Score: 5, Informative

The scheduler is at the very heart of the kernel. It's relatively hard to make the logic for choicing what and when to context-switch modular, while keeping the actual context-switches fast enough. Diferent schedulers tend to have different ideas on what stats to keep, and you all want it with good memory locality. After all, we should remember that this is a piece of code that's relevant tens or hundreds of times per second, no matter what you do with your machine.
Re:Why not swappable? by sonpal · 2007-09-02 01:46 · Score: 3, Interesting

One could say the same about filesystems - but we figured out how to abstract the filesystem API in UNIX a long time ago. This led to a lot of innovation in filesystems - ext2, ext3, ReiserFS, AFS, ZFS, etc. I think we might see similar innovation in schedulers if the scheduler was pluggable. At the very least, I suspect that Con Klivas would still be a kernel developer had we supported pluggable schedulers, and that alone might justify making the scheduler pluggable.

I expect that there would be a performance impact if the scheduler were pluggable - modular and optimized do not generally go together. However, the worst case performance of any scheduler dominates the user experience, so IMHO, it is worth accepting a small performance penalty to enables competition and innovation toward reducing the worst case performance.

This post by fishthegeek · 2007-09-01 09:18 · Score: 3, Funny

has been scheduled for use by the slashdot server farm on September 6, 2007 at 14:54:23. Please refresh this page at that time for fishthegeek's insightful comment.

Automatically generated by:
Slashdot Predictive Post Scheduler v 2.12.02-16

--
load "$",8,1

More flame bait? by Bryan+Ischo · 2007-09-01 09:29 · Score: 4, Insightful

I read the article in question. There is obviously much disagreement about the value of the Really Fair Scheduler, and so I must assume that "derrida" and the Slashdot editors are once again just trying to invite more people to the flame-fest as usual.

The comments on the article at the linked-to site suggest that there are potentially flaws in the logic behind the Really Fair Scheduler, and that its author has ignored advancements in the CFS that make most (or all?) of its improvements irrelevent. Also there are many suggestions that the author of the Really Fair Scheduler, some guy named Roman something-or-other, is raging on the kernel lists rather than working cooperatively to improve the Linux scheduler.

Given what I have seen, I suspect that the Really Fair Scheduler is going nowhere, and that "derrida" knows that and is just trying to add more fuel to the flame-fire by posting about it on Slashdot.

Re:More flame bait? by Dr.+Spork · 2007-09-01 10:36 · Score: 4, Insightful

You could be right, but Roman is in a tough position, because he's arguing for a change that he thinks is big, and Ingo seems to be trying to sap his enthusiasm by telling him to essentially "work on what we're doing" when Roman wants to have a debate about the best architecture for the scheduler.
In order to help give substance to the debate, Roman coded together some proof-of-concept stuff, but instead of his architectural ideas being looked at seriously and critically, Ingo instructs him to strip away most things and "well use it." That really should seem to everyone on the sidelines like Roman's ideas are being ignored without debate. Now, maybe Ingo is polite, Roman's work just sucks, and Ingo won't confront him on it. But if that's not the case, maybe there should be a (non-flamey) debate about the best architecture for the scheduler.

ingo's reply by ianare · 2007-09-01 09:32 · Score: 4, Informative

Ingo's reply can be found here. Roman's reply to that is here and here

Re:Coming soon by arth1 · 2007-09-01 09:34 · Score: 4, Insightful

You're more insightful than you think. I don't want a fair scheduler. I want a very unfair one, that favours my favourite processes. And I want one that has as little overhead as possible -- a scheduler so complex that it eats 20% of the available cycles just to figure out who to give the remaining 80% to, I have no use for.

But where is the Linux IO Scheduler? by Anonymous Coward · 2007-09-01 09:52 · Score: 5, Insightful

Screw the CPU scheduler at this point. The kernel folks are missing the obvious and utter brokenness of the IO scheduling. These bugs have been outstanding about a year now!! And it's not just AMD64 anymore either. Quoth the kernel bug report:

"Now, as far as this bug being AMD64 only. We develop a portable data analysis
tool and we run it on Intel Core Mobile systems (Sony UX series, Panasonic
Toughbook series) and see this bug or one almost exactly like it on those
platforms as well.
"

http://bugzilla.kernel.org/show_bug.cgi?id=7372
http://bugzilla.kernel.org/show_bug.cgi?id=8636
http://www.nabble.com/IO-activity-brings-my-deskto p-to-its-knees-(2.6.22.1-ck1)-t4192136.html
http://forums.gentoo.org/viewtopic-t-482731-start- 500.html

At first, deadline IO was touted as an answer, but that doesn't completely fix things.
Some say Native Command Queueing is broken. One person claims deadline + NCQ disabled helps.
Some say the kernel's vfs_cache_pressure settings help, while others refute it (compare kernel bug report versus page 21 of the gentoo forum thread). But no one understands what's really broken in the kernel.

Can we please get Ingo working on IO scheduling? PLEASE?

mirror, mirror, on the RAID by r00t · 2007-09-01 09:53 · Score: 3, Funny

Who's the fairest scheduler made?

Re:Coming soon by Anti-Trend · 2007-09-01 10:34 · Score: 3, Informative

Hmmm, ever heard of nice?

--
Working in a DevOps shop is like playing in a band made up entirely of keytarists.

Math is only reliable up to a point by Goonie · 2007-09-01 11:11 · Score: 3, Insightful

A fair proportion of the time, the mathematics applied in computer science (and, probably, most other disciplines) starts with simplifying and often unrealistic assumptions.

Not that maths isn't useful, but much of the time it can't give you definitive answers for the questions you really want answers to, only somewhat related, simpler ones.

--

Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)

Re:What about the really greedy scheduler... by ozmanjusri · 2007-09-01 11:19 · Score: 4, Funny

Microsoft has patented that for the Vista scheduler

--
"I've got more toys than Teruhisa Kitahara."

Sausages by chiok · 2007-09-01 12:10 · Score: 5, Funny

"To retain respect for sausages and Linux schedulers, one must not watch them in the making."
-- Otto von Bismarck (paraphrased)

Re:Not quite accurate by Antique+Geekmeister · 2007-09-01 12:10 · Score: 3, Funny

I guess he should pull a Theo de Raadt, and release an OpenLinux kernel now?

User Driven Scheduler by elmartinos · 2007-09-01 12:40 · Score: 3, Funny

Writing a fair scheduler is difficult. Why not let the user decide? I propose a popup message for each context switch: "Hello, it seems the CPU is doing a context switch. Which application to you want to allow to run this time?".

--
Open Source Alternatives

Smarter write throttling is the answer by Spoke · 2007-09-01 17:57 · Score: 4, Interesting

It's fairly well known that large writes to the filesystem can cause huge read delays.

This seems to be aggravated by a number of conditions listed in the links posted by the parent post, but it's also aggravated when using ext3 and ordered data journaling as well (which is the default on most systems).

There is some work being done to reduce the huge latency in reads that can occur during heavy write loads with the "per device dirty throttling" patchset. Initial results look very promising.

LWN article: Smarter write throttling
per device dirty throttling -v8

This patch set seems to hold a lot of promise in being able to fix this problem, but I'm not sure what the latest status is or what kernel it will make it into. It could make it into 2.6.24 at the earliest.

Next week: by bytesex · 2007-09-01 19:23 · Score: 3, Funny

Next week: a completely new scheduler, written by Ingo, in 05:12:43.33213, called the 'Astoundingly Fair Scheduler', which doesn't look at all like this new improvement, especially - hey look ! Something shiny ! And in two weeks time, a defence written by Linus Torvalds, detailing why the AFS is so much better than the RFS, and why Ingo can be trusted so much more when it comes to maintaining stuff like that.

--
Religion is what happens when nature strikes and groupthink goes wrong.

Review feedback by Ingo+Molnar · 2007-09-02 02:29 · Score: 5, Informative

Oh my gosh, the Linux scheduler is on Slashdot. Again! :-)

Frankly, this amount of interest in the Linux scheduler is certainly flattering to all of us Linux scheduler hackers, but there are certainly more important areas that need improvement: 3D support, the MM / IO schedulers, stability, compatibility, etc. (There's also the FreeBSD scheduler that went through a total rewrite recently - and it got not a single Slashdot article that i remember.)

But i digress. A couple of quick high-level points (most of the details can be found in the discussions on lkml):

I find the RFS submission interesting and useful, and i have asked the author to split the patch up a bit better, to separate the core idea from optimizations and unrelated changes - to ease review and merging of the changes, and to make the changes bisectable during QA after they have been applied to the mainstream kernel. (That is how patches are typically submitted to the Linux-kernel mailing list - it's a basic requirement before anything can be merged. CFS for example was applied to the 2.6.23 development tree in form of a series of 50 (!) separate patches. (And the scheduler works at every patching/bisection point.))

I also pointed him to the latest "bleeding edge" scheduler tree, which already implements the same non-normalized form of math and makes some of the rounding and performance arguments moot i believe. (lkml mail).

There are some issues where i disagree with Roman at the moment: even when comparing to unmodified current upstream CFS, i think Roman makes too much out of rounding behavior and i have asked him to substantiate his claims with numbers (lkml mail).

The current precision/rounding of CFS is better than one part in a million. (in fact it's currently even better than that, but i'm saying 1:1000000 here because we could in the future consciously decrease precision, if performance or simplicity arguments justify it.)

I can understand his desire towards creating interest in his patch, but IMO it should not be done by unfairly (pun unintended ;) trash-talking other people's code. The math code in CFS that achieves precision has gone through more than 5 complete rewrites already in the 20-plus CFS versions, and the current variant was not written by me but was largely authored by Thomas Gleixner and Peter Zijlstra.

New, better approaches are possible of course and the math is relatively easy to replace, due to the internal modularity of CFS. So we are keeping an open mind towards further improvements. (which includes the possibility of total replacements as well. Dozens of times has my own kernel code been replaced with new, better implementations in the past - and that includes large parts of the scheduler too. In fact only ~30% of current kernel/sched.c was authored by me, the rest has been written by the other 90+ scheduler contributors, according to the git-annotate output that covers the past ~2.5 years of kernel history. Beyond that numerous other people have contributed to the scheduler in the past.)

About the submitted code: it was a bit hard to review it because the new code did not contain any comments - it only included raw code - which is very uncommon for patches of such type. The email gave the theoretical background but there was little implementational detail in the patch itself connecting the theory to practice.

So to drive this issue forward i have today posted a question to Roman in form of a tiny patch that extracts only his suggested new math from his patch and applies it to CFS. If it is indeed what Roman intended then we can analyze that in isolation and in more detail. The patch is as small as it gets:

include/linux/sched.h | 1 +

40 of 199 comments (clear)