The Really Fair Scheduler

← Back to Stories (view on slashdot.org)

Posted by kdawson on Saturday September 1, 2007 @08:54AM from the not-over-till-it's-over dept.

derrida writes "During the many threads discussing Ingo Molnar's recently merged Completely Fair Scheduler, Roman Zippel has repeatedly questioned the complexity of the new process scheduler. In a recent posting to the Linux Kernel mailing list he offered a simpler scheduler named the 'Really Fair Scheduler' saying, 'As I already tried to explain previously CFS has a considerable algorithmic and computational complexity. This patch should now make it clearer, why I could so easily skip over Ingo's long explanation of all the tricks CFS uses to keep the computational overhead low — I simply don't need them.'"

14 of 199 comments (clear)

Min score:

Reason:

Sort:

Does it... by markov_chain · 2007-09-01 09:03 · Score: 3, Interesting

help in the case when a process goes nuts allocating memory, and stops the GUI dead in its tracks? No Alt-Ctrl-Backspace, no switching to console, unbearably slow remote login...

--
Tsunami -- You can't bring a good wave down!
Interestingly rigorous by heinousjay · 2007-09-01 09:03 · Score: 3, Interesting

I'd have to imagine doing so much work to prove a particular implementation's value mathematically is a good step toward depoliticizing the scheduler. That should help in what's been a contentious piece of the kernel of late.

--
Slashdot - where whining about luck is the new way to make the world you want.
1. Re:Interestingly rigorous by try_anything · 2007-09-01 10:29 · Score: 2, Interesting
  
  Math is reliable, but it's slow going, even for very simple math.
  
  People prefer verbal reasoning, even though all kinds of logical errors can slip in undetected, for the simple fact that they can read it at the speed of speech -- even if they really shouldn't.
  
  This is PAINFULLY evident in the software world. I imagine even kernel developers tend to be lazy this way.
2. Re:Interestingly rigorous by ccp · 2007-09-04 22:38 · Score: 2, Interesting
  
  When will people learn that being rude doesn't help? If you want somebody to work with you, you need to play nice. It's not pleasant, and it's not easy to make yourself calm down and act like a pussy, but it's important if you ever want any collaboration.
  
  (emphasis mine)
  
  Very true, but I have this suspicion that some hacker's rudeness is intended to piss people off and keep the field, the spotlight, and the pressumed "glory" to themselves.
  
  Sad thing is, it works a lot of the time, and you can always blame old trusty Asperger's.
  
  Cheers,
  CC
Why not swappable? by jimmyhat3939 · 2007-09-01 09:15 · Score: 3, Interesting

What I don't understand is why these schedulers can't just be swapped out by the users. I know there was some discussion of this, and it was vetoed by the kernel maintainers. It makes a lot of sense to me to just allow users to insert kernel modules with schedulers and just do something in the /proc filesystem to go between them. Then people could use whatever they like, and if they write their own, they wouldn't have to recompile the kernel.
After all, isn't that the idea of open source software -- may the best code win?

--
Free Conference Call -- No Spam, High Quality
1. Re:Why not swappable? by dhasenan · 2007-09-01 10:05 · Score: 2, Interesting
  
  Then don't allow them to compile schedulers as modules -- force each kernel to have a single scheduler built in. Then it's a matter of specifying the interface and then linking in a different object file.
  
  It's doable (easy, even), it doesn't require significant investment from a kernel maintenance perspective, and it cuts through a fair bit of politicking.
2. Re:Why not swappable? by sonpal · 2007-09-02 01:46 · Score: 3, Interesting
  
  One could say the same about filesystems - but we figured out how to abstract the filesystem API in UNIX a long time ago. This led to a lot of innovation in filesystems - ext2, ext3, ReiserFS, AFS, ZFS, etc. I think we might see similar innovation in schedulers if the scheduler was pluggable. At the very least, I suspect that Con Klivas would still be a kernel developer had we supported pluggable schedulers, and that alone might justify making the scheduler pluggable.
  
  I expect that there would be a performance impact if the scheduler were pluggable - modular and optimized do not generally go together. However, the worst case performance of any scheduler dominates the user experience, so IMHO, it is worth accepting a small performance penalty to enables competition and innovation toward reducing the worst case performance.
Re:Still waiting for the IFS by Anti-Trend · 2007-09-01 09:59 · Score: 3, Interesting

Agreed. While I recognise and appreciate the humor in your comment, this is the main reason I use Debian on the desktop rather than OS X -- I multitask heavily. A Linux kernel with a Desktop preemption model and 1000Hz Timer frequency is a Godsend for those who push their PC's a tad too hard on a regular basis. I would like to see a simplified version of the scheduler, but all said CFS isn't as bad as everybody makes it out to be.

--
Working in a DevOps shop is like playing in a band made up entirely of keytarists.
Come on by Anonymous Coward · 2007-09-01 10:59 · Score: 1, Interesting

People don't you understand? Ingo Molnar is the favourite puppy of Linus, so completely forget anybody else getting a chance to make a better scheduler for the kernel...
It's the usual politics and corruption...
Re:ingo's reply by budgenator · 2007-09-01 11:07 · Score: 2, Interesting

The problem is Linux is used in a spectrum of 3 obvious types, servers, workstations and desktop and the developers tend to be very sensitive to the server and workstations areas so in the end of the day it'll be test cases that favor servers vs. test cases that favor desktops. What makes me wonder is why don't they develop three, each one optimized for a particular usage pattern and just let me select the kernel I want with GRUB? It should be possible to modify init to select the correct rc.conf to each pattern as well.

--
Apocalypse Cancelled, Sorry, No Ticket Refunds
Not quite accurate by LinuxGeek · 2007-09-01 12:00 · Score: 2, Interesting

Linus chose the scheduler written by the person that best interacted within the existing developer structure and responded to problem reports. The rejected scheduler may have been slightly better, but the developer was much less cooperative and responsive to bug reports. He killed his own project because of attitude.

--

Kindness is the language which the deaf can hear and the blind can see. - Mark Twain
Re:Coming soon by ls671 · 2007-09-01 12:19 · Score: 2, Interesting

Ever heard of ionice?

I experimented with it, but not in depth. As far as I remember, ionice didn't help a lot compared to real mainframe I/O scheduler. I have always felt that Linux was weak on I/O scheduling and other posts tend to confirm what I suspect.
Now, if you tell me that I can do real I/O scheduling with ionice and that you have managed to accomplish that. I might give it a second try, more in depth this time.
Also, please specify kernel tweaking parameters to cause ionice to act as a real I/O scheduler.
Again, I might not have experimented with ionice enough to possess an accurate picture but other posts on this thread seem to lead to what I assumed so far.

--
Everything I write is lies, read between the lines.
Smarter write throttling is the answer by Spoke · 2007-09-01 17:57 · Score: 4, Interesting

It's fairly well known that large writes to the filesystem can cause huge read delays.

This seems to be aggravated by a number of conditions listed in the links posted by the parent post, but it's also aggravated when using ext3 and ordered data journaling as well (which is the default on most systems).

There is some work being done to reduce the huge latency in reads that can occur during heavy write loads with the "per device dirty throttling" patchset. Initial results look very promising.

LWN article: Smarter write throttling
per device dirty throttling -v8

This patch set seems to hold a lot of promise in being able to fix this problem, but I'm not sure what the latest status is or what kernel it will make it into. It could make it into 2.6.24 at the earliest.
Misunderstanding... by Junta · 2007-09-02 03:03 · Score: 2, Interesting

At least in linux, and I presume FreeBSD's swap strategy is similar, you miss the point. Let's look at two scenarios, one with proactive swapping, one without, and a malloc comes in that exceeds system memory.

Non-proactive case:
-kernel sees malloc, knows it lacks physical memory to accommodate, malloc is blocked while kernel does housekeeping.
-kernel picks the appropriate amount of pages to write to swap, then writes those pages to swap space, taking a while since block storage IO is excruciatingly slow.
-After the extremely long previous step, the memory is freed
-the malloc is allowed to continue, after a number of milliseconds have passed to execute the drive write, aside from the drive write, everything was in the microsecond scale, so it was delayed by a factor of thousands.

Proactive case:
-The system has some idle time, with nothing immediately better to do, the kernel notes free swap space and flags some appropriate memory as what would be swapped out if and when the system was in need, and copies it to disk, but it *leaves it in memory*. The kernel remembers that while these pages are indeed in memory, it can zap them and be able to restore. This is the critical point, the data in memory has not been *moved* to swap, it has been *copied* to swap.
-Program using that data randomly kicks back to life. It's needed data is on disk, but it is a moot point because it is in physical memory too, so it isn't slowed down. The kernel might take this opportunity to re-evaluate things when idle in terms of what it thinks is mostly unwanted pages.
-Later on, a program needs to malloc and physical memory is exhausted, the kernel blocks to do housekeeping, finds pages that it knows it has copied to disc, frees them and uses them to satisfy the malloc, within microseconds.

Proactive swapping causes extra IO activity during idle, but does not, if implemented correctly, impact things proactively swapped unnecessarily negatively, and allows swap on actual demand to be nearly trivially fast. It may be wasteful to have gobs of swap, and certainly if the swap has the sole copy of tons of data then performance is hopeless, but don't think seeing the swap used count go up 'mysteriously' without significant mallocs going on that it will impact access to the data written to swap later on.

--
XML is like violence. If it doesn't solve the problem, use more.