The Really Fair Scheduler
derrida writes "During the many threads discussing Ingo Molnar's recently merged Completely Fair Scheduler, Roman Zippel has repeatedly questioned the complexity of the new process scheduler. In a recent posting to the Linux Kernel mailing list he offered a simpler scheduler named the 'Really Fair Scheduler' saying, 'As I already tried to explain previously CFS has a considerable algorithmic and computational complexity. This patch should now make it clearer, why I could so easily skip over Ingo's long explanation of all the tricks CFS uses to keep the computational overhead low — I simply don't need them.'"
help in the case when a process goes nuts allocating memory, and stops the GUI dead in its tracks? No Alt-Ctrl-Backspace, no switching to console, unbearably slow remote login...
Tsunami -- You can't bring a good wave down!
I'd have to imagine doing so much work to prove a particular implementation's value mathematically is a good step toward depoliticizing the scheduler. That should help in what's been a contentious piece of the kernel of late.
Slashdot - where whining about luck is the new way to make the world you want.
After all, isn't that the idea of open source software -- may the best code win?
Free Conference Call -- No Spam, High Quality
Agreed. While I recognise and appreciate the humor in your comment, this is the main reason I use Debian on the desktop rather than OS X -- I multitask heavily. A Linux kernel with a Desktop preemption model and 1000Hz Timer frequency is a Godsend for those who push their PC's a tad too hard on a regular basis. I would like to see a simplified version of the scheduler, but all said CFS isn't as bad as everybody makes it out to be.
Working in a DevOps shop is like playing in a band made up entirely of keytarists.
People don't you understand? Ingo Molnar is the favourite puppy of Linus, so completely forget anybody else getting a chance to make a better scheduler for the kernel...
It's the usual politics and corruption...
The problem is Linux is used in a spectrum of 3 obvious types, servers, workstations and desktop and the developers tend to be very sensitive to the server and workstations areas so in the end of the day it'll be test cases that favor servers vs. test cases that favor desktops. What makes me wonder is why don't they develop three, each one optimized for a particular usage pattern and just let me select the kernel I want with GRUB? It should be possible to modify init to select the correct rc.conf to each pattern as well.
Apocalypse Cancelled, Sorry, No Ticket Refunds
Linus chose the scheduler written by the person that best interacted within the existing developer structure and responded to problem reports. The rejected scheduler may have been slightly better, but the developer was much less cooperative and responsive to bug reports. He killed his own project because of attitude.
Kindness is the language which the deaf can hear and the blind can see. - Mark Twain
I experimented with it, but not in depth. As far as I remember, ionice didn't help a lot compared to real mainframe I/O scheduler. I have always felt that Linux was weak on I/O scheduling and other posts tend to confirm what I suspect.
Now, if you tell me that I can do real I/O scheduling with ionice and that you have managed to accomplish that. I might give it a second try, more in depth this time.
Also, please specify kernel tweaking parameters to cause ionice to act as a real I/O scheduler.
Again, I might not have experimented with ionice enough to possess an accurate picture but other posts on this thread seem to lead to what I assumed so far.
Everything I write is lies, read between the lines.
It's fairly well known that large writes to the filesystem can cause huge read delays.
This seems to be aggravated by a number of conditions listed in the links posted by the parent post, but it's also aggravated when using ext3 and ordered data journaling as well (which is the default on most systems).
There is some work being done to reduce the huge latency in reads that can occur during heavy write loads with the "per device dirty throttling" patchset. Initial results look very promising.
LWN article: Smarter write throttling
per device dirty throttling -v8
This patch set seems to hold a lot of promise in being able to fix this problem, but I'm not sure what the latest status is or what kernel it will make it into. It could make it into 2.6.24 at the earliest.
At least in linux, and I presume FreeBSD's swap strategy is similar, you miss the point. Let's look at two scenarios, one with proactive swapping, one without, and a malloc comes in that exceeds system memory.
Non-proactive case:
-kernel sees malloc, knows it lacks physical memory to accommodate, malloc is blocked while kernel does housekeeping.
-kernel picks the appropriate amount of pages to write to swap, then writes those pages to swap space, taking a while since block storage IO is excruciatingly slow.
-After the extremely long previous step, the memory is freed
-the malloc is allowed to continue, after a number of milliseconds have passed to execute the drive write, aside from the drive write, everything was in the microsecond scale, so it was delayed by a factor of thousands.
Proactive case:
-The system has some idle time, with nothing immediately better to do, the kernel notes free swap space and flags some appropriate memory as what would be swapped out if and when the system was in need, and copies it to disk, but it *leaves it in memory*. The kernel remembers that while these pages are indeed in memory, it can zap them and be able to restore. This is the critical point, the data in memory has not been *moved* to swap, it has been *copied* to swap.
-Program using that data randomly kicks back to life. It's needed data is on disk, but it is a moot point because it is in physical memory too, so it isn't slowed down. The kernel might take this opportunity to re-evaluate things when idle in terms of what it thinks is mostly unwanted pages.
-Later on, a program needs to malloc and physical memory is exhausted, the kernel blocks to do housekeeping, finds pages that it knows it has copied to disc, frees them and uses them to satisfy the malloc, within microseconds.
Proactive swapping causes extra IO activity during idle, but does not, if implemented correctly, impact things proactively swapped unnecessarily negatively, and allows swap on actual demand to be nearly trivially fast. It may be wasteful to have gobs of swap, and certainly if the swap has the sole copy of tons of data then performance is hopeless, but don't think seeing the swap used count go up 'mysteriously' without significant mallocs going on that it will impact access to the data written to swap later on.
XML is like violence. If it doesn't solve the problem, use more.