ArsTechnica Explains O(1) Scheduler
geogeek writes "The recent release of Linux's 2.6 kernel introduces a number sweeping improvements. These can be hard to understand without a background in programming. This week's Linux.ars examines the improved scheduler for an enthusiast audience, concisely explaining its nuances and the practical effects of an O(1) design."
I read this piece yesterday, and while it did "dumb down" the basics (as the first poster noted), I thought it did a very good job of putting it all into a nutshell that those of us not as familiar with Big-O and schedulers in general might easily understand. For Linux.Ars' format, I thought it was of appropriate length, and had enough detail to "belong." I'm sure there are more detailed writeups on the O(1) scheduler in place in 2.6. Does anyone have any links?
"Aye, and if my grandmother had wheels, she'd be a wagon!" -- Montgomery Scott, ST:III
I setup a small computer today with 2.6. It boots from an initrd but I noticed during bootup the kernel will pause for a few seconds with:
checking if image is initramfs...it isn't (no cpio magic); looks like an initrd
I couldn't google much info on it. Anybody know more about it? Or how I can stop the kernel checking for it.
I clicked on this article expecting to see an explanation of how it was that the O(1) scheduler worked, and by what tricks it was able to schedule in O(1) rather than having to spend extra time as extra processes are added, and what the real-life effects of such a situation are.
Instead the article was just "This is what O(1) is. This is what a scheduler is. This is what "preemptive" means. The 2.6 SMP can load-balance effectively across many processors. I used this really cool mp3 player this week." and just about nothing else.
Anyone want to explain exactly how it is that an O(1) scheduler is a difficult thing, and how exactly linux 2.6 achieves it?
-- Super Ugly Ultraman
While Ars definately isn't targeted at the same audience as, say, KernelTrap, its nice to see there are a few technology websites/publications that aren't dumbed down. I remember when Byte magazine used to publish articles detailing the PowerPC architecture, down to the level of registers and the types of pipelines in the first set of implementations. Compare this to the ZD rags, which are a hair away from calling the CPU the "brain" of the computer!
A deep unwavering belief is a sure sign you're missing something...
I'm not sure I understand how a hardware scheduler would work. It would need to keep track of the process's state, including virtual memory and registers, no? I don't know how the hardware would do that without the OS's support. You would need a software scheduler to manage these things; otherwise, it seems like you'd be limited to one process per CPU. Correct me if I'm wrong, though.
While O(1) certainly might sound impressive to someone who's taken a first semester computer science course in university, anyone beyond that would obviously realize you've got to worry about the average case analysis.
To achieve O(1) you usually have to do some tricks that will get you quite a large constant. With an appropriately small number of processes, which I assume is the case with most computers, even an O(n^n) scheduler could be faster if the constant is very, very small.
Have any sites done an average case analysis to show at what point the O(1) scheduler is faster than the old 2.4 kernel? Certainly it isn't always faster or else the old one really did suck.
www.usenix.org/publications/library/proceedings/a
When it comes time to select a process, the scheduler finds the highest priority level containing available processes. It then selects a process from the desired priority level. Because the number of priority levels is fixed, the scheduler always takes a constant amount of time.
Yes, but it does need to inspect all the processes within a priority level, right??
As the total number of processes grows, the average number of processes within a priority level also grows, and so does scheduling time. So it's not O(1).
Unless choosing a process from a priority level is also constant time -- say, if it were a round-robin queue.
What am I missing?
The mechanism for recalculation of timeslices in previous Linux kernel's was very simple. When every process had its timeslice completely depleted (they were all 0) the kernel would simply go through every process and recalculate its timeslice and start execution again at the highest priority runnable process. While this is the most obvious solution it is also very inefficient, executing in O(n) time.
Ok, its easy to see why this is O(n).
The 2.6 scheduler uses a simple yet effective method for getting rid of this problem, it uses two priority arrays! One priority array is for processes that are runnable, and one priority array is for processes that are not runnable (they have depleted their timeslice). This way if when a process has depleted its timeslice the scheduler simply recalculates its timeslice, removes it from the active array, and inserts it into the expired array.
How is this not O(n)? The time slice calculation still occurs for each process, just not all at once for all processes. Each process still gets its time slice calcuated, it is removed from one queue, and inserted into another. Is there some other unmentioned trick that eliminates the calculations? Or was there something else that made the 2.4 scheduler O(n), such as finding the highest priority process?
So when all processes have depleted their timeslices there is no need to recalculate timeslices for every process, the two arrays are just switched (for the code oriented among us: they are accessed via pointers and the pointers are simply switched).
So the calculation is done per process as they finish their time slice, rather then at the end when all the processes are done. I still don't see why this would imply better efficiency. Am I missing something?
At any rate, thanks for the link, it was much more informative than the published article.
This is more of a virtual memory or disk scheduling issue than process scheduling. Even if a process has full access to the CPU, it still can't run if either it has code or data it needs that are not paged into RAM, or need to be loaded in initially from disk.
Especially in Windows, it seems like if a process is hammering the disk in the background, other processes can be starved from accessing the disk. Windows seems especially braindead at times when multiple processes fight over the disk, and the disk is wildly seeking all over the place and not accomplishing much - it would be much more efficient if it would just devote the disk for a longer time to each request.
I don't know what Windows has in it for disk I/O scheduling, but it's probably safe to say that Linux is a bit more intelligent in this regard..
The article was okay, I guess. Very, very elementary. I would assert that most schedulers are O(1), though. Its really nothing new. This article was probably good for someone who never heard of a scheduler or worked with one.
Although the article was about scheduling, I would have like to have seen some more text about Big-O notation and why it is important to programmers. To me, any good programmer should have a good concept of Big-O notation, how it relates to traversing data structures (such as trees), et cetera.
The article did not make a point as to why good schedulers use priority queues. Which is simple, really. All priority queues are O(1) for getting the next item in the queue. It has nothing to do with scheduling, but it is the nature of the priority queue itself... as a data structure.
I would have liked to have seen the article in its whole form. I wonder if it went into more depth about Big-O, et cetera.