ArsTechnica Explains O(1) Scheduler
geogeek writes "The recent release of Linux's 2.6 kernel introduces a number sweeping improvements. These can be hard to understand without a background in programming. This week's Linux.ars examines the improved scheduler for an enthusiast audience, concisely explaining its nuances and the practical effects of an O(1) design."
I'm the author of that article..
The original article was MUCH MUCH more in depth it was toned down because of the "glaze over" effect people were getting. I didn't have much say in the editorial process in the matter.
I think this way atleast it appeals to a broader audience, not just the CS audience who would know what a binary min heap is if I were to use it.
To each his own I suppose.
- Lean and mean (low overhead).
- Scales well with the number of tasks (O(1)).
- Scales well with the number of processors (O(1) for scheduling, O(N) for balancing).
- Strong affinity avoids tasks bouncing needlessly between CPUs.
- Initial affinity makes it likely that request/response-type tasks stay on the same CPU (i.e., good for LMbench lat_udp et al)
BTW, It's good to see that the starvation and affinity problems that plagued the early versions of the O(1) scheduler have been ironed out.Regarding "dumbing it down":
There is one exception to this load balancing rule -- some special processes may be fixed to a certain runqueue. This attribute is called "thread affinity;" a thread is simply another name for a process.
I think it over-simplified this a bit. Granted, Linux doesn't make all that much distinction between a "thread" and a "process" as compared to (say) Windows, but the distinction is still important.
Otherwise, it's a decent article, and gave me some insight to one of the major improvements in 2.6 (which I've yet to try out).
I remember when the "pre-empt" patch came out for 2.4, before it was integrated into the kernel. For a desktop (or otherwise interactive, like my media box) system, it really made a major perceived improvement in performance (while *actual* performance wasn't affected; it was just about priorities and scheduling, making it "feel" more responsive).
The O(1) scheduling algorithm likely improves this further, and is enough to tempt me to spend XMas evening upgrading... these things are important, especially on a desktop system.
Windows XP, for example, isn't all that great in this area, and often one process can too easily slow things down. This is even further emphasized when all the processes you're working with are actually the main "explorer.exe" process; eg, you do something in Windows Explorer that blocks (spin up a CD drive) and everything else (the "desktop", task bar, etc) all become unresponsive...
I enjoy seeing such improvements in the Linux kernel. The pre-empt patch (later integrated into the mainstream kernel) made a drastic usability improvement (especially when the box is also a web server, MySQL server, firewall/router, etc) and I couldn't see a Windows box handling similar tasks without becoming unusable (note: not unstable, just not practical to use when responsiveness is hindered).
I think the kernel team is on the right track with this, specifically for desktop applications (though it does help in many other applications as well; anything interactive, really, even over the network/Internet).
NGWave - Fast Sound Editor for Windows
http://67.163.90.104/~jim/scheduler.html Thats the completely unedited first draft its all I could find. That has zero editing done, and is even missing a quote up at the top. It goes further in depth, but as you can see...my unedited writing isn't so great ;)
O(1) doesn't mean the time is constant, but that in the limit it is bounded by a constant. It can get enourmous, then shrink for large inputs. It could go to 0 in fact.
Mainframe schedulers have been O(1) for a long time. The one for UNIVAC 1108 EXEC 8 was O(1) in 1967. (And you can still buy a Unisys ClearPath server with that code in it.)
The traditional implementation is a set of FIFO queues, one for each priority. For systems with only a few priorities, that's just fine. As the number of different priorities increases, you spend too much time checking the empty queues, so there are tree-like structures to get around that problem.
Scheduling today is more complex because of cache issues. If you context-switch too much, you thrash in the CPU caches. On the other hand, everybody has so much memory today that paging is mostly unnecessary.
Let's see if I can remember anything from my algorithms class.
How is this not O(n)?
Read on for an explanation. I do believe you gave the answer to your own question in your post. I will also assume that you understand "Big-O" notation well enough to follow the answer.
Each process still gets its time slice calcuated, it is removed from one queue, and inserted into another.
Correct. You've nailed it. Insertion into a Que operates in O(1) time. Removal from the Que is also a constant time algorithm. A Que inserts at the tail of the list (or the end of the array) and De-Ques from the Head of the list (or the beginning of the array). Ques do not permit searching, sorting or accessing the data after the first element.
If we De-Que a process, run it then calculate some information and Que it up that would run in O(c1*O(1)+c2*O(1)) = O(1).
This is still an oversimplification and does not take into account the possibility of a Priority Que in which insertion runs in O(lg n) and De-Queing runs in O(1). So a heap sort using a Priority Que runs in O(n*lg n).
Hope that helps.