ArsTechnica Explains O(1) Scheduler

← Back to Stories (view on slashdot.org)

ArsTechnica Explains O(1) Scheduler

Posted by michael on Thursday December 25, 2003 @11:00AM from the time-to-kill dept.

geogeek writes "The recent release of Linux's 2.6 kernel introduces a number sweeping improvements. These can be hard to understand without a background in programming. This week's Linux.ars examines the improved scheduler for an enthusiast audience, concisely explaining its nuances and the practical effects of an O(1) design."

11 of 295 comments (clear)

Ars' Piece by WarpFlyght · 2003-12-25 11:03 · Score: 5, Interesting

I read this piece yesterday, and while it did "dumb down" the basics (as the first poster noted), I thought it did a very good job of putting it all into a nutshell that those of us not as familiar with Big-O and schedulers in general might easily understand. For Linux.Ars' format, I thought it was of appropriate length, and had enough detail to "belong." I'm sure there are more detailed writeups on the O(1) scheduler in place in 2.6. Does anyone have any links?

--

"Aye, and if my grandmother had wheels, she'd be a wagon!" -- Montgomery Scott, ST:III
1. Re:Ars' Piece by sfe_software · 2003-12-25 11:50 · Score: 5, Informative
  
  Regarding "dumbing it down":
  
  There is one exception to this load balancing rule -- some special processes may be fixed to a certain runqueue. This attribute is called "thread affinity;" a thread is simply another name for a process.
  
  I think it over-simplified this a bit. Granted, Linux doesn't make all that much distinction between a "thread" and a "process" as compared to (say) Windows, but the distinction is still important.
  
  Otherwise, it's a decent article, and gave me some insight to one of the major improvements in 2.6 (which I've yet to try out).
  
  I remember when the "pre-empt" patch came out for 2.4, before it was integrated into the kernel. For a desktop (or otherwise interactive, like my media box) system, it really made a major perceived improvement in performance (while *actual* performance wasn't affected; it was just about priorities and scheduling, making it "feel" more responsive).
  
  The O(1) scheduling algorithm likely improves this further, and is enough to tempt me to spend XMas evening upgrading... these things are important, especially on a desktop system.
  
  Windows XP, for example, isn't all that great in this area, and often one process can too easily slow things down. This is even further emphasized when all the processes you're working with are actually the main "explorer.exe" process; eg, you do something in Windows Explorer that blocks (spin up a CD drive) and everything else (the "desktop", task bar, etc) all become unresponsive...
  
  I enjoy seeing such improvements in the Linux kernel. The pre-empt patch (later integrated into the mainstream kernel) made a drastic usability improvement (especially when the box is also a web server, MySQL server, firewall/router, etc) and I couldn't see a Windows box handling similar tasks without becoming unusable (note: not unstable, just not practical to use when responsiveness is hindered).
  
  I think the kernel team is on the right track with this, specifically for desktop applications (though it does help in many other applications as well; anything interactive, really, even over the network/Internet).
  
  --
  NGWave - Fast Sound Editor for Windows
2. Re:Ars' Piece by Sloppy · 2003-12-25 15:45 · Score: 5, Interesting
  
  Windows XP, for example, isn't all that great in this area, and often one process can too easily slow things down. This is even further emphasized when all the processes you're working with are actually the main "explorer.exe" process; eg, you do something in Windows Explorer that blocks (spin up a CD drive) and everything else (the "desktop", task bar, etc) all become unresponsive...
  
  I actually saw this behavior once on a customer's machine. I closed the CD door, and the machine went unresponsive for several seconds. I was shocked. Then I realized: millions of people think this is normal. And those millions of people live in 2003, not 1983. It was like some sort of Lovecraftian revelation.
  
  --
  As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
3. Re:Ars' Piece by spectecjr · 2003-12-25 22:02 · Score: 5, Insightful
  
  Windows XP, for example, isn't all that great in this area, and often one process can too easily slow things down. This is even further emphasized when all the processes you're working with are actually the main "explorer.exe" process; eg, you do something in Windows Explorer that blocks (spin up a CD drive) and everything else (the "desktop", task bar, etc) all become unresponsive...
  
  The kernel hasn't slowed down in that case; it's the GUI layer which has slowed down. It hit a bottleneck.
  
  Interestingly, it would appear that Windows has used an O(1) scheduler in Windows NT and up since at least NT 4.0 (which came out in 1996).
  
  It's also interesting to note that it has the same priority-boosting for IO threads that Linux has.
  
  "Windows NT also uses additional priority adjustments based on other events, such as momentarily boosting thread priority when it returns from an I/O call"
  
  Detailed explanation of the NT4.0 scheduler
  
  Easier to read version of the above
  
  Part 1 of a slightly more indepth (but in some places inaccurate)* article
  
  Part 2 of the article
  
  *The article makes it sound like it's O(N) in places; it's not. The scheduler is using a set of priority lists, which are kept in-order at insertion time when a thread exits its context, by the thread inserting itself at the end of the list for its priority.
  
  --
  Coming soon - pyrogyra
Hmm. by Anonymous Coward · 2003-12-25 11:11 · Score: 5, Interesting

I clicked on this article expecting to see an explanation of how it was that the O(1) scheduler worked, and by what tricks it was able to schedule in O(1) rather than having to spend extra time as extra processes are added, and what the real-life effects of such a situation are.

Instead the article was just "This is what O(1) is. This is what a scheduler is. This is what "preemptive" means. The 2.6 SMP can load-balance effectively across many processors. I used this really cool mp3 player this week." and just about nothing else.

Anyone want to explain exactly how it is that an O(1) scheduler is a difficult thing, and how exactly linux 2.6 achieves it?

-- Super Ugly Ultraman
1. Re:Hmm. by AmirPC · 2003-12-25 11:21 · Score: 5, Informative
  
  I'm the author of that article..
  
  The original article was MUCH MUCH more in depth it was toned down because of the "glaze over" effect people were getting. I didn't have much say in the editorial process in the matter.
  
  I think this way atleast it appeals to a broader audience, not just the CS audience who would know what a binary min heap is if I were to use it.
  
  To each his own I suppose.
2. Re:Hmm. by AmirPC · 2003-12-25 12:01 · Score: 5, Informative
  
  http://67.163.90.104/~jim/scheduler.html Thats the completely unedited first draft its all I could find. That has zero editing done, and is even missing a quote up at the top. It goes further in depth, but as you can see...my unedited writing isn't so great ;)
Interesting by Gary+Whittles · 2003-12-25 11:31 · Score: 5, Informative
The scheduler is an important part of the kernel, so this is an important thing to understand. This article describes it well, but just to summarize a bit more technically, the benefits of the O(1) scheduler are:
- Lean and mean (low overhead).
- Scales well with the number of tasks (O(1)).
- Scales well with the number of processors (O(1) for scheduling, O(N) for balancing).
- Strong affinity avoids tasks bouncing needlessly between CPUs.
- Initial affinity makes it likely that request/response-type tasks stay on the same CPU (i.e., good for LMbench lat_udp et al)
BTW, It's good to see that the starvation and affinity problems that plagued the early versions of the O(1) scheduler have been ironed out.
--
- 3 replies beneath your current threshold
Ars got it slightly wrong about O(1)... by andy666 · 2003-12-25 12:06 · Score: 5, Informative

O(1) doesn't mean the time is constant, but that in the limit it is bounded by a constant. It can get enourmous, then shrink for large inputs. It could go to 0 in fact.
UNIX catches up with mainframes by Animats · 2003-12-25 13:13 · Score: 5, Informative

For historical reasons, revolving around the "process table" design in PDP-11 UNIX, the schedulers in UNIX-like OSs were originally very dumb. The one in V6 UNIX was so bad that if three processes were compute-bound, the scheduler would stall.
Mainframe schedulers have been O(1) for a long time. The one for UNIVAC 1108 EXEC 8 was O(1) in 1967. (And you can still buy a Unisys ClearPath server with that code in it.)
The traditional implementation is a set of FIFO queues, one for each priority. For systems with only a few priorities, that's just fine. As the number of different priorities increases, you spend too much time checking the empty queues, so there are tree-like structures to get around that problem.
Scheduling today is more complex because of cache issues. If you context-switch too much, you thrash in the CPU caches. On the other hand, everybody has so much memory today that paging is mostly unnecessary.
Re:Question by krakrjak · 2003-12-25 18:15 · Score: 5, Informative

Let's see if I can remember anything from my algorithms class.

How is this not O(n)?

Read on for an explanation. I do believe you gave the answer to your own question in your post. I will also assume that you understand "Big-O" notation well enough to follow the answer.

Each process still gets its time slice calcuated, it is removed from one queue, and inserted into another.

Correct. You've nailed it. Insertion into a Que operates in O(1) time. Removal from the Que is also a constant time algorithm. A Que inserts at the tail of the list (or the end of the array) and De-Ques from the Head of the list (or the beginning of the array). Ques do not permit searching, sorting or accessing the data after the first element.

If we De-Que a process, run it then calculate some information and Que it up that would run in O(c1*O(1)+c2*O(1)) = O(1).

This is still an oversimplification and does not take into account the possibility of a Priority Que in which insertion runs in O(lg n) and De-Queing runs in O(1). So a heap sort using a Priority Que runs in O(n*lg n).

Hope that helps.