Preemptible Linux Kernel: Interviews and Info

Public Service Announcement from Brokaw & Torv by waldoj · 2001-10-14 06:33 · Score: 5, Funny

We're sorry, but tonight's "Linux" will not be aired. Normally you would find 2.4.12 or 2.4.13-pre2 on Sunday nights, but not this evening. Now that Linux is fully preemptible, NBC will be airing a four-hour music-and-ice-skating tribute to Bill Gates.

We apologize for any inconvenience, and for the reduced uptime. Enjoy the show.

Re:Hmm by selectspec · 2001-10-14 06:53 · Score: 5, Informative

The interrupt handlers can't allow premption during the context switch of an interrupt because the registers are intransit. Basically, you can't have an interrupt while your in the process of any kind of context switch otherwise you're never sure what registers you were able to flush to and from the CPU to the stack.

Critical Sections (such as access to the IP stack or I/O queues) have to be protected. With the advent of multi-processor systems under the SMP scheme, there is already considerable locking within the kernel to synchronize access of critical resources between processors. Critical regions also need to be protected from interrupt concurrent access as well.

Bottom Half handlers generaly are fast track implementations to quickly deal with the interrupts. To avoid concurrency collisions of reasources used within the bottom half handlers, interrupts (for that particular handler) must be disabled during the handler's execution.

All in all, this is basic non-preemptive stuff. What I don't understand is that this strategy that he is defining is a textbook NON-premtive approach to kernel design. I'm not too sure where he gets off claiming that the kernel is fully-preemptive here.

--

Someone you trust is one of us.

Background and a different patch by alewando · 2001-10-14 06:55 · Score: 5, Informative

If you're wondering what the heck a preemptive kernel entails, then here's some background.

Also, if you don't like Robert Love's implementation, then Andrew Morton maintains a patch with a similar low-latency goal.

Links on Spinlocks, etc by Alien54 · 2001-10-14 06:59 · Score: 5, Informative

There are these links:

Linux Devices Article detailed, very nice, on this issue from Sept 6
Kernel Hacking How-to Page on this again, very detailed.
The Kernel SpinLock metering FAQ

All around useful stuff, enough to get you started destro^H^H^H^H^H^H hacking your own kernel

--
"It is a greater offense to steal men's labor, than their clothes"

Re:So will that make Linux a superior audio platfo by Xoro · 2001-10-14 07:05 · Score: 5, Informative

I don't want to sound like I'm contradicting you, but did you happen to read this link from the article? It's specifically about realtime audio. Key paragraph:

*EXCITING* NEWS: things getting almost perfect ! Ingo's lowlatency-2.2.10-N6 patch with the shm.c part backed out and a modification of filemap.c (thanks to Roger Larsson) performs _REALLY_ well, using my usual latencytest parameters (4.3ms buffer), I got NO DROP-OUTS anymore, with sporadic maximum peaks of ONLY 2.9ms This is really exciting because it opens the doors to a whole new class of Realtime applications for Linux, simply using userspace processes scheduled SCHED_FIFO. I heard of comparable low-latencies only from BEOS, Windows can't simply guarantee these kind of latencies, not even using DirectX. Using a soft-synth on Win98 on my BOX I must use 15-20ms audio buffers to get _SOMEWHAT_ reliable audio. This is actually about more than 3-4times the buffer I used for testing under Linux ( 4.35ms).

I don't know much about the field, but the page seems to speak to several of the audio-related concerns mentioned above.

--
Kill, Tux, kill!

Re:Hmm by Anonymous Coward · 2001-10-14 07:09 · Score: 5, Funny

Only on slashdot would "IP stack", "I/O queues", "interrupt concurrent access" and "SMP" be considered laymans terms.

Re:Hmm by sagei · 2001-10-14 07:54 · Score: 5, Informative

I originally felt I should stay out of any discussion here, but I want to answer some of these questions and clear some of this stuff up. To be honest, it is a little embarrassing having everyone read and comment on the interview. :)

Bottom Half handlers generaly are fast track implementations to quickly deal with the interrupts. To avoid concurrency collisions of reasources used within the bottom half handlers, interrupts (for that particular handler) must be disabled during the handler's execution.

Interrupts, even just the in question, are not disabled during a bottom half, at least in general. The reason we can't preempt bottom halves is that they are guaranteed to be serialized w.r.t CPUs (ie a given BH runs on only one CPU at a time). Because of this, the BHs are designed without a regard reentrancy. So we can't preempt them.

All in all, this is basic non-preemptive stuff. What I don't understand is that this strategy that he is defining is a textbook NON-premtive approach to kernel design. I'm not too sure where he gets off claiming that the kernel is fully-preemptive here.

Hardly. Would you say an SMP system is not SMP if it is non-concurrent inside critical sections? No, you wouldn't, and this is the same situation we have here with preemption. We can't preempt inside critical regions. We have concurrency and reentrancy concerns, just like SMP does. We also can't preempt inside interrupt handlers or bh's because they aren't designed to be preempted (nor would you want to interrupt the top half of an interrupt, anyhow).

The current kernel is not preemptive _anywhere_. The only way, in fact, kernel code ever yields execution is if it explicitly does so or returns. Since with the preempt-kernel patch we can now preempt in 90% of the kernel, I think its safe to say we have a preemptible kernel now.

--

Robert Love

Re:I'm not sure... by sagei · 2001-10-14 09:09 · Score: 5, Informative

Disclaimer: It's my patch

I think this is a good short-term solution for the latency problems but I personally wouldn't include it in the main kernel releases. I believe that it *might* be a good idea to fork the kernel releases (temorarily) in two groups: One for servers and one for workstations until the problems have been solved.

I tend to look at this more of a long-term solution, and I think people who see it has a short-term solution or hack are missing the point. First, this is a feature. We aren't kludging kernel code so that we can lower latency by stopping it when needed. We are effectively using the SMP code to multitask better within the kernel.

Second, forking the kernel over this is a terrible idea. Since it is a config setting, this is a non-issue anyhow, but I really don't want to see this thing forked off. In fact, I think the ideal situation is where we can get a preemptible kernel that benefits throughput so that server processes benefit from it as well.

I think that (for now) using this patch on workstations is a pretty good idea

Agreed :)

And I think there should be a better solution for the problem witch should THEN be something along the lines of kernel 3.0

There isn't a better solution that is not a hack. There is a reason Solaris, NT, and all RTOS are preemptible inside the kernel: it is the only way to achieve real-time response. You just _have_ to be able to respond to events when needed.

The "better" solutions in this case are "simpler" -- if we can hack some conditional schedules into places, perhaps simplify some algorithms, etc. then we can perhaps reduce latency without preemption. This is what Andrew Morton's low-latency patches do. But we need more. The point is not that preempt-kernel is a hack, but that it is a whole new high-tech feature, and some people want to find a simpler solution.

Personally, I don't think a simpler solution exists, and I believe the preemptive kernel satisfies other problems (and it also a neat feature:>). Thus I work on it.

--

Robert Love

Re:yes, but why? by sagei · 2001-10-14 09:39 · Score: 5, Informative

Disclaimer: It is my patch

but why does anyone need better latency? afaikt, the latency here is strictly for people who want to do RT audio effects. this has nothing to do with audio playback , which has no latency sensitivity (because of buffering). this also has nothing to do with "feel", since humans are terribly slow, and cannot possibly feel the difference between 5 and 10ms.

You ever have an mp3 skip? Audio become out of sync in a game? That is caused by scheduling latencies becoming greater than the duration of the audio buffer. Ie, audio playback does not just need x units of CPU but it also needs it every y units of time. The preempt-kernel patch helps alleviate this.

I hope that Linus will look at whether these patches hurt the normal case. "normal" means things like kernel compilation, not just an arbitrary latency measure and dbench (one of the least realistic benchmarks possible!)

Not only does preempt not hurt a kernel compile, but it helps it. I and many users have benchmarks. One of my requests from users is to get a lot of benchmarks and "feelings" so I can substantiate the patch. I am _not_ an audio guy. I use my Linux machine to code, go on the net, etc. just like 90% of the people here. Preemption helps me. I don't want to hurt the common case either.

Even so, it is a configure item. Merging it into the kernel does not equate to you having to use it. But I bet you would want to!

there are good reasons to be skeptical of all-out premptiveness: it will unavoidably lower throughput in easy-to-define cases. any intro OS text will talk about optimal scheduling, where 'optimal' requires a definition of throughput or some other metric.

The cases in which we lower throughput are cases in which file I/O is favored since it runs until completition. In this case, you can extend that argument to be that I/O-intense tasks should just be cooperatively scheduled. An I/O task won't be preempted unless its timeslice has run out (ie, it should be preempted, and it would be if it were in userspace). If the I/O is so critical, run it at a higher priority. Hell, maybe we should look into a higher timeslice.

Note that a lot of this is a non-issue, since we don't affect throughput (or actually improve it!) In the cases throughput is decreased, it is just a couple of percent, which could be cost-benefited to the increase in response some other application gets.

we need to set a target (5ms would be fine imo) and meet it. going beyond such a goal will hurt the normal case.

This is very very true, and an insightful point. One of the problems with this whole latency quest is that eventually we are going to reach some point and have to decide if enough-is-enough. We can always keep doing more and eventually the work _is_ going to be detrimental to the common-case. I agree we need to set a threshold and celebrate when we reach it. The super-special situations needing much lower latency can apply super-special solutions.

--

Robert Love

Re:needed badly by chabotc · 2001-10-14 10:57 · Score: 5, Insightful

Actualy 9 out of 10 cases when that happens, and the hardware is locked up, it will have locked up the PCI bridge as well (they have to to communicatie), so this wont do anything.

Also if the systeem feels locked up, and its not a hardware lock, there's a good chance its the tty/console subsystem thats killed.

only in a few cases, where a run-away process would deal out so much of a beating to the system, then the better multithreading will help in the way you described.

(ps, telnetting in is always a good work around for a system with a dead keyboard/console :P)

Re:I'm not sure... by sagei · 2001-10-14 12:28 · Score: 5, Informative

I thought that what (certain) kernel hackers really objected to is preemption while locks are held. The complications (eg priority inversion) they talked about seem only to arise in that case.

There are a few reasons other hackers complain, although I didn't know this was one of them. Since MontaVista's original preemptive kernel work, I believe, we have never preempted inside of locks. Note that you can, but then you reach the issues with deadlocks and thus the need for priority-inversion that you spoke of.

So, first, does "fully-preemtive" traditionally mean with or without locks? Are Solaris, NT, and RTOS preemtible when locks are held?

I would say it means sans locks. None of the mentioned OS's are preemptive while holding a lock. You always have to respect the lock. Now, you can preempt during the lock and go do other things. If you do this, you are assuming the lock is going to be held long (or else it is favorable to just spin for a cycle or two). In this situation you want to use semaphores, which we _do_ preempt during.

When a process hits a semaphore that is in use, it goes to sleep and something else continues. The process awakes when the resource is available. Now we reach the problem you wrote of above: priority inversion. What if task A holds resource Y and sleeps waiting for resource X and task B holds resource X and sleeps waiting for resource Y? You deadlock.

Thus we need to use a type of semaphore called a priority-inheriting mutex, which inverts the priority of the task holding a resource so it will always complete and release the lock. I know Solaris has these. However, I would consider any kernel that can preempt itself in general a preemptible kernel.

Second, observed results aside, what reason do you have to believe that preempting the lock-less parts of the kernel is "good enough". All else equal, one would expect the latency distribution to be similar with and without locks, so you would expect plenty of "worst cases" to occur with locks. Of course, there is already a pressure to reduce the time that critical locks are held, but I wouldn't be surprised to see non-contended locks (especially outside the kernel core) held for long times. So is there a good reason that the important "worst cases" are happen without locks?

First, before I cast results aside, let me mention that observations show we are already lowering latency a great amount. But, you are right, periods in which locks are held are a problem. This is why I mentioned in the interview the use of things like Andrew Morton's low-latency patch, the preempt-stats patch (for finding the locks), etc.

Some of the problems still occur while locks are held, but thankfully the point of a spinlock is that they are held for a VERY short time. A solution to this may be to replace the spinlocks held for a long time with a priority-inhereting mutex.

--

Robert Love

Slashdot Mirror

Preemptible Linux Kernel: Interviews and Info

11 of 238 comments (clear)