Robert Love, Preemptible Kernel Maintainer Interviewed
Tom F writes: "LinuxDevices did an interesting interview with Robert Love, the maintainer of the Linux preemptible kernel along with MontaVista. It is an exciting read and has Robert's usual wit and insight."
I do believe that what Love is working on is in fact a patch, and not a fork.
Like all true geeks, Love doesn't forget to include in his comments that, despite being a computer nerd, he does, in fact, have a girlfriend.
RL: Approximately how much time per week do you spend working on your kernel patch for Linux?
Love: My girlfriend would probably say too much. Anywhere from a couple hours a week to many hours a day.
Obviously, you want someone that knows the kernel really well and can maintain every part of it.
That person doesn't exist, not even Linus knows every part of the kernel inside and out.
You have to trust the maintainers of their parts of the kernel because as good as Linus, Marcelo, Alan, etc are they can't know all the gotchas, etc of all the drivers and different kernel subsystems.
Aren't you taking a good developer (who can maintain every part of the kernel) away from the newer versions of the kernel?
They don't have to do it if they don't want to or don't have the time. But with Alan's recent want to not maintain 2.4.x so he can work on other things seems to say how much time is really required for the maintenance of a kernel tree.
There are only so many developers, what happens if you run out?
I highly doubt there will ever be that many currently maintained kernel versions.
If you maintain different kernels, people say "OHMYGOD we are forking we will all DIE"
If you roll changes into a kernel and make it unstable, people say "OHMYGOD production kernel's not stable we will all DIE"
RTFA and lighten up. The patches are being considered for 2.5. They haven't been ruled out.
www.eFax.com are spammers
>the Linux operating system will soon end up like
>*BSD, with several mutually incompatible,
> infighting factions. We can't let this happen.
Where on Earth did you get this nonsense from? There are three open-source BSDs, each with a different focus. Admittedly a few OpenBSD and NetBSD developers aren't the best of chums, but most of the developers are happy to share code between projects (eg. dirhash, smp, usb etc)
--Jon
And right now Linus hasn't let this and several other patches into 2.5 officially becaues he's focusing on the bio changes and it's much harder to debug if you have multiple subsystem changes going on at once. And Linus also has shown interest in Ingo's new O(1) scheduler, which the preempt patches have only recently become compatible with.
Remember when you had to compile your device drivers into the kernel yourself instead of using a module? The idea here is that the vital OS features are part of the kernel.
The open source movement is about modifying your software and sharing it. Anyone with the ability can modify a vital OS feature and share it. voila! Many, many kernels.
But, the problem here is that real time processing does not belong in a macrokernel architechture. Look at the commercial RTOS (Real Time Operating Systems) like QNX and you will see that a microkernel architechture -- a kernel that a provides minimal feature set is favored. This is because if you are depending on time constraints, all you want in your kernel is message passing and task syncronization.
I'm supposed to be working right now.
a quick nice -n -5 /usr/bin/X11/X helps, too.
I have been following these two patches a bit, because I want my desktop to be as snappy and responsive as possible.
The current low-latency patches work by finding "hot spots" in the kernel code where something is taking a long time, and then putting in a hack to make the code yield. The good part is that you can get a really low-latency kernel; the bad part is that you have to touch the kernel code in hundreds of places, and the kernel code gets really ugly. I remember reading that Ingo Molnar, who wrote a giant low-latency patch that worked this way, agreed with Linus that his low-latency patch was just too ugly and huge and should not be included in the main source tree.
The preemption patch is comparatively small and elegant. It leverages the work that has already been done to make SMP work correctly. I'm using it on my Linux desktops, and I like it.
On one of the mailing lists, Linus said that he wants the Linux kernel to gain low latency the cleanest way: find all parts that are slow, and instead of hacking them to yield, re-write them so they are faster (but still clean code that is easy to understand). This is of course the ideal, but when will it be finished? The preemption patch is available now, and works now, and I am using it now.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
There are only so many developers, what happens if you run out?
If the supply of Linux kernel developers runs out (as unlikely as that seems), then kernel developers get hired away from different kernels.
Seriously... there are folks who, for commercial reasons, need a lot of this work to be done; and a developer with real-time experience from a different UNIX kernel can familiarize with Linux within a reasonable amount of time.
For instance, quite a few of the folks MontaVista has hired to do kernel work come not from a Linux background but from working on other Unices -- one fellow we hired to work on preemptability originally did the same variety of thing for IRIX, IIRC.
Alan Bawden wrote a paper on it, and it's quite a good read. His web site has a compressed .gz version, but I found an HTML version of the HTML PCLSR Paper and I quote from its abstract here:
There was also a way to put the system into a PCLSR test mode that exercised all these control points within the system calls, to help debug them. See SYSDOC TEST documentation extracted from the now decomissioned AI PDP-10 that originally served it up as ftp://ftp.ai.mit.edu/pub/alan/its/sysdoc.tgz (yes, ITS was on the Arpanet and the Internet and ran TCP/IP as well).
...Alan Cox's own fork which is steadily separating from Linus' core...
Well, Alan isn't really in the business of doing this anymore. But even when he did, that's not the way things worked -- the Alan and Linus trees stayed fairly in sync with each other in many ways.
What in the heck do you think this IS? Many of the best students treat developing their coding and CS skills more like a musician or artist practicing and performing for many hours a day. It's a creative act, and can be very involving. Moreover, that level of involvement hones problem solving and practival skills that the "just a job" students can never hope to achieve.
I always wondered about students who didn't have any passion for the field. From what I've seen in both academia and industry, that "just a job" mentality reduces one's skills to "programming fodder", and would seem to be a pretty unenjoyable career.
Studying comes after Kernel Hacking and right before Binge Drinking. If we're not lucky, then the Kernel Hacking would come after or during the Binge Drinking
thats complete nonsense. the behaviour you're describing is what any multitasking kernel does. it has nothing to do with the preemptive kernel patch, which ensures that when a task becomes runnable (for example, due to a device interrupt that tells the CPU that a condition the task was waiting for is met), it can start running ASAP, rather than waiting till the next regular re-scheduling point. ASAP here means within at most a millisecond or two. There will be always be places where the kernel has to prevent this from happening; the goal is to minimize (and document) those places.
I think you'll find that Linux has premptive multitasking too. What it can't do (without the preempt patch) is prempt a task that is currently running kernel code (e.g. through a syscall). I've no idea whether the Amiga's "kernel" (exec?) was preemptible in this sense or not.
Special Relativity: The person in the other queue thinks yours is moving faster.
For those that don't really understand the importance of preemptive multitasking (and from reading some comments, there are a few of you out there :-O), let's explain this through an example.
:-)
Consider application (a) that wants to read 128MB of data from the disk and application (b) that wants to read 1KB of data.
Let's say that the disk transfers @ 1MB/sec and let's assume application (a) issues the read 1 second before application (b) is started.
The sequence of calls for each app will look something like this:
1) program calls read
2) read is handled by the top level file system and is handed down to the proper file system
3) the file system calls the block device
4) the block device driver breaks up the request into the maximum block size the device can handle per request (for example 256 sectors for IDE)
5) for every request, the block device sends request to physical drive
6) drive transfers data to host
7) drive indicates 'done'
8) goto 5 until done
9) file system returns
10) read returns
It's important to know that there may be limitations on the number of requests any stage can handle simultaneously. For example, an IDE drive can only handle one request at a time. Some Operating Systems however introduce even tighter restrictions, because for example the block device driver was written, assuming only one request at a time would be allowed.
Take for example an OS where the kernel assures that no more that one request is pending between step 3) and 9).
This would have the following effect on our apps: app (a) is allowed to call step 4) because it is the first request. One second later, app (b) arrives at step 3) and is blocked. It is not allowed to enter step 4) until app (a) is done and passes step 9). Effectively this means that app (b) has to wait for 127 seconds before it gets access to step 4).
Now consider an OS where the file system and drivers can handle multiple requests. It still has to assure that the physical drive receives only one request though, so it permits only access of one app between step 5) and 7). App (a) arrives at step 5) first, so it gets to start sending requests to the drive. One second later, app (b) arrives at step 5) and has to wait for app (a) to finish it's current request to the drive. As soon as this request is done though, app (a) gets to step 8). Now we have both app (a) and app (b) wanting to have access to step 5). Depending on the scheduler either one will be granted access. There's a good change that app (b) will get access, and thus it only had to wait the time it took for app (a) to finish it's outstanding request, which takes max 1/8th of a second (256 sectors = 128KB) in this example.
Btw. you may notice though that there's more likely to be an (expensive) seek introduced by allowing app (b) to interrupt the transfer of app (a)
You can see how moving the 'lock' deeper in to the OS improves responsiveness. I'm not going to start a flame war about which OS is better, all I will say is that MY OS locks steps 5) through 7)
(Aside from the fact that all of the Linux kernel, drivers, etc. is in 'kernel' mode, and a MicroKernel has only the message-passing and task-scheduling in 'kernel' mode, and everything else (drivers, etc.) run in 'user' mode.) :) The preemptive bit is orthagonal to micro/macro kernel. There are non-preemptible microkernels (like MINIX, I think) and preemptible monolithic kernels (Solaris).
>>>>>>>>>>.
That's basically everything that distinguishes a microkernel from a monolithic kernel
A deep unwavering belief is a sure sign you're missing something...
If a user gets an oops and submits a bug report to linux-kernel while running preempt, the bug report is a lot harder to decipher.
True, but the SMP support introduced these to begin with. The preemptive patches just bring that danger to UP machines.
Not true. preempt introduces new hangs. Read the threads on linux-kernel, especially "Re: [2.4.17/18pre] VM and swap - it's really unusable".
Some things that are broken by preempt:
* Network drivers which disable IRQs to avoid spinlocking on uniprocs (major performance win)
* Drivers which use per-CPU data to avoid spinlocking at all
* Drivers which disable individual interrupts for long periods of time
* Drivers which depend on consecutive lines of code executing near each other in time, especially serial drivers
That thread has details on all of these.
And there's the priority inversion scenario:
SCHED_OTHER process 1 acquires a semaphore in kernel mode
SCHED_FIFO realtime process needs the semaphore, blocks on it
Now the rt process is stuck pending progress from the SCHED_OTHER process. Without preemption, the SCHED_OTHER process would have done whatever required the semaphore and released it. Now, SCHED_OTHER process 2, 3,
[Note that I said semaphore, not spinlock, so the lock-break code won't help.]
Do you not remember all the problems with priority inversion and the SCHED_IDLE patches? This is exactly the same problem, it's not like it's something new and mysterious that people are making up as FUD to stop preempt from getting in to the kernel. Any introductory OS textbook discusses it, and priority inheritance is the only robust way to eliminate it--with all the problems that come along with priority inheritance.
4. It doesn't improve the worst-case latency.
>>>>>>>>>>>
It's not designed to. Love has already started work on a lock-breaking patch to get rid of long-held locks
It won't help. The fundamental problem is that preempt in interrupts is impossible with this scheme (and AFAIK nobody has proposed a scheme that could even theoretically work), and the worst-case latencies are in interrupts. The secondary problem is that a lot of these issues are highly hardware sensitive; different hardware has different timing requirements, and the only way to find them all is to audit every driver and deal with it appropriately (as, e.g., the LL patches do).
As Alan Cox wrote,
Sumner
rage, rage against the dying of the light