Removing the Big Kernel Lock
Corrado writes "There is a big discussion going on over removing a bit of non-preemptable code from the Linux kernel. 'As some of the latency junkies on lkml already know, commit 8e3e076 in v2.6.26-rc2 removed the preemptable BKL feature and made the Big Kernel Lock a spinlock and thus turned it into non-preemptable code again. "This commit returned the BKL code to the 2.6.7 state of affairs in essence," began Ingo Molnar. He noted that this had a very negative effect on the real time kernel efforts, adding that Linux creator Linus Torvalds indicated the only acceptable way forward was to completely remove the BKL.'"
Why did they remove the preemptable BKL?
I'm not a kernel developer, but I'd say it's because there's widespread belief that the preemtable BKL is "the wrong way forward". Statements like these lead me to believe this:
In any large software project there's always a path to get from where you are, to where you want to be. It sounds like any version of BKL is considered ugly and causes problems, and patching it just won't work. In other words, fixing this part of the kernel isn't really possible, so they need to start over and change any code that relies on it to rely on something different entirely.
AccountKiller
Anytime you have more than one application running, they could get into an argument about who gets to use the serial port, the video display, memory, or drive storage. This is especially critical in multi-processor systems.
The answer is to allow sections of code to "lock" access for a brief duration -- "I'm working with this right now, don't anyone else touch it." Simple in theory, very difficult in concept.
Note that I'm speaking generically; I'm not an expert on the Linux kernel. Ideally, though, you want locks to be "granular" -- in other words, you only lock that specific hardware and/or portion of memory that you need exclusive access to. Apparently, the "big kernel lock" takes a brick wall and hammer approach, locking access (and claiming exclusive access during the lock, preventing anything from running). It's not granular.
If I'm wrong, someone else here can correct me. Like I said, I'm not an expert on the Linux kernel.
Comment removed based on user account deletion
In other words:
TESTS DON'T VERIFY THAT YOUR CODE IS NOT BUGGY. YOU VERIFY THAT YOUR CODE ISN'T BUGGY.
Contrary to the popular belief, there indeed is no God.
Comment removed based on user account deletion
This is one approach to deadlocks; it would fall generally under "avoidance".
The problem is that if you have any serious contention over the resources, it is entirely possible that the process will _never_ get the resources (because one of them always gets snatched up before another gets released, so all n resources are never available at once to the requesting process). This leads to starvation and general sadness.
If the system has minimal contention (so the normal case is that all three resources are unclaimed) and resources are held very briefly (so if a resource is taken it is likely to be released before another is taken, anyway) then it may work. In real systems these are hard properties to guarantee.
Also, the scheme requires a process to know in advance which locks it will need. A lot of algorithms may discover this on the fly (e.g. if you are traversing a data structure), which becomes a problem. The best you could hope to do is to lock aggressively--taking everything you might need--but this is ugly, and would tend to violate the conditions above (locking everything will lead to contention, locking everything in advance will lead to holding locks for a long time). Alternatively, if you discover that you need a new lock that you don't yet have, you could give up all the locks you do have and then try to lock again (with the new lock added to the set). This is also ugly and increases the chance of starvation (since now you need to lock a bunch of resources several times). Additionally, since you have to unlock in the middle, the algorithm becomes much more complicated. For example, when you discover you need a new lock, you must put the world in a consistent state before unlocking. And when you re-lock, you must check to make sure that the world hasn't been modified under your feet (which is entirely possible, and may very well cause you to need a still-different lock).
Basically it doesn't work that well.I'm brave enough to want per CPU microkernels (with a messaging master?). I envisage all multi-CPU systems addressing memory in an non-unified manor soon enough - it'll be like the jump from segmented addressing to protected mode, but for CPUs.
The monolithic design is slowly forming a focal point in performance: something has to do a lot of locked switching - if SMP machines could do what they do best and handle IRQs and threads concurrently without waiting for a lock (they're better spent sending/receiving lockless messages), life would be easier on the scalability gurus.