Removing the Big Kernel Lock
Corrado writes "There is a big discussion going on over removing a bit of non-preemptable code from the Linux kernel. 'As some of the latency junkies on lkml already know, commit 8e3e076 in v2.6.26-rc2 removed the preemptable BKL feature and made the Big Kernel Lock a spinlock and thus turned it into non-preemptable code again. "This commit returned the BKL code to the 2.6.7 state of affairs in essence," began Ingo Molnar. He noted that this had a very negative effect on the real time kernel efforts, adding that Linux creator Linus Torvalds indicated the only acceptable way forward was to completely remove the BKL.'"
Why did they remove the preemptable BKL?
RTFAing says that temporarily forking the kernel with a branch dedicated to experimenting with the BKL is being considered. Maybe they can call it 2.7...
Hail Eris, full of mischief...
E pluribus sanguinem
What's linux?
That's fine, but once you reach maturity you should be trying to do the "right thing" (the exact opposite.) And the Linux kernel has reached maturity for quite a while now.
I think Linus is right on this.
Lets be sure to get every thread from the Linux kernel mailing list on the front page.
Is there any chance someone who understands this can translate it a bit? I may be a nerd but I dont do much with Kernel's or much coding and would really appreciate if someone could simplify this a bit so I could understand it.
Comment removed based on user account deletion
Since the summary doesn't cut to the chase, and the article was starting to get a little boring and watered-down, I read Ingo's post and here's what I got from it: the BKL is released in the scheduler, so a lot of code is written that grabs the lock and assumes it will be released later, which is bad. Giving it the usual lock behavior of having explicit release will break lots of code. Ingo created a new branch that does this necessary breakage so that the broken code can be detected and fixed. He wants people to test this "highly experimental" branch and report error messages and/or fixes.
Assuming everything is stable and correct, the next step is to break the BKL into locks with finer granularity so that the BKL can go the way of the dodo.
You obviously don't know anything about Linux kernel development. So why bother giving your useless opinions on it? Seriously, do you think they are worth anything at all?
Comment removed based on user account deletion
here (for subscribers. I dare not post a free link here :)
you had me at #!
For years Linux users have bashed *BSD with the Giant Lock stating that Linux had it removed years ago. It appears that Linux still has parts of their lock still present. The point here is that you shouldn't throw stones in glass houses.
PS: I am sure I will be marked as a troll. For the record; this is a point to stop the flame wars. Yes, netcraft has confirmed the Giant Lock.
Will this affect anything I do if I am eventually given an option to install this kernel version? (Or am presented with a distro that has this kernel as the default?)
I know (or think I know) low latency is important for audio work, and I know people who do a lot of audio work under Linux, should I be giving them aheads up to avoid upgrading their kernel until this gets fixed, or should I start looking for unofficial, special low latency versions of the kernel to recommend to them?
Matthew Wilcox replaced the per platform semaphore code with a generic implementation because it was likely to be less buggy, reduced code size and most places that are performance critical should be using mutexes now.
Unfortunately this caused a 40% regression in the AIM7 benchmark. The BKL was now a (slower) semaphore and the high lock contention on it was made worse by its ability to be preempted. As the ability to build a kernel without BKL preemption had been removed Linus decided that the BKL preemption would go. Ingo suggested semaphore gymnastics to try and recover performance but Linus didn't like this idea.
As the the BKL is no longer be preemptible it is now a big source of latency (since it could no longer be interrupted). People still want low latencies (that's why they made the BKL preemptible in the first place) so they took the only option left and started work to get rid of the BKL.
(Bah half a dozen other people have replied in the time it's taken me to edit and redit this. Oh well...)
Keep these on Front Page...
This is the type of stuff that needs to be kept in the news, as the people who post here often have no understanding of, and the ones that do, have the opportunity to explain this stuff, bringing everyone up to a better level of understanding.
Maybe if we did this, real discussions about the designs and benefits of all technologies could be debated and referenced accruately.. Or even dare say, NT won't have people go ape when someone refers to a good aspect of its kernel design.
Yeah fixing this is sooooo easy, as a slashdot reader ofc I know how to do it better than those kernel mailing list noobs.
In fact I've got the code right here.
what you want to see it?
Oh look over there a flying car.
IranAir Flight 655 never forget!
No really. Is it? BK-Whatsit?
The recent semaphore consolidation assumed that semaphores are not timing critical. Also it made semaphores fair. This interacted badly with the BKL (see [1]) which is a semaphore.
The consensus was to not revert the generic semaphore patch, but to fix it another way. Linus decided on a path that will make people focus on removing the BKL rather than a workaround in the generic semaphore code. Also, Linus doesn't think that the latency of the non-preemptable BKL is too bad [2].
[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-05/msg03526.html
[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8e3e076c5a78519a9f64cd384e8f18bc21882ce0
I imagine a kernel will come out that just has uses the BKL far less (I don't think it will be a compilation option). There is a risk of instability (especially if you are using SMP/preemption) while overlooked code that need locking is sorted out (this could lead to deadlocks or in an extreme case memory corruption). Over time this risk should decrease.
This work won't go into 2.6.26 (it's too late). It may not even go into 2.6.27 (it's been done outside of the mainline tree). This may mean that until it is done kernel from 2.6.25 may have better "worse case" latencies than following kernels until this work goes in. Once it does go in that kernel may have even better "worse case" latencies than 2.6.25.
For folks not doing audio recording (listening to your MP3s doesn't count) and without the need for hard realtime the worse latencies in 2.6.26 are too small to matter.
However if you can afford to risk your machine testing this experimental work will result in issues being found and fixed quicker and a better end result for people without old hardware that few people still have.
no, finer grained locking has been evolving in Linux for over five years (and similar efforts are in the BSD). it will take years more work, nothing simple or obvious about it except to slashdot posters talking out of their ass.
Wow. It sounds like it's about time someone on the kernel team reads Working Effectively With Legacy Code by Michael Feathers.
I'm a software developer myself on a very large project myself, and this book has absolutely revolutionized what I do. Having things break silently in the kernel is a sure sign that dependency problems in the code exist, and most of this book is about ways to break dependencies effectively and get code under test. And that's the other thing... if they aren't writing tests for everything they do, then even the code they write today is legacy code. Code without tests can't be easilly checked for correctness when a change is made, can fail silently easilly, and can't be understood as easilly.
That's what this book is about, and if things in the kernel have deteriorated to such a state then they need to swallow their pride and take a look at resources designed to cope with this. I know they are all uber-coders in many respects, but everyone has something they can improve on, and from the description they give of their own code, this is their area for improving.
Beware of bugs in the above code; I have only proved it correct, not tried it.
It was ALL LOCKED.
Now, we're trying to UNLOCK it. See? Locking semantics are tricky.
That's a terrible excuse. There are many applications where a real-time Linux kernel is highly desired. Besides, it is important to note that real time systems do not focus on speed. This is a subtle difference from "performance" which usually caries speed as a connotation; it doesn't for a real time system. The real time system's focus is on completing tasks by the time the system promised to get them done (meeting scheduling contracts). It's all about deadlines, not speed. So from this point of view, the preemptible BKL, even with the degraded speed, could still be viewed as successful for a real time kernel.
In other words:
TESTS DON'T VERIFY THAT YOUR CODE IS NOT BUGGY. YOU VERIFY THAT YOUR CODE ISN'T BUGGY.
Contrary to the popular belief, there indeed is no God.
Comment removed based on user account deletion
You do wonder if they need some proper test strategy to test regression etc..
Also, I wonder if the Linux kernel can carry on expanding or if it's time for the form of the kernel to change.
I know people like the monolithic kernel, but lack of change does not promote new techniques. Doesn't have to be a microkernel or have to fit in any existing box.
Whatever your large project is, I'm willing to bet it's nowhere near as complex as the kernel. Whenever you get the feeling that they must have missed something that seems obvious, you're probably the one that's wrong. No offense, but they have a lot more experience dealing with unique kernel issues than you do.
You talk about unit testing, but how exactly are you going to unit test multi-threading issues? This is not some simple problem that you can run a test/fail test against. These kinds of things can really only be tested by analysis to prove it can't fail, or extensive fuzz testing to get it "good enough"..
This task is not easy at all. 12 years after Linux has been converted to an SMP OS we still have 1300+ legacy BKL using sites. There are 400+ lock_kernel() critical sections and 800+ ioctls. They are spread out across rather difficult areas of often legacy code that few people understand and few people dare to touch.
This is where microkernels win. When almost everything is in a user process, you don't have this problem.
Within QNX, which really is a microkernel, almost everything is preemptable. All the kernel does is pass messages, manage memory, and dispatch the CPUs. All these operations either have a hard upper bound in how long they can take (a few microseconds), or are preemptable. Real time engineers run tests where interrupts are triggered at some huge rate from an external oscillator, and when the high priority process handling the interrupt gets control, it sends a signal to an output port. The time delay between the events is recorded with a logic analyzer. You can do this with QNX while running a background load, and you won't see unexpected delays. Preemption really works. I've seen complaints because one in a billion interrupts was delayed 12 microseconds, and that problem was quickly fixed.
As the number of CPUs increases, microkernels may win out. Locking contention becomes more of a problem for spinlock-based systems as the number of CPUs increases. You have to work really hard to fix this in monolithic kernels, and any badly coded driver can make overall system latency worse.
It's hard to test whether you've broken a driver when you don't have the hardware to test with. Perhaps the future will be Qemu emulation of all the different hardware in your system : )
This is not to say that there need to be tests for things that can be caught at compile time or run time regardless of hardware but there is only so far you can take it.
It's not like the kernel doesn't have any testing done on it though. There's the Linux Test Project which seems to test new kernel's nightly. If you ever look in the kernel hacking menu of the kernel configuration you will see tests ranging from Ingo Molnar's lock dependency tester (which checks to see locks are taken in the right order at run time), memory poisoning, spurious IRQ at un/registration time, rcu torture testing, softlockup testing, stack overflow checking, marking parts of the kernel readonly, changing page attributes every 30 seconds... Couple that with people like Coverity reporting static analysis checks on the code. Tools like sparse have been developed to try and so some of the static checks on kernel developer machines while they are building the code.
But this is not enough. Bugs STILL get through and there are still no go areas of code. If you've got the skills to write tests for the Linux kernel PLEASE do! Even having more people testing and reporting issues with the latest releases of the kernel would also help. It's only going to get more buggy without help...
Linux is something like nearly half the servers in existence and most of the top supercomputers. Desktop is a slower road of course, but it is still chugging along slowly but surely. Look at apple, originally a big percentage of desktops, then dropped to almost nothing, now inching its way back up because it got good. Stuff changes. The linux desktop market is big enough for there to be a lot of credible choices just within "linux" itself, there are half a dozen or so really good desktops and dozens of pretty good desktop linuxes out there now. And word gets around. It will be like FF, 0% to now upwards of one quarter to one half depending on where you look around the planet. There's some magic number that is hard to pinpoint but once anything reaches a certain level of use/adoption it really takes off then, usually near as I can see around 10%, then it makes huge jumps. Bad car analogy time, toyota prius is now more than one million cars sold from zero cars ten years ago, and the first with a mass market hybrid system that they really tried to make and sell in decent numbers (compared to honda for example who only fooled around with their insight). Now look, all the major manufacturers either have their own hybrids or will have them shortly. Ten years, that's all it takes once some threshold hits and it looks "real" to joe consumer to go from exotic to normal. I think this year the asus eeePC made linux "real" to a lot of people, so I am expecting ubiquitous linux as a choice to be along shortly with most computer makers as an option. And that is leaving out all the gadgets people use day to day running some smallish embedded linux, gps systems, cellphones, etc.
I'll agree that this should have been shorted out long since. But it wasn't, and very few people though that it was reasonable to expend time on something so obviously unreasonable. (Multiprocessors were things like Illiac IV, huge monsters that were utterly impractical.)
Time passes, technology changes, and now it's become urgent to deal with this, so now it's being dealt with.
One should, perhaps, wonder what currently unreasonable problem should actually start being addressed RIGHT NOW!! The things I can think of divide neatly into two camps. 1) We don't know enough to even get started, and 2) It really seems utterly implausible, even given this example to work from. Unfortunately, somewhere in there is something that's being overlooked, and I don't know what. Kernel support for Actors? Kernel security to control Actors? Kernel support for Language parsing? They all seem implausible.
What is clearly needed soon is software that facilitates the use of multi-processor environments. Dataflow languages have promise, but there may be other reasonable choices. Possibly some interface that would easily allow different computer languages to work together, but that may be a real impossibility. Or even a language basically like C or C++. but extended with a "foreach" operator that allowed parallel execution of the loop body...but the language would need to be smart enough to tell what needed to be read locked and what needed to be write locked, and what could just be ignored. This implies that use of pointers is *severely* circumscribed! And if you're going to do that, you probably ought to have garbage collection. It might sound like I'm talking about Java, but that would be wrong. This language would need to be close to the metal, so it could adapt itself (at run time!!) to the local machine. And since we want as much efficiency as possible, virtual machines, interpreters, etc. are probably out.
I don't know of any language that meets the specs I've outlined, but I know of many languages that meet large parts of them. Of the languages I know, D (Digital Mars D) comes the closest, but its totally missing on even the parallelization that C/C++ have (as an add-on).
But that doesn't really say where the kernel should be going...except that possibly C isn't the best language to use for a multiprocessor environment. (But C is still the most efficient in most places, and it DOES have add-ons for parallelization...though whether you can use those add-ons in kernel programming isn't something I've investigated.)
I think we've pushed this "anyone can grow up to be president" thing too far.
He has a point. All of this stuff in Solaris, for example, was sorted out in Solaris 2.7 which came out well over a decade ago.
Linux is great, but its development is weird. Remember all the problems in 2.4 that didn't get sorted out until about 2.4.23? Then there's 2.6 which didn't become usable until 2.6.13 or so.
In my very humble opinion, there should be a 2.7.x development branch for these sorts of experiments. But, I'm not Linus, and I suppose I should write my own damned kernel instead of complaining.
Stick Men
Mod parent up
I think we've pushed this "anyone can grow up to be president" thing too far.
...is removing?
If so then perhaps what DragonFlyBSD (the BSD could be dropped at this time - as its only relevant to history line) is doing to remove it can be helpful to removing it in Linux.
Its called System testing and I agree writing Unit tests are never enough.
As for your comment about them "knowing better", I've worked on a multi-million line project. When your lines of code reaches that sort of size the issues faced for someone on ten million are pretty much the same for people on twenty million loc projects. If you RTFA you'll see it was a series of system tests which demonstrated the problem in the first place. Although the fact the kernel doesn't seem to have a standardised set of system/unit tests does concern me. (correct me if I'm wrong)
To just add another few wild guesses:
The Tao of math: The numbers you can count are not the real numbers.
while a 2.7 branch would make sense if stability was important, its not, with 2.6 linus decided he couldn't be bothered with the boring part of stability and so told distros to do it themselves. This happens (with varying degrees of success), and has allows the kernel to develop at a much faster pace. Unfortunately this particular change is too much even for the 'unstable' kernel, so a temporary testing branch to get other peoples code sorted is being formed (a virtual 2.7 in effect only its so experimental it will never get released)
IranAir Flight 655 never forget!
Comment removed based on user account deletion
Would one of you kind folks please put this into non-kernel-programmer terms that explain what this does for software/hardware in terms of the user experience and how the proposed outcomes will affect said experience?
The best course of action would be to redesign the Linux kernel from scratch and this time integrate all possible drivers. Hardware support would be a lot easier!
I would even go so far as to suggest integrating the most important server tools into the kernel to decrease latency. Why not integrate Apache? You could even integrate the shell for added responsiveness!
Linus has demonstrated that micro kernels are a footnote in history. Nowadays memory is cheap and we can afford the have a large (or very large) kernel.
Which reminds me of Alan Cox's trollish remark saying that Sun should drop work on Solaris and support Linux instead - might be easier to add the nice stuff from Linux to Solaris than to clean up the mess with the BKL. IIRC, Sun had been supporting SMP machines from before the time that Linus started on Linux. In addition, getting SMP support done right has been a much higher priority with the Solaris developers than the Linux developers.
I appreciate the work the kernel devs do (I'm using their work right now), but the mere act of working on the kernel doesn't make them inherently more intelligent or smarter than anybody else.
The OP has a point. This is a pretty big design issue (witness all the things it's screwing up), and should've been addressed a long time ago.
Maybe not
Yes, and since the kernel can and is branched, they can decline to apply this patch and keep the 2.6.7-2.6.22 or whatever style BKL... or they can help everyone and rewrite various BKL-using code to not use it. I'd rather have a kernel that has low latency AND behaves correctly, but if I have to chose I prefer correct behavior every time.
A while ago I read this article: LWN article about ext4 fs
and your comment reminded me of it. I think it's very impressive that some people can think far ahead.
To be, or not to be: isn't that quite logical, Slashdot Beta?
It doesn't make them smarted but it does mean they know a lot more about kernel coding.
It is a deign issue, but its not that big, most of the kernel has granular locks, this only really affects the goal of having an entirely realtime kernel, so while it is important it's not that important. And im not sure what you mean by a long time ago as having a RT kernel is a fairly recent goal.
IranAir Flight 655 never forget!
So when can we run QNX-Ubntu?
Just asking . . .
It's worth pointing out here that the kind of races (bugs) introduced by faulty locking in general suffer from a very important problem: YOU CANNOT TEST FOR THEM.
Race conditions are mostly eliminated by design, not by testing. Testing will find the most egregious ones but the rest cause bizarre and hard-to-trace symptoms that usually end up with someone fixing them by reasoning about them. "Hmm" you think to yourself "that sounds like a race problem. Wonder where it might be?" and thinking about it, looking at the code, inventing scenarios that might trigger a race; that's how you find them.
Still, the OS kernel is, by definition, one of the most complex pieces of software in a system. There's only three other ones I can think of that would even come close: The compiler, the system libraries (libc), and device firmware.
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
The proposed outcome is for there to be increased opportunities to switch between programs/kernel or to run multiple things at the same time.
For those who enable the option this should reduce the chance of a hardware's buffer not being filled in time (so audio is less likely to skip in demanding environments). If you are an audio recording person or need VERY (less than hundredths of seconds) fast responses all the time your experience should improve. If you run VERY big workloads that have lots of pieces that can happen simultaneously on computers with 2-1024 CPUs, your experience (increase in work finished per second say) should improve. Typical desktop performance may improve a little if you have multiple CPUs/cores but one would guess not enough to be noticeable without careful measurement.
The trade off is increased risk of system hangs / data corruption due to the programming being trickier although instances of this happening should fall over time with popular hardware.
YOU FUCKING ASSHOLE.
That, my dim-witted friend, is what the Linux is.
Lack of documentation in code in legacy sections is the main reason why much of the code (i.e. users of BKL) remains untouched while they should be improved. Senior programmers should document the code to help others to maintain it, instead of fixing themselves.
http://michaelsmith.id.au
http://michaelsmith.id.au
Yep. Most bugs are in drivers and architecture-specific code (that's where most of the code is!), and unfortunately it's unrealistic to expect everyone who changes a piece of code to retest with all of the (possibly obscure) hardware the change might theoretically have affected.
I'm not going to claim that Linux's, or any other kernel's, architecture is perfect; I'm not qualified to judge that. I do suspect you're missing one of the key differences between kernels and most types of software when you make that statement: much of the kernel's behavior is specified by external forces. Hardware, processor, and system specs. Often the weird details of these specs are based on details of their hardware design, which is often based on making the final product cheaply mass-producible rather than on having a nice clean interface. The kernel doesn't have a choice: it has to deal with many of these idiosyncratic devices at once.
Additionally there are performance concerns: even beyond the application-specific cases where kernels have to perform, and the levels of performance they have to meet to comply with hardware specs, generally a kernel has to perform well against competing kernels to be successful in the market.
There might be more complex systems in number of lines of code, or in number of tasks that need to be performed, but there are lots of choices that kernel writers don't get to make, because of the types of tasks that the kernel has to perform and how efficiently it has to perform them.
It seems to me that switching to a microkernel just because people have been abusing the Big Kernel Spinlock is like using a nuclear bomb to drive in a nail: sure, it is a technically pure solution, but the problems involved exceed the number of solutions. It seems to me that the Linux kernel is a monolithic kernel only in the fact that everything shares the same address space. Linus seems to be a smart guy, and considering how Linux scales in real world scenarios and its general popularity (no disrespect for QNX), it seems that the only reason to bash its monolithic design is that it is conceptually inferior. You have to admire how well the kernel was able to grow from being 386+ uniprocessor only to what it can do now, and if real time performance (something it was not designed for in the first place) is suffering because so much code is improperly using the deprecated Big Kernel Lock, then I don't see why there should be this animosity. And furthermore, the solution is fairly straightforward and has been done before; it just requires a lot of code to be reworked. And besides, the BKL had been made preemptible anyway for a good 15 stable kernel releases anyway. These patches are only against an experimental fork to remove the BKL.
Organic code writing FTW!
Yes.
Even when the BKL is pre-emptible, however, linux can only provide soft real time support.
The best way to achieve HARD real time support in linux is via Xenomai - and it gives you real time in user mode too.
--jeffk++
ipv6 is my vpn
Each test is supposed to be small, and easily comprehensible. You can have a large collection of tests, but they are all unique.
I can throw myself at the ground, and miss.
But you CAN test a design, and you CAN validate that a given block of code is functionally identical to a formal design for that same block, ergo for some cases you CAN use testing methods to (indirectly) validate locking even when a direct test would not be possible.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
If you are on the latest release of your distro and the problem is reproducible please report it to your distros bug database. People only fix things that they know are broken...
Within QNX, which really is a microkernel, almost everything is preemptable. All the kernel does is pass messages, manage memory, and dispatch the CPUs.
Your comment is incomplete and not completely correct.
One topic you touched is the choice of locking versus message passing for communication between threads. With a monolithic kernel like Linux you have the choice between locking and message passing. With a microkernel like QNX you don't have a choice -- message passing is the only option. This matters because the more messages that are passed, the more context switches that are needed. And each context switch takes some time. Which means that microkernels have a throughput disadvantage.
Your post seems to suggest that the option of passing messages between subsystems in the Linux kernel is not available. This is very well possible.
Regarding multiple CPU's and spinlock-based systems: aren't you aware that with Linux-rt all spinlocks are converted into preemptible mutexes ?
Or: the latency issue you touched is real. But stating that the latency of a microkernel is better than the latency of a monolithic kernel is wrong. And you seem to be unaware of Linux-rt.
Are you perhaps a QNX-salesman ?
Having to spell out I'm not a kernel developer on a supposed geek site, while having IANAL in constant use, just seems wrong. So I would like to submit the following abbreviations for immediate use: IA[N]AKD/H.
I figure that when MS have a genuine 60% or less market share they'll either go to work on being INTEROPERABLE (especially on services: always annoyed me that MS servers weren't: they only served MS products) or they'll fall to the minority and effectively die.
Which is taken depends who's in charge at the time and how they think. Ballmer and Gates wouldn't change now and they'd rather see the company die (or rather would not concede that their actions would cause its death) than change outlook and start working with its competitors.
All lines of code are not equal.
There's a huge difference between typical application code and system code. Very little application code is as performance-sensitive as system code, because the goal of the system code is to use as little time as possible (consistent with some other goals), to make as much time as possible available to run application code.
OS code is performance-tuned to a degree that application code almost never is, and that focus on performance results in complexity that isn't well-represented by counting lines of code. Further, most application code isn't nearly as multiprocessor-aware as modern OS code, which introduces another huge complexity factor. Finally, the role of OS code is to interact directly with hardware, and if you've ever written on-the-metal code you know how much complexity that adds. Linux, of course, takes that even further by trying to work on a wide variety of hardware platforms, abstracting commonality where possible, but only when it doesn't interfere with performance.
No, I don't think there are many, if any, unit tests.
Altogether, I estimate that system code is an order of magnitude more complex, per line, than application code.
If you RTFA you'll see it was a series of system tests which demonstrated the problem in the first place. Although the fact the kernel doesn't seem to have a standardised set of system/unit tests does concern me. (correct me if I'm wrong)There are a bunch of sets of system tests in place for Linux. They're created and executed by multiple groups of people around the world and the results are made available to the developers (some of whom are the same people executing the tests).
This is a different approach than is common in the normal, centralized development model, but it's one that's very effective for the sort of decentralized development model used by the Linux kernel team. People who are interested in different aspects of Linux create tests designed to evaluate the kernel according to those aspects. When they see problems, or opportunities for improvement, they post their results to LKML, often with patches to address the issue they identified.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
With the sizes of modern CPU caches, one would assume that the cache is big enough to hold the most frequently used parts of the kernel (because we're talking about context-switching between kernel components here).
Silly question: If you don't bother with memory-protection for in-kernel context switches, then how is it a microkernel?
And the trouble is that it crippled Solaris performance on systems with less than four CPUs. Debian SPARC was famously much faster than Solaris 8 on single-processor machines running the exact same software.
http://rocknerd.co.uk
Loading and unloading kernel modules, enabling and disabling big subsystems, and other such heavyweight operations may need to lock out the rest of the kernel completely.
I propose that the BKL be strictly audited from this point forward, much like softirq's are. Softirq's are a strictly limited resource, and the BKL has huge latency problems if misused.
The BKL is a nice thing to have for when nothing else will work. However, it should be used strictly on an as needed basis only.
It happens to be one thing that will work if nothing else will. So, it's good as a lock of last-resort, but should be avoided at all costs if there's anything better.
But no, it should not be completely removed.
It does not matter -- the problem is, tests only check if particular criteria are satisfied or not. However if you really knew all criteria, you would trivially derive a program from them, and there would be nothing to test.
In reality tests are only for things you believe are important, or possible to get wrong, and the more of them you have, the harder is to find out what they don't cover. Tests may have an advantage that they don't have to be optimized, but that merely makes them slightly less likely to be wrong.
Contrary to the popular belief, there indeed is no God.
And the trouble is that it crippled Solaris performance on systems with less than four CPUs. Debian SPARC was famously much faster than Solaris 8 on single-processor machines running the exact same software.
And Solaris is much faster than Linux (scales better) on multi-cpu systems. Sun saw the future and is ahead in that one respect.
Now, Solaris networking was much slower than Linux...
Stick Men
Oh yeah. Note that I'm speaking as someone who works as a Solaris admin for a living. I'm very pleased that with Solaris 10, they realised the competition was Linux. Competition is good.
http://rocknerd.co.uk
Interesting idea, a time-t based (or n-instruction based) interrupt disable. Still a bit open for abuse if it was called too often, but at least a single user task couldn't hang the system for long.
It doesn't typically improve on a basic spin-lock, unless a significant amount of time is being spent in locked code. And turning off interrupts would normally be seen as a heavy-handed way of protecting code, because it affects every other process, even if that process will never access that code (and adds latency to interrupts). But arguably it could reduce the number of process switches where tasks of equal priority are competing for a resources.
Inefficiencies in multitasking settings are often not simple to fix. For example, in a situation where one task is producing an item and another is consuming the item, and both run at equal priority, then you can get the situation where a task switch occurs on every item produced (instead of batching them). If the processing of the item is fast, then the task switch can dominate the time taken. There are hacks that most operating systems use, but it isn't as simple as just making the semaphores go faster.
It comes down to a mismatch between the heaviness of a thread (each having a significant amount of state information) compared to the efficiency of processing code fragments. Current architectures are oriented around processing a single task, not around fine-grained parallelism (and rightly so, since that is how we write our programs). But it does mean that we try to find ways to avoid parallelism (and struggle with how to structure it).
The issue for me is how code fragments should communicate in a highly parallel environment. If, for example, they use queues, then sure, it makes sense to develop hardware that optimizes these queues. But at the moment we don't even know how to write highly parallel programs (except for the problems where we model fields of elements, where the parallelism is obvious and highly regular, such as in fluid flow analysis).
Once we know how we want to write these parallel systems (and kernels are one of the complex and heteregenous examples of what we want to be able to do), then we can start thinking about hardware optimizations. Unfortunately any change in paradigm has major ramifications; for example, moving from stacks to queues would have implications for memory caching strategies (or even whether memory should be distributed rather than global).