Removing the Big Kernel Lock

I don't understand by Nimey · 2008-05-17 04:19 · Score: 1

Why did they remove the preemptable BKL?

RTFAing says that temporarily forking the kernel with a branch dedicated to experimenting with the BKL is being considered. Maybe they can call it 2.7...

--
Hail Eris, full of mischief...

E pluribus sanguinem

Re:I don't understand by Vellmont · 2008-05-17 04:44 · Score: 4, Interesting

Why did they remove the preemptable BKL?

I'm not a kernel developer, but I'd say it's because there's widespread belief that the preemtable BKL is "the wrong way forward". Statements like these lead me to believe this:
"all this has built up to a kind of Fear, Uncertainty and Doubt about the BKL: nobody really knows it, nobody really dares to touch it and code can break silently and subtly if BKL locking is wrong."

In any large software project there's always a path to get from where you are, to where you want to be. It sounds like any version of BKL is considered ugly and causes problems, and patching it just won't work. In other words, fixing this part of the kernel isn't really possible, so they need to start over and change any code that relies on it to rely on something different entirely.

--
AccountKiller
Re:I don't understand by QX-Mat · 2008-05-17 04:52 · Score: 4, Informative

new semaphore code was introduced that simplified locking. Unfortunately in many kernel situations it's proven to affect performance at around something like 40% - which isn't just considerable its disastrous.

rather than merge the old locking code back in, and reintroduce the many different locking primitives they had, someone decided to simply reenable the BKL - the downside of which is they have to either fix the regression caused by the simpler semaphore code (not likely, it's very simple and clean - everyone's favourite pet/child) or remove instances of where the semaphore code is likely to be called (the BKL).

Matt
Re:I don't understand by diegocgteleline.es · 2008-05-17 04:53 · Score: 5, Informative

Because these days the BKL is barely used in the kernel core, or so Linus says: the core kernel, VM and networking already don't really do BKL. And it's seldom the case that subsystems interact with other unrelated subsystems outside of the core areas. IOW, it's rare to hit BKL contention - and in those cases, you want the contention period to be as short as possible. And spinlocks are the faster locking primitive, so making the BKL a spinlock (which is not preemptable) makes the BKL contention periods faster. A mutex/spinlock brings you "preemptability" and hides a bit the fact that there's a global lock being used sometimes at the expense of performance, which may be a good thing for RT/lowlatency users, but apparently Linus prefers to choose the solution that is faster and doesn't hid the real problem.
Re:I don't understand by diegocgteleline.es · 2008-05-17 05:06 · Score: 3, Informative

A mutex/spinlock brings you "preemptability"

Duh, I meant mutex/semaphore. And Linux semaphores have become slower, meanwhile mutexes still are fast as old semaphores were, as #23446368 says. The options were to move from a semaphore to mutexes or spinlocks, but Linus chose spinlocks because the RT/low-latency crow will notice it and will try to remove the remaining BKL users.
Re:I don't understand by SpinyNorman · 2008-05-17 05:20 · Score: 4, Insightful

If that is true then it sounds like a bad decision.

If the BKL code is rarely used then the general usage performance impact is minimal and the efficiency of a spinlock vs mutex is irrelevant. If this is not true then saying it is rarely used is misleading.

However for real-time use you either do or don't meet a given worst case latency spec - the fact that a glitch only rarely happens is of little comfort.

It seems like it should have been a no-brainer to leave the pre-emptable code in for the time being. If there's a clean way to redesign the lock out altogether then great, but that should be a seperate issue.
Re:I don't understand by kestasjk · 2008-05-17 05:48 · Score: 2, Interesting

new semaphore code was introduced that simplified locking. Unfortunately in many kernel situations it's proven to affect performance at around something like 40% - which isn't just considerable its disastrous. rather than merge the old locking code back in, and reintroduce the many different locking primitives they had, someone decided to simply reenable the BKL - the downside of which is they have to either fix the regression caused by the simpler semaphore code (not likely, it's very simple and clean - everyone's favourite pet/child) or remove instances of where the semaphore code is likely to be called (the BKL). Matt Couldn't they just ask the real-time developers to kindly find a real-time kernel to work on? Why try to make a non-preemptible kernel preemptible for the sake of real-time, if it affects non-real-time performance?

--
// MD_Update(&m,buf,j);
Re:I don't understand by Anonymous Coward · 2008-05-17 05:57 · Score: 0

> being backward compatible is the road to madness that hindered the windows kernel.

Do you have any specific instances to cite of this happening in the NT kernel?
Re:I don't understand by Anonymous Coward · 2008-05-17 05:59 · Score: 0

If the situation is really as simple as you describe it, isn't that very naive? There is a reason there are lots of different locking algorithms out there. Sometimes you want a fair lock, sometimes you don't. Sometimes you want to reschedule while waiting for the lock, sometimes you don't. If you take out all the lock algorithms and replace them with one, it's easy to see a loss in performance if the code is especially tuned for a particular type of lock.
Re:I don't understand by Anonymous Coward · 2008-05-17 06:17 · Score: 0

Couldn't they just ask the real-time developers to kindly find a real-time kernel to work on?

Sure. We'll stop kernel development while we're at it too.
Re:I don't understand by johannesg · 2008-05-17 06:48 · Score: 4, Funny

Are you arguing for a microkernel style solution, sir? If so, I salute your bravery! ;-)
Re:I don't understand by NoSCO · 2008-05-17 06:52 · Score: 1

I understood considerably less than you it seems, and I consider myself a reasonably proficient linux junkie. "As some of the latency junkies on lkml already know, commit 8e3e076 in v2.6.26-rc2 removed the preemptable BKL feature and made the Big Kernel Lock a spinlock and thus turned it into non-preemptable code again. *blink*
Re:I don't understand by Anonymous Coward · 2008-05-17 07:01 · Score: 0

The problem is obviously that they have a Crow working on the RT kernel. They should replace the Crow with an Owl as they can see further ahead.
Re:I don't understand by siride · 2008-05-17 07:03 · Score: 2

You sure that's all in the kernel? I have a feeling that's mostly, if not entirely, in userland APIs, which is not uncommon to happen on Linux either. Witness X and the toolkits.
Re:I don't understand by Elladan · 2008-05-17 07:26 · Score: 1

Or better yet, replace it with Tom Servo. What sort of a lunatic would have Crow do kernel development?
Re:I don't understand by QX-Mat · 2008-05-17 07:46 · Score: 2, Interesting

yes I agree. the BKL wasn't fair - which worked. happiness. but hey my views are pointless because I'm pro desktop/nvidia/life (and I have a soul).

The new semaphore implementation gives way in a very fair manor proving that sometimes you can be too fair and Kon was right from the get go (a desktop user's opinion, i don't mean to bait). if something didn't explicitly ask for a time slice, ie: a kernel section was preempted by an interrupt and then queued back in normally, then it doesn't get one until the pope said yes. the BKL as we knew it would allow a section enclosed in [un]lock_kernel to queue jump - at the expense of some extra code trickery.

Sadly the BKL was one of the more demanding locks - if it didn't get what it wanted fast enough, subtle differences in character device operations, which all seem to be strategically linked to the tty and other vfs layers from what I can make of KernelTrap, would work against the overall throughput of the kernel and into usermode (archaic ioctls suffer)

the best "fix" would be to implement the old semaphore types for depreciated BKL-only use (don't export them?) - but the old semaphore code was quite substantial, and required some arch specific implementations which have now... I don't see that much work being reintroduced in it's old form any time soon.

Matt
Re:I don't understand by QX-Mat · 2008-05-17 08:12 · Score: 5, Informative

There are a few added benefits from a fully real-time OS that most people gloss over.

For example, you will *know* your PC will never become utterly unusable to the point it's unsafe with your data. ie: while it's handling many IO operations (say you're being ddosed whilst transcoding a dvd and flossing with a sata cable) unless you run completely out of system memory. Nothing should run away wasn't design to. This stems from the predictability of code execution times that pre-empting offers.

The predictably allows devices to make guarantees, for example, if your mouse is aware it's going to get a time slice, at worst, every 100ms, at least it'll be doing something every 100ms and you gain a visually responsive mouse (aye, it's not as great as it could be). The non-preempt side of life has your CPU tied up doing work that was sits inside a BKL - ie: dealing with a character device or ioctl - your mouse could be waiting 500ms or 1000ms before updating it's position: giving you the impression your PC is dying.

Code that is stuck inside the BKL isn't pre-emptable (you *must* wait for it to finish.) - there's a lot of it that does a lot of regular stuff. Often this will hold up other cores if you've got a cooperative multi-threaded program. The net effect is a slow PC.

RT systems have a different use: they want a guarentee that something is unable to ever delay the system, in particular interrupts. The BKL allows code to take a time slice and run away with it, because you thought it was very important and wrapped it in [un]lock_kernel. This then delays IRQs (IRQs cant run until the lock has finish - at least, I'd like to believe second level IRQs can't run, I'm unsure of the specifics) which will delay data coming in and out of your PC - hard file, disk and display buffers suffer: they fill and you start to loose data because there's nothing dealing with it.

Preempt kernels are good :)

(viva the Desktop)

Matt
Re:I don't understand by HeroreV · 2008-05-17 09:26 · Score: 1

someone decided to simply reenable the BKL My understanding is they didn't reenable the BKL, they just made in non-preemptable again.
Re:I don't understand by Anonymous Coward · 2008-05-17 09:27 · Score: 0

For the second one have a look there :
http://web.archive.org/web/*/http://www.nasconf.com/pres05/allison.pdf

specifically the "wrong write times" slide, but there are other similar examples in there.

Samba also has to work around this stuff to keep ms office happy.
Re:I don't understand by AftanGustur · 2008-05-17 10:01 · Score: 1

Couldn't they just ask the real-time developers to kindly find a real-time kernel to work on? Why try to make a non-preemptible kernel preemptible for the sake of real-time, if it affects non-real-time performance?
Running a non-real-time kernel, is like asking Eddie the shipboard computer for tea.
If you ever want to put the linux kernet into anything other than your standard IT server, a airplane, a missile or any weapons system for example, it is going to HAVE to be real-time-kernel.

--
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
Re:I don't understand by Anonymous Coward · 2008-05-17 10:23 · Score: 0

Yeah and that's what makes Windows so gross. You don't leave these nastyhacks in the libraries, you use library wrappers. Hardly anything actually makes kernel calls, it all goes through libc and other libraries.. I don't know if Windows even supports these, but Linux-style, buggy binary apps, you (for some value of "you", maybe the distro-maker or some guy...) writes a small library wrapper, writes a shell script that's like:
#!/bin/sh
LD_PRELOAD=nasty-hax.so /usr/local/bin/buggyapp

and.. buggyapp can expect to use free()d memory or whatever, the nasty-hax wrapper will mangle library calls as needed to make the app work while the actual library doesn't have to maintain buggy behavior to keep buggyapp working.
Re:I don't understand by MichaelSmith · 2008-05-17 10:54 · Score: 0, Redundant

My laptop is low on ram (256 MB) and it stops responding under load all the time. The reason seems to be disk saturation caused by firefox, but that shouldn't make it not be able to scroll windows.

I run netbsd on some systems and have noticed that performance degrades much more gracefully than linux.

--
http://michaelsmith.id.au
Re:I don't understand by QX-Mat · 2008-05-17 11:06 · Score: 5, Interesting

I'm brave enough to want per CPU microkernels (with a messaging master?). I envisage all multi-CPU systems addressing memory in an non-unified manor soon enough - it'll be like the jump from segmented addressing to protected mode, but for CPUs.

The monolithic design is slowly forming a focal point in performance: something has to do a lot of locked switching - if SMP machines could do what they do best and handle IRQs and threads concurrently without waiting for a lock (they're better spent sending/receiving lockless messages), life would be easier on the scalability gurus.
Re:I don't understand by QX-Mat · 2008-05-17 11:33 · Score: 1

thats right. I guess I should have said that - the BKL is the preempt sense wasn't really a lock, although it was a mutual exclusion mechanism... tbh, I don't know what you call a lockless lock nowadays.

IM's patch, amongst a host of other things, adds down(&kernel_sem) to the lock_kernel (and up() to unlock) to making it a true lock.

It's interesting to note that copy and save (CAS/CAS2) instructions have been around in many architectures for nearly a decade (longer still for some). It's surprising to see lockless programming (vital to pre-empting) launching a come back, to make better use of CPUs, instead of faster CPUs with more clock cycles. Will the paradigm shift to concurrent programming models slow Moore's law?

Matt
Re:I don't understand by Anonymous Coward · 2008-05-17 14:58 · Score: 0

And I like hamburgers... what's your point?
Re:I don't understand by x2A · 2008-05-17 19:32 · Score: 1

I'm pretty sure Windows has something like this, I recall seeing something in howto's about running different versions of IE along side each other, by creating an empty .local file with the same name as the executable (so in this case would've been iexplore.exe.local) and that affects the shared library loading path, so it will load dlls that are in the same directory before using others elsewhere in the system. Not quite the same, but could be used to achieve the same ends.

--
The revolution will not be televised... but it will have a page on Wikipedia
Re:I don't understand by IamTheRealMike · 2008-05-17 21:17 · Score: 1

Windows does this as well. In fact it has a vastly more sophisticated system than LD_PRELOAD in the form of an API shimming framework. Windows ships with a database of hacks (search your registry for AppGoo) which tells it things like, if you run a program called FOOBAR.EXE then make GetVersionEx report itself as Windows 95 instead of Windows XP, or change the heap behavior, etc. It's more automatic, more precise and better supported than LD_PRELOAD.

There are some things though which can't be fixed with shims. One example that comes to mind from Chens blog is a shell extension that plugged into Explorer. Apparently it was quite popular .... perhaps some common utility or part of a driver for a common device. Given the choice of doing things the right way or the stupid way, they opted for the latter.

The extension took a parameter to a callback, added 0x60 to the value and then dereferenced a pointer it found at that location. In other words it reached several frames up the stack and directly poked at crap inside Windows private code. Unsurprisingly, as the Explorer codebase evolved, computers with this extension installed began crashing (you crash explorer, it restarts just fine but you see some annoying flickering on screen as it reloads). Their solution was to place a fake equivalent of that structure in the "right place" on the stack to make the app happy. And another bajillion people got a smooth upgrade.

I used to think like you, incidentally, but over time I changed my mind. A lot of that was reading the rational, well thought out explanations of various weird Windows behaviors on Raymond Chens blog, I definitely recommend it to anybody interested in systems programming. The fact is, the value of backwards compatibility is vast compared to the effort that it requires.
Re:I don't understand by Anonymous Coward · 2008-05-17 21:23 · Score: 0

On the windows platform (and many other environments) free() may not release memory to the OS instantly to optimize allocation of heap allocated memory. You can use _heapmin() if you really want to return free'd memory back to the OS if its that important to you.

The second comment is total nonsense. Try deleting a file in windows NT that has an open handle and let me know how that goes.

Yes, MS does put a lot of effort into maintaining backwards compatibility but with proper versioning and application compatibility personalities it is something that can be managed and mitigated going forward. There is no technical reason you can't have your cake and eat it too.
Re:I don't understand by Anonymous Coward · 2008-05-18 21:02 · Score: 0

hint: if it's not clear for YOU, it's not a TOTAL nonsense. And if you close your eyes, the world doesn't disappear.

Linux? by Anonymous Coward · 2008-05-17 04:31 · Score: 1, Funny

What's linux?

Re:Linux? by Anonymous Coward · 2008-05-17 04:44 · Score: 0, Funny

it's part of the GNU operating system.
Re:Linux? by Dunbal · 2008-05-17 04:47 · Score: 4, Informative

What's linux?

The future.

--
Seven puppies were harmed during the making of this post.
Re:Linux? by MobileTatsu-NJG · 2008-05-17 05:04 · Score: 3, Funny

What's linux? oh!! You saw that IBM ad a few years ago, too!

--

"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
Re:Linux? by ichigo+2.0 · 2008-05-17 05:42 · Score: 2, Insightful

Perhaps its growth is exponential.
Re:Linux? by RiotingPacifist · 2008-05-17 05:44 · Score: 1

Why is this funny^H^H^H troll?

--
IranAir Flight 655 never forget!
Re:Linux? by RiotingPacifist · 2008-05-17 05:47 · Score: 3, Insightful

Its gone from 0% to 100% on my pcs, everything is relative.

--
IranAir Flight 655 never forget!
Re:Linux? by Anonymous Coward · 2008-05-17 06:50 · Score: 0

You might be retarded. You're talking about "on the desktop".
Re:Linux? by Anonymous Coward · 2008-05-17 07:17 · Score: 2, Funny

Specifically its just the bootloader for GNU Emacs, the finest most complete operating system known to mankind.
Re:Linux? by BotnetZombie · 2008-05-17 08:48 · Score: 1

I would have modded this as funny if it wasn't an AC. No need to label it troll though, even if the submitter doesn't have the guts to post it with name attached.
Re:Linux? by Dogtanian · 2008-05-17 12:08 · Score: 3, Funny

Specifically its just the bootloader for GNU Emacs, the finest most complete operating system known to mankind. Albeit one without a decent text editor.

On second thoughts, disregard that... I just remembered that Emacs has a Vi emultation mode!

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Re:Linux? by Drinking+Bleach · 2008-05-17 14:07 · Score: 1

> On second thoughts, disregard that... I just remembered that Emacs has a Vi emultation mode!

I thought you said decent, not shoddy.
Re:Linux? by mrscorpio · 2008-05-18 00:35 · Score: 1

We're talking about Linux, not the Hurd here.
Re:Linux? by Anonymous Coward · 2008-05-18 06:55 · Score: 0

It's like Windows, but it works.

Looks like "Worse is Better" all over by paratiritis · 2008-05-17 04:33 · Score: 5, Insightful

Worse is Better (also here) basically says that fast (and crappy) approaches dominate in fast-moving software, because they may produce crappy results, but they allow you to ship products first.

That's fine, but once you reach maturity you should be trying to do the "right thing" (the exact opposite.) And the Linux kernel has reached maturity for quite a while now.

I think Linus is right on this.

Re:Looks like "Worse is Better" all over by 93+Escort+Wagon · 2008-05-17 06:48 · Score: 0

And the Linux kernel has reached maturity for quite a while now. Trying...not...to...morph...into...Grammar...Nazi...

(hey, it's the only way I could participate in this discussion).

--
#DeleteChrome
Re:Looks like "Worse is Better" all over by Anonymous Coward · 2008-05-17 08:39 · Score: 1, Insightful

"Worse is better" has nothing to do with the BKL. RTOS users need a non-interruptible lock. Desktop and server users' need conflict with the RTOS users' needs. The "right thing" solution would be to abstract away the differences between the locks and use a plugin architecture. Unfortunately, the "right thing" is undesirable, because the added indirection will substantially reduce performance for everybody.

Think fast hot shot. What do you do. What do you do.
Re:Looks like "Worse is Better" all over by paratiritis · 2008-05-17 11:43 · Score: 2, Insightful

RTOS users need a non-interruptible lock. Nope they need a guaranteed number time slices for their RT application(s) to run whithin constraints. Which they can't get if the kernel not the applications has a lock for arbitrary amounts of time.
Desktop and server users' need conflict with the RTOS users' needs. No they are orthogonal. In fact no BKL means more robust systems under heavy loads.
Unfortunately, the "right thing" is undesirable, because the added indirection will substantially reduce performance for everybody.
In fact none of the people in the discussion seem to think that. They just think it is a huge amount of work (and they should know far better than me, but for what it's worth I agree)
The problem? Changing legacy code used under heavy multitasking in N(-> infinity) configurations.
Torvalds' solution? Start doing it and offer it as an experimental config option. By supporting both options, with BKL as the default, they (a) move the architecture in this direction and (b) allow heavy users and distro creators to experiment with it and give feedback (and help). Open source at its best.
Think fast hot shot. What do you do. What do you do. Nice. The "worse is better" mantra. Do we need it at this stage of the game though?

Fascinating. by Anonymous Coward · 2008-05-17 04:33 · Score: 0, Funny

Lets be sure to get every thread from the Linux kernel mailing list on the front page.

Re:Fascinating. by ResidntGeek · 2008-05-17 04:37 · Score: 4, Funny

If this bores you, every lkml thread would cause your head to explode.

--
ResidntGeek
Re:Fascinating. by Anonymous Coward · 2008-05-17 04:40 · Score: 0

Perhaps thats why I don't subscribe then?
Re:Fascinating. by pla · 2008-05-17 04:56 · Score: 3, Insightful

If this bores you, every lkml thread would cause your head to explode.

Hey, I consider myself a code junky (and yes, even consider the issue of the BKL somewhat interesting), but I realize that this topic has about as much appeal to the average Slashdotter as mowing the lawn.
Re:Fascinating. by hostyle · 2008-05-17 05:12 · Score: 3, Funny

Hey, its not easy keep every blade of grass within 0.3mm in length and maintain cross-colour length rules while keeping a close watch on weather-judged per-species expected length margins, you insensitive clod!

Keep off my lawn too, you pesky kernel hackers^WWkids ...

--
Caesar si viveret, ad remum dareris.
Re:Fascinating. by ResidntGeek · 2008-05-17 06:07 · Score: 5, Insightful

Slashdot's not supposed to be interesting to every reader all the time. If you want someone to cater to a least common denominator, you'd be better off somewhere else.

--
ResidntGeek
Re:Fascinating. by joeman3429 · 2008-05-17 06:13 · Score: 1

it's called Digg
Re:Fascinating. by tomhudson · 2008-05-17 06:19 · Score: 1

Slashdot's not supposed to be interesting to every reader all the time. If you want someone to cater to a least common denominator, you'd be better off somewhere else.

s/somewhere else/Faux News/gi;
Fixed it for ya ;-)
Re:Fascinating. by Anonymous Coward · 2008-05-17 07:33 · Score: 0

This is only to weed out the true geeks from the fake geeks.

Geek wannabees: Move along, nothing interesting for you to see here.
Re:Fascinating. by IntlHarvester · 2008-05-17 07:37 · Score: 2, Insightful

Hey, I consider myself a code junky (and yes, even consider the issue of the BKL somewhat interesting),
but I realize that this topic has about as much appeal to the average Slashdotter as mowing the lawn. This topic is probably mainly of historical interest. (BKL used to be one of those bread-n-butter slashdot stories in the early days)

The funny thing is that the reply quality here is quite high for technical topics, but over time slashdot management has found that retarded political threads are much more popular.

--
Business. Numbers. Money. People. Computer World.
Re:Fascinating. by VGPowerlord · 2008-05-17 08:55 · Score: 1

s/somewhere else/Faux News/gi;

s/Faux News/Fox News/gi;

It's English! ;)

--
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
Re:Fascinating. by Anonymous Coward · 2008-05-17 09:20 · Score: 1, Funny

Come on, that's just not tr- *BANG*

Seriously, that stuff really complex, why don't they just do it in python, duh:

#!/bin/python
from kernels import Linux
Re:Fascinating. by Anonymous Coward · 2008-05-17 11:45 · Score: 0

That's called Digg.
Re:Fascinating. by jarden_from_cerberus · 2008-05-17 20:35 · Score: 1

If you want someone to cater to a least common denominator, you'd be better off somewhere else. Let's leave Fox News out of this discussion, if at all possible.
Re:Fascinating. by Anonymous Coward · 2008-05-17 22:28 · Score: 0

Slashdot's not supposed to be interesting to every reader all the time. If you want someone to cater to a least common denominator, you'd be better off somewhere else. Thats strange i thought this was cheapshot or slapshot
Re:Fascinating. by Anonymous Coward · 2008-05-18 02:11 · Score: 0

Typical elitist nerd. "Slashdot's for smart people, if you don't like it go hang out with the idiots."

Seems like a lot of people are "somewhere else" lately, I wonder why? Pseudo-intellectual assholes like yourself, perhaps?
Re:Fascinating. by ResidntGeek · 2008-05-18 08:35 · Score: 1

Oh, shut up. Elitism isn't a bad thing, it's the expectation of intelligence from others. It's a tall order, but not an unreasonable one.

--
ResidntGeek
Re:Fascinating. by Anonymous Coward · 2008-05-19 01:26 · Score: 0

Slashdot's not supposed to be interesting to every reader all the time. If you want someone to cater to a least common denominator, you'd be better off somewhere else. Yeah. www.fark.com

Translation? by Pazy · 2008-05-17 04:36 · Score: 1

Is there any chance someone who understands this can translate it a bit? I may be a nerd but I dont do much with Kernel's or much coding and would really appreciate if someone could simplify this a bit so I could understand it.

Re:Translation? by kcbanner · 2008-05-17 04:41 · Score: 4, Funny

Its like rubbing cheetah blood on the engine of your car to make it go faster.

--
Obligatory blog plug: http://www.caseybanner.ca/
Re:Translation? by maxume · 2008-05-17 04:44 · Score: 1

So it makes things cooler?

--
Nerd rage is the funniest rage.
Re:Translation? by Burdell · 2008-05-17 04:48 · Score: 5, Informative

When the Linux kernel first supported multiprocessor systems, it was done with a single lock protecting access to all the kernel (the Big Kernel Lock); the kernel could still only do one thing at a time. Over time, most sections of the kernel have introduced their own fine-grained locking and moved out from under the BKL, allowing many parts of the kernel to be running at the same time on multiple processors. The BKL has shrunk over time, but it still exists over a chunk of the kernel. The kernel hackers recently tried to replace the hard lock with a preemptable lock, but that had some bad interactions with the scheduler (which determines what process/kernel thread runs when), so Linus switched back to the old-style BKL.

Now, a group is trying to see if it is possible to weed out all the remaining uses of the BKL and replace them with localized locking for specific sections of the kernel. This is tricky, as there are side-effects of the BKL that are not always obvious.
Re:Translation? by Anonymous Coward · 2008-05-17 04:49 · Score: 3, Interesting

Anytime you have more than one application running, they could get into an argument about who gets to use the serial port, the video display, memory, or drive storage. This is especially critical in multi-processor systems.

The answer is to allow sections of code to "lock" access for a brief duration -- "I'm working with this right now, don't anyone else touch it." Simple in theory, very difficult in concept.

Note that I'm speaking generically; I'm not an expert on the Linux kernel. Ideally, though, you want locks to be "granular" -- in other words, you only lock that specific hardware and/or portion of memory that you need exclusive access to. Apparently, the "big kernel lock" takes a brick wall and hammer approach, locking access (and claiming exclusive access during the lock, preventing anything from running). It's not granular.

If I'm wrong, someone else here can correct me. Like I said, I'm not an expert on the Linux kernel.
Re:Translation? by Pazy · 2008-05-17 04:52 · Score: 1

Thanks so much, looks live ive learned something today :D
Re:Translation? by Anonymous Coward · 2008-05-17 04:52 · Score: 5, Informative

Ok so here's the deal:
Linux is a preemptive multi-tasking kernel. What this means is that a hardware interrupt like a keyboard click or the system timer will interrupt whatever is currently running on the CPU, and an interrupt handler in the kernel starts running code. In order to make sure that all the states of the kernel are consistent (ie: not corrupt), the different parts of the kernel are supposed to lock the data that they are using or modifying (ie, readlock or writelock) in case another code path gets run at the same time trying to modify the same data. It becomes even more important in a multi-cpu environment where locks have to be atomic (happen at the same time on all CPUs). So what you are supposed to do is only lock the resources you currently need (a file system drivers would only lock parts of the filesystem, not a character device). Because some programmers are lazy, or not sure what they are doing, they just use the big kernel lock which locks pretty much everything in the kernel. This is bad for multi-tasking and multi-processing because it means you can only have one codepath using the lock at a time.

Note: it's been a while since I've done kernel work, so I'm sure this is not 100% true, but hope it helps you understand.
Re:Translation? by Pazy · 2008-05-17 04:53 · Score: 1

Thanks for the help, always looking to learn new things.
Re:Translation? by Pazy · 2008-05-17 04:55 · Score: 1

Woah, ive learned more today than in college since August lol
Re:Translation? by Anonymous Coward · 2008-05-17 05:05 · Score: 0

Wouldn't it be easier and mainly better to start all over? You mean, hop in a time machine and go back to 1992?
Re:Translation? by Anonymous Coward · 2008-05-17 05:25 · Score: 5, Informative

"That part of the code" is the difficult part. The BKL assumption is present in thousands of place all around the kernel, and nobody really know where. You can have two pieces of code, that looks totally unrelated, that happen to work because in all the code path leading to them the BKL is taken. Removing the BKL and "code it all over again" will create this new race condition.

There would be thousands of such, and you'll probably never succeed in debugging it.

The approach suggested in the article is to replace the BKL by a true lock, then "pushing it down", which means understanding WHY that code want the BKL, and get smaller locks instead in subroutines.

For instance, one piece of code could take the BKL because it will change 3 data structure. You could then remove the BKL and use, in the 3 part of code that changes those 3 structure, and use a finer grained lock for each of those.

By iterating this way, you should always get a somewhat working kernel, and slowly kill the BKL.
Re:Translation? by mikael · 2008-05-17 05:26 · Score: 1

The Big Kernel Lock was designed to only allow one CPU in a multiprocessor system to access the kernel at a time. This wasn't too bad for a two core system, but it becomes a very big problem when multiple CPU's are in use. In a large SMP system(say a rack of 4 core Intel Xeon's), all processes/threads on all 16+ CPU's could end up halting while waiting for one system call on one CPU to complete.

And the reasons for each CPU wishing access to the kernel might be for a completely different reason. One CPU might be wanting to access the hard disk drive, while another is making a shared-memory request, and yet another is sending data to the network card.

If the BKL is broken up and replaced by a lock for each subsystem, then the latency problem can be eliminated. Though, there is the risk of deadlock where any CPU held more than one lock at any time.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:Translation? by Anonymous Coward · 2008-05-17 05:26 · Score: 0

Scheduler huh? Then it seems to me that this particular problem will go away when schedulers die.

In around 10 years we will have more processors than processes and threads, so each process will have its own private processor and no scheduling will be necessary (actually it will seldom be used, like HDD swap today with 4GB+ RAM). Think 100 to 1000 processors per machine.
Re:Translation? by larien · 2008-05-17 05:29 · Score: 1

Getting the locks right can make a huge difference in performance. The problem for the developer is that he has to lock enough to prevent race conditions and data corruption, but not so much so as to destroy performance. As you say, developers have taken the "easy" route of locking everything because they can.
There should be very few things which lock up an entire kernel, even for a few nanoseconds.
Re:Translation? by LordNimon · 2008-05-17 05:39 · Score: 5, Informative

Wouldn't it be easier and mainly better to start all over?
No.
You know, like, remove that part of the code and code it all over again, see what is broken, and continue this way?
It's not that simple. When it comes to locking, there is no "part of the code" that can be replaced. Locking governs interaction between two pieces of code, sometimes two pieces that are very different but have some small thing in common.
Besides, the kernel is too big to just start throwing parts of it out and redoing them from scratch. It's much better to make incremental improvements, because then the people working on them will actual learn how to solve the problem. The BKL is not just a coding problem, but also a people and project management problem.

--
And the men who hold high places must be the ones who start
To mold a new reality... closer to the heart
Re:Translation? by DarkOx · 2008-05-17 05:39 · Score: 4, Informative

Your not wrong, and like you I am going to continue in an over simplified style so the non programs can understand. The part you are leaving out is why you want your locking to be granular.

The granularity is important because you want other threads(jobs) to beable to get something done. At some point there is this thing called a scheduler that assigns your thread to execute, because every job needs a CPU. You get to work until your time allotment is expired or you have to stop because something you need to continue is not availible, because its say "locked".

Think of this like working in a shop along side someone else. You have one set of tools, you need a little screw driver, and a big one to do your work. The other guy needs the little scew driver and a pair of pliers. You want him to put the screw driver down while he is using the pliers so that you can use it if you need to. If he instead puts it in his breast pocket you are going to have to wait to finish your job until he finishes his. Even though its your turn at the work bench(CPU) you can't do anything with it because you don't have what you need. So all you can do is yeild the rest of the time to the other task, and hope he finished up soon.

In the kernel world this really short circuits the work of the scheduler. It might want to give time to other threads and it will but they are going to just turn around and give that up because whichever thread is holding the BKL is likely the only one who can actually do any work. As an end user this means something like data gets read from your network card ok but your sound keeps skipping.

The tricky part with more granular locks though is avoiding circular conditions, these can crash the system. Imagine: Job One needs resource A and B and has A locked, its waiting for B. Job Two needs B and C and has B locked and is waiting on C. Job Three needs A and C and has C locked and is waiting on A. Unless the system can detect this condition which is hard to do in many cases none of these threads will ever be able to run. The kernel contributors likely have some work ahead to eliminate the BKL and not cause these types of problems.

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Re:Translation? by weicco · 2008-05-17 05:50 · Score: 2, Funny

And you learn more when you write your own (virtual) device driver which crashes your kernel and renders it to unbootable state :)

Not that I know anyone who has done so... or at least I wont admit it!

--
You don't know what you don't know.
Re:Translation? by Hal_Porter · 2008-05-17 05:51 · Score: 0, Troll

Windows NT was around before 1992 and it doesn't have a big kernel lock - individual resources are protected with spinlocks.

But that's because it was designed from the ground up to be SMP friendly, as opposed to being a clone of a 1970's operating system.

--
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Re:Translation? by sticks_us · 2008-05-17 06:03 · Score: 1

I didn't read all of the kerneltrap stuff, but isn't it true that the CONFIG_PREEMPT_RT folks have made some good headway here?

I don't believe the -rt patch is expected to perform well as a drop-in replacement everywhere, or on systems that have management interrupts, but I think for people interested in real-time programming, the rt patch is a good starting point.

--
"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth
Re:Translation? by tomhudson · 2008-05-17 06:26 · Score: 3, Insightful

Scheduler huh? Then it seems to me that this particular problem will go away when schedulers die.
In around 10 years we will have more processors than processes and threads, so each process will have its own private processor and no scheduling will be necessary (actually it will seldom be used, like HDD swap today with 4GB+ RAM). Think 100 to 1000 processors per machine.

Keep dreaming ...
With all those processors, you'll want to be saving energy, so you'll be aiming to turn off individual processors until needed, and run the remaining processors at full load, so you'll still need a scheduler, locks, etc.
And yes, it's possible even today to use up more than 4 gig of ram and have to hit swap.
Re:Translation? by 93+Escort+Wagon · 2008-05-17 06:52 · Score: 3, Funny

Ok so here's the deal:
Linux is a preemptive multi-tasking kernel. What this means is that a hardware interrupt like a keyboard click or the system timer will interrupt whatever is currently running on the CPU, and an interrupt handler in the kernel starts running code. In order to make sure that all the states of the kernel are consistent (ie: not corrupt), the different parts of the kernel are supposed to lock the data that they are using or modifying (ie, readlock or writelock) in case another code path gets run at the same time trying to modify the same data. It becomes even more important in a multi-cpu environment where locks have to be atomic (happen at the same time on all CPUs). So what you are supposed to do is only lock the resources you currently need (a file system drivers would only lock parts of the filesystem, not a character device). Because some programmers are lazy, or not sure what they are doing, they just use the big kernel lock which locks pretty much everything in the kernel. This is bad for multi-tasking and multi-processing because it means you can only have one codepath using the lock at a time. Like putting too much air in a balloon!

--
#DeleteChrome
Re:Translation? by maxwell+demon · 2008-05-17 07:02 · Score: 1

But couldn't the deadlock risk be removed by adding a strategy where you have to acquire all your locks at the same time, with a single call?

That is, have one call, say
lock(resource1|resource2|resource3);
to lock the three resources at once, and disallow any further call to lock before unlock was called by the process. The lock call would wait until all three resources are unlocked by other threads, and then lock all three of them atomically. This would prevent a deadlock, because you could only lock a resource when you don't currently hold a lock.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:Translation? by Anonymous Coward · 2008-05-17 07:06 · Score: 0

Which decade did VMS come from?
Re:Translation? by suck_burners_rice · 2008-05-17 07:25 · Score: 2, Interesting

Since removing the BKL will cause deadlock situations like the one you describe, perhaps a solution to this problem is to re-think the way locking is implemented. If a program knows that it will need access to resources A, B, and C, it could put in a request to reserve all three of those resources simultaneously. If the three resources are available at that moment, they will all be locked simultaneously, the task will execute, and then they will be unlocked simultaneously. But if one or more of the resources are not available at that moment, that task will simply stop executing (it won't be scheduled) until the first instance that all three become available. This way, a resource doesn't become locked until it is actually going to be used.

--
McCain/Palin '08. Now THAT's hope and change!
Re:Translation? by Anonymous Coward · 2008-05-17 08:33 · Score: 0

Out of curiousity, why do you come here?
Re:Translation? by Anonymous Coward · 2008-05-17 08:57 · Score: 0

hrm, the only way I can see that would get around that issue would be to have the scheduler keep a resource dependency graph around and some sort of static analysis which requires processes to have consistent intermediate states where they can temporarily unlock a resource or some such.

or so I htink
Re:Translation? by Anonymous Coward · 2008-05-17 09:05 · Score: 0

What if the future processor uses 1/1000th of the power that the current bloated ones do?

And yes, it's possible even today to use up more than 4 gig of ram and have to hit swap.

Yes, that's what 'seldom' means.
Re:Translation? by tomhudson · 2008-05-17 09:23 · Score: 1

What if the future processor uses 1/1000th of the power that the current bloated ones do?

In 10 years? (the timespan you posited). Not going to happen.
Re:Translation? by twizmer · 2008-05-17 09:42 · Score: 4, Interesting

This is one approach to deadlocks; it would fall generally under "avoidance".

The problem is that if you have any serious contention over the resources, it is entirely possible that the process will _never_ get the resources (because one of them always gets snatched up before another gets released, so all n resources are never available at once to the requesting process). This leads to starvation and general sadness.

If the system has minimal contention (so the normal case is that all three resources are unclaimed) and resources are held very briefly (so if a resource is taken it is likely to be released before another is taken, anyway) then it may work. In real systems these are hard properties to guarantee.

Also, the scheme requires a process to know in advance which locks it will need. A lot of algorithms may discover this on the fly (e.g. if you are traversing a data structure), which becomes a problem. The best you could hope to do is to lock aggressively--taking everything you might need--but this is ugly, and would tend to violate the conditions above (locking everything will lead to contention, locking everything in advance will lead to holding locks for a long time). Alternatively, if you discover that you need a new lock that you don't yet have, you could give up all the locks you do have and then try to lock again (with the new lock added to the set). This is also ugly and increases the chance of starvation (since now you need to lock a bunch of resources several times). Additionally, since you have to unlock in the middle, the algorithm becomes much more complicated. For example, when you discover you need a new lock, you must put the world in a consistent state before unlocking. And when you re-lock, you must check to make sure that the world hasn't been modified under your feet (which is entirely possible, and may very well cause you to need a still-different lock).
Basically it doesn't work that well.
Re:Translation? by neomunk · 2008-05-17 11:14 · Score: 1

Futurama reference + Star Trek reference + relevant observation in linux kernel discussion = Win.

Wish I had some mod points today. Hilarious.
Re:Translation? by mikael · 2008-05-17 11:58 · Score: 1

Traditionally, if a process wishes to access a single resource, it is placed in a suspended state on a queue until that resource becomes available (monitors)

When another process releases the lock on that process, the first process in that queue is restarted.

For a single atomic multi-resource lock to work, the process would have to be placed on multiple queues. Whenever one resource was released, the process would be unable to continue because it did not have the other processes. It would be permanently asleep until all resources were released at exactly the same time.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re:Translation? by Pazy · 2008-05-17 13:37 · Score: 1

Because I am a geek/nerd and want my tech news. I just dont happen to delve into the coding of Kernel's very often. In fact I dont even code, but does that somehow bar me from Slashdot?
Re:Translation? by twizmer · 2008-05-17 13:56 · Score: 1

See comment above http://tech.slashdot.org/comments.pl?sid=556184&cid=23448010
Re:Translation? by Anonymous Coward · 2008-05-17 14:15 · Score: 0

original vms was bad at SMP too, just like Unix.

NT was actually more of a clone of the nextgen version of VMS, called "Mica" IIRC.
Re:Translation? by Workaphobia · 2008-05-17 15:07 · Score: 1

More like the engine on a spaceship. To find a torpedo-robot.

--
Evidently, the key to understanding recursion is to begin by understanding recursion. The rest is easy.
Re:Translation? by kcbanner · 2008-05-17 15:08 · Score: 1

Someone gets the reference woooo!

--
Obligatory blog plug: http://www.caseybanner.ca/
Re:Translation? by suck_burners_rice · 2008-05-17 15:14 · Score: 2, Interesting

What would happen if there were two distinct ways of handling the BKL problem, which could be implemented simultaneously: Method number 1, the existing method. This would remain in place to support all the program that use it. Method number 2, a newfangled method that new programs will need to be made aware of. The idea is that "well behaved" programs would, over time, begin to use the second method. The first method would actually use the second method in its implementation, so really there's only one method, although it appears that there are two.
Each resource in the kernel that might potentially be locked receives its own small lock. The idea is that the more finely grained the locking, the less likely it is that more than one program will need the same resource at the same time. The deadlock and resource contention problems will still exist, of course, but they will occur less frequently. Programs that know exactly which locks they'll need over the course of an operation would put in a request for all of those resources. At this moment, the kernel checks if all requested resources are available, and if so, the resources are locked immediately (and simultaneously) and program execution continues into the portion that uses those resources. However, if one or more of the resources are not available, the kernel will queue the request and one of two things will happen at the program's choice: Either the call will block until the resources are available, at which point program execution continues normally (we'll call this "synchronous"); or the program can be notified that "I'll get back to you on that" (we'll call this "asynchronous") in which case it can continue executing some other, unrelated part of its code, such as the part that updates the screen, outputs sound, or whatever. Obviously, this will be a bit trickier for application programmers to implement, but the reward will be programs that appear responsive to a user while they "wait" for locks. There should be an additional option in the locking request: a timeout value (with an option to never timeout). If the request times out, the program can do something appropriate. The kernel function that releases a lock will be modified to check the queue of lock requests, and immediately transfer control of the lock to a waiting program (according to some prioritization method). This means unblocking something that is blocked synchronously, or sending a signal to the process that queued the asynchronous lock request. The BKL will be implemented by requesting all locks, synchronous, never time out. Simple. The lock request function will return a value telling you whether you received the lock or why you didn't receive it. The scheduler "knows" if there are two or more processes in its list that are blocked waiting for a lock. If at least one of them has provided a timeout value other than "never," the scheduler does nothing since the problem will fix itself eventually, a la Office Space. Otherwise, it can examine the lock relationships. If a deadlock situation exists, it will choose one of the processes and return it an answer that it didn't receive the lock for deadlock prevention. That process can then do something appropriate. Code that doesn't know how to handle this situation will crash, but the system won't block indefinitely.

--
McCain/Palin '08. Now THAT's hope and change!
Re:Translation? by neonsignal · 2008-05-17 15:46 · Score: 3, Insightful

I'll respond because it is fantastic to see new people thinking about these issues. But I must agree with twizmer on this - grabbing multiple resources might solve the problem, but it is very clumsy. Some resources (eg storage) may take milliseconds to complete, whereas others (eg graphics) might take only microseconds. Holding up the fast ones while the slow ones complete is very undesirable (for all the reasons twizmer gives).
There are techniques that are used for problems like deadlocks and starvation: changing priorities on the fly; or enforcing mutex ordering; or even 'prodding' deadlocked tasks, but they are somewhat ugly. You'll find chapters in any book on OS design.
The essential problem is that the use of semaphores (and mutexes etc) is a low level way to control multiple processes; it is analogous to using the goto for flow control. There are languages that have attempted to address this (eg Occam or Modula I) with slightly higher level constructs, but they have not become popular, and are not totally radical.
I believe that we will need new programming languages to achieve safer parallelism. My bet would be on a language with message passing primitives (since they fit well with our object oriented models), and perhaps the use of Petri net formalism to prevent deadlocks. I gather that Nokia's phone OS uses this message passing model.
It should be noted that current processor design does not suit efficient message passing (the emphasis is more on an efficient stack, since that corresponds to the procedural flow of control - an exception may be the old Transputer architecture). However, I think the languages need to be developed first, even if they are not efficient to compile; processor development will support the most popular languages (as it has grown to support the use of C and other procedural languages).
Re:Translation? by Anonymous Coward · 2008-05-17 16:04 · Score: 0

"You're" is not the same word as "your."
Re:Translation? by Anonymous Coward · 2008-05-17 16:17 · Score: 0

Wouldn't it be easier and mainly better to start all over?
No.

You know, like, remove that part of the code and code it all over again, see what is broken, and continue this way?

It's not that simple. When it comes to locking, there is no "part of the code" that can be replaced. Locking governs interaction between two pieces of code, sometimes two pieces that are very different but have some small thing in common.

This is innacurate, not correct and misleading at the same time.

Most access to shared resources can and should be locked on a resource by resource basis, so that users of resources are not aware that the resources are being locked.

Also, using AOP it becomes much easier, the problem is that C compilers and C programmers are so 70's and 80's and have not learned anything about the techniques invented in the 90's and the early 2000's.

AOP is great and would let you code it as simply as configuration. Also Dynamic Proxies in Java could do the trick.

The wonderful thing about using such techiniques is that you just mention what you want to protect and then can install different alternatives, like spinning locks, non locking synchronization and the like.

It is a shame Linux uses such old tools and he is not aware of new tools. I'm waiting for JNode to be finished to install in my computer.
Re:Translation? by Gabest · 2008-05-17 17:03 · Score: 1

Oh, just build more power plants, we still have 10 years left :)
Re:Translation? by maxwell+demon · 2008-05-17 20:06 · Score: 1

If the system has minimal contention (so the normal case is that all three resources are unclaimed) and resources are held very briefly (so if a resource is taken it is likely to be released before another is taken, anyway) then it may work. In real systems these are hard properties to guarantee.

However, note that we are talking about a very specific system here: The kernel. I admit I'm not a kernel expert, but I see just two different things which might need locks: Hardware resources and internal kernel structures. Now I can't imagine any reason why any code would want to lock hardware and kernel structures at the same time: Either you are communicating with the hardware, in which case you lock that hardware (and the corresponding driver's data structures). I doubt there's ever the need for any piece of code to lock more than one hardware device at the same time. Therefore hardware locking should be the easy part.

Now, for locking internal kernel data structures, I'd say if the kernel spends significant amounts of time updating those, you have major problems anyway (except for waiting for hardware, the system shouldn't spend significant time in the kernel). Therefore I'd guess your conditions should always be fulfilled for the kernel (if not, the system probably isn't actually usable anyway, even without locking problems).

Now, as I said, I'm no kernel expert, so I may have overlooked something very fundamental.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:Translation? by Hal_Porter · 2008-05-17 21:07 · Score: 2, Informative

Yup and Mica designed in the late 80's as a Microkernel replacement for VMS.

http://en.wikipedia.org/wiki/DEC_PRISM

It was canned by DEC and Cutler and most of the engineers were hired by Microsoft where they wrote Windows NT.

Interesting Mica was supposesed to have multiple personalities - posix and VMS. Similarly Windows NT had Win32, Win16/Dos, Posix and originally OS/2.

The NT model is to packetize IO as IRPs and pipeline requests across multiple CPUs. E.g. on Windows NT under very heavy load a disk driver can be starting a request on one CPU, in an ISR on another and in an APC on yet another. That makes the rules for writing drivers quite complex of course, but that's a price worth paying. That's an extreme case though, normally DPCs have an affinity for the CPU that generated the interrupt because that CPU is likely to have the driver code in its cache. But the fact it's possible illustrates how far Mica/NT is from the tradional Unix model of treating device drivers as synchronous subroutines that only one kernel thread can be in.

And it seems pretty obvious to me that Mica/NT works like this because they were designed from the start to run on a lot of small Risc processors wired up as an SMP machine. Unix wasn't - back when Unix was designed the dominant paradigm was one big CISC CPU. In that world, drivers as a synchronous library is actually better since you don't have the overhead of building packets.

--
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Re:Translation? by flnca · 2008-05-17 21:17 · Score: 1

The best solution would be a message-based kernel in the style of AmigaOS' "exec.library". Every kernel thread sleeps until events that are interesting to it occur. An interrupt handler would defer handling the interrupt to a kernel thread, and would just signal the kernel thread that something happened. The kernel thread would then interrogate the hardware and generate an event that it would send to subscribers interested in that event. The kernel threads would sleep most of the time, waiting for signals. The message system would be constructed such that a signal is delivered to the kernel thread when a message arrives at its message port. The kernel thread would then scoop the messages from the message port and store them in an internal queue for subsequent processing. This would minimize the time the message queue has to be locked. When a kernel thread has finished processing its messages, it would wait on the port again. This concept was successfully implemented in AmigaOS, which was very fast even on old hardware (7.16 MHz 68000 CPU). The BKL would be similar to 'exec.library's Disable() and Enable() calls, which disabled and enabled all interrupt handling. These were only rarely used on AmigaOS, mainly to make message queue operations atomic in the kernel. In most CPUs, there are test-and-set (TAS) instructions for multiprocessor situations (on x86, there's XCHG for instance). One could even go so far as to keep a backlog of outgoing messages for a kernel thread. If it can't acquire a TAS lock, it would just store the notification request in the kernel thread's pending output queue. The scheduler would automatically wake the kernel thread and would have it test whether it can deliver pending output messages. Of course this requires that when a kernel message port is deallocated, that pending messages have to be removed from all kernel thread queues.

Code like this is not difficult to implement, but it would greatly enhance the speed and reliability of the kernel. Kernel locks would not be required.
Re:Translation? by x2A · 2008-05-17 21:40 · Score: 2, Insightful

You've gotta know that's complete nonsense, right? You wanna create an abstraction layer between every single bit of code in the kernel and the hardware it's meant to be running on to handle locking dynamically, making decisions at runtime, rather than expecting developers to be able to design a system and make those decisions for the code outside of its runtime and having the best way compiled to run directly on the hardware at full speed?

Managed languages have their place (and in fact all my work now is with managed languages)... but they also have to run on something. You think your java apps will just run without an operating system underneith it with these kinds of features to support the running of the virtual machine? What language do you think your java machine is written in? Do you think that's air you're breathing now? Hmm...

"This is innacurate, not correct and misleading at the same time"

But hey at least you prefixed what you had to say with this so we did know.

--
The revolution will not be televised... but it will have a page on Wikipedia
Re:Translation? by flnca · 2008-05-17 21:51 · Score: 1

BTW, Helios, which was created by Tim King from MetaComCo, after he ported and incorporated TRIPOS into AmigaOS as the 'dos.library' (another fine piece of software, BTW), Helios used a "fire-and-forget" messaging scheme, with message delivery not guaranteed. In the scenario that I described in the parent article, a failure of a TAS instruction to lock a message queue would simply result in a message not being delivered. The disadvantage of that is obviously that information is potentially lost. To alleviate that, there would have to be message-based conversations between kernel threads, so a kernel thread can ask another for a new copy of the information. When a message would not be able to be delivered, it could flag the target queue that incoming data could not be delivered by using another TAS instruction. Then the receiver would know that something has been lost in transit. Critical information, like signals flagged by interrupts, could be signalled using TAS instructions.
Re:Translation? by flnca · 2008-05-17 22:19 · Score: 1

Sample implementation in C for a message system that is multiprocessor safe:
/* simple doubly-linked list: */ typedef struct _node_t { struct _node_t* next; struct _node_t* prev; } node_t; typedef struct _list_t { node_t head; node_t tail; int taslock; } list_t; bool addnode( list_t* list, node_t* node ) { if ( TAS( &list->taslock, 1 ) ) { return false; } /* ... add node to list ... */ return true; } /* message and message port structure */ typedef struct _msg_t { node_t node; int type; /* for automatic targetting */ struct _msgport_t* target; /* ... */ } msg_t; typedef struct _msgport_t { list_t msglist; struct _task_t* sigtask; int sigmask; } msgport_t; bool addmsg( msgport_t* port, msg_t* msg ) { if ( !addnode( &port->msglist, &msg->node ) ) { return false; } if ( port->sigtask ) { sig2task( port->sigtask, port->sigmask ); } return true; } /* sample task structure for kernel threads */ typedef struct _task_t { msgport_t mainport; msgport_t outqueue; /* optional */ msgport_t inqueue; /* optional */ } task_t; Of course, additional work would be required for automatic inbox/outbox handling that would retry failed queuing attempts.

(BTW, how do you indent code on Slashdot? Is the code tag the right thing (I used it), or do I have to use CSS formatting?)
Re:Translation? by Anonymous Coward · 2008-05-17 22:35 · Score: 0

I'm not sure if 'clone' is really the right word here. NT was written by the same development team who wrote Mica at DEC (before DEC's incompetent management cancelled PRISM, only to revive it as Alpha after their best OS architects had already walked out), so in a sense NT is Mica, even if the developers had to rewrite all of the source code they had previously developed at DEC. (There are unconfirmed claims that some Mica code was actually ported from PRISM to MIPS, and used in NT, which led DEC to sue Microsoft, but they settled out of court, so only the people involved know whether or not it happened.)
All I can say is I'm ever so pleased Cutler and his team were able to develop a next-generation OS, thanks to sensible management at Microsoft, in contrast to the raving lunatics running DEC. It's such a pity that a company with so many brilliant people, who designed such superb hardware and software, was done in by the incompetence of senior management. At the same time, I hate to imagine what would have happened on the PC if Cutler and his team hadn't gone to Microsoft, but I'll wager most of us who use Windows would instead be using either some hopelessly inferior version of Microsoft-IBM OS/2, or even a mutant zombie OS derived from MS-DOS (i.e. a descendant of Windows 3.x/9x instead of NT).
Re:Translation? by Anonymous Coward · 2008-05-17 23:12 · Score: 0

Wouldn't it be easier and mainly better to start all over? You know, like, remove that part of the code and code it all over again, see what is broken, and continue this way?

Interestingly, that's basically what some of the big Unix vendors did in the 1980s and early 90s. After a spat with AT&T/Sun about Unix licensing, DEC, IBM and HP started the Open Software Foundation, which aimed to develop a replacement for Unix, using Mach and a kernel-mode BSD server (i.e. a hybrid/monolithic Mach kernel) in lieu of the traditional monolithic Unix kernel. As far as I know, only DEC completed the full implementation of the system (DEC OSF/1, which later became Digital UNIX, then Tru64 UNIX), but I think IBM also used Mach for at least AIX/ESA (not sure about the mainline version of AIX). The end of the Unix wars led to a shift from the BSD API to SVR4, but the underlying kernel in OSF/1 is still Mach. Mac OS X of course also uses a hybrid/monolithic version of Mach (i.e. Mach with a kernel-mode BSD server).

Microsoft did the same thing with NT (but of course with the NT kernel instead of Mach), which was designed to replace MS-DOS/Win16 and OS/2. It is a slightly more microkernel-like hybrid, in the sense that the OS servers (Win32, Unix, etc.) run in user mode instead of kernel mode, but then the entire graphics and windowing subsystems were moved into kernel mode in NT 4.0 (1996), which is usually not even the case on monolithic Unixes. At any rate, NT represented a complete restart at the kernel level, rather than an attempt to build something on OS/2 or MS-DOS/Win16 (NT was designed to include subsystems to emulate MS-DOS/Windows and OS/2, as well as Unix, but the underlying NTOS is competely different).

The Linux and BSD guys stuck with the old design, and instead tried to evolve uniprocessor systems into multiprocessor ones. It seems to have worked out reasonably well in the case of Linux (although not perfectly, as this article demonstrates), but I think the free BSDs still have some SMP issues compared to systems without the uniprocessor Unix/BSD legacy.
Re:Translation? by Anonymous Coward · 2008-05-18 03:04 · Score: 0

Does this sound vaguely familiar ...

http://en.wikipedia.org/wiki/Dining_philosophers_problem
Re:Translation? by x2A · 2008-05-18 13:06 · Score: 1

I believe locks are already fairly fine grained in most of the system. Problems occur when you have multi level locks (eg, you want to change the value in a structure/object, so you get a lock on the structure, and then copy (duplicate) it, which could involve needing a lock on the table of structures before you can insert a new one. The lock on the table could be harder to get; you may have to wait for certain types of locks on the structures within to be released, but you can't release the lock on the object you're holding until you've created the copy, as you want the copy to be of the object in that state, and something else might change it before you get the lock on the table.

You can minimise the effects, but locking something in always implies the chance you are locking something else out. Where possible, lockless algorithyms can be switched to. One example is creating per-cpu versions of things. For example, a counter which goes up everytime something happens. A process on a CPU could get a lock, read the value, perhaps check it against an upper limit, increment and write it back, then release the lock, to make sure other processors don't change the value in between reading it and writing the new value back, which would lose the value the other process had updated it to. Instead, give each cpu its own counter, and when you want to know the value, you just add up all the values of all the counters.

Another method is using copy-on-write and reference counting. If you want to change a structure which something else has a read lock on, the structure is copied, and you change that. Pointers are updated so that anything that reads from it in the future reads from the new copy, only processes with the lock on the old copy read from that, and when there are no more left, the old copy is released and memory freed.

Both of these methods require that you can have more than one of what you'd want to lock, but obviously in some cases this just isn't true, because there is only one of the resource and it can't be copied.

--
The revolution will not be televised... but it will have a page on Wikipedia
Re:Translation? by kl76 · 2008-05-19 00:04 · Score: 1

Wouldn't it be easier and mainly better to start all over? You mean, hop in a time machine and go back to 1992? And this time, maybe design a microkernel system instead? :-)
Re:Translation? by twizmer · 2008-05-20 03:36 · Score: 1

By "real systems", I had real kernels in mind in particular. I'm not a kernel expert (certainly not a Linux kernel expert), but I have written a [small, toy, but functional] UNIX-y flavor kernel for class, so I can claim to know what I'm talking about generally.

Realize that "internal kernel structures" reaches basically everything living anywhere in memory. Realize also that modern operating systems are horifically concurrent. Lots of things are happening at once, and at any moment, someone can (and probably will) pull the rug out from under you. Concurrency is far uglier than in user land—even if a thread in a seperate process totally unrelated to the current thread starts to run, you still have to context switch, and that's still messy.

If you are reading data from hardware (or writing to hardware) you may need to lock things in memory.

If you believe in supporting mulitple threads/processes/runnables in the same memory space, locking the VM can be hilarious.

You certainly need some sort of data structure for understanding the memory layout across processes, which requires locking (god forbid you accidentally give out the same physical frame for two different things, or the same spot on disk, or whatever).

Filesytems live in the kernel, and certainly entail locking (and may combine kernel structures and hardware when you do disk I/O).
Big issue: Even though you don't want to spend much time in the kernel total, you have to go through kernel a lot (every time you take an interrupt). The scheduler should be fast, so shouldn't hold on to resources very long, but it has to run an awful bloody lot, and it certainly brings in locking issues. Any operations which would affect your runqueue (or whatever datastructure keeps track of your runnables) is thus complicated by locking. (This means yield, this means exit, etc). Also, exiting and waiting/jointing/etc., whether for threads or processes, brings their own forms of locking pain. And please don't even thinking about killing a multi-threaded process! (But, of course, you really want to be able to do that).

In sum, the locking issues in a kernel are exceedingly complicated and do not easily produce nice conditions.
Re:Translation? by twizmer · 2008-05-20 04:01 · Score: 1

I think you have a misconception about the locking problem we're discussing. Understand that the BKL issue has nothing to do with userland or application programmers; it is a lock used internally in the kernel, and is only visible or relevant to kernel code.
Re:Translation? by agbinfo · 2008-05-21 06:03 · Score: 1

Since you seem to respond to new people thinking about these issues and you seem to know what you're talking about I'll ask you this.
There seems to be a lot of times when you need to do something which should only take a couple of cycles but because you might get interrupted, you need to use a lock to do the work.
Do you know why CPUs don't offer a user space equivalent to the interrupt disabling instruction? This could be time based. So if I needed to update a queue's head pointer after testing the tail pointer, I could use this instruction which would guarantee that I would either get preempted immediately or ensure that I have a minimum of say 20 cycles available. I might still have to worry about multiple cores accessing the data but there are other solutions for that.
Such an instruction would make it possible to avoid locks with very little (no) penalty. If your time slice is nearly over trying to lock would have caused a preemption anyways and guaranteeing a few cycles wouldn't cause any problems.
A smart assembler editor could even highlight the instructions that would have time to get executed during the reserved time.

Comment removed by account_deleted · 2008-05-17 04:56 · Score: 1

Comment removed based on user account deletion

Punchline by Anonymous Coward · 2008-05-17 05:03 · Score: 5, Informative

Since the summary doesn't cut to the chase, and the article was starting to get a little boring and watered-down, I read Ingo's post and here's what I got from it: the BKL is released in the scheduler, so a lot of code is written that grabs the lock and assumes it will be released later, which is bad. Giving it the usual lock behavior of having explicit release will break lots of code. Ingo created a new branch that does this necessary breakage so that the broken code can be detected and fixed. He wants people to test this "highly experimental" branch and report error messages and/or fixes.

Assuming everything is stable and correct, the next step is to break the BKL into locks with finer granularity so that the BKL can go the way of the dodo.

Re:Punchline by Anonymous Coward · 2008-05-17 19:25 · Score: 1, Interesting

Correct, this is what I have understood as well. Interesting to add might be that Linus has stated that he would prefer to have this in mainline as soon as possible, so that there will be more testing coverage.

So the approach they seem to be taking, is:
- add a WARN_ON in the scheduler when it releases the BKL, so that users get notified via syslog and can report such cases (in mainline)
- identify all cases where BKL is locked and not released, and fix the code (might be done in mainline)
- rewrite the BKL as a recursive spinlock (basically, a lock that can double-lock itself and still un-lock correctly) - might also be done in mainline once they are confident that all call sites have been fixed
- merge it, if needed
- move the locks down into subsystems (as you said)

Re:Fundamental kernel structures such as this... by Anonymous Coward · 2008-05-17 05:07 · Score: 0

You obviously don't know anything about Linux kernel development. So why bother giving your useless opinions on it? Seriously, do you think they are worth anything at all?

Comment removed by account_deleted · 2008-05-17 05:07 · Score: 5, Interesting

Comment removed based on user account deletion

LWN has a summary of the issue by toby · 2008-05-17 05:10 · Score: 3, Informative

here (for subscribers. I dare not post a free link here :)

--
you had me at #!

Interesting by Anonymous Coward · 2008-05-17 05:17 · Score: 0, Flamebait

For years Linux users have bashed *BSD with the Giant Lock stating that Linux had it removed years ago. It appears that Linux still has parts of their lock still present. The point here is that you shouldn't throw stones in glass houses.

PS: I am sure I will be marked as a troll. For the record; this is a point to stop the flame wars. Yes, netcraft has confirmed the Giant Lock.

Re:Interesting by RiotingPacifist · 2008-05-17 05:31 · Score: 4, Funny

Yeah we got rid of the Giant lock, this is just the big lock, totally different things.

--
IranAir Flight 655 never forget!
Re:Interesting by Anonymous Coward · 2008-05-17 07:51 · Score: 0

How are they different ?
Just like select(2) is "different" from poll(2) ?
If you have a point, please give some details.
Re:Interesting by maxwell+demon · 2008-05-17 08:00 · Score: 1

I think you want to read about the "whoosh" system call.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:Interesting by VGPowerlord · 2008-05-17 09:28 · Score: 1

After this, the goal is to get rid of the Jumbo lock. After that, the Large and Substantial locks.

Maybe eventually the Medium lock.

--
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011

What does this mean for us mere mortals? by analog_line · 2008-05-17 05:18 · Score: 1

Will this affect anything I do if I am eventually given an option to install this kernel version? (Or am presented with a distro that has this kernel as the default?)

I know (or think I know) low latency is important for audio work, and I know people who do a lot of audio work under Linux, should I be giving them aheads up to avoid upgrading their kernel until this gets fixed, or should I start looking for unofficial, special low latency versions of the kernel to recommend to them?

Re:What does this mean for us mere mortals? by dziman · 2008-05-17 05:32 · Score: 0

Test it out and let others know your results. Useful to you, your friends, and to the developing community.
Re:What does this mean for us mere mortals? by Anonymous Coward · 2008-05-17 05:41 · Score: 0

If Ingo's post is any indication, this new branch is going to be very unstable at first. That said, you can test it out and report problems. Of course, be prepared for it to hang a lot. And you should back up your data, etc. (So probably not on a production system...)

I would expect that if this branch is successful, we should be seeing it merged back eventually.
Re:What does this mean for us mere mortals? by ttldkns · 2008-05-17 05:46 · Score: 1

Unless you are in a position where you compile your own kernel every few weeks then this wont affect you in any noticible way.

Im sure all the major distros will be watching whats happening here and making sure it doesnt affect their end users. From what i've read the changes mean better performance at the expense of latency but they're working to elimintate the BKL all together.

By the time your distros next kernel upgrade comes around im sure none of this will matter. :)

--
How many computers are too many?
Re:What does this mean for us mere mortals? by eh2o · 2008-05-17 05:55 · Score: 1

AFAIK you already need a patched or at least properly configured kernel to get really good latency response. Planet CCRMA and the Debian Multimedia Project are two distros that come with LL kernels. Obviously these distros will stay on older versions until the BKL situation is resolved.
Re:What does this mean for us mere mortals? by Alex+Belits · 2008-05-17 06:02 · Score: 1

Unless they run real-time audio processing applications, it won't matter.

If they DO run real-time audio processing applications, they most likely have their kernel specifically configured for it, and won't get updates until developers will make sure that there is no additional latency introduced.

--
Contrary to the popular belief, there indeed is no God.
Re:What does this mean for us mere mortals? by LVSlushdat · 2008-05-17 06:28 · Score: 1

Don't forget the UbuntuStudio derivative.. Been doing a RT kernel option since 7.04, to my knowledge..

--
THANK YOU, Edward Snowden!! Americans owe you a debt of gratitude (whether they know it or not..)

BKL is again a big source of latency by Sits · 2008-05-17 05:25 · Score: 4, Informative

Matthew Wilcox replaced the per platform semaphore code with a generic implementation because it was likely to be less buggy, reduced code size and most places that are performance critical should be using mutexes now.

Unfortunately this caused a 40% regression in the AIM7 benchmark. The BKL was now a (slower) semaphore and the high lock contention on it was made worse by its ability to be preempted. As the ability to build a kernel without BKL preemption had been removed Linus decided that the BKL preemption would go. Ingo suggested semaphore gymnastics to try and recover performance but Linus didn't like this idea.

As the the BKL is no longer be preemptible it is now a big source of latency (since it could no longer be interrupted). People still want low latencies (that's why they made the BKL preemptible in the first place) so they took the only option left and started work to get rid of the BKL.

(Bah half a dozen other people have replied in the time it's taken me to edit and redit this. Oh well...)

Keep these on Front Page... by TheNetAvenger · 2008-05-17 05:28 · Score: 5, Insightful

Keep these on Front Page...

This is the type of stuff that needs to be kept in the news, as the people who post here often have no understanding of, and the ones that do, have the opportunity to explain this stuff, bringing everyone up to a better level of understanding.

Maybe if we did this, real discussions about the designs and benefits of all technologies could be debated and referenced accruately.. Or even dare say, NT won't have people go ape when someone refers to a good aspect of its kernel design.

Re:Keep these on Front Page... by Anonymous Coward · 2008-05-18 12:34 · Score: 0

Keep these on Front Page... Just put up a link to lwn.net

http://lwn.net/Kernel/Index/
Re:Keep these on Front Page... by TheNetAvenger · 2008-05-18 14:16 · Score: 1

Just put up a link to lwn.net

Great for Linux, but what about the other 20 floating kernel technologies in use.

Linux is NOT the end all be all for kernels. It does what it does well or OK, but had very little long term design as it was basically going from past concepts already in use.

This is why we are seeing problems with granularity, getting to realtime, and other things that are not only working well but easy for some kernel architectures to deal with and adapt to, like the NT kernel for example.

If we were all discussing this stuff, then this isn't stuff people would be ignorant of, and less OS worship would occur. Especially in the OSS world, where semi-mainstream variants become the flavors used and many aspects could be improved on. The hard lock in Linux and the legacy code that is buried in Linux is a very big example of this and why even SlashDot readers should be somewhat knowledgeable on the subject. Linux is not well structure code, nor agile, and this is OSS and it should be more agile than NT, and sadly has a more restrictive endpoint of progession because so few people know so little about the deep layers of Linux and the actual code or methods it uses.

There are virtually no comprehensive sites dedicated to exploring kernel architecture past basic features or concepts, we need more people paying attention to the code and internal structuring as well. And this can include Solaris, and NT even, as any of us can obtain academic source access to the base NT kernel without any subsystems. (Just like MinWin or Win7 as some idiots tried to call it.)

Re:Fundamental kernel structures such as this... by RiotingPacifist · 2008-05-17 05:29 · Score: 1

Yeah fixing this is sooooo easy, as a slashdot reader ofc I know how to do it better than those kernel mailing list noobs.

In fact I've got the code right here.
what you want to see it?
Oh look over there a flying car.

--
IranAir Flight 655 never forget!

Is this story in English? by Evildonald · 2008-05-17 05:34 · Score: 0, Redundant

No really. Is it? BK-Whatsit?

Re:Is this story in English? by davolfman · 2008-05-17 06:07 · Score: 1

I think it has something to do with kernel multi-threading. Beyond that my head starts to hurt.
Re:Is this story in English? by Anonymous Coward · 2008-05-17 09:13 · Score: 0

Go back to digg.
Re:Is this story in English? by VGPowerlord · 2008-05-17 09:33 · Score: 1

Evilronald: They want Burger King to stop serving Lunch.

--
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011

Bad interaction with the generic semaphores. by Anonymous Coward · 2008-05-17 05:37 · Score: 5, Informative

The recent semaphore consolidation assumed that semaphores are not timing critical. Also it made semaphores fair. This interacted badly with the BKL (see [1]) which is a semaphore.

The consensus was to not revert the generic semaphore patch, but to fix it another way. Linus decided on a path that will make people focus on removing the BKL rather than a workaround in the generic semaphore code. Also, Linus doesn't think that the latency of the non-preemptable BKL is too bad [2].

[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-05/msg03526.html
[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8e3e076c5a78519a9f64cd384e8f18bc21882ce0

It means pain for long term gain by Sits · 2008-05-17 05:37 · Score: 1

I imagine a kernel will come out that just has uses the BKL far less (I don't think it will be a compilation option). There is a risk of instability (especially if you are using SMP/preemption) while overlooked code that need locking is sorted out (this could lead to deadlocks or in an extreme case memory corruption). Over time this risk should decrease.

This work won't go into 2.6.26 (it's too late). It may not even go into 2.6.27 (it's been done outside of the mainline tree). This may mean that until it is done kernel from 2.6.25 may have better "worse case" latencies than following kernels until this work goes in. Once it does go in that kernel may have even better "worse case" latencies than 2.6.25.

For folks not doing audio recording (listening to your MP3s doesn't count) and without the need for hard realtime the worse latencies in 2.6.26 are too small to matter.

However if you can afford to risk your machine testing this experimental work will result in issues being found and fixed quicker and a better end result for people without old hardware that few people still have.

Re:Fundamental kernel structures such as this... by rubycodez · 2008-05-17 05:39 · Score: 1

no, finer grained locking has been evolving in Linux for over five years (and similar efforts are in the BSD). it will take years more work, nothing simple or obvious about it except to slashdot posters talking out of their ass.

Sounds like the Linux kernel needs some tests... by Crazy+Taco · 2008-05-17 05:44 · Score: 2, Informative

He noted that the various dependencies of the lock are lost in the haze of 15 years of code changes, "all this has built up to a kind of Fear, Uncertainty and Doubt about the BKL: nobody really knows it, nobody really dares to touch it and code can break silently and subtly if BKL locking is wrong."

Wow. It sounds like it's about time someone on the kernel team reads Working Effectively With Legacy Code by Michael Feathers.

I'm a software developer myself on a very large project myself, and this book has absolutely revolutionized what I do. Having things break silently in the kernel is a sure sign that dependency problems in the code exist, and most of this book is about ways to break dependencies effectively and get code under test. And that's the other thing... if they aren't writing tests for everything they do, then even the code they write today is legacy code. Code without tests can't be easilly checked for correctness when a change is made, can fail silently easilly, and can't be understood as easilly.

That's what this book is about, and if things in the kernel have deteriorated to such a state then they need to swallow their pride and take a look at resources designed to cope with this. I know they are all uber-coders in many respects, but everyone has something they can improve on, and from the description they give of their own code, this is their area for improving.

--
Beware of bugs in the above code; I have only proved it correct, not tried it.

Re:Fundamental kernel structures such as this... by hxnwix · 2008-05-17 05:51 · Score: 1

Fundamental kernel structures such as this... should have been... locked down a long time ago Yeah, well that's the whole problem, isn't it? A long time ago, we had the Big Kernel Lock.

It was ALL LOCKED.

Now, we're trying to UNLOCK it. See? Locking semantics are tricky.

(Performance != Speed) // in an RT system by Arakageeta · 2008-05-17 06:14 · Score: 5, Insightful

That's a terrible excuse. There are many applications where a real-time Linux kernel is highly desired. Besides, it is important to note that real time systems do not focus on speed. This is a subtle difference from "performance" which usually caries speed as a connotation; it doesn't for a real time system. The real time system's focus is on completing tasks by the time the system promised to get them done (meeting scheduling contracts). It's all about deadlines, not speed. So from this point of view, the preemptible BKL, even with the degraded speed, could still be viewed as successful for a real time kernel.

Re:Sounds like the Linux kernel needs some tests.. by Alex+Belits · 2008-05-17 06:14 · Score: 4, Interesting

And that's the other thing... if they aren't writing tests for everything they do, then even the code they write today is legacy code. Code without tests can't be easilly checked for correctness when a change is made, can fail silently easilly, and can't be understood as easilly. On the other hand, code WITH tests also can't be easily checked for correctness when a change is made. There is only very small scope of possible mistakes that a test can detect, and if you will try to make test verify everything, test will grow larger (and buggier, and more incomprehensible) than your code. It's also possible that intended behavior of the code and expected behavior that the test checks for, diverge because of some changed interface. Tests help with detection of obvious breakage, but you can never rely on anything just because it passed them.

In other words:

TESTS DON'T VERIFY THAT YOUR CODE IS NOT BUGGY. YOU VERIFY THAT YOUR CODE ISN'T BUGGY.

--
Contrary to the popular belief, there indeed is no God.

Comment removed by account_deleted · 2008-05-17 06:17 · Score: 5, Interesting

Comment removed based on user account deletion

Re:Sounds like the Linux kernel needs some tests.. by gilesjuk · 2008-05-17 06:28 · Score: 1

You do wonder if they need some proper test strategy to test regression etc..

Also, I wonder if the Linux kernel can carry on expanding or if it's time for the form of the kernel to change.

I know people like the monolithic kernel, but lack of change does not promote new techniques. Doesn't have to be a microkernel or have to fit in any existing box.

Re:Sounds like the Linux kernel needs some tests.. by pherthyl · 2008-05-17 06:28 · Score: 5, Insightful

Whatever your large project is, I'm willing to bet it's nowhere near as complex as the kernel. Whenever you get the feeling that they must have missed something that seems obvious, you're probably the one that's wrong. No offense, but they have a lot more experience dealing with unique kernel issues than you do.

You talk about unit testing, but how exactly are you going to unit test multi-threading issues? This is not some simple problem that you can run a test/fail test against. These kinds of things can really only be tested by analysis to prove it can't fail, or extensive fuzz testing to get it "good enough"..

This is why monolithic kernels do real-time badly by Animats · 2008-05-17 06:30 · Score: 5, Insightful

This task is not easy at all. 12 years after Linux has been converted to an SMP OS we still have 1300+ legacy BKL using sites. There are 400+ lock_kernel() critical sections and 800+ ioctls. They are spread out across rather difficult areas of often legacy code that few people understand and few people dare to touch.

This is where microkernels win. When almost everything is in a user process, you don't have this problem.

Within QNX, which really is a microkernel, almost everything is preemptable. All the kernel does is pass messages, manage memory, and dispatch the CPUs. All these operations either have a hard upper bound in how long they can take (a few microseconds), or are preemptable. Real time engineers run tests where interrupts are triggered at some huge rate from an external oscillator, and when the high priority process handling the interrupt gets control, it sends a signal to an output port. The time delay between the events is recorded with a logic analyzer. You can do this with QNX while running a background load, and you won't see unexpected delays. Preemption really works. I've seen complaints because one in a billion interrupts was delayed 12 microseconds, and that problem was quickly fixed.

As the number of CPUs increases, microkernels may win out. Locking contention becomes more of a problem for spinlock-based systems as the number of CPUs increases. You have to work really hard to fix this in monolithic kernels, and any badly coded driver can make overall system latency worse.

Tough to test drivers for hardware you don't have by Sits · 2008-05-17 06:30 · Score: 4, Informative

It's hard to test whether you've broken a driver when you don't have the hardware to test with. Perhaps the future will be Qemu emulation of all the different hardware in your system : )

This is not to say that there need to be tests for things that can be caught at compile time or run time regardless of hardware but there is only so far you can take it.

It's not like the kernel doesn't have any testing done on it though. There's the Linux Test Project which seems to test new kernel's nightly. If you ever look in the kernel hacking menu of the kernel configuration you will see tests ranging from Ingo Molnar's lock dependency tester (which checks to see locks are taken in the right order at run time), memory poisoning, spurious IRQ at un/registration time, rcu torture testing, softlockup testing, stack overflow checking, marking parts of the kernel readonly, changing page attributes every 30 seconds... Couple that with people like Coverity reporting static analysis checks on the code. Tools like sparse have been developed to try and so some of the static checks on kernel developer machines while they are building the code.

But this is not enough. Bugs STILL get through and there are still no go areas of code. If you've got the skills to write tests for the Linux kernel PLEASE do! Even having more people testing and reporting issues with the latest releases of the kernel would also help. It's only going to get more buggy without help...

1%? by zogger · 2008-05-17 06:45 · Score: 2, Informative

Linux is something like nearly half the servers in existence and most of the top supercomputers. Desktop is a slower road of course, but it is still chugging along slowly but surely. Look at apple, originally a big percentage of desktops, then dropped to almost nothing, now inching its way back up because it got good. Stuff changes. The linux desktop market is big enough for there to be a lot of credible choices just within "linux" itself, there are half a dozen or so really good desktops and dozens of pretty good desktop linuxes out there now. And word gets around. It will be like FF, 0% to now upwards of one quarter to one half depending on where you look around the planet. There's some magic number that is hard to pinpoint but once anything reaches a certain level of use/adoption it really takes off then, usually near as I can see around 10%, then it makes huge jumps. Bad car analogy time, toyota prius is now more than one million cars sold from zero cars ten years ago, and the first with a mass market hybrid system that they really tried to make and sell in decent numbers (compared to honda for example who only fooled around with their insight). Now look, all the major manufacturers either have their own hybrids or will have them shortly. Ten years, that's all it takes once some threshold hits and it looks "real" to joe consumer to go from exotic to normal. I think this year the asus eeePC made linux "real" to a lot of people, so I am expecting ubiquitous linux as a choice to be along shortly with most computer makers as an option. And that is leaving out all the gadgets people use day to day running some smallish embedded linux, gps systems, cellphones, etc.

Re:Fundamental kernel structures such as this... by HiThere · 2008-05-17 06:45 · Score: 2, Interesting

I'll agree that this should have been shorted out long since. But it wasn't, and very few people though that it was reasonable to expend time on something so obviously unreasonable. (Multiprocessors were things like Illiac IV, huge monsters that were utterly impractical.)

Time passes, technology changes, and now it's become urgent to deal with this, so now it's being dealt with.

One should, perhaps, wonder what currently unreasonable problem should actually start being addressed RIGHT NOW!! The things I can think of divide neatly into two camps. 1) We don't know enough to even get started, and 2) It really seems utterly implausible, even given this example to work from. Unfortunately, somewhere in there is something that's being overlooked, and I don't know what. Kernel support for Actors? Kernel security to control Actors? Kernel support for Language parsing? They all seem implausible.

What is clearly needed soon is software that facilitates the use of multi-processor environments. Dataflow languages have promise, but there may be other reasonable choices. Possibly some interface that would easily allow different computer languages to work together, but that may be a real impossibility. Or even a language basically like C or C++. but extended with a "foreach" operator that allowed parallel execution of the loop body...but the language would need to be smart enough to tell what needed to be read locked and what needed to be write locked, and what could just be ignored. This implies that use of pointers is *severely* circumscribed! And if you're going to do that, you probably ought to have garbage collection. It might sound like I'm talking about Java, but that would be wrong. This language would need to be close to the metal, so it could adapt itself (at run time!!) to the local machine. And since we want as much efficiency as possible, virtual machines, interpreters, etc. are probably out.

I don't know of any language that meets the specs I've outlined, but I know of many languages that meet large parts of them. Of the languages I know, D (Digital Mars D) comes the closest, but its totally missing on even the parallelization that C/C++ have (as an add-on).

But that doesn't really say where the kernel should be going...except that possibly C isn't the best language to use for a multiprocessor environment. (But C is still the most efficient in most places, and it DOES have add-ons for parallelization...though whether you can use those add-ons in kernel programming isn't something I've investigated.)

--

I think we've pushed this "anyone can grow up to be president" thing too far.

Re:Fundamental kernel structures such as this... by turgid · 2008-05-17 06:51 · Score: 1

He has a point. All of this stuff in Solaris, for example, was sorted out in Solaris 2.7 which came out well over a decade ago.

Linux is great, but its development is weird. Remember all the problems in 2.4 that didn't get sorted out until about 2.4.23? Then there's 2.6 which didn't become usable until 2.6.13 or so.

In my very humble opinion, there should be a 2.7.x development branch for these sorts of experiments. But, I'm not Linus, and I suppose I should write my own damned kernel instead of complaining.

--
Stick Men

Mod parent up by HiThere · 2008-05-17 07:10 · Score: 0, Redundant

Mod parent up

--

I think we've pushed this "anyone can grow up to be president" thing too far.

This same type Big Giant Lock DragonFlyBSD .... by 3seas · 2008-05-17 07:38 · Score: 1

...is removing?

If so then perhaps what DragonFlyBSD (the BSD could be dropped at this time - as its only relevant to history line) is doing to remove it can be helpful to removing it in Linux.

Re:This same type Big Giant Lock DragonFlyBSD .... by Moridineas · 2008-05-17 08:13 · Score: 1

(the BSD could be dropped at this time - as its only relevant to history line) Oh really? Dragonfly isn't BSD licensed anymore? Dragonfly doesn't regularly sync parts of the source tree from Free/Net/OpenBSDs? Matt Dillon himself didn't go to school at the B in BSD?

I think Dragonflys links to the BSD world are slightly more than you seem to!
Re:This same type Big Giant Lock DragonFlyBSD .... by 3seas · 2008-05-17 08:26 · Score: 1

The break from BSD is in regards to code resemblance. The code of DragonFly has changed enough that it doesn't resemble BSD so much anymore.

In regards to teh Big Giant Lock...

DragonFly Projects

http://wiki.dragonflybsd.org/index.cgi/GoogleSoC2008

Extend Multi-Processing (MP) support

* Robert Luciani, mentored by Simon Schubert
* Back in 2003 when DragonFly was born, the first subsystem to be implemented was the LWKT. The reduction in complexity achieved by using message passing (as opposed to a shared memory environment using locks) was undeniable. What was also "unlocked" though, was the potential for near linear performance scaling on multiple CPU systems. Unfortunately many kernel systems, such as the network stack, need to be modified to take advantage of this potential, since they are still encumbered by a legacy "Big Giant Lock". In this project I will remove the MP lock in important areas of the kernel that have a direct affect on the performance of popular programs such as PostgreSQL.
Re:This same type Big Giant Lock DragonFlyBSD .... by Moridineas · 2008-05-17 08:33 · Score: 2, Informative

The break from BSD is in regards to code resemblance. The code of DragonFly has changed enough that it doesn't resemble BSD so much anymore. You might have been able to say something was "BSD" even 5 years ago, but I think you would have a lot more trouble saying that now. The family is more philosophical than architectural now. How close are the kernels between OpenBSD and FreeBSD for instance? My guess is if you looked back to FreeBSD4, they would be far more similar than now. Likewise, if you just look at the Dragonfly change logs, they frequently import code directly from the other BSDs--I believe the ATA code is one recent example.

Dragonfly is just the latest branch of development of a codebase with decades of history!

On the other point (which I wasn't disputing or answer)--about the giant lock, I don't know much about the linux internals, but it sounds like the bsd GIANT is the same as the linux BKL. FreeBSD is in something like its 6th year of removing giant!

Re:Sounds like the Linux kernel needs some tests.. by Stevecrox · 2008-05-17 07:43 · Score: 1

Its called System testing and I agree writing Unit tests are never enough.

As for your comment about them "knowing better", I've worked on a multi-million line project. When your lines of code reaches that sort of size the issues faced for someone on ten million are pretty much the same for people on twenty million loc projects. If you RTFA you'll see it was a series of system tests which demonstrated the problem in the first place. Although the fact the kernel doesn't seem to have a standardised set of system/unit tests does concern me. (correct me if I'm wrong)

Re:Fundamental kernel structures such as this... by maxwell+demon · 2008-05-17 07:52 · Score: 1

One should, perhaps, wonder what currently unreasonable problem should actually start being addressed RIGHT NOW!! The things I can think of divide neatly into two camps. 1) We don't know enough to even get started, and 2) It really seems utterly implausible, even given this example to work from. Unfortunately, somewhere in there is something that's being overlooked, and I don't know what. Kernel support for Actors? Kernel security to control Actors? Kernel support for Language parsing? They all seem implausible.

To just add another few wild guesses:

Managing GPUs (if there are several GPUs, one of them might be used by some graphics program, while the other might be used by a different program to do GPGPU style processing; maybe we will even need multitasking for GPUs, so several programs can use the same GPU at different times)
FPGA support (maybe in the future FPGAs will be common computer components, possibly even integrated into the processor; the kernel will have to manage access to FPGAs; maybe even allow two processes to use different parts of the same FPGA at the same time)
Management of processor-specific memory (making sure that the data a certain process needs is in the memory belonging to that processor; it probably isn't a given that in massive multicore processors, all cores will have access to all of the memory; and even if they have, there might be considerable performance issues if the data is in the wrong memory page)

--
The Tao of math: The numbers you can count are not the real numbers.

Re:Fundamental kernel structures such as this... by RiotingPacifist · 2008-05-17 07:53 · Score: 2, Interesting

while a 2.7 branch would make sense if stability was important, its not, with 2.6 linus decided he couldn't be bothered with the boring part of stability and so told distros to do it themselves. This happens (with varying degrees of success), and has allows the kernel to develop at a much faster pace. Unfortunately this particular change is too much even for the 'unstable' kernel, so a temporary testing branch to get other peoples code sorted is being formed (a virtual 2.7 in effect only its so experimental it will never get released)

--
IranAir Flight 655 never forget!

Comment removed by account_deleted · 2008-05-17 08:11 · Score: 2, Informative

Comment removed based on user account deletion

Layman's translation? by Loopy · 2008-05-17 08:13 · Score: 1

Would one of you kind folks please put this into non-kernel-programmer terms that explain what this does for software/hardware in terms of the user experience and how the proposed outcomes will affect said experience?

Re:Layman's translation? by klapaucjusz · 2008-05-17 12:09 · Score: 1

Would one of you kind folks please put this into non-kernel-programmer terms that explain what this does for software/hardware in terms of the user experience and how the proposed outcomes will affect said experience?

They're trying to make your mouse pointer move smoothly, your audio never skip, your FPS game never miss a beat and your Linux-driven coffee machine never burn a bean, even when there is some process opening a random device in the background.

We need MACRO kernels! by Anonymous Coward · 2008-05-17 08:13 · Score: 2, Funny

Micro kernels are cute to play with but can't cope with the heavy demands that todays computing requires.

The best course of action would be to redesign the Linux kernel from scratch and this time integrate all possible drivers. Hardware support would be a lot easier!

I would even go so far as to suggest integrating the most important server tools into the kernel to decrease latency. Why not integrate Apache? You could even integrate the shell for added responsiveness!

Linus has demonstrated that micro kernels are a footnote in history. Nowadays memory is cheap and we can afford the have a large (or very large) kernel.

Re:We need MACRO kernels! by VGPowerlord · 2008-05-17 08:52 · Score: 3, Funny

Didn't you read that Code Quality In Open and Closed Source Kernels article yesterday? The Linux kernel already has 703,940 macros. It doesn't need any more!

--
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
Re:We need MACRO kernels! by H3g3m0n · 2008-05-17 19:52 · Score: 1

Memory isn't the main reason for Micro kernels, in fact a microkernel (and the servers required to run it) will likely use more memory imho since every process that replaces part of the kernel will require its own crap, unless you strip down the amount of stuff running, but you can do that on a monolithic modularized kernel anyway. I don't feel that Linus has demonstrated the death of microkernels, just that systems aren't ready for them yet, specifically that they weren't in 1983 (when Hurd was announced, or 1991 when Linux was). Besides we don't know what would have happened if Hurd had seen the same amount of development as Linux, Hurd had some recent activities to change to a newer better kernels (such as L4) at least twice but the development community seems to be stalled. Also don't forget OSX uses a hybrid Microkernel. I think the future of microkernels will be for things like cloud computing and clusters providing the communication between each node is fast enough, in this case they will run in virtual machines ontop of regular systems. This probably won't be for a while though, ie when cloud computers are made from actual clouds or nanoprocessors or such.

--
cat /dev/urandom > .sig

Re:Fundamental kernel structures such as this... by IvyKing · 2008-05-17 08:38 · Score: 1

All of this stuff in Solaris, for example, was sorted out in Solaris 2.7 which came out well over a decade ago.

Which reminds me of Alan Cox's trollish remark saying that Sun should drop work on Solaris and support Linux instead - might be easier to add the nice stuff from Linux to Solaris than to clean up the mess with the BKL. IIRC, Sun had been supporting SMP machines from before the time that Linus started on Linux. In addition, getting SMP support done right has been a much higher priority with the Solaris developers than the Linux developers.

Re:Fundamental kernel structures such as this... by jlarocco · 2008-05-17 08:40 · Score: 1

I appreciate the work the kernel devs do (I'm using their work right now), but the mere act of working on the kernel doesn't make them inherently more intelligent or smarter than anybody else.

The OP has a point. This is a pretty big design issue (witness all the things it's screwing up), and should've been addressed a long time ago.

--
Maybe not

Re:(Performance != Speed) // in an RT system by Anonymous Coward · 2008-05-17 09:03 · Score: 2, Interesting

Yes, and since the kernel can and is branched, they can decline to apply this patch and keep the 2.6.7-2.6.22 or whatever style BKL... or they can help everyone and rewrite various BKL-using code to not use it. I'd rather have a kernel that has low latency AND behaves correctly, but if I have to chose I prefer correct behavior every time.

ext4 by fritsd · 2008-05-17 09:10 · Score: 1

One should, perhaps, wonder what currently unreasonable problem should actually start being addressed RIGHT NOW!!

A while ago I read this article: LWN article about ext4 fs

and your comment reminded me of it. I think it's very impressive that some people can think far ahead.

--
To be, or not to be: isn't that quite logical, Slashdot Beta?

Re:Fundamental kernel structures such as this... by RiotingPacifist · 2008-05-17 09:16 · Score: 1

It doesn't make them smarted but it does mean they know a lot more about kernel coding.

It is a deign issue, but its not that big, most of the kernel has granular locks, this only really affects the goal of having an entirely realtime kernel, so while it is important it's not that important. And im not sure what you mean by a long time ago as having a RT kernel is a fairly recent goal.

--
IranAir Flight 655 never forget!

So when can we run QNX-Ubntu? by Anonymous Coward · 2008-05-17 09:28 · Score: 2, Funny

So when can we run QNX-Ubntu?

Just asking . . .

Re:So when can we run QNX-Ubntu? by Animats · 2008-05-17 09:40 · Score: 2, Interesting

You can actually run X-windows on QNX, although nobody does. QNX has its own GUI system called Photon, which is oriented towards things one might want in an industrial control panel or an entertainment system, like meters, dials, and graphs.
QNX the OS is fine; it's QNX the company that's difficult to deal with. On the other hand, who else has gone up against Microsoft on x86 for 20 years and is still alive?
Re:So when can we run QNX-Ubntu? by Anonymous Coward · 2008-05-17 19:36 · Score: 0

Probably when they reach Queery Quagga.
Re:So when can we run QNX-Ubntu? by slcdb · 2008-05-19 04:58 · Score: 1

You mean, when can we run "Ubuntu GNU/QNX". And how many users will realize this is the name of some software, and not some strange foreign language?

--
Despite what EULAs say, most software is sold, not licensed.

Why testing isn't enough by mikeb · 2008-05-17 09:33 · Score: 5, Informative

It's worth pointing out here that the kind of races (bugs) introduced by faulty locking in general suffer from a very important problem: YOU CANNOT TEST FOR THEM.

Race conditions are mostly eliminated by design, not by testing. Testing will find the most egregious ones but the rest cause bizarre and hard-to-trace symptoms that usually end up with someone fixing them by reasoning about them. "Hmm" you think to yourself "that sounds like a race problem. Wonder where it might be?" and thinking about it, looking at the code, inventing scenarios that might trigger a race; that's how you find them.

Re:Why testing isn't enough by jamesh · 2008-05-17 13:12 · Score: 1

With a sophisticated enough code analyser, all things are possible. I don't know of any, but i'm sure they exist. I've been doing driver development under windows recently (PV drivers for Xen), and yes, every time I come across a bug that looks like a race (mostly because it works under UP but not SMP), I end up trying to think up with situations where the race could come about. My drivers are pretty straight forward so it doesn't normally take long, but for something really complicated it would be a nightmare.
Re:Why testing isn't enough by Rich0 · 2008-05-17 16:54 · Score: 1

I dunno - sometimes I wonder if things are tested at all.

I just had to reboot my server because a USB drive was disconnected while it was mounted. I'm not sure what caused the disconnect (it wasn't a cable detach), but it froze the whole usb subsystem. Anything accessing the mounted device or even just running lsusb ended up in the dreaded D state. I couldn't even forcibly unload the usb_storage module (as risky as that would be). Now, otherwise the computer was just fine (ignoring the loads pegged in the 10s-20s - with no noticeable loss in responsiveness), but I couldn't get the usb drive to show up again until after the reboot.

I absolutely love linux, but I have to be honest and say that it has one really big design weakness - it fails badly. The emphasis seems to be on preventing failures, and not on dealing with failures. I think this is where a mircokernel approach would help - more containment means less mess when something goes wrong. Sure, do all you can to prevent failure, but plan for everything...
Re:Why testing isn't enough by Abcd1234 · 2008-05-20 01:59 · Score: 1

Yeah, but no one even mentioned race conditions, here, and my understanding of the problem is that breaking down the BKL isn't an issue of finding race conditions. It's an issue of finding subsystems that take the BKL, but never release it... finding those should be a piece of cake (assuming you can exercise the codepaths).

Re:Sounds like the Linux kernel needs some tests.. by VGPowerlord · 2008-05-17 09:38 · Score: 1

Still, the OS kernel is, by definition, one of the most complex pieces of software in a system. There's only three other ones I can think of that would even come close: The compiler, the system libraries (libc), and device firmware.

--
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011

Layman's summary in terms of user experience by Sits · 2008-05-17 09:45 · Score: 2, Informative

The proposed outcome is for there to be increased opportunities to switch between programs/kernel or to run multiple things at the same time.

For those who enable the option this should reduce the chance of a hardware's buffer not being filled in time (so audio is less likely to skip in demanding environments). If you are an audio recording person or need VERY (less than hundredths of seconds) fast responses all the time your experience should improve. If you run VERY big workloads that have lots of pieces that can happen simultaneously on computers with 2-1024 CPUs, your experience (increase in work finished per second say) should improve. Typical desktop performance may improve a little if you have multiple CPUs/cores but one would guess not enough to be noticeable without careful measurement.

The trade off is increased risk of system hangs / data corruption due to the programming being trickier although instances of this happening should fall over time with popular hardware.

Re:FIRST! by Anonymous Coward · 2008-05-17 10:29 · Score: 0

YOU FUCKING ASSHOLE.

Always has been. Always will be. by Anonymous Coward · 2008-05-17 11:30 · Score: 0

That, my dim-witted friend, is what the Linux is.

senior programmers should document rather than fix by Anonymous Coward · 2008-05-17 11:33 · Score: 0

Lack of documentation in code in legacy sections is the main reason why much of the code (i.e. users of BKL) remains untouched while they should be improved. Senior programmers should document the code to help others to maintain it, instead of fixing themselves.

Re:Sounds like the Linux kernel needs some tests.. by MichaelSmith · 2008-05-17 12:52 · Score: 2, Funny

Its called System testing Thats what users are for.

--
http://michaelsmith.id.au

Re:Sounds like the Linux kernel needs some tests.. by MichaelSmith · 2008-05-17 12:54 · Score: 1

Still, the OS kernel is, by definition, one of the most complex pieces of software in a system. Only because of its architecture. I know of many more complex systems but they are modularised into different processes with middleware for interfacing.

--
http://michaelsmith.id.au

Re:Tough to test drivers for hardware you don't ha by bfields · 2008-05-17 15:22 · Score: 1

Yep. Most bugs are in drivers and architecture-specific code (that's where most of the code is!), and unfortunately it's unrealistic to expect everyone who changes a piece of code to retest with all of the (possibly obscure) hardware the change might theoretically have affected.

Re:Sounds like the Linux kernel needs some tests.. by Al+Dimond · 2008-05-17 15:26 · Score: 1

I'm not going to claim that Linux's, or any other kernel's, architecture is perfect; I'm not qualified to judge that. I do suspect you're missing one of the key differences between kernels and most types of software when you make that statement: much of the kernel's behavior is specified by external forces. Hardware, processor, and system specs. Often the weird details of these specs are based on details of their hardware design, which is often based on making the final product cheaply mass-producible rather than on having a nice clean interface. The kernel doesn't have a choice: it has to deal with many of these idiosyncratic devices at once.

Additionally there are performance concerns: even beyond the application-specific cases where kernels have to perform, and the levels of performance they have to meet to comply with hardware specs, generally a kernel has to perform well against competing kernels to be successful in the market.

There might be more complex systems in number of lines of code, or in number of tasks that need to be performed, but there are lots of choices that kernel writers don't get to make, because of the types of tasks that the kernel has to perform and how efficiently it has to perform them.

Re:This is why monolithic kernels do real-time bad by Josh+Booth · 2008-05-17 15:56 · Score: 1

It seems to me that switching to a microkernel just because people have been abusing the Big Kernel Spinlock is like using a nuclear bomb to drive in a nail: sure, it is a technically pure solution, but the problems involved exceed the number of solutions. It seems to me that the Linux kernel is a monolithic kernel only in the fact that everything shares the same address space. Linus seems to be a smart guy, and considering how Linux scales in real world scenarios and its general popularity (no disrespect for QNX), it seems that the only reason to bash its monolithic design is that it is conceptually inferior. You have to admire how well the kernel was able to grow from being 386+ uniprocessor only to what it can do now, and if real time performance (something it was not designed for in the first place) is suffering because so much code is improperly using the deprecated Big Kernel Lock, then I don't see why there should be this animosity. And furthermore, the solution is fairly straightforward and has been done before; it just requires a lot of code to be reworked. And besides, the BKL had been made preemptible anyway for a good 15 stable kernel releases anyway. These patches are only against an experimental fork to remove the BKL.

Organic code writing FTW!

Re:(Performance != Speed) // in an RT system by statusbar · 2008-05-17 19:57 · Score: 1

Yes.

Even when the BKL is pre-emptible, however, linux can only provide soft real time support.

The best way to achieve HARD real time support in linux is via Xenomai - and it gives you real time in user mode too.

--jeffk++

--
ipv6 is my vpn

Re:Sounds like the Linux kernel needs some tests.. by dodobh · 2008-05-17 20:23 · Score: 1

Each test is supposed to be small, and easily comprehensible. You can have a large collection of tests, but they are all unique.

--
I can throw myself at the ground, and miss.

Hmmmm. by jd · 2008-05-17 20:48 · Score: 1

But you CAN test a design, and you CAN validate that a given block of code is functionally identical to a formal design for that same block, ergo for some cases you CAN use testing methods to (indirectly) validate locking even when a direct test would not be possible.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Please report if you are on a modern distro by Sits · 2008-05-17 20:53 · Score: 1

If you are on the latest release of your distro and the problem is reproducible please report it to your distros bug database. People only fix things that they know are broken...

Re:This is why monolithic kernels do real-time bad by abacus1 · 2008-05-17 21:12 · Score: 1

Within QNX, which really is a microkernel, almost everything is preemptable. All the kernel does is pass messages, manage memory, and dispatch the CPUs.

Your comment is incomplete and not completely correct.

One topic you touched is the choice of locking versus message passing for communication between threads. With a monolithic kernel like Linux you have the choice between locking and message passing. With a microkernel like QNX you don't have a choice -- message passing is the only option. This matters because the more messages that are passed, the more context switches that are needed. And each context switch takes some time. Which means that microkernels have a throughput disadvantage.

Your post seems to suggest that the option of passing messages between subsystems in the Linux kernel is not available. This is very well possible.

Regarding multiple CPU's and spinlock-based systems: aren't you aware that with Linux-rt all spinlocks are converted into preemptible mutexes ?

Or: the latency issue you touched is real. But stating that the latency of a microkernel is better than the latency of a monolithic kernel is wrong. And you seem to be unaware of Linux-rt.

Are you perhaps a QNX-salesman ?

IANAKD? by Anonymous Coward · 2008-05-17 21:37 · Score: 1, Funny

Having to spell out I'm not a kernel developer on a supposed geek site, while having IANAL in constant use, just seems wrong. So I would like to submit the following abbreviations for immediate use: IA[N]AKD/H.

Re:This is why monolithic kernels do real-time bad by flnca · 2008-05-17 22:30 · Score: 1

This matters because the more messages that are passed, the more context switches that are needed. And each context switch takes some time. Which means that microkernels have a throughput disadvantage. Context switches can be as cheap as two machine instructions (loading a task structure pointer into a register, and jumping to a code address). No-one says you have to use the heavy-duty task switching in a kernel. Only when kernel context has been left and user context is entered and vice versa, you have to set up the MMU and other stuff.

Desktop turning point: 40% not MS by Anonymous Coward · 2008-05-17 22:55 · Score: 0

I figure that when MS have a genuine 60% or less market share they'll either go to work on being INTEROPERABLE (especially on services: always annoyed me that MS servers weren't: they only served MS products) or they'll fall to the minority and effectively die.

Which is taken depends who's in charge at the time and how they think. Ballmer and Gates wouldn't change now and they'd rather see the company die (or rather would not concede that their actions would cause its death) than change outlook and start working with its competitors.

Re:This is why monolithic kernels do real-time bad by abacus1 · 2008-05-18 00:29 · Score: 1

Context switches can be as cheap as two machine instructions (loading a task structure pointer into a register, and jumping to a code address). Modern systems have caches, and context switches typically cause many cache misses. This has an important negative impact on performance.

Re:Sounds like the Linux kernel needs some tests.. by swillden · 2008-05-18 01:19 · Score: 2, Insightful

As for your comment about them "knowing better", I've worked on a multi-million line project. When your lines of code reaches that sort of size the issues faced for someone on ten million are pretty much the same for people on twenty million loc projects.

All lines of code are not equal.

There's a huge difference between typical application code and system code. Very little application code is as performance-sensitive as system code, because the goal of the system code is to use as little time as possible (consistent with some other goals), to make as much time as possible available to run application code.

OS code is performance-tuned to a degree that application code almost never is, and that focus on performance results in complexity that isn't well-represented by counting lines of code. Further, most application code isn't nearly as multiprocessor-aware as modern OS code, which introduces another huge complexity factor. Finally, the role of OS code is to interact directly with hardware, and if you've ever written on-the-metal code you know how much complexity that adds. Linux, of course, takes that even further by trying to work on a wide variety of hardware platforms, abstracting commonality where possible, but only when it doesn't interfere with performance.

No, I don't think there are many, if any, unit tests.

Altogether, I estimate that system code is an order of magnitude more complex, per line, than application code.

If you RTFA you'll see it was a series of system tests which demonstrated the problem in the first place. Although the fact the kernel doesn't seem to have a standardised set of system/unit tests does concern me. (correct me if I'm wrong)

There are a bunch of sets of system tests in place for Linux. They're created and executed by multiple groups of people around the world and the results are made available to the developers (some of whom are the same people executing the tests).

This is a different approach than is common in the normal, centralized development model, but it's one that's very effective for the sort of decentralized development model used by the Linux kernel team. People who are interested in different aspects of Linux create tests designed to evaluate the kernel according to those aspects. When they see problems, or opportunities for improvement, they post their results to LKML, often with patches to address the issue they identified.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.

Re:This is why monolithic kernels do real-time bad by flnca · 2008-05-18 04:42 · Score: 1

With the sizes of modern CPU caches, one would assume that the cache is big enough to hold the most frequently used parts of the kernel (because we're talking about context-switching between kernel components here).

Re:This is why monolithic kernels do real-time bad by mike260 · 2008-05-18 09:54 · Score: 1

Silly question: If you don't bother with memory-protection for in-kernel context switches, then how is it a microkernel?

Re:Fundamental kernel structures such as this... by David+Gerard · 2008-05-19 03:35 · Score: 1

And the trouble is that it crippled Solaris performance on systems with less than four CPUs. Debian SPARC was famously much faster than Solaris 8 on single-processor machines running the exact same software.

--
http://rocknerd.co.uk

BKL should be for emergencies only. by shentino · 2008-05-19 03:50 · Score: 1

Loading and unloading kernel modules, enabling and disabling big subsystems, and other such heavyweight operations may need to lock out the rest of the kernel completely.

I propose that the BKL be strictly audited from this point forward, much like softirq's are. Softirq's are a strictly limited resource, and the BKL has huge latency problems if misused.

The BKL is a nice thing to have for when nothing else will work. However, it should be used strictly on an as needed basis only.

It happens to be one thing that will work if nothing else will. So, it's good as a lock of last-resort, but should be avoided at all costs if there's anything better.

But no, it should not be completely removed.

Re:Sounds like the Linux kernel needs some tests.. by Alex+Belits · 2008-05-19 04:56 · Score: 1

It does not matter -- the problem is, tests only check if particular criteria are satisfied or not. However if you really knew all criteria, you would trivially derive a program from them, and there would be nothing to test.

In reality tests are only for things you believe are important, or possible to get wrong, and the more of them you have, the harder is to find out what they don't cover. Tests may have an advantage that they don't have to be optimized, but that merely makes them slightly less likely to be wrong.

--
Contrary to the popular belief, there indeed is no God.

Re:Fundamental kernel structures such as this... by turgid · 2008-05-19 07:13 · Score: 1

And the trouble is that it crippled Solaris performance on systems with less than four CPUs. Debian SPARC was famously much faster than Solaris 8 on single-processor machines running the exact same software.

And Solaris is much faster than Linux (scales better) on multi-cpu systems. Sun saw the future and is ahead in that one respect.

Now, Solaris networking was much slower than Linux...

--
Stick Men

Re:Fundamental kernel structures such as this... by David+Gerard · 2008-05-19 07:37 · Score: 1

Oh yeah. Note that I'm speaking as someone who works as a Solaris admin for a living. I'm very pleased that with Solaris 10, they realised the competition was Linux. Competition is good.

--
http://rocknerd.co.uk

Re: finite disable by neonsignal · 2008-05-21 18:37 · Score: 1

Interesting idea, a time-t based (or n-instruction based) interrupt disable. Still a bit open for abuse if it was called too often, but at least a single user task couldn't hang the system for long.

It doesn't typically improve on a basic spin-lock, unless a significant amount of time is being spent in locked code. And turning off interrupts would normally be seen as a heavy-handed way of protecting code, because it affects every other process, even if that process will never access that code (and adds latency to interrupts). But arguably it could reduce the number of process switches where tasks of equal priority are competing for a resources.

Inefficiencies in multitasking settings are often not simple to fix. For example, in a situation where one task is producing an item and another is consuming the item, and both run at equal priority, then you can get the situation where a task switch occurs on every item produced (instead of batching them). If the processing of the item is fast, then the task switch can dominate the time taken. There are hacks that most operating systems use, but it isn't as simple as just making the semaphores go faster.

It comes down to a mismatch between the heaviness of a thread (each having a significant amount of state information) compared to the efficiency of processing code fragments. Current architectures are oriented around processing a single task, not around fine-grained parallelism (and rightly so, since that is how we write our programs). But it does mean that we try to find ways to avoid parallelism (and struggle with how to structure it).

The issue for me is how code fragments should communicate in a highly parallel environment. If, for example, they use queues, then sure, it makes sense to develop hardware that optimizes these queues. But at the moment we don't even know how to write highly parallel programs (except for the problems where we model fields of elements, where the parallelism is obvious and highly regular, such as in fluid flow analysis).

Once we know how we want to write these parallel systems (and kernels are one of the complex and heteregenous examples of what we want to be able to do), then we can start thinking about hardware optimizations. Unfortunately any change in paradigm has major ramifications; for example, moving from stacks to queues would have implications for memory caching strategies (or even whether memory should be distributed rather than global).

222 comments