Behind the Scenes in Kernel Development

← Back to Stories (view on slashdot.org)

Behind the Scenes in Kernel Development

Posted by michael on Thursday February 19, 2004 @02:50AM from the knit-one-perl-two dept.

An anonymous reader writes "Some interesting changes took place in the way the Linux kernel is developed and tested. In many ways, the methods used to develop the Linux kernel are much the same today as they were 3 years ago. However, several key changes have improved overall stability as well as quality. This article takes a look behind the scenes at the tools, tests, and techniques -- from revision control and regression testing to bugtracking and list keeping -- that helped make 2.6 a better kernel than any that have come before it." We might as well mention here (again) that a couple of new kernels are out: leif.singer writes "2.6.3 and 2.4.25 are out, fixing another vulnerability in do_mremap()."

15 of 139 comments (clear)

Min score:

Reason:

Sort:

Kernel development interests me terribly by ObviousGuy · 2004-02-19 02:53 · Score: 5, Interesting

I wish I could wrap my head around even the smallest part of the kernel. There is so much code in there and aside from main(), it is hard to find a good place to start studying.

Would these tests be a good starting place?

--
I have been pwned because my /. password was too easy to guess.
1. Re:Kernel development interests me terribly by slamb · 2004-02-19 05:52 · Score: 4, Interesting
  I wish I could wrap my head around even the smallest part of the kernel. There is so much code in there and aside from main(), it is hard to find a good place to start studying.
  Very recently, I've been writing some low-level code. There was a long while I'd thought this was out of my league. Then I realized several things:
  
  I was not happy with several characteristics of the low-level code other people had written and I was depending on.
  I had done some more low-level stuff long ago - like a couple simple but legitimately useful assembly programs in DOS, and even a patch that added a sort of capability system to the OpenBSD kernel. (I never polished up the patch enough to send it in to them or anything, but the point is that it essentially worked, and I wasn't afraid to take it on.)
  When I'd done those things back in the day, I wasn't anywhere near as good a coder as I am now.
  The only reason I'd been unable to do these things more recently is an attitude that I'm not good enough, not a reality. (It's an attitude a lot of people in low-level code promote, I think. They so much don't want to waste their time with people who really are bad that they probably don't mind scaring off a few people who are in fact good but don't realize it. Also, I think there's ego involved - it's an exclusive club, why not let it stay that way.)
  So I think the moral of the story is to just be fearless/persistent. If you're not confident, there are plenty of ways you can improve without even involving anyone else:
  
  Read the code. It sounds obvious, but there's a lot of code I'd stayed away from even looking at because of intimidation.
  Try experiments. Make a change, set a hypothesis about what it will do, and run it. Then see why you were wrong, if you were. Then try it again. Even just getting in the habit of running the build system will help, and setting up experiments like this will help your debugging.
  Find something lacking and try to fix it.
  And then, if you're still not comfortable talking on the linux-kernel list, I think you have at least another couple choices:
  
  If you're lucky, you're friendly with someone more skilled and can use him/her to screen questions.
  There's a couple lists like kernel-janitors and kernel-newbies to dip your feet in the water.
  Sometimes in the process of writing an eloquent question through email you'll figure out the answer yourself. (Did you see the teddy bear anecdote in the debugging link above?)
  As for myself, I'm taking my own advice to make sigsafe - an alternate set of system call wrappers (libc level) that eliminate a couple race conditions involving signals, without a performance penalty. It's going well - the code works, and I have a race condition checker and microbenchmark to prove it. I just released my first version. Now I'm working on the documentation; it still needs a lot of work. (I could use plenty of help with this project! If you want to try low-level programming, it's a great way. It requires writing assembly for each combination of operating system and architecture. I've only written it for two systems. There are plenty left, and public systems to do it on if you don't have access to exotic machines of your own. Plus, you can hopefully gain some low-level understanding by proof-reading and helping me write the documentation.)
  Once I have that polished, I've got a couple projects I might try in the Linux kernel (and/or other kernels):
  
  implementing a couple of system calls - the nonblocking_read(2) and nonblocking_write(2) that djb mentions.
  implementing SO_RCVTIMEO and SO_SNDTIMEO under Linux. Assuming no one has yet; I haven't checked, so the manpage could just be out of date. Which brings m
Automatic Testing by 4of12 · 2004-02-19 03:01 · Score: 4, Interesting

I can't say how much I appreciate the automatic tests. This is applying computers to a thankless task that they're suited for.

Now if they only had a web dashboard portal showing the latest results in an easily-assimilated color coded HTML table....

--
"Provided by the management for your protection."
1. Re:Automatic Testing by tcopeland · 2004-02-19 03:31 · Score: 1, Interesting
  
  > a web dashboard portal showing the latest
  > results in an easily-assimilated color
  > coded HTML table
  
  So true. I've set up one of these for a project I work on, and it's really helpful to be able to see at a glance the status of a bunch of projects.
  
  Seems like some lintish tools could be incorporated into a kernel daily build, and maybe something like CPD as well...
  
  --
  The Army reading list
Kernel quality by Rosco+P.+Coltrane · 2004-02-19 03:04 · Score: 5, Interesting

However, several key changes have improved overall stability as well as quality.

I have a suggestion : how about not calling development kernels with an even version number?

- 2.6.0-beta-something kernels were bad (okay fair enough, it was beta, and Linus admitted having called a 2.5.x kernel 2.6 in order to lure early adopters and get them to test it).

- 2.6.0, 2.6.1 and 2.6.2 were unstable for me, with doozies such as oopses while rmmoding and random crashes using ide-scsi (yes I know it's deprecated, but some of us need it).

I now run 2.6.3-rc3 and it's the first time it seems stable enough to be called a 2.6 kernel. There are some problems left, but overall it's getting decent. But then why are the others "2.6" kernel called 2.6 at all? they were really 2.5 kernels imho.

This has happened before, with the beginning of the 2.4 serie. I only felt it was getting good enough at version 2.4.6 and above (I'm not counting the failed 2.4.11 release). When 2.4.0 went out, I thought it meant it was ready for prime time, like 2.2.0 was, or at least was more, but no it was crap. I was slightly annoyed with Linus then, but I thought he had been pressured by commercial Linux shops and that he wouldn't do it again. But no, he did it again with 2.6.

It's really quite annoying, because those who follow Linux know the first "stable" kernels aren't stable at all, therefore avoid it, therefore defeat the point of testing it for Linus, but beginners think "cool, a new stable kernel", try it and are disappointed, giving a bad name to an otherwise great kernel. Too bad ...

--
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
1. Re:Kernel quality by Psiren · 2004-02-19 03:10 · Score: 5, Interesting
  
  This is a chicken and egg situation. Unless there is widespread testing of a kernel, some bugs won't be found. But not everyone wants to risk running a development kernel, so the only way to get them to test is to bend the truth slightly, and call a beta version the new stable kernel. At the end of the day, the number just reflects the developers opinion on the stability of the thing as a whole. They could make no changes to 2.6.3 and release it as 2.7.0, but that wouldn't make it any less unstable.
2. Re:Kernel quality by Rosco+P.+Coltrane · 2004-02-19 03:20 · Score: 4, Interesting
  
  the only way to get them to test is to bend the truth slightly, and call a beta version the new stable kernel
  
  I realize that, but what I'm saying is that those in the know get burnt a couple of times, then see through the bullcrap and silently renumber the kernel versions. In the end, early adopters are pissed off because they've been lied to a little, and swear never to try newer stable versions again, newcomers get disgusted by the quality of early stable releases, and Linus doesn't get the testing he wanted that made him bend the truth in the first place, therefore everybody loses.
  
  I'd much rather see Linus say : "here, there's this 2.5, we call it 2.5.xx-RC-something. It's close to 2.6, but not quite. *PLEASE* test it for us *PLEASE*!, that'll allow 2.6.0 to be good". He could even have a "best testers" or "more devoted QA volunteers" list prominently displayed on the main page at kernel.org, to appeal to people's sense of ego.
  
  At least that would be a more honest approach to testing new kernels than lying to people.
  
  --
  "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
3. Re:Kernel quality by Rosco+P.+Coltrane · 2004-02-19 03:40 · Score: 2, Interesting
  
  Man, come on dog, it is after all a free Operating System kernel. I bet you bitch about rain being wet... that's my word, holla...
  
  My friend, not everybody uses Linux just for shits and giggle, and for the heck of saying "Windoze sux0rs" to buddies. I actually need to get work done with my Linux boxes, and so when a stable kernel isn't stable, it pisses me off.
  
  Also, FYI, I reckon I have a certain right to be annoyed because I contribute code back to the free software community, in the form of userland software projects and specialized Linux drivers. It's not much perhaps, but I'm not just a freeloader who should be happy with what he gets for free.
  
  --
  "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
2.2.0? by autechre · 2004-02-19 03:42 · Score: 4, Interesting

2.2.0 had a bug where the system would instantly reboot when any user ran "ldd". I wouldn't call that "ready for prime time" :)

(I remember this because we were waiting for 2.2.x to come out, having just gotten a dual P-II 350 server [2.0.x didn't have SMP support]. Fortunately, we managed to hold off for the first few revisions.)

It's not as if this problem is unique to the Linux kernel. "Never use a Red Hat .0 release" is pretty sage advice, and of course we know Microsoft's track record. You're not going to be able to catch all of the bugs before something gets truly widespread testing, no matter what you call it or how long you work on it.

--
WMBC freeform/independent online radio.
Bitkeeper by jurgen · 2004-02-19 03:45 · Score: 5, Interesting

I sent this to the author of the article...
[the author] wrote:
The lack of formal revision control and source code management led many to suggest the use of a product called BitKeeper.
I grant that sometimes you have to simplify history to avoid digressing in an article, but this is a bit too inaccurate to let stand.
Bitkeeper wasn't suggested by anyone; it didn't have to be. It was developed from the ground up to Linus' requirements. Larry McVoy had a discussion about source control with Linus years ago, in which Linus said "none of the products are good enough" and Larry said, "ok, I'm going to write one that is". Apparently he had this on his mind anyway, and so he started Bitmover Co. As bitkeeper became a usable product Larry continued to take Linus feedback and improve it until it was good enough for Linus to use... at which point Linus started using it.
This is still a simplification of course, but it's closer... and as you can see, there were no third party suggestions involved.
But why is it in so few distros? by Bob+Bitchen · 2004-02-19 03:57 · Score: 4, Interesting

That's what bothers me. How long will the distros wait until they use the 2.6 kernel? I hear the scheduler is improved amongst many other things. So what's the hold up? Is it just that there's no one willing to be the guinea pig?

--
http://tinyurl.com/3t236
Re:Interesting read by FePe · 2004-02-19 04:14 · Score: 3, Interesting

It's still amazing to me that a project as large as Linux was able to be so successfull BEFORE the changes that were made to the development process. It lacked a centralized CVS, coherent bug tracking, automated testing...
Without beeing too sure, I believe that Linux developer's in the beginning focused more on fixing bugs than keeping things clean and structured. That's mainly because the base of the system needed to be developed, and little attention was drawn to factors like speed and optimization. "First make it work, then make it fast."
And most bugs were indeed caught. Linus's law, states that "given enough eyeballs, all bugs are shallow". More formally: "Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone." (From Wikipedia)

--
"Until you do what you believe in, how do you know whether you believe in it or not?" -- Leo Tolstoy
2.4.x and 2.6.1 and external USB devices by Spoing · 2004-02-19 05:26 · Score: 2, Interesting

This isn't the right story to mention this on, though it's somewhat related.
I've encountered many problems with external hard drives using USB 1 and 2 interfaces. Locking up the entire system on large file copies was the main issue. (Copying small numbers of files was never an issue. Lockups occured on different drives, different external chipsets, different 2.4.x kernels though supposedly fixed in the latest 2.4.x releases.)
I've finally gotten the nerve to run a few days of tests on 2.6.1 to see if this has been really resolved, and I'm happy to report that this now works like a charm.
If you've encountered similar problems with 2.4.x, give 2.6.x a try.

--
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Re:Interesting read by raxx7 · 2004-02-19 06:17 · Score: 3, Interesting

I'm just speculating here, but I think the issue would be speed. CVS isn't very efficient in terms of speed or disk space. Handling something as large as the kernel might be a problem and duplicating trees with cp -rl an interesting alternative.
SVN is much more efficient though. I'm not sure that comment aplied only to CVS or to avaliable versioning systems in general.
Re:BitKeeper? by Anonymous Coward · 2004-02-19 09:46 · Score: 1, Interesting

Bitkeeper is proprietary. You can contribute (all you like) by posting patches to the linux kernel mailing list (LKML). Linus didn't like CVS or RCS. There was a lot of (angry) debate on LKML about using bitkeeper. I know (at least one) kernel developer using subversion. You can call bitkeeper proprietary all you like (and be correct, since it is). Anticompetitive? Maybe. It's what Linus wants to use because it works (there is only one Linus, and he would pull his hair out applying patches all day without it). It provides excellent, decentralized source control (better than CVS or RCS or SCCS or any of the others). Smaller projects can use these (and they work great). The kernel is too large to use them.