Behind the Scenes in Kernel Development

← Back to Stories (view on slashdot.org)

Behind the Scenes in Kernel Development

Posted by michael on Thursday February 19, 2004 @02:50AM from the knit-one-perl-two dept.

An anonymous reader writes "Some interesting changes took place in the way the Linux kernel is developed and tested. In many ways, the methods used to develop the Linux kernel are much the same today as they were 3 years ago. However, several key changes have improved overall stability as well as quality. This article takes a look behind the scenes at the tools, tests, and techniques -- from revision control and regression testing to bugtracking and list keeping -- that helped make 2.6 a better kernel than any that have come before it." We might as well mention here (again) that a couple of new kernels are out: leif.singer writes "2.6.3 and 2.4.25 are out, fixing another vulnerability in do_mremap()."

27 of 139 comments (clear)

Kernel development interests me terribly by ObviousGuy · 2004-02-19 02:53 · Score: 5, Interesting

I wish I could wrap my head around even the smallest part of the kernel. There is so much code in there and aside from main(), it is hard to find a good place to start studying.

Would these tests be a good starting place?

--
I have been pwned because my /. password was too easy to guess.
1. Re:Kernel development interests me terribly by deadlinegrunt · 2004-02-19 02:59 · Score: 5, Informative
  
  Find a particular functionality of the kernel that really interest you; read any documentation you can find about it; then grep the src till you see the relevant sections of code and start perusing with your $(EDITOR)
  
  Much time is spent teaching people how to write code but never really reading it. This is a perfect example of how to do it and why you would.
  
  --
  BSD is designed. Linux is grown. C++ libs
2. Re:Kernel development interests me terribly by millahtime · 2004-02-19 03:03 · Score: 5, Insightful
  
  "Much time is spent teaching people how to write code but never really reading it. This is a perfect example of how to do it and why you would."
  
  Reading code can be a huge help in becomeing a better coder. You see how other coders do things. Learning from their bad on what not to do and seeing new good methods you may not have come up with on your own.
  
  --
  Evolution or ID?
3. Re:Kernel development interests me terribly by Rosco+P.+Coltrane · 2004-02-19 03:08 · Score: 5, Funny
  
  I wish I could wrap my head around even the smallest part of the kernel. There is so much code in there and aside from main(), it is hard to find a good place to start studying.
  
  You could contribute some work to SCO: I hear they're very interested in having someone sprinkle several "printk("(c) SCO\n");" lines here and there in init/main.c, since they can't do it themselves, having no technical department, being a law firm and all...
  
  --
  "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
4. Re:Kernel development interests me terribly by tcopeland · 2004-02-19 03:35 · Score: 5, Informative
  
  Code Reading by Diomidis Spinellis contains a bunch of ideas on ways to comprehend large codebases more easily.
  
  He talks about browsing code, package structures, adding features or fixing bugs in a large codebase, and so on. It's a good read - well worth the money.
  
  --
  The Army reading list
5. Re:Kernel development interests me terribly by po8 · 2004-02-19 04:37 · Score: 5, Funny
  
  Does the fact that Diomidis Spinellis has repeated won the International Obfuscated C Code Contest (IOCCC) make him more or less qualified to write such a book :-)? Check out his "best abuse of the rules" entry from 1988 that is my all-time favorite. BTW, the contest is currently open.
6. Re:Kernel development interests me terribly by slamb · 2004-02-19 05:52 · Score: 4, Interesting
  I wish I could wrap my head around even the smallest part of the kernel. There is so much code in there and aside from main(), it is hard to find a good place to start studying.
  Very recently, I've been writing some low-level code. There was a long while I'd thought this was out of my league. Then I realized several things:
  
  I was not happy with several characteristics of the low-level code other people had written and I was depending on.
  I had done some more low-level stuff long ago - like a couple simple but legitimately useful assembly programs in DOS, and even a patch that added a sort of capability system to the OpenBSD kernel. (I never polished up the patch enough to send it in to them or anything, but the point is that it essentially worked, and I wasn't afraid to take it on.)
  When I'd done those things back in the day, I wasn't anywhere near as good a coder as I am now.
  The only reason I'd been unable to do these things more recently is an attitude that I'm not good enough, not a reality. (It's an attitude a lot of people in low-level code promote, I think. They so much don't want to waste their time with people who really are bad that they probably don't mind scaring off a few people who are in fact good but don't realize it. Also, I think there's ego involved - it's an exclusive club, why not let it stay that way.)
  So I think the moral of the story is to just be fearless/persistent. If you're not confident, there are plenty of ways you can improve without even involving anyone else:
  
  Read the code. It sounds obvious, but there's a lot of code I'd stayed away from even looking at because of intimidation.
  Try experiments. Make a change, set a hypothesis about what it will do, and run it. Then see why you were wrong, if you were. Then try it again. Even just getting in the habit of running the build system will help, and setting up experiments like this will help your debugging.
  Find something lacking and try to fix it.
  And then, if you're still not comfortable talking on the linux-kernel list, I think you have at least another couple choices:
  
  If you're lucky, you're friendly with someone more skilled and can use him/her to screen questions.
  There's a couple lists like kernel-janitors and kernel-newbies to dip your feet in the water.
  Sometimes in the process of writing an eloquent question through email you'll figure out the answer yourself. (Did you see the teddy bear anecdote in the debugging link above?)
  As for myself, I'm taking my own advice to make sigsafe - an alternate set of system call wrappers (libc level) that eliminate a couple race conditions involving signals, without a performance penalty. It's going well - the code works, and I have a race condition checker and microbenchmark to prove it. I just released my first version. Now I'm working on the documentation; it still needs a lot of work. (I could use plenty of help with this project! If you want to try low-level programming, it's a great way. It requires writing assembly for each combination of operating system and architecture. I've only written it for two systems. There are plenty left, and public systems to do it on if you don't have access to exotic machines of your own. Plus, you can hopefully gain some low-level understanding by proof-reading and helping me write the documentation.)
  Once I have that polished, I've got a couple projects I might try in the Linux kernel (and/or other kernels):
  
  implementing a couple of system calls - the nonblocking_read(2) and nonblocking_write(2) that djb mentions.
  implementing SO_RCVTIMEO and SO_SNDTIMEO under Linux. Assuming no one has yet; I haven't checked, so the manpage could just be out of date. Which brings m
Automatic Testing by 4of12 · 2004-02-19 03:01 · Score: 4, Interesting

I can't say how much I appreciate the automatic tests. This is applying computers to a thankless task that they're suited for.

Now if they only had a web dashboard portal showing the latest results in an easily-assimilated color coded HTML table....

--
"Provided by the management for your protection."
Kernel quality by Rosco+P.+Coltrane · 2004-02-19 03:04 · Score: 5, Interesting

However, several key changes have improved overall stability as well as quality.

I have a suggestion : how about not calling development kernels with an even version number?

- 2.6.0-beta-something kernels were bad (okay fair enough, it was beta, and Linus admitted having called a 2.5.x kernel 2.6 in order to lure early adopters and get them to test it).

- 2.6.0, 2.6.1 and 2.6.2 were unstable for me, with doozies such as oopses while rmmoding and random crashes using ide-scsi (yes I know it's deprecated, but some of us need it).

I now run 2.6.3-rc3 and it's the first time it seems stable enough to be called a 2.6 kernel. There are some problems left, but overall it's getting decent. But then why are the others "2.6" kernel called 2.6 at all? they were really 2.5 kernels imho.

This has happened before, with the beginning of the 2.4 serie. I only felt it was getting good enough at version 2.4.6 and above (I'm not counting the failed 2.4.11 release). When 2.4.0 went out, I thought it meant it was ready for prime time, like 2.2.0 was, or at least was more, but no it was crap. I was slightly annoyed with Linus then, but I thought he had been pressured by commercial Linux shops and that he wouldn't do it again. But no, he did it again with 2.6.

It's really quite annoying, because those who follow Linux know the first "stable" kernels aren't stable at all, therefore avoid it, therefore defeat the point of testing it for Linus, but beginners think "cool, a new stable kernel", try it and are disappointed, giving a bad name to an otherwise great kernel. Too bad ...

--
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
1. Re:Kernel quality by Psiren · 2004-02-19 03:10 · Score: 5, Interesting
  
  This is a chicken and egg situation. Unless there is widespread testing of a kernel, some bugs won't be found. But not everyone wants to risk running a development kernel, so the only way to get them to test is to bend the truth slightly, and call a beta version the new stable kernel. At the end of the day, the number just reflects the developers opinion on the stability of the thing as a whole. They could make no changes to 2.6.3 and release it as 2.7.0, but that wouldn't make it any less unstable.
2. Re:Kernel quality by Rosco+P.+Coltrane · 2004-02-19 03:20 · Score: 4, Interesting
  
  the only way to get them to test is to bend the truth slightly, and call a beta version the new stable kernel
  
  I realize that, but what I'm saying is that those in the know get burnt a couple of times, then see through the bullcrap and silently renumber the kernel versions. In the end, early adopters are pissed off because they've been lied to a little, and swear never to try newer stable versions again, newcomers get disgusted by the quality of early stable releases, and Linus doesn't get the testing he wanted that made him bend the truth in the first place, therefore everybody loses.
  
  I'd much rather see Linus say : "here, there's this 2.5, we call it 2.5.xx-RC-something. It's close to 2.6, but not quite. *PLEASE* test it for us *PLEASE*!, that'll allow 2.6.0 to be good". He could even have a "best testers" or "more devoted QA volunteers" list prominently displayed on the main page at kernel.org, to appeal to people's sense of ego.
  
  At least that would be a more honest approach to testing new kernels than lying to people.
  
  --
  "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
3. Re:Kernel quality by Anonymous Coward · 2004-02-19 03:31 · Score: 5, Informative
  
  and random crashes using ide-scsi (yes I know it's deprecated, but some of us need it).
  
  disable ide-scsi and use latest cdrecord with
  
  dev=ATAPI:x,x,x
  
  instead of
  
  dev=x,x,x
  
  and everything is cool. Don't forget to check lilo.conf. Stop using cdrdao. Ignore xcdroast's "This would be faster is you had ide-scsi enabled" dialogs.
  
  This should apply to almost every kind of ide-scsi use.
4. Re:Kernel quality by adrianbaugh · 2004-02-19 03:44 · Score: 5, Insightful
  
  That's exactly what Linus does: there is such a series of release candidates (first introduced prior to 2.4). You can argue that it isn't long enough, but there's an obvious counterargument that if you wait forever nothing will ever get released.
  I can't think of a x.y.0 release of any software project that's been properly stable. It's not just linux, it's the way the world is. You could argue that software never ever becomes perfectly stable: marking a series as "stable" is really just shorthand for good enough that further development is largely maintenance, therefore we expect the structure and codebase to remain stable", not some guarantee that they'll never go wrong. It's more a development term than a performance or reliability term, though the stability of development generally arises from the performance and reliability being sufficient to obviate the need for large changes to the code.
  
  Even quite late in stable kernel release cycles there's occasionally a shocker - anyone remember 2.0.33?
  
  If you don't like those kernels, just stick with 2.4 until a distribution ships with 2.6.8 or so. For what it's worth 2.6.(1+) has been fine for me.
  
  Nobody's lying to you - there has to be some cut-off where a kernel series is declared stable, and by and large I think Linus judges it pretty well.
  
  --
  "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
  - JRR Tolkien.
5. Re:Kernel quality by b17bmbr · 2004-02-19 04:00 · Score: 4, Funny
  
  because waiting for a final release to be stable, secure, and thoroughly tested has worked for microsoft, and we wouldn't want to do things their way.
  
  --
  My problem? I was perfectly gruntled, until some numbnuts came by and dissed me.
6. Re:Kernel quality by pohl · 2004-02-19 04:38 · Score: 4, Insightful
  
  For what it's worth, 2.6.0 and the subsequent minor revisions have worked flawlessly for me and many associates. You seem quick to assume that they named it a "stable release" prematurely. Have you considered the altertnate hypothesis that you are in the minority of users who have encountered problems? If so, that's not a dishonorable place to be...somebody has to be the poor soul who encounters a bugs. (Thank you, by the way, I appreciate your hardship.)
  
  --
  The "cue the foo posts in 3, 2, 1..." posts will commence with no subsequent foo posts in 3, 2, 1...
Post: -1, Redundant by HoldmyCauls · 2004-02-19 03:06 · Score: 5, Informative

"2.6.3 and 2.4.25 are out, fixing another vulnerability in do_munmap()."

The announcement for 2.6.3 and 2.4.25 was yesterday, and the vulnerability to which the link in the text above refers was with mremap, not munmap; there's also another vulnerability with mremap mentioned yesterday as an *update* to the kernel announcement.

--
Emacs: for people who just never know when to :q!
SCO by rauhest · 2004-02-19 03:08 · Score: 5, Funny

Without RTFA (of course), I tried to find any reference to "sco".

The only match was "a misconfigured system". :)
ACPI cure for 2.4.25 HOW-TO by Quietti · 2004-02-19 03:11 · Score: 5, Informative

Was here in yesterday's thread about 2.4.25 and 2.6.3 releases.

--
Software is not supposed to be about how to work around a useability issue. - Ken Barber
do_mremap by suckamc_0x90 · 2004-02-19 03:36 · Score: 4, Funny

fixing another vulnerability in do_mremap() ah, good old Mr. Emap.
2.2.0? by autechre · 2004-02-19 03:42 · Score: 4, Interesting

2.2.0 had a bug where the system would instantly reboot when any user ran "ldd". I wouldn't call that "ready for prime time" :)

(I remember this because we were waiting for 2.2.x to come out, having just gotten a dual P-II 350 server [2.0.x didn't have SMP support]. Fortunately, we managed to hold off for the first few revisions.)

It's not as if this problem is unique to the Linux kernel. "Never use a Red Hat .0 release" is pretty sage advice, and of course we know Microsoft's track record. You're not going to be able to catch all of the bugs before something gets truly widespread testing, no matter what you call it or how long you work on it.

--
WMBC freeform/independent online radio.
Framebuffer? by bruthasj · 2004-02-19 03:44 · Score: 4, Funny

I think they forgot to test the framebuffer in 2.6.x kernels. If I can't see Tux, then I ain't booting it! (radeon)
Bitkeeper by jurgen · 2004-02-19 03:45 · Score: 5, Interesting

I sent this to the author of the article...
[the author] wrote:
The lack of formal revision control and source code management led many to suggest the use of a product called BitKeeper.
I grant that sometimes you have to simplify history to avoid digressing in an article, but this is a bit too inaccurate to let stand.
Bitkeeper wasn't suggested by anyone; it didn't have to be. It was developed from the ground up to Linus' requirements. Larry McVoy had a discussion about source control with Linus years ago, in which Linus said "none of the products are good enough" and Larry said, "ok, I'm going to write one that is". Apparently he had this on his mind anyway, and so he started Bitmover Co. As bitkeeper became a usable product Larry continued to take Linus feedback and improve it until it was good enough for Linus to use... at which point Linus started using it.
This is still a simplification of course, but it's closer... and as you can see, there were no third party suggestions involved.
1. Re:Bitkeeper by pangloss · 2004-02-19 05:02 · Score: 4, Informative
  
  Some more on that history: Larry McVoy on Bitkeeper, kernel development, Linux Torvalds & Bruce Perens
Interesting read by Derkec · 2004-02-19 03:55 · Score: 4, Insightful

It's still amazing to me that a project as large as Linux was able to be so successfull BEFORE the changes that were made to the development process. It lacked a centralized CVS, coherent bug tracking, automated testing... These are all things I use in the smallest of professional projets. Many eyes goes a long way towards compensating for having many hands in a big project, but some structure seems like it's helped.
But why is it in so few distros? by Bob+Bitchen · 2004-02-19 03:57 · Score: 4, Interesting

That's what bothers me. How long will the distros wait until they use the 2.6 kernel? I hear the scheduler is improved amongst many other things. So what's the hold up? Is it just that there's no one willing to be the guinea pig?

--
http://tinyurl.com/3t236
Beautiful by maximilln · 2004-02-19 04:30 · Score: 5, Insightful

I don't like to start new threads but I didn't see this: A general "Thank you for your time, effort, and a job well done" to all of the kernel hackers out there. They're fixing kernel level bugs that are almost at the hardware level while M$ is still patching their web browser. I don't think there's any doubt which system is ultimately more secure.

Can anyone take a guess how many low-level memory exploits are in Windows XP, 2k, or others? Perhaps it's irrelevant. Who needs to crack the low mem when there are so many ways into the system at the document level?

--
+++ATHZ 99:5:80
Re:BitKeeper? by raxx7 · 2004-02-19 06:47 · Score: 4, Informative

First, you can contribute to Linux and Subversion. You just have to:
a) not use BitKeeper
b) buy a BitKeeper licence

Second, RCS doesn't support concurrent development. That's why we have CVS.

Third, why BitKeeper?
Though CVS has lots of shortcommings (and thats why Subversion exists) and Subversion (SVN) is still labeled "alpha" by it's developers (though in practice it's stable enough to be self hosted and widely used), the real reason has to do with the basic model of CVS and SVN. Two main issues, in my opinion:
a) In CVS/SVN you need write access to the central repository or you can't make proper use of versioning control. Giving write access is a problem for Linux's contribute based development model. BitKeeper doesn't need it.
b) CVS/SVN know about branches but they don't know about merges from one branch into the other. Their view of the repository is a pure spanning tree. Subversion has a "merge" command, but a merge is commited as any other change into the repositoty. BitKeeper knows about previous merges and where they were merged from and uses that information to be smarter at resolving conflicts when you do a merge.
In contribute based development every change to the project has to go through one of few maintainers who can write into the main repository (in Linux's case, there's only one), so proper merging support becomes very important. At some point before BitKeeper, Linus was having trouble keeping up with all the patches people were sending him and people were getting angry with that.

If you don't believe me, you can check the GNU Arch website: http://wiki.gnuarch.org/
They're developing a Free versioning control system very similar to BitKeeper.