Slashdot Mirror


2.6 Linux Kernel in Need of an Overhaul?

toadlife writes "ZDNet UK reports that Andrew Morton, the head maintainer of the Linux production kernel, is concerned about the amount of bugs in the 2.6 kernel. He is considering the possibility of dedicating an entire release cycle to fixing long standing bugs." From the article: "One problem is that few developers are motivated to work on bugs, according to Morton. This is particularly a problem for bugs that affect old computers or peripherals, as kernel developers working for corporations don't tend to care about out-of-date hardware, he said. Nowadays, many kernel developers are employed by IT companies, such as hardware manufacturers, which can cause problems as they can mainly be motivated by self-interest."

30 of 512 comments (clear)

  1. Important for the Old Debate by eldavojohn · · Score: 5, Insightful

    A lot of times, the old debate of Windows Vs Linux covers how often the OS fails miserably. Yes, we all know the famous "blue screen of death" and I think that that single concept connected with Windows makes it unappealing. I believe that Linux has the ability to handle internal errors more elegantly but that's only because I've only seen it fail from hardware errors. Granted, I don't know enough about the inner workings of Windows or Linux but let's face it, Win95 & Win98 first editions would crash if you looked at them wrong.

    Here's a possible horror story:

    While the debate rages on, Linux gets more complex. Linux gains more bugs. Linux begins to aim for more end-user features. Developers get sick of maintaining other developers code and focus on making new features (asked for or un-asked for) because it gives them pride to make something new. The Linux kernel hits the same pitfalls as the Windows kernel.

    If it takes an entire developement cycle to simply improve the current version's bugs, I'd gladly accept and encourage that.

    --
    My work here is dung.
    1. Re:Important for the Old Debate by MOtisBeard · · Score: 5, Insightful
      "If it takes an entire developement cycle to simply improve the current version's bugs, I'd gladly accept and encourage that."

      Hear, hear. The real pitfall for any technical production process, from software to space shuttles, is the ascendancy of a businesslike concern with the product's image to the point that it begins to dictate release deadlines. It's all well and good to worry about image, but when that worry becomes such a focus that it dictates the way that technical work gets handled, suddenly your product or process has become an example of form over function... and unless your product is tuxedoes for corpses or something similar, SCREW form over function!

    2. Re:Important for the Old Debate by gmack · · Score: 4, Interesting

      As someone who is resbposible for many of those bug reports I can tell you it's not the fetures that break things. It's things like driver API cleanups that don't get all of the older drivers.

      The result is that if you have reasonably common hardware the kernel is getting much more stable but for things like my non PCI sparc(compile problem with some options) or my 21 ethernet port firewall (needs special options to boot or it crashes) it has gotten more buggy.

      I'm not sure a freeze will do much to fix it as a large part of the problem is that all these somewhat rare things need testing.

      I still find these things get fixed rather quickly when I report them even without the freeze.

    3. Re:Important for the Old Debate by Rosco+P.+Coltrane · · Score: 5, Interesting

      Yes, we all know the famous "blue screen of death" and I think that that single concept connected with Windows makes it unappealing. [...] Win95 & Win98 first editions would crash if you looked at them wrong.

      Er.. I hate Windows as much as the next guy, but really, when was the last time you saw Windows bluescreen? Perhaps you could make your point by comparing Windows and Linux versions that aren't 11 years apart.

      I believe that Linux has the ability to handle internal errors more elegantly but that's only because I've only seen it fail from hardware errors.

      Yes but it handles hardware errors gracefully too: for example, one of my 24/7 machines's hard-disk died last week. I came back and found out that I couldn't write anything to it at. A quick look at the console showed a message saying "root filesystem, too many errors, remounting read-only" or something like that. The result is that data corruption was minimal *AND* the machine didn't hang. How's that for graceful? You wouldn't dream of having that in Windows.

      --
      "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    4. Re:Important for the Old Debate by TheRaven64 · · Score: 5, Informative
      Linux got me using *NIX. BSD showed me how *NIX is meant to work. I currently use OpenBSD and FreeBSD, and this is exactly the kind of reason why I switched.

      In FreeBSD, there are three branches, -STABLE, -CURRENT, and -RELEASE. Any new features are put into -CURRENT. Here, they undergo testing. The only people who should be running -CURRENT are those who are developing or actively bug-hunting. Once a feature is stabilised, it migrates into -STABLE. Here, it receives more general testing. A lot of people use -STABLE, and file bug reports. Finally, a -RELEASE branch is created from -STABLE. This undergoes even more testing and is then shipped (usually after several betas and RCs). The -RELEASE branch is maintained in the tree, but only bug fixes are allowed to go in it. If you want a stable system, you stick with a -RELEASE branch. For a slightly less-critical system, you might want -STABLE for the features (my ThinkPad runs -STABLE, and I have never yet had it crash).

      The direction of the OS development is driven by the core team. These are elected annually by the developers.

      In the OpenBSD world, there is a code review process. Every piece of code in the base system is audited on a regular basis. When a new category of bug is discovered (e.g. the multiply overflow that caused a security hole in OpenSSH), the entire source tree is searched for occurrences of that bug. These are then fixed.

      Both of these development processes give high-quality, stable systems.

      --
      I am TheRaven on Soylent News
    5. Re:Important for the Old Debate by trewornan · · Score: 4, Funny

      Hurd is currently due for release shortly after Duke Nukem Forever.

    6. Re:Important for the Old Debate by ozmanjusri · · Score: 5, Informative

      Failing that, you could just install the Debian distro of it from here: http://www.debian.org/ports/hurd/hurd-cd

      --
      "I've got more toys than Teruhisa Kitahara."
    7. Re:Important for the Old Debate by Tim+C · · Score: 4, Informative

      In Windows there are NO logs, no clues, NOTHING to indicate what the problem might be.

      When Windows crashes, it writes an error log and a memory dump to the disk. Under XP, check the Startup and Recovery settings (My Computer -> Properties -> Advanced -> Startup and Recovery settings)

      Also, Windows logs a lot of information to the event logs. The event log viewer is in the Administrative Tools. By default, when Windows crashes, it logs information about the crash there (in the System log).

      No offence, but given that you appear not to know about the crash dump or event logs suggests that either you don't know enough to correctly diagnose the problem, or you're running one of the 9x series of Windows, in which case frankly your only sensbile option is to throw it away and install something based on NT; 9x is a joke.

    8. Re:Important for the Old Debate by *SECADM · · Score: 4, Informative

      Some completely offtopicness:

      XP by default will reboot when it encounters a bugcheck (or BSOD as people call it). However typically a mini dump or a full memory dump is created in your %windows% directory. This is pretty much like a core file on unix, with the stack information necessary for debugging.

      So, if you ever feel brave enough to do some windoze hacking, you can grab this file by booting into a safeOS on the system or a linux live CD, and then head to http://www.microsoft.com/whdc/devtools/debugging/i nstallx86.mspx to grab the MS debugger. Once inside the debugger, usually only 1 or 2 steps are needed to figure out who the culprit is that caused the bugcheck. (by seeing what's on the stack at the crash for example). The other thing you can do, if you can boot into safemode, is to open up the event log viewer and typically there are messages that explain who caused the last bugcheck.

      --
      sure I'll have a sig.
  2. Duh Factor by Spazmania · · Score: 5, Insightful

    One problem is that few developers are motivated to work on bugs

    Yeah, this is one for the "no shit sherlock" column. What did you expect to happen when you eliminated the stable/unstable cycle? At a minimum the individual parts of the kernel would achieve stability at different times so that the kernel as a whole was never stable.

    This frustrates me immensely at work. I hung on to 2.4 as long as I could. Hardware compatibility pushed me to 2.6 and it just isn't as reliable.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  3. Rewrite it as a microkernel!! by borgheron · · Score: 5, Interesting

    This may look like flamebait, but I'm actually serious. Microkernels are more reliable because of drivers running on userspace. If a driver crashes, it can't take down the whole system. Also, given that some microkernels are only about 3500-6000 lines of code (as opposed to Linux's million or so) it's relatively easy to make certain that the code is bug free (given that the average number of bugs is 16 bugs per 1000 lines of code according to some recent studies).

    So, if the kernel needs an overhaul, the why not do it right this time? Now some may say that microkernels have a performance hit, but todays machines are certainly fast enough to render any performance hit negligible.

    GJC

    --
    Gregory Casamento
    ## Chief Maintainer for GNUstep
  4. Re:About time by Blue+Booger · · Score: 4, Interesting

    Agreed. I have been forced to upgrade to 2.6 on a few computers for features that I needed that are only in the 2.6 series, but everytime it has been a problem. All of our production machines are still built with 2.4 and we purposely use hardware that is supported by the 2.4 series.

    Linux has caused Microsoft to improve their products, and I have found myself removing Linux servers to replace them with Windows 2003 Server of late. On the desktop, it is not even close. I sit next to a guy who runs 2.6 on his Ubuntu machine and I laugh everytime he has to reboot. My Windows XP box only goes down rarely for updates and it does it at night when I am not there. Last time, I had over 100 days of uptime (this is a desktop machine). I rarely ever see the BSOD anymore and if I do it is almost always caused by a hardware problem. That is what I *USED* to be able to count on with Linux - if it crashed, there was a hardware issue. Now, with 2.6, I've lost that.

    There are coworkers of mine who would have fainted three years ago if they heard me say something like this, but Linux just isn't the lean, reliable operating system it used to be.

    --
    --If you don't test it, it won't work. Guaranteed.
  5. Fantastic and Overdue by udippel · · Score: 5, Insightful

    So, there are two relevant aspects to it. Probably more.
    The 2.6 Kernel has been plagued by bad bugs. On the other hand, one way or another you need it for a multimedia-enabled desktop on more modern hardware (compared to 2.4). From that point of view, the proposal is fantastic. Otherwise we see the quality of the kernel of our beloved OS going down.
    2.6 has never seen a phase of consolidation, really. Therefore, the proposal is almost overdue.

    It would be badly short-sighted to think of quick ROI (as the IT companies usually aspire), since the troubles only multiply with further advances.

    Yes, please, Andrew, get stability back into 2.6 - Though I have no single word of say in this, I thrust up both hands in favour !

    Maybe there are some thumb-screws needed for the contributors: As long as the bug level stands above a certain threshold, no enhancements will be accepted.

    There is also a political aspect to it: we have always argued about re-use of legacy hardware. This becomes even more important with Vista on the horizon. The kernel must not lose the 'caring' attitude. It must be trustworthy and trusted by the general public to care for more than greedy hardware manufacturers and their sick quest to replace functional hardware with most recent hardware.

  6. The Ability to Lead? by eldavojohn · · Score: 5, Insightful

    Man, it's crazy but we have this thing where I work. Uh, what do you call those things again?

    They are very good at convincing people to do things regardless of what they get out of it ... I think they're called 'leaders.'

    If Andrew Morton doesn't have leadership skills, I suggest he step down and let another manager step up.

    If I were in his position, I'd get everyone who's even mildly important in a room (or, failing that, an e-mail) and:

    "Guys, remember back to the reason you first joined in the contribution to develop a free operating system. Now, think of all the hard work you've put into it and other people have put into it. Now, that's all in jeopardy and here's why..."

    Spend some time reasoning with them and pointing out the bugs that are really really hurting the kernel. In the end, wrap up with:

    "Look, I know this sucks and you're going to have to tangle with a lot of bugs that aren't even your own. But what have got if we haven't got a stable operating system? We've got another Windows, that's what. You just don't have to pay for our piece of malware. Just see this one development cycle through, I promise we'll make it as quick and painless as possible and after all is said and done, we'll have another meeting like this were anyone can suggest any crazy-ass feature they want to add. Once we pick out what we want, we'll spend the next development cycle letting our imaginations run wild. We'll make a kernel so unstable that the user'll have to re-flash their BIOS when it crashes! Then maybe we'll work on solidifying that. Right now, we just owe it to ourselves and our fans to give them something that's 100% stable and reliable."

    If you can't reason with them like that, maybe you just have to accept they can't be persuaded and let them do what they want but prune their work if it detracts from your goal end system.

    --
    My work here is dung.
    1. Re:The Ability to Lead? by chrismear · · Score: 5, Insightful

      Man, it's crazy but we have this thing where I work. Uh, what do you call those things again?

      Paychecks?

  7. so here we are ..... by nblender · · Score: 4, Insightful
    Linux got off the ground and started incorporating everything anyone contributed... grabbing features and drivers like there was no tomorrow. NetBSD was rejecting stuff because it wasn't written right. So it took ages for NetBSD to get audio until someone did it right; while everyone else went with OSS. Over and Over this happened. NetBSD was criticized for being useless because it didn't support all the stuff Linux/FreeBSD did.

    Nice house. Did you build it yourself?

  8. It takes a change of mindset to get it done by Opportunist · · Score: 5, Insightful

    Of course it's more rewarding to create a new feature. First of all, no coder enjoys working on foreign code. It just doesn't "look right", doesn't "feel right", simply because everyone has his own style.

    And don't forget bragging rights. Hey, I invented some feature. Sure, some guy debugged it, but I get to slap the label to it. I might even name it after me (Hello Mr. Reiser, if you should read this...). The guy who debugs it gets ... zip.

    This has to change first if we want people to put in time to hack through other people's code. Appreciate the work done to get it fixed. After all, appreciation, bragging rights and "making a name" is everything you get from writing free software.

    Few people do it out of generosity or because it "feels good". They want to be known. Linus might not have gotten much out of writing that Kernel, but he sure as hell has a killer paying job now. I doubt the people who wrote the original implementation of iptables/ipchains are worse off. But the debuggers? Lot of work, no name.

    Pull the debuggers in front of the curtain, and you'll see people debug. If we only appreciate the people who wrote a feature in the first place, even if that feature doesn't work 100%, we won't see people debug.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    1. Re:It takes a change of mindset to get it done by carlislematthew · · Score: 4, Funny
      a cool function that works can be better than sex

      Promise me you'll never EVER say that out loud. Please.

  9. Blaming corporate developers is a dodge by Shivetya · · Score: 5, Insightful

    The painful truth is that very few developers, in open source or otherwise, like fixing old code or old bugs. This is very true if the bug fix isn't going to be noticed by a great number of people. Face it, most of us like to write new code or improve on something that isn't working the way we want it even if it is working right.

    This is what separates professional developers from the rest. We work on it regardless of how much it benefits us. We might gripe a bit but in the end we do what is asked. Sure that backend has flaws and is going to be replaced down the road but it does not excuse us from making it work now.

    When you go look at some of the bugs listed in even current applications you start to see the age some have accrued. Some are rightly passed over as 1 in a million occurences but too many are skipped because it just doesn't have any allure. Note, I am not singling out people who work on Open Source, I am pointing out that the article fails to touch an area that exist but most don't want to acknowledge.

    --
    * Winners compare their achievements to their goals, losers compare theirs to that of others.
  10. Yes, fix the bugs, BUT ... by njdj · · Score: 5, Insightful
    entire release cycle to fixing long standing bugs

    Yes, it's a good idea.

    But don't waste time on bugs that only affect legacy hardware.

    It would also be a good idea for some effort to be spent on consolidating, corrrecting, and updating the various lists of "Hardware supported by Linux". There are lots of such lists on the web, for example:

    - not to mention the distro-specific compatible hardware lists maintained for Redhat, Mandriva,Suse, and others.

    We need one correct, maintained list, not dozens of nearly-correct, usually out-of-date lists. And it seems to me that the list should depend only on the kernel version, not on the distro.

  11. 2.6 'stable' no longer stable by prestwich · · Score: 4, Insightful

    My experience is that stability is dropping, even on modern hardware. You can no longer take the latest '2.6' stable kernel and expect it to keep your server running stably.

    Now, you can take a Redhat or SuSE packaged kernel and find those are pretty stable.
    But there is a problem; if you report a bug in a Redhat/SuSE kernel on the lk.ml you get a
    'that's Redhat/SuSE problem - speak to them'.

    As the 2.6.x stable tree becomes less stable, less people use them on production servers and instead
    use packaged kernels. As less people use them, they get tested less - and less bugs are actually reported for them.

    It is also not just a case of old hardware; in the last few kernels I've had leaks that make
    a simple firewall die repeatedly after a few months, I've got a machine with a slow RAM kernel leak
    that makes a simple DHCP server fall over every few months, and I've had a 2.6.1x kernel that couldn't
    run an NFS server for 24 hours without falling over.

    It ain't nice - but these are my experiences.

    Dave

  12. But it runs Faster!! by giorgosts · · Score: 5, Interesting

    I follow Ubuntu with the latest kernel updates and I tell you with every update performance increases.. .When I booted Windows I used to feel the difference, but not anymore. I think the quality of the kernel is fine. There other people that need to improve in quality, e.g. the rest of the free apps, esp packagers who have to make the thing to just work.. What will I do with stability if nothing works? Am I going to just look at the computer while its all stable doing nothing?

  13. Outstanding bugs does not always mean instability by vijayiyer · · Score: 4, Interesting

    Some of the above posts say "I don't notice any problems". I'm guessing some of the bugs nobody has fixed are somewhat obscure. There is a well known bug when Linux mounts large XFS file systems via NFS that bothered me regularly - large directories could not be searched, deleted, etc. Now I have a Mac working with that flawlessly. These are the types of bugs - annoying, but non-fatal - that few people want to fix.

  14. BFOD and Bragging Rights by martyb · · Score: 5, Insightful
    Pull the debuggers in front of the curtain, and you'll see people debug. If we only appreciate the people who wrote a feature in the first place, even if that feature doesn't work 100%, we won't see people debug.

    Here Here! Seti at home had a gazillon(tm) people contributing cycles to the effort (many times in teams) to see who could place highest on the list of contributors.

    How about a BFoD - Best Fix Of the Day? Each day, post the name of the submitter and some details about the item debugged and fixed:

    1. Name Recognition Not just to see your name in "lights", but also gain something you could add to your resume.
    2. New Code - preference to bug fixers Make a policy that you will give top priority to bug fixes... if you attach your new feature to a bug fix, it will get preferential treatment. Those without a bug fix fall to the bottom of the queue.
    3. Share / Educate Share debugging techniques and tools. Make it easier to fix bugs by sharing best practices with the community.
    4. Scratch an Itch It may not be fun, but if you develop new code, you also get to spend time debugging... learning from the preceding item will speed the development process and you'll be able to complete your Next New Thing(tm) even faster and better!
    5. Competition Have contests for the Best Fix of the Month (BFoM) and Best Fix Of the Year (BFoY). To be chosen from the winners of the BFoDs.

    This could be further improved by posting a Bug Of the Day (BoD) where there is a daily bug that is to be fixed. The first fixer gets recognized as well as anyone who provides an especially elegant solution. Award bonus points for fixing related bugs in the area so as to promote more complete fixing in that area.

    Post these prominently for all to see and I'd be willing to bet that there would be a groundswell of support.

    This is just off the top of my head - please post any suggestions for enhancements or (gasp) any problems you see in it!

    1. Re:BFOD and Bragging Rights by Opportunist · · Score: 5, Interesting

      Add a "highscore list" and it's already hitting home.

      No, don't mod me funny. I mean it. Make it a page every halfway important person in the OS-community wants to read, make it the place to go looking of you're headhunting for a person with fixing skills.

      Today, you rarely if ever get to start a new project. Most of the time, you're hired for a project that's been running for ages. And there, you don't need a coder who can pull fast algos out of his rear, you need people who can deal with alien code, understand it quickly and debug it. And there you'd have those people, listed. The top debuggers of the world.

      Just make sure HR gets to read it and they know their applicant list.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  15. Whew! by cciRRus · · Score: 5, Funny

    So, there are lots of bugs in Linux! Good thing I'm using Windows.

    --
    w00t
  16. No, just use nooks by meese · · Score: 4, Insightful

    Or you could use nooks. Nooks will protect the OS from driver crashes and restart failed drivers transparently.

  17. Old hardware users should use older kernels by Wolfier · · Score: 4, Insightful

    If you put resources into making the newest kernel compatible with old peripherals that resource could not be used for bugfixes and new features.

    The new kernel probably will not bring anything new to the old hardware, either.  So why don't just use the stable 2.4 kernel with security patches?

  18. Re:Complexity and Abstraction by TheRaven64 · · Score: 4, Interesting
    While I agree in general (C is not a good language if you are programming something that's not a PDP-11), it is possible to get a lot of abstraction using C. Compare, for example, Linux and Dragonfly BSD. In Linux, they use an explicit threading / locking model. The developer has to make sure that they acquire and release all of the locks they need, and in the right order. In DragonFly, they use a message-passing model. Once they have the message-passing semantics working once, they can keep re-applying it. It is then much easier to reason about the code (using tools such as CSP) and to prove that it is deadlock-free. Simple things, like non-branching sequences of processes that don't send messages backwards can be implemented in a way that guarantees that there are no corner cases that will break it. Consider the following operations involved with sending some data over a TCP link (a rather contrived example):
    1. Split the buffer into segments that can fit in a packet.
    2. Add TCP headers.
    3. Add IP headers.
    4. Add Ethernet headers
    5. Send to the hardware's output buffer
    On DragonFly, each of these steps could be performed by a separate thread, passing the processed buffer between them, and if it worked at all then you could guarantee that it was correct. On Linux, the amount of locking required would mean that a system this complex would be likely to create locking bugs. Looking through the Linux bug database, it seems that a significant number of bugs are currently lock-related.
    --
    I am TheRaven on Soylent News
  19. like I've been saying... by Malor · · Score: 4, Insightful

    I've said this, here and elsewhere, over and over and over. Quality is something that has to be in software FROM THE START. It's not something you can retrofit.

    As soon as the kernel dev team decided that Linus' kernel didn't need to be stable anymore, as soon as they started waving their hands in the air and expecting 'the distros' to magically fix their problems, OF COURSE quality took a dive. One of the kernel devs said that it was okay for only one out of three 'stable' kernels to actually be stable! Stability takes a long time... they now refuse to support a given kernel for more than a couple of months. The 2.4 kernel still has a few problems, and it's been around for, what, six years now? Supporting a given kernel release for only a couple of months is impossibly stupid from a stability perspective.

    They're doing it this way because they're tired of doing the painful, annoying, tedious task of making sure the kernel always works. And the 2.6 kernel has, as a result, been a steaming pile of crap. Features don't matter if the fucking kernel doesn't stay up. No kernel since about 2.6.8 has worked in APIC mode on my ASUS KT333 board. 2.6.15 crashes my Intel 865 chipset servers randomly; they rarely stay up more than an hour or so. 2.6.14 broke traceroute. And with the constant stream of patches to their security fuckups, my system uptimes rarely exceed two weeks. Remember being proud of your kernel uptimes?

    The social contract with Linux for many years was essentially: "The official kernel tree is as stable as we know how to make it. You can trust this code." And that is what got Linux as far as it has gotten... the fact that you could TRUST IT. It NEVER fell over. The 2.2 kernel was one of the finest pieces of software I've ever run. 2.4 took a huge dive in terms of stability, and was a total mess until Linus branched off to 2.5 and let the poor harried 2.4 maintainer, Marcelo Tosatti, take it over. He finally whipped it into shape. He has done an outstanding job.

    What Linus et al need to do is GO PLAY IN THEIR SANDBOX IN 2.7. Let 2.6 fucking stabilize. They're shoving new features down our throats so fast that it's a part-time job just keeping up with the new stuff... and obviously NOBODY understands the security implications of moving this fast, or we wouldn't have so many goddamn security patches. We're gonna be having those security patches for YEARS because of this bullshit. The number of possible interactions in a system goes up exponentially with the number of features... so adding features should slow down over time, not speed up.

    Go BACK TO THE OLD SYSTEM. People crying about 'too slow release schedules' is a HELL of a lot better than people crying about Linux being unstable. Linux *owned* the word stability for many years, and it's in very real danger of losing it, right at its height of popularity. The old system worked. It got Linux where it is today.

    A simple 'bugfix release' won't do shit... it's the process that's broken. It'll fix some of today's bugs, but what about next week?