2.6 Linux Kernel in Need of an Overhaul?
toadlife writes "ZDNet UK reports that Andrew Morton, the head maintainer of the Linux production kernel, is concerned about the amount of bugs in the 2.6 kernel. He is considering the possibility of dedicating an entire release cycle to fixing long standing bugs." From the article: "One problem is that few developers are motivated to work on bugs, according to Morton. This is particularly a problem for bugs that affect old computers or peripherals, as kernel developers working for corporations don't tend to care about out-of-date hardware, he said. Nowadays, many kernel developers are employed by IT companies, such as hardware manufacturers, which can cause problems as they can mainly be motivated by self-interest."
A lot of times, the old debate of Windows Vs Linux covers how often the OS fails miserably. Yes, we all know the famous "blue screen of death" and I think that that single concept connected with Windows makes it unappealing. I believe that Linux has the ability to handle internal errors more elegantly but that's only because I've only seen it fail from hardware errors. Granted, I don't know enough about the inner workings of Windows or Linux but let's face it, Win95 & Win98 first editions would crash if you looked at them wrong.
Here's a possible horror story:
While the debate rages on, Linux gets more complex. Linux gains more bugs. Linux begins to aim for more end-user features. Developers get sick of maintaining other developers code and focus on making new features (asked for or un-asked for) because it gives them pride to make something new. The Linux kernel hits the same pitfalls as the Windows kernel.
If it takes an entire developement cycle to simply improve the current version's bugs, I'd gladly accept and encourage that.
My work here is dung.
One problem is that few developers are motivated to work on bugs
Yeah, this is one for the "no shit sherlock" column. What did you expect to happen when you eliminated the stable/unstable cycle? At a minimum the individual parts of the kernel would achieve stability at different times so that the kernel as a whole was never stable.
This frustrates me immensely at work. I hung on to 2.4 as long as I could. Hardware compatibility pushed me to 2.6 and it just isn't as reliable.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
This may look like flamebait, but I'm actually serious. Microkernels are more reliable because of drivers running on userspace. If a driver crashes, it can't take down the whole system. Also, given that some microkernels are only about 3500-6000 lines of code (as opposed to Linux's million or so) it's relatively easy to make certain that the code is bug free (given that the average number of bugs is 16 bugs per 1000 lines of code according to some recent studies).
So, if the kernel needs an overhaul, the why not do it right this time? Now some may say that microkernels have a performance hit, but todays machines are certainly fast enough to render any performance hit negligible.
GJC
Gregory Casamento
## Chief Maintainer for GNUstep
I think at some point you need to draw the line regarding support for older hardware and peripherals. I mean, excessive backwards compatability has retarded advancement of the industry IMHO.
End of Line.
Agreed. I have been forced to upgrade to 2.6 on a few computers for features that I needed that are only in the 2.6 series, but everytime it has been a problem. All of our production machines are still built with 2.4 and we purposely use hardware that is supported by the 2.4 series.
Linux has caused Microsoft to improve their products, and I have found myself removing Linux servers to replace them with Windows 2003 Server of late. On the desktop, it is not even close. I sit next to a guy who runs 2.6 on his Ubuntu machine and I laugh everytime he has to reboot. My Windows XP box only goes down rarely for updates and it does it at night when I am not there. Last time, I had over 100 days of uptime (this is a desktop machine). I rarely ever see the BSOD anymore and if I do it is almost always caused by a hardware problem. That is what I *USED* to be able to count on with Linux - if it crashed, there was a hardware issue. Now, with 2.6, I've lost that.
There are coworkers of mine who would have fainted three years ago if they heard me say something like this, but Linux just isn't the lean, reliable operating system it used to be.
--If you don't test it, it won't work. Guaranteed.
So, there are two relevant aspects to it. Probably more.
The 2.6 Kernel has been plagued by bad bugs. On the other hand, one way or another you need it for a multimedia-enabled desktop on more modern hardware (compared to 2.4). From that point of view, the proposal is fantastic. Otherwise we see the quality of the kernel of our beloved OS going down.
2.6 has never seen a phase of consolidation, really. Therefore, the proposal is almost overdue.
It would be badly short-sighted to think of quick ROI (as the IT companies usually aspire), since the troubles only multiply with further advances.
Yes, please, Andrew, get stability back into 2.6 - Though I have no single word of say in this, I thrust up both hands in favour !
Maybe there are some thumb-screws needed for the contributors: As long as the bug level stands above a certain threshold, no enhancements will be accepted.
There is also a political aspect to it: we have always argued about re-use of legacy hardware. This becomes even more important with Vista on the horizon. The kernel must not lose the 'caring' attitude. It must be trustworthy and trusted by the general public to care for more than greedy hardware manufacturers and their sick quest to replace functional hardware with most recent hardware.
Man, it's crazy but we have this thing where I work. Uh, what do you call those things again?
... I think they're called 'leaders.'
They are very good at convincing people to do things regardless of what they get out of it
If Andrew Morton doesn't have leadership skills, I suggest he step down and let another manager step up.
If I were in his position, I'd get everyone who's even mildly important in a room (or, failing that, an e-mail) and:
"Guys, remember back to the reason you first joined in the contribution to develop a free operating system. Now, think of all the hard work you've put into it and other people have put into it. Now, that's all in jeopardy and here's why..."
Spend some time reasoning with them and pointing out the bugs that are really really hurting the kernel. In the end, wrap up with:
"Look, I know this sucks and you're going to have to tangle with a lot of bugs that aren't even your own. But what have got if we haven't got a stable operating system? We've got another Windows, that's what. You just don't have to pay for our piece of malware. Just see this one development cycle through, I promise we'll make it as quick and painless as possible and after all is said and done, we'll have another meeting like this were anyone can suggest any crazy-ass feature they want to add. Once we pick out what we want, we'll spend the next development cycle letting our imaginations run wild. We'll make a kernel so unstable that the user'll have to re-flash their BIOS when it crashes! Then maybe we'll work on solidifying that. Right now, we just owe it to ourselves and our fans to give them something that's 100% stable and reliable."
If you can't reason with them like that, maybe you just have to accept they can't be persuaded and let them do what they want but prune their work if it detracts from your goal end system.
My work here is dung.
Nice house. Did you build it yourself?
Of course it's more rewarding to create a new feature. First of all, no coder enjoys working on foreign code. It just doesn't "look right", doesn't "feel right", simply because everyone has his own style.
... zip.
And don't forget bragging rights. Hey, I invented some feature. Sure, some guy debugged it, but I get to slap the label to it. I might even name it after me (Hello Mr. Reiser, if you should read this...). The guy who debugs it gets
This has to change first if we want people to put in time to hack through other people's code. Appreciate the work done to get it fixed. After all, appreciation, bragging rights and "making a name" is everything you get from writing free software.
Few people do it out of generosity or because it "feels good". They want to be known. Linus might not have gotten much out of writing that Kernel, but he sure as hell has a killer paying job now. I doubt the people who wrote the original implementation of iptables/ipchains are worse off. But the debuggers? Lot of work, no name.
Pull the debuggers in front of the curtain, and you'll see people debug. If we only appreciate the people who wrote a feature in the first place, even if that feature doesn't work 100%, we won't see people debug.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
The painful truth is that very few developers, in open source or otherwise, like fixing old code or old bugs. This is very true if the bug fix isn't going to be noticed by a great number of people. Face it, most of us like to write new code or improve on something that isn't working the way we want it even if it is working right.
This is what separates professional developers from the rest. We work on it regardless of how much it benefits us. We might gripe a bit but in the end we do what is asked. Sure that backend has flaws and is going to be replaced down the road but it does not excuse us from making it work now.
When you go look at some of the bugs listed in even current applications you start to see the age some have accrued. Some are rightly passed over as 1 in a million occurences but too many are skipped because it just doesn't have any allure. Note, I am not singling out people who work on Open Source, I am pointing out that the article fails to touch an area that exist but most don't want to acknowledge.
* Winners compare their achievements to their goals, losers compare theirs to that of others.
Yes, it's a good idea.
But don't waste time on bugs that only affect legacy hardware.
It would also be a good idea for some effort to be spent on consolidating, corrrecting, and updating the various lists of "Hardware supported by Linux". There are lots of such lists on the web, for example:
- not to mention the distro-specific compatible hardware lists maintained for Redhat, Mandriva,Suse, and others.
We need one correct, maintained list, not dozens of nearly-correct, usually out-of-date lists. And it seems to me that the list should depend only on the kernel version, not on the distro.
My experience is that stability is dropping, even on modern hardware. You can no longer take the latest '2.6' stable kernel and expect it to keep your server running stably.
Now, you can take a Redhat or SuSE packaged kernel and find those are pretty stable.
But there is a problem; if you report a bug in a Redhat/SuSE kernel on the lk.ml you get a
'that's Redhat/SuSE problem - speak to them'.
As the 2.6.x stable tree becomes less stable, less people use them on production servers and instead
use packaged kernels. As less people use them, they get tested less - and less bugs are actually reported for them.
It is also not just a case of old hardware; in the last few kernels I've had leaks that make
a simple firewall die repeatedly after a few months, I've got a machine with a slow RAM kernel leak
that makes a simple DHCP server fall over every few months, and I've had a 2.6.1x kernel that couldn't
run an NFS server for 24 hours without falling over.
It ain't nice - but these are my experiences.
Dave
I follow Ubuntu with the latest kernel updates and I tell you with every update performance increases.. .When I booted Windows I used to feel the difference, but not anymore. I think the quality of the kernel is fine. There other people that need to improve in quality, e.g. the rest of the free apps, esp packagers who have to make the thing to just work.. What will I do with stability if nothing works? Am I going to just look at the computer while its all stable doing nothing?
Some of the above posts say "I don't notice any problems". I'm guessing some of the bugs nobody has fixed are somewhat obscure. There is a well known bug when Linux mounts large XFS file systems via NFS that bothered me regularly - large directories could not be searched, deleted, etc. Now I have a Mac working with that flawlessly. These are the types of bugs - annoying, but non-fatal - that few people want to fix.
Here Here! Seti at home had a gazillon(tm) people contributing cycles to the effort (many times in teams) to see who could place highest on the list of contributors.
How about a BFoD - Best Fix Of the Day? Each day, post the name of the submitter and some details about the item debugged and fixed:
This could be further improved by posting a Bug Of the Day (BoD) where there is a daily bug that is to be fixed. The first fixer gets recognized as well as anyone who provides an especially elegant solution. Award bonus points for fixing related bugs in the area so as to promote more complete fixing in that area.
Post these prominently for all to see and I'd be willing to bet that there would be a groundswell of support.
This is just off the top of my head - please post any suggestions for enhancements or (gasp) any problems you see in it!
So, there are lots of bugs in Linux! Good thing I'm using Windows.
w00t
As an application developer, it really irks me that I have to release software that I *know* has bugs, choosing instead to complete whatever features were supposed to be in the release. As a consumer of applications, sometimes I wish that instead of adding all the new wizbang stuff, someone would devote an entire release to fixing *all* known bugs and improving performance. Maybe this will finally happen w/ the kernel.
Or you could use nooks. Nooks will protect the OS from driver crashes and restart failed drivers transparently.
C doesn't offer enough abstraction to deal with new levels of complexity. It is now far from the best language available for systems programming: bitc and prescheme are especially worth looking at if you haven't heard of them.
If you put resources into making the newest kernel compatible with old peripherals that resource could not be used for bugfixes and new features.
The new kernel probably will not bring anything new to the old hardware, either. So why don't just use the stable 2.4 kernel with security patches?
I see a lot of hand waving about how buggy 2.6 is but I do not see any references to bug databases or particular reproduceable bugs. How about some data?
So far 2.6 has been just as solid for me as previous kernel versions but I try really hard to avoid using bizarro hardware and drivers that probably do not get much testing, and rightly so.
I think we need to distinguish between bugs in the core kernel (code that everyone runs) and bugs in drivers. The vast majority of the Linux kernel code is drivers.
I've said this, here and elsewhere, over and over and over. Quality is something that has to be in software FROM THE START. It's not something you can retrofit.
As soon as the kernel dev team decided that Linus' kernel didn't need to be stable anymore, as soon as they started waving their hands in the air and expecting 'the distros' to magically fix their problems, OF COURSE quality took a dive. One of the kernel devs said that it was okay for only one out of three 'stable' kernels to actually be stable! Stability takes a long time... they now refuse to support a given kernel for more than a couple of months. The 2.4 kernel still has a few problems, and it's been around for, what, six years now? Supporting a given kernel release for only a couple of months is impossibly stupid from a stability perspective.
They're doing it this way because they're tired of doing the painful, annoying, tedious task of making sure the kernel always works. And the 2.6 kernel has, as a result, been a steaming pile of crap. Features don't matter if the fucking kernel doesn't stay up. No kernel since about 2.6.8 has worked in APIC mode on my ASUS KT333 board. 2.6.15 crashes my Intel 865 chipset servers randomly; they rarely stay up more than an hour or so. 2.6.14 broke traceroute. And with the constant stream of patches to their security fuckups, my system uptimes rarely exceed two weeks. Remember being proud of your kernel uptimes?
The social contract with Linux for many years was essentially: "The official kernel tree is as stable as we know how to make it. You can trust this code." And that is what got Linux as far as it has gotten... the fact that you could TRUST IT. It NEVER fell over. The 2.2 kernel was one of the finest pieces of software I've ever run. 2.4 took a huge dive in terms of stability, and was a total mess until Linus branched off to 2.5 and let the poor harried 2.4 maintainer, Marcelo Tosatti, take it over. He finally whipped it into shape. He has done an outstanding job.
What Linus et al need to do is GO PLAY IN THEIR SANDBOX IN 2.7. Let 2.6 fucking stabilize. They're shoving new features down our throats so fast that it's a part-time job just keeping up with the new stuff... and obviously NOBODY understands the security implications of moving this fast, or we wouldn't have so many goddamn security patches. We're gonna be having those security patches for YEARS because of this bullshit. The number of possible interactions in a system goes up exponentially with the number of features... so adding features should slow down over time, not speed up.
Go BACK TO THE OLD SYSTEM. People crying about 'too slow release schedules' is a HELL of a lot better than people crying about Linux being unstable. Linux *owned* the word stability for many years, and it's in very real danger of losing it, right at its height of popularity. The old system worked. It got Linux where it is today.
A simple 'bugfix release' won't do shit... it's the process that's broken. It'll fix some of today's bugs, but what about next week?
If I have old hardware that doesn't run 2.6, I can and do drop back to an older kernel. Hell, 2.0.40 came out in 2004. And note the size! That kernel boots as fast on my 133MHz machine as 2.6 does on my 1GHz frankenstein. New features on a new kernel mean nothing on hardware that can't use it. If you want to keep running a new kernel on old hardware, obviously you're going to suffer plenty of bloat, as evidenced in the Windows world. And speaking of that, if MS had kept their old version on the market. They could have slimmed down the new versions considerably. Of course, most of us know that older versions are MS biggest competition, so that's why the lockdown, all made possible by our gracious IP overlords. So be it. I don't need them anymore. Even Apple put up their old old versions for free. But it doesn't run on new hardware. And their new software doesn't run on old hardware. And furthermore, wouldn't it be easier to troubleshoot and fix bugs in the older, smaller kernels? My general rule is to use a kernel that is approximately 6 months to a year newer than the hardware it's running on. We shouldn't try to make a single kernel to run on all hardware. We have lots of them, one for each specific time period. This also applies to the distros. The older ones are still available for your old hardware.
FTA: Nowadays, many kernel developers are employed by IT companies, such as hardware manufacturers, which can cause problems as they can mainly be motivated by self-interest.
Am I supposed to be surprised by this? Even the most altruistic of us are generally motivated by self interest. We all want some kind of return for our efforts....even if it's a simple "Thank you".
What?