2.4, The Kernel of Pain

← Back to Stories (view on slashdot.org)

Posted by jamie on Wednesday January 16, 2002 @07:30PM from the my-destiny-to-be dept.

Joshua Drake has written an article for LinuxWorld.com called The Kernel of Pain. He seems to think 2.4 is fine for desktop systems but is only now, after a year of release, approaching stability for high-end use. Slashdot has had its own issues with 2.4, so I know where he's coming from. What have your experiences been? Is it still too soon for 2.4?

10 of 730 comments (clear)

Min score:

Reason:

Sort:

2.4 is hit and miss. by aussersterne · 2002-01-16 19:48 · Score: 5, Interesting

We're running the Red Hat 2.4.9-13 kernel on several SMP database servers and they have been perfect (not rebooted since 'rpm -U' of the new kernel) for several weeks. Before that, we were running 2.4.7-something from Red Hat and they were the same -- ran straight from the day we installed the kernel to the day we updated without needing to be restarted.

On my desktop machine, I've taken more risks (installed pretty much every official 2.4.x-linus release as they have come out) and some have been good, while others have been total dogs.

I'm running 2.4.17 right now. It seems okay; I've only had a freeze-up once over the last couple of weeks, though it was a total hard freeze (i.e. no ping, no magic SysRq, no nothing), which I haven't had in Linux for several years.

The obvious issue is VM; if you keep lots of memory (768M, or preferably 1.0G+) in your system, things to much more smoothly, though MP3 playback still skips a little.

Right now, I'd prefer some work on the RAID and IDE performance issues. One or two of the 2.4 series have had disk performance 100%+ better than the current 2.4 kernels. Why? I'd like to get the disk I/O back to reasonable levels.

--
STOP . AMERICA . NOW
Re:Au contraire by ElOttoGrande · 2002-01-16 19:49 · Score: 5, Interesting

it just doesn't not show me that it is rendering a window... and that's something that Gnome was doing even on 450 mhz machine.

The preemptive patches have made my system a lot more responsive under use. Most notably the mouse cursor doesn't slow down during heavy compiles and audio latency is good enough to play with some of the more interesting sound software projects out for linux.
But it really sounds like your problem isn't with linux but with XFree86. X has its share of problems but if you have a good video card that's supported well under it, you should get more than acceptible 2d drawing performance. I use a 3dfx voodoo3 here and its about as good as win2k running KDE (sometimes you can see it rendering when resizing or moving windows quickly but i like to think of it as a cool effect ;) and its way faster with lighter WM's like blackbox.
Worked for me. by roystgnr · 2002-01-16 20:03 · Score: 5, Interesting

There was a bad period where the Soundblaster Live driver (particularly mixer settings) was broken. That lasted through at least three kernel releases. There was a worse period where the VM had fits, and where performance degraded way too rapidly if the system had to swap. That lasted at least six kernel releases. There were at least one or two releases where I discovered that Alan Cox's (usually more bleeding edge) tree was being better behaved.

Of course, whenever I'm playing around with this stuff I don't delete my "last known good" kernel, so if after a couple hours or a couple days I noticed a problem, I just booted back to what worked. The default (albeit heavily patched) Red Hat kernels were good, so "last known good" always existed for me.

To summarize: this hasn't been a source of inconvenience for me, but it has been one of vicarious embarrassment. I've only been using Linux since 2.0.somehighnumber, but this is the worst mess I've seen the "stable" kernel tree go through in that time. Don't get me wrong, I've experienced system-crashing bugs (a tulip driver that freaked at some tulip chipset clones, some really bad OOM behavior a couple years ago) before, and pragmatically I guess that's worse... but those problems were always fixed fast enough that the patches predated my bug reports. Watching even the top kernel developers seem to flounder for months over bugs in a core part of the OS like the virtual memory system just sucked.
Unfortunately I have to agree by JeffL · 2002-01-16 20:11 · Score: 5, Interesting

I'll start by saying that I find 2.4 to be very stable, and to perform mostly ok on 1 and 2 way machines. My laptop, desktop, and 2-way server stay up until I decide to reboot them. Actually, I brought my 2-way server down for a disk upgrade today for the first time since early November when I installed 2.4.14.
Having said that, there are some serious issues with 2.4 on some 8-way 8GB machines that I manage. They have been running 2.4.13-ac7 since November, because that is the last kernel that is usable for me (-ac11 would probably be ok). Newer kernels have terrible behavior under the intense IO load these machines go through. They get 14-30 days of uptime, and then hang or get resource starved or something and have to be rebooted.
I think part of the issue is that there simply aren't that many people running 8-way boxes, so bugs aren't found as easy, this is of course on top of having 8-way SMP being much more complex than a defacto single user, single processor desktop machine. To make it even worse, the machines are pushed hard. They move around GBs of data every day, and often will run for extended periods with loads over 25.
Of course, it is still mostly ok. While the machines are working they mostly work fine. Of course 20 days of uptime is totally unacceptable. I have an alpha running Tru64 pushing 300 days of uptime, and the last time it was down was due to a drive failure, not an OS problem.
My only remaining issue with Linux on "small" machines is an oscillation problem in IO. Data will fill up all available memory before being written to disk, and then everything from memory will be written out, and then memory fills up again before anything new is written to disk. This is a bit inefficient, and the machine's responsiveness at the memory-full part of the cycle is poor.
What are my options though? I guess I could try FreeBSD, but a bit of lurking on their lists and forums reveals plenty of problems there, too. Do I switch and hope things get better, or wait out 2.4 and hope it comes around soon? Aside from a few nasty bugs in some releases, pretty much each successive 2.4 kernel has been better than the previous one, at least on small systems.
Several years ago I was having a hard lockup problem with Tru64 (Digital Unix, at the time) and that was very scary. It took time to get the problem escalated to the OS engineers, instead of just sending an e-mail to lkm. Even then I could only hope that the issue was being addressed, but I had no way to know if anybody was doing anything about it or not. (Turned out to be an bug in the NFS server that would cause the machine to lockup when serving to AIX.) For all of its problems though, it is extremely reassuring for me to be able to monitor the development process of Linux through the linux-kernel-mailing list, and other specialized lists. If I feel that people aren't aware of some problem I am experiencing, I can raise the issue. I am not in the dark about what is happening, and what fixes are being made. I know what changes have gone into each kernel update, so I know if there is a chance of it fixing my problems.
Let's call it a curiosity by Oestergaard · 2002-01-16 20:28 · Score: 4, Interesting

jakob@unthought ~> uptime
9:21am up 181 days, 13:25, 3 users, load average: 3.57, 3.33, 2.79

jakob@unthought ~> uname -a
Linux unthought.net 2.4.0-test4 #1 SMP Fri Jul 14 01:56:30 CEST 2000 i686 unknown

I suppose that ain't too bad. Other than that, with real 2.4 kernels, on UP and SMP systems, I've been fairly satisfied.

There was a RAID bug (RAID-1) in 2.4.9 or there about, which they forgot in the article. I think, except for the fs/raid corruption problems (which are horrible when they happen), that the 2.4 kernel has been a nice experience.

Think back for a moment: How would you like *not* to have iptables, reiser, proper software RAID, etc. etc. etc.

I think I would miss 2.4 if I went back, although the fs/raid corruption bugs made me "almost" do that.
We've been using it... by pellaeon · 2002-01-17 01:27 · Score: 4, Interesting

almost from the start (2.4.3 if I remember correctly). We patched XFS into the kernels and some other stuff we needed and the transition to 2.4 was relatively painless using a fresh install (for XFS).

We ran into some trouble with a number of Athlon systems but that was due to the 'Athlon bug' and was soon fixed. More worrisome was the performance of pre-2.4.9 kernels on the desktop: sometimes they slowed down to a crawl (and i'm talking about lightly loaded ~750MHz machines here).

We got over that with the -ac kernels however, and it's been a breeze ever since. We currently use 2.4.14 with XFS patched in (although we're ditching it in favor of ext3 now that it's been integrated and the RH installer supports it) and we're looking at 2.4.17 now.

Why use 2.4 on servers (as some have asked)? Well, iptables is a good reason, for one. Other security-related things count heavily too. And XFS seemed a good reason to do it at the time too. It can deliver very good performance.

Some stats:
zuse [1] > uname -a
Linux zuse 2.4.14-xfs_MI10 #1 Tue Nov 6 17:34:04 MET 2001 i686 unknown
zuse [2] > uptime
2:25pm up 61 days, 21:21, 1 user, load average: 1.07, 1.02, 0.93

--
-- /bin/coffee missing. universe halted.
Re:Why Linux? by Ded+Bob · 2002-01-17 02:09 · Score: 5, Interesting

I know BSD can run many Linux binaries, but what about kernel modules?

At least for nVidia, it is being worked on: FreeBSD NVIDIA Driver Initiative
STABLE vs STABLE by ajs · 2002-01-17 02:27 · Score: 5, Interesting

I'm beginning to feel like a broken record, and maybe Linus should just change the terminology so that people stop making the same assumption over and over, but people: wake up and smell the bogomips!

"Stable", in the context of a kernel release refers to the interfaces. When Linus releases 2.&ltleven>.0, he is saying that this kernel is one that has reached some arbitrary plateu of development stability, and it's now ready for others to begin actuall release engineering on.

You have to understand that the Linux Kernel is released by Linus in a state that is very reasonable for a development team, but that will never be "production quality". Debian puts a lot of realease engineering work into a Kernel. As does Red Hat. As does SuSe, etc, etc.

If you just grab 2.4.x and install it, you're acting as Linux Q/A, and I applaud your effort, but when it breaks in your environment, you should not be stunned.

Once again, production release != stable release. A stable release is just one the developers are happy with (and I've yet to see a 2.4 kernel that I can say developers should not be happy with).

So, maybe next time, 2.6.0 should be called the "post-development" release so that people don't go off half-cocked installing it on production systems.
FreeBSD VM, sync(3) OK for 10 YEARS. by aphor · 2002-01-17 02:53 · Score: 4, Interesting

If the code is unreadable for most advanced users (people with enough C background to follow execution flow in readable code), then you really can't get the benefit of having a large community debugging the software.

What you have here is a LARGE body of (l)users, and a small cadre of kernel hackers who are separated and out of communication. The three times I've found a problem in FreeBSD-STABLE (to be honest, I think one was the 2.2.6, and two were 3.x branches), I sent my bug report in with instructions on how to repeat the problem, and a patch for the ugly hack I made to make the problem go away. I never beat the committers to the bug. Now, before I do all the work, I watch the WebCVS for commits for a couple of days and email the committer who touched the effective files last with a quick question "hey, I see this problem.. have you?"

I tried to read some Linux (the kernel) code a long while ago. There were some funny comments, error messages, and variable identifiers, but otherwise it gave me a headache. I just felt being a Linux participant was beyond my tolerance. I browsed the FreeBSD source code though, and even in the midst of reorganization, both organising schemes were apparent, well documented, and there is a clean style to all FreeBSD code I've seen that makes for (relatively) easy reading.

I guess that the difference is culture, but in this case it seems to be a serious problem. I have to wonder why the Linux kernel people haven't broken the Linux VM code out for modularity and borrowed the FreeBSD VM as an option? I mean: FreeBSD is free-as-in-beer and also free-as-in-software. I'm not volunteering, but after all these years wouldn't it make sense to hack out some Linux wrappers for the FreeBSD VM system? I think I remember reading a Matthew Dillon interview where he talks about all the good stuff he's done to FreeBSD VM recently...

--
--- Nothing clever here: move along now...
He Almost Had Me by bwt · 2002-01-17 03:32 · Score: 5, Interesting

I almost took this guy seriously until this part:

The kernel seemed to show more stability. Then we hit kernel 2.4.15.

Linux version 2.4.15 contained a bug that was arguably worse than the VM bug. Essentially, if you unmounted a file system via reboot -- or any another common method -- you would get filesystem corruption. A fix, called kernel 2.4.16, was released 24 hours later.

Look, anybody who is deploying a kernel on the day it is released on a production server deserves what they get. One day turnaround on a bug fix is phenomenal. Even if these are marked as "stable" kernels, trying to track the new versions in real time is a dumb thing to do.

This guy has written a moan and groan article based on a small set of bugs, some of which he only could have experienced if he is experimenting on his production system. He obviously requires extreme stability and says he needs this over the new 2.4 features (SMP, 2G memory, 2G files), which makes me ask: why was he putting new kernels on his production system before emperical evidence was there about high stability?

Open source will fix bugs faster the proprietary. It doesn't change reality to make bugs impossible. This is true even in "stable" releases, especially if you are talking about highly stressful production environments.