2.4, The Kernel of Pain
Joshua Drake has written an article for LinuxWorld.com called
The Kernel of Pain.
He seems to think 2.4 is fine for desktop systems but is only now, after a year of release, approaching stability for high-end use. Slashdot has had its own issues with 2.4, so I know where he's coming from. What have your experiences been? Is it still too soon for 2.4?
And this guy appears to be talking about only x86 machines. My lab has had a horrible time with 2.4 on Alphas. In fact, we've moved back to 2.2.18 on some macines. (2.2.20 for Alpha didn't compile properly, and I didn't want to mess with it -- anyone know if which 2.2 kernel is best for number-crunching Alphas right now?). Oh, the pain. The lost time. "Kernel of Pain" is a fine description of our 2.4 experience on Alphas.
-Paul Komarek
What have your experiences been?
;-)
Well:
8:33pm up 45 days, 5:49,
Shameful I know, but I had to move city before that I had 6 months. Should had a UPS
This is pretty much a desktop/development box running postgres, JBoss, tomcat, apache, JBuilder and (occasionally) kylix. No problems so far, touch wood.
I also used to work at the comp-sci department of a university were we had 40 boxes in the linux lab, no real problems except they were running ext2 so only the occasional manual fsck. Now the maclab, that is another story (OS9 not OSX).
He who defends everything, defends nothing. -- Fredrick The Great
Oh yeah, and the machine would crash randomly and lose data. We were using ext3, so the file system was (supposedly) still consistant, but whatever was being worked on would be lost.
Ultimately, we upgraded the kernel to 2.4.17, and the problems have been fixed. But the "even number == stable reliable" rule failed us that time.
Since then, I've read that "the entire VM system in 2.4 was replaced around 2.4.10". This really scares me. I hope that Linus and Alan Cox have learned to manage things better now. If not, someone else will have to pick up the slack (maybe RedHat) and manage a stable kernel.
Cryptnotic
My other first post is car post.
Try the low-latency patches to 2.4 tree. They have much better impact than those call "preemptive".
/usr/bin/X11/X
Also
nice -n -10
helps quite a lot on an average desktop linux
What are you smoking?!? High end box DOES NOT mean your 1.2 GHz Athlon!! We're talking about machines with >8 processors here. Machines which need to use the PPro PAE so that over 4gig of memory can be addressed.
There are serious VM stability issues with these systems. Ever wonder why Redhat hasn't released a >2.4.9 kernel? It's because 2.4.10 is where the new VM system went in. Redhat is busily porting Rick van Riel's 2.4.9 VM up to the later kernels so that they can use it.
Um... 1.3.x is, indeed, the stable version. From the website:
2.0.x is the unstable tree at the moment.
Comment removed based on user account deletion
I had been running 2.4 on our router for many months. That's not to say those months were consecutive running. :) I had so many problems. I reached the point where I had to reboot the box at least once a week (usually twice) or else it would suddenly become unresponsive. If I had an uptime over 10 days I was doing REALLY well. I tried about 10 different 2.4 kernels (up to 2.4.13), as well as RedHat's 2.4.7 kernel. (I was forced to use 2.4 because of features I required.) At any rate, after about 6-8 months of this, I was resigned to putting either freeBSD on the router or recommending we buy a hardware solution next fiscal year (i.e. cisco router).
Well, I put 2.4.14 on the box and I haven't rebooted since. I have 61 days of uptime and that's the most I've seen on that box ever. It is finally stable. The only thing I can conclude is that it's AA's VM that is doing the trick. And in hindsight, it makes sense. The behaviour of the box was that it was thrashing, but at the time it didn't seem that way because I hadn't noticed the HDD light was disconnected from the box and I couldn't hear the disk in the noisy server room.
So, Linux 2.4 is (knock on wood) stable for my servers, now.
Jason.
There were some lingering problems with NFS (even v2 using UDP) in the 2.2.x kernel series until 2.2.19.
I recommend that you upgrade the machine that's running 2.2.17, or else apply the NFS patches. If you're using NFS v3 or TCP, you definitely want to upgrade to the latest version, and get the latest NFS utils.
As the author of a window manager and big hunks of GTK, I don't think your analysis is quite right.
The primary problem is synchronization, not delay. GTK 1.2 is very fast, its geometry code is not causing any slowness. You are confusing slow with flicker. Flicker looks slow but slow is not the problem; no matter how fast code is, if it flickers, you will see it, and it will look slow.
Similarly when opaque resizing a window; it has nothing to do with quantization or speed, the problem is that the window manager frame and the client are not resized/drawn at the same time resulting in a "tearing" effect. This would be visible no matter how fast you make things.
As you say, putting the toolkit in the server or putting the WM in the toolkit are overradical ways to fix this. It's not even necessary to backing store all X windows. It could be done with an extension allowing us to push a backing store on a single X window during a resize, for example. However fixing it 100% pretty clearly requires server changes, and that's why you haven't seen a fix yet.
... you are absolutely correct in observing that the 2.4 debacle has used up a great deal of Linux's reputation for being stable. I use 2.4.x with SGI's xfs patches both in production systems at work, and at home (like others, we need various features of 2.4.x not available in 2.2.x), and while it has never been anything close to as flakey as the most stable of Microsoft systems, it has in comparison to 2.2.x (and FreeBSD for that matter) been pretty damn unreliable. In comparison to just about everything else it is still quite stable, so happiness is indeed to some degree relative.
And now for some arm chair quarterbacking, all that having been said, I really think Linus needs to excersize some self discipline and stay away from maintaining even-numbered kernel releases (x.0.x, x.2.x, x.4.x, etc.). By his own admission he isn't good at being a stable kernel maintainer and prefers the more interesting work done in development kernels, and his track record in 2.2 wasn't fantastic (particularly in comparison to 2.0, where he did a fantastic job) and was pretty abysmal in 2.4. As someone who's been using GNU/Linux since the early pre 1.0 days I hope he'll put his efforts where his talents are (managing changes in odd numbered development releases) and leave stable maintenance to Cox and Marcelo (who are very good at maintaining and improving stable releases). But enough commentary from the peanut gallery...
The Future of Human Evolution: Autonomy
Same here.
At about the time the 2.4 kernel was first released, we were bulding a server for serving out large media files for encoding. We were on a limited budget, so we put together a PC with about 256 MB RAM running on a K6-2/500. Set it up with a combination of RAID 1 and RAID 5 with 2x40GB and 2x80 GB IDE drives. While running with the stock RH 6.2 kernel we had no problems. But we needed the 2.4 kernel for large files, so we waited until we couldn't wait any longer.
This turned out to be problematic to say the least. While we had 7 servers running RH 6.2 and never had a crash, the machine serving up the media files would lock up whenever copying large files, or whenever many files were being copied. Kept me working through a few weekends trying the latest kernel and then stress testing the server with large file copies. We wound up reverting back to a 2.2 kernel because the crashes were too frequent.
I haven't tried the RH kernels for 2.4 on anything other than desktop systems. I can say that, on RH 7.1 at least, the 2.4 kernel in use is rock solid and has never crashed for me at home or on desktop systems at work. I never got the chance to try the kernels on RH 7.1, but I suspect Redhat kernels would probably be more stable. They've got the resources to stress test and modify kernels for specific needs.
I liked the article. He's not a kernel hacker and writes from his experience of the 2.4 kernel with clients. Only problem I see is WTH was he thinking using Mandrake 8.0 for a server? That version of Mandrake, more than any other I've used, I've found to be very unstable on 2.4.
I saved this one for last:
You see, that's what we call not in the linus kernel. Your impressions of importance and maturity of the patch are really something you should take up with Linus himself. I, for one, wish Ingo's TUX subsystem makes it into the linus tree sometime soon. But you have no basis to say that just b/c a kernel patch is out, and linus hasn't integrated it into his stable tree, the linux process is flawed. Get a clue! Independent patches come out much faster than anyone can pull them into the core; they are usually conflictive and compete with other patches to solve the same problem. So it takes a while. If you want it in the linus tree sooner, help out. Welcome to open source.