Slashdot Mirror


2.4, The Kernel of Pain

Joshua Drake has written an article for LinuxWorld.com called The Kernel of Pain. He seems to think 2.4 is fine for desktop systems but is only now, after a year of release, approaching stability for high-end use. Slashdot has had its own issues with 2.4, so I know where he's coming from. What have your experiences been? Is it still too soon for 2.4?

15 of 730 comments (clear)

  1. Alphas by Paul+Komarek · · Score: 5, Informative

    And this guy appears to be talking about only x86 machines. My lab has had a horrible time with 2.4 on Alphas. In fact, we've moved back to 2.2.18 on some macines. (2.2.20 for Alpha didn't compile properly, and I didn't want to mess with it -- anyone know if which 2.2 kernel is best for number-crunching Alphas right now?). Oh, the pain. The lost time. "Kernel of Pain" is a fine description of our 2.4 experience on Alphas.

    -Paul Komarek

    1. Re:Alphas by Dr.+Tom · · Score: 4, Informative

      I'm running 2.4.17 and it works great on a DP264. I followed the whole 2.4 series and there were some rough spots in the first dozen or so but it's good to go now.

  2. My experience by nzhavok · · Score: 5, Informative

    What have your experiences been?

    Well:
    8:33pm up 45 days, 5:49,

    Shameful I know, but I had to move city before that I had 6 months. Should had a UPS ;-)

    This is pretty much a desktop/development box running postgres, JBoss, tomcat, apache, JBuilder and (occasionally) kylix. No problems so far, touch wood.

    I also used to work at the comp-sci department of a university were we had 40 boxes in the linux lab, no real problems except they were running ext2 so only the occasional manual fsck. Now the maclab, that is another story (OS9 not OSX).

    --

    He who defends everything, defends nothing. -- Fredrick The Great
  3. Similar problem here... by Cryptnotic · · Score: 3, Informative
    The article is a little short on the details, but we had a similar problem here at work with a new Redhat 7.2 server (kernel 2.4.9) we were setting up. The machine was to be a CVS/file server, running a cvs pserver and Samba. It had 1GB of main memory, and a 180 GB RAID5 array (external via a Mylex RAID card w/ LVD SCSI U160). The machine would seem to run fine, but then in testing, the machine would block on processes for seemingly no reason. It was something in the [kswapd] kernel process that was blocking things. If you logged in at a terminal or over a network, you'd get extreme "stuttering" on your responsiveness. Basically, it was unresponsive under loads with several running processes. This wasn't even excessive.

    Oh yeah, and the machine would crash randomly and lose data. We were using ext3, so the file system was (supposedly) still consistant, but whatever was being worked on would be lost.

    Ultimately, we upgraded the kernel to 2.4.17, and the problems have been fixed. But the "even number == stable reliable" rule failed us that time.

    Since then, I've read that "the entire VM system in 2.4 was replaced around 2.4.10". This really scares me. I hope that Linus and Alan Cox have learned to manage things better now. If not, someone else will have to pick up the slack (maybe RedHat) and manage a stable kernel.

    Cryptnotic

    --
    My other first post is car post.
    1. Re:Similar problem here... by Rentar · · Score: 3, Informative
      I hope that Linus and Alan Cox have learned to manage things better now. If not, someone else will have to pick up the slack (maybe RedHat) and manage a stable kernel.

      Neither Linus, nor Alan Cox maintain 2.4 at the moment. Marcelo Tosatti does, and from what I read on LKML some ppl thought that to be a bad move at the beginning, but I think it works out just great (the first release he made was 2.4.17 IIRC)

  4. Re:Au contraire by Ace+Rimmer · · Score: 4, Informative

    Try the low-latency patches to 2.4 tree. They have much better impact than those call "preemptive".

    Also
    nice -n -10 /usr/bin/X11/X
    helps quite a lot on an average desktop linux

    --

    :wq

  5. Re:Au contraire by hansendc · · Score: 5, Informative

    What are you smoking?!? High end box DOES NOT mean your 1.2 GHz Athlon!! We're talking about machines with >8 processors here. Machines which need to use the PPro PAE so that over 4gig of memory can be addressed.
    There are serious VM stability issues with these systems. Ever wonder why Redhat hasn't released a >2.4.9 kernel? It's because 2.4.10 is where the new VM system went in. Redhat is busily porting Rick van Riel's 2.4.9 VM up to the later kernels so that they can use it.

  6. Re:Observations & Experiences by schwap · · Score: 4, Informative
    How many people are running a version of Apache in the 1.3.x tree? Well if you are that's a development tree and not necessariliy stable. Yes there are stable versions, but you must test!

    Um... 1.3.x is, indeed, the stable version. From the website:

    The Apache Group is pleased to announce the release of the 1.3.22 version of the Apache HTTP server. Apache 1.3.22 is the best version of Apache currently available.

    2.0.x is the unstable tree at the moment.

  7. Comment removed by account_deleted · · Score: 3, Informative

    Comment removed based on user account deletion

  8. Linux 2.4 on our router by Tack · · Score: 4, Informative

    I had been running 2.4 on our router for many months. That's not to say those months were consecutive running. :) I had so many problems. I reached the point where I had to reboot the box at least once a week (usually twice) or else it would suddenly become unresponsive. If I had an uptime over 10 days I was doing REALLY well. I tried about 10 different 2.4 kernels (up to 2.4.13), as well as RedHat's 2.4.7 kernel. (I was forced to use 2.4 because of features I required.) At any rate, after about 6-8 months of this, I was resigned to putting either freeBSD on the router or recommending we buy a hardware solution next fiscal year (i.e. cisco router).

    Well, I put 2.4.14 on the box and I haven't rebooted since. I have 61 days of uptime and that's the most I've seen on that box ever. It is finally stable. The only thing I can conclude is that it's AA's VM that is doing the trick. And in hindsight, it makes sense. The behaviour of the box was that it was thrashing, but at the time it didn't seem that way because I hadn't noticed the HDD light was disconnected from the box and I couldn't hear the disk in the noisy server room.

    So, Linux 2.4 is (knock on wood) stable for my servers, now.

    Jason.

  9. NFS and 2.2 by ansible · · Score: 3, Informative

    There were some lingering problems with NFS (even v2 using UDP) in the 2.2.x kernel series until 2.2.19.

    I recommend that you upgrade the machine that's running 2.2.17, or else apply the NFS patches. If you're using NFS v3 or TCP, you definitely want to upgrade to the latest version, and get the latest NFS utils.

  10. Re:Au contraire by Havoc+Pennington · · Score: 5, Informative

    As the author of a window manager and big hunks of GTK, I don't think your analysis is quite right.

    The primary problem is synchronization, not delay. GTK 1.2 is very fast, its geometry code is not causing any slowness. You are confusing slow with flicker. Flicker looks slow but slow is not the problem; no matter how fast code is, if it flickers, you will see it, and it will look slow.

    Similarly when opaque resizing a window; it has nothing to do with quantization or speed, the problem is that the window manager frame and the client are not resized/drawn at the same time resulting in a "tearing" effect. This would be visible no matter how fast you make things.

    As you say, putting the toolkit in the server or putting the WM in the toolkit are overradical ways to fix this. It's not even necessary to backing store all X windows. It could be done with an extension allowing us to push a backing store on a single X window during a resize, for example. However fixing it 100% pretty clearly requires server changes, and that's why you haven't seen a fix yet.

  11. While Linux remains superior to Windows by FreeUser · · Score: 5, Informative

    ... you are absolutely correct in observing that the 2.4 debacle has used up a great deal of Linux's reputation for being stable. I use 2.4.x with SGI's xfs patches both in production systems at work, and at home (like others, we need various features of 2.4.x not available in 2.2.x), and while it has never been anything close to as flakey as the most stable of Microsoft systems, it has in comparison to 2.2.x (and FreeBSD for that matter) been pretty damn unreliable. In comparison to just about everything else it is still quite stable, so happiness is indeed to some degree relative.

    And now for some arm chair quarterbacking, all that having been said, I really think Linus needs to excersize some self discipline and stay away from maintaining even-numbered kernel releases (x.0.x, x.2.x, x.4.x, etc.). By his own admission he isn't good at being a stable kernel maintainer and prefers the more interesting work done in development kernels, and his track record in 2.2 wasn't fantastic (particularly in comparison to 2.0, where he did a fantastic job) and was pretty abysmal in 2.4. As someone who's been using GNU/Linux since the early pre 1.0 days I hope he'll put his efforts where his talents are (managing changes in odd numbered development releases) and leave stable maintenance to Cox and Marcelo (who are very good at maintaining and improving stable releases). But enough commentary from the peanut gallery...

    --
    The Future of Human Evolution: Autonomy
  12. Re:Unfortunately I have to agree by _johnnyc · · Score: 4, Informative

    Same here.

    At about the time the 2.4 kernel was first released, we were bulding a server for serving out large media files for encoding. We were on a limited budget, so we put together a PC with about 256 MB RAM running on a K6-2/500. Set it up with a combination of RAID 1 and RAID 5 with 2x40GB and 2x80 GB IDE drives. While running with the stock RH 6.2 kernel we had no problems. But we needed the 2.4 kernel for large files, so we waited until we couldn't wait any longer.

    This turned out to be problematic to say the least. While we had 7 servers running RH 6.2 and never had a crash, the machine serving up the media files would lock up whenever copying large files, or whenever many files were being copied. Kept me working through a few weekends trying the latest kernel and then stress testing the server with large file copies. We wound up reverting back to a 2.2 kernel because the crashes were too frequent.

    I haven't tried the RH kernels for 2.4 on anything other than desktop systems. I can say that, on RH 7.1 at least, the 2.4 kernel in use is rock solid and has never crashed for me at home or on desktop systems at work. I never got the chance to try the kernels on RH 7.1, but I suspect Redhat kernels would probably be more stable. They've got the resources to stress test and modify kernels for specific needs.

    I liked the article. He's not a kernel hacker and writes from his experience of the 2.4 kernel with clients. Only problem I see is WTH was he thinking using Mandrake 8.0 for a server? That version of Mandrake, more than any other I've used, I've found to be very unstable on 2.4.

  13. the above message may be a troll by denshi · · Score: 3, Informative
    I think your message is highly misinformed and borders on trolling. Maybe you're just new.

    Many hardware setups require recompiling the kernel and experimenting endlessly.
    This is true. On machines with really exotic hardware, I have had to recompile a great many kernel configurations. Usually, however, I can just rmmod & insmod to test the new configurations without rebooting, so the experimenting phase is not overlong.
    Every time you recompile the kernel, you need to recompile some kernel modules.
    You are in no way forced to compile anything as a module -- the kernel will live quite happily as a solitary elf executable. So don't tell me 'every time'.
    Dependencies and recompilation aren't working correctly--some things don't recompile when they should, and lots of things recompile over and over and over again.
    That's possible anywhere, and I have seen little evidence for your recompiliation loop. It has been some time since I have last seen an incorrect dependency in the kernel build. And on an average uniprocessor machine, my full builds complete in under two minutes. So I'm not crying for time.
    The kernel itself is a 30Mbyte download.
    Cry me a river. Get DSL. Or learn to use the patch command -- that's why all those patch files are on the kernel mirrors. I've been pulling kernel sources off a 33k modem link for the last 6 months, and I'm not hurting for the speed.
    And the list of problems goes on and on.
    All of which are apparently handwaving. Let's watch.
    The kernel hackers keep telling us that C and make are just great tools for building kernels.
    I agree with you that make sucks. Unfortunately, it still sucks less than almost everything on the field. Please suggest an alternative. I also agree that C sucks. OTOH, C++ sucks even harder, and for its extra demands of space and time and its ability to obfuscate, C++ doesn't deliver any of the benefits that a real language (like LISP) does. C++ has been out for 20 years, and it still hasn't superseded C in close-to-the-metal progging. Figure it out.
    This is not a system I can recommend to non-technical users--commercial distributions can't cover all the possible kernel configurations (even with fully modularized kernels), and recompilation is out of the question for many users.
    I have to agree with you on that, but recent kernels are pretty complete -- most users won't need to recompile.
    It must be possible to write drivers and other kernel modules that can be compiled separately from the kernel and work across many versions. Binary modules really should keep working across minor version number changes (2.2 to 2.4, for example).
    You can do that. Say yes to 'attach version information to modules' in the kernel config.
    It must be possible to write kernel modules with more safety in mind. There should also be some way to apply some memory protection to kernel modules when desired.
    I agree with you, but that's pretty far off. The MIT exokernel is I think the shining example of what you are looking for. In the meantime, most people get the same effect by running your theoretic modules outside of the kernel, in daemons or shared libs or something. The user/kernel protections are usually enough.
    The build system needs to get fixed. There is no reason why adding or removing a module should result in a recompilation of the whole kernel. Maybe it's time to get rid of "make" altogether for the kernel.
    There *is* no reason to recompile the whole kernel to add a module. What are you smoking? "make modules","cd to blah","cp blah.o /lib/modules/x.y.z/","depmod". Or just "make modules; make modules_install". As for 'getting rid of make', what would you use to replace it?

    I saved this one for last:

    Important and mature packages like MOSIX require patching the kernel and aren't integrated into the kernel.
    You see, that's what we call not in the linus kernel. Your impressions of importance and maturity of the patch are really something you should take up with Linus himself. I, for one, wish Ingo's TUX subsystem makes it into the linus tree sometime soon. But you have no basis to say that just b/c a kernel patch is out, and linus hasn't integrated it into his stable tree, the linux process is flawed. Get a clue! Independent patches come out much faster than anyone can pull them into the core; they are usually conflictive and compete with other patches to solve the same problem. So it takes a while. If you want it in the linus tree sooner, help out. Welcome to open source.