Slashdot Mirror


2.4, The Kernel of Pain

Joshua Drake has written an article for LinuxWorld.com called The Kernel of Pain. He seems to think 2.4 is fine for desktop systems but is only now, after a year of release, approaching stability for high-end use. Slashdot has had its own issues with 2.4, so I know where he's coming from. What have your experiences been? Is it still too soon for 2.4?

20 of 730 comments (clear)

  1. Alphas by Paul+Komarek · · Score: 5, Informative

    And this guy appears to be talking about only x86 machines. My lab has had a horrible time with 2.4 on Alphas. In fact, we've moved back to 2.2.18 on some macines. (2.2.20 for Alpha didn't compile properly, and I didn't want to mess with it -- anyone know if which 2.2 kernel is best for number-crunching Alphas right now?). Oh, the pain. The lost time. "Kernel of Pain" is a fine description of our 2.4 experience on Alphas.

    -Paul Komarek

  2. My experience by nzhavok · · Score: 5, Informative

    What have your experiences been?

    Well:
    8:33pm up 45 days, 5:49,

    Shameful I know, but I had to move city before that I had 6 months. Should had a UPS ;-)

    This is pretty much a desktop/development box running postgres, JBoss, tomcat, apache, JBuilder and (occasionally) kylix. No problems so far, touch wood.

    I also used to work at the comp-sci department of a university were we had 40 boxes in the linux lab, no real problems except they were running ext2 so only the occasional manual fsck. Now the maclab, that is another story (OS9 not OSX).

    --

    He who defends everything, defends nothing. -- Fredrick The Great
  3. Mandrake8.1 ships with both 2.4 and 2.2 by renoX · · Score: 5, Insightful

    This guy is complaining that he had troubles on a production server with Mandrake8.1 and its kernel 2.4.

    But Mandrake 8.1 ships with both kernel 2.4 and 2.2.
    The idea behind it is: if you need all the fancy stuff use 2.4 but if you want stability use 2.2.

    So using 2.4 on a server and then complaining that it isn't stable enough is silly IMHO.

    That said I agree that 2.4 has been slow to stabilize (VM mess apparently caused by communications problems between Linus and Rick Van Riel).

  4. 2.4 is hit and miss. by aussersterne · · Score: 5, Interesting

    We're running the Red Hat 2.4.9-13 kernel on several SMP database servers and they have been perfect (not rebooted since 'rpm -U' of the new kernel) for several weeks. Before that, we were running 2.4.7-something from Red Hat and they were the same -- ran straight from the day we installed the kernel to the day we updated without needing to be restarted.

    On my desktop machine, I've taken more risks (installed pretty much every official 2.4.x-linus release as they have come out) and some have been good, while others have been total dogs.

    I'm running 2.4.17 right now. It seems okay; I've only had a freeze-up once over the last couple of weeks, though it was a total hard freeze (i.e. no ping, no magic SysRq, no nothing), which I haven't had in Linux for several years.

    The obvious issue is VM; if you keep lots of memory (768M, or preferably 1.0G+) in your system, things to much more smoothly, though MP3 playback still skips a little.

    Right now, I'd prefer some work on the RAID and IDE performance issues. One or two of the 2.4 series have had disk performance 100%+ better than the current 2.4 kernels. Why? I'd like to get the disk I/O back to reasonable levels.

    --
    STOP . AMERICA . NOW
  5. Re:Au contraire by ElOttoGrande · · Score: 5, Interesting
    it just doesn't not show me that it is rendering a window... and that's something that Gnome was doing even on 450 mhz machine.

    The preemptive patches have made my system a lot more responsive under use. Most notably the mouse cursor doesn't slow down during heavy compiles and audio latency is good enough to play with some of the more interesting sound software projects out for linux.

    But it really sounds like your problem isn't with linux but with XFree86. X has its share of problems but if you have a good video card that's supported well under it, you should get more than acceptible 2d drawing performance. I use a 3dfx voodoo3 here and its about as good as win2k running KDE (sometimes you can see it rendering when resizing or moving windows quickly but i like to think of it as a cool effect ;) and its way faster with lighter WM's like blackbox.

  6. Worked for me. by roystgnr · · Score: 5, Interesting

    There was a bad period where the Soundblaster Live driver (particularly mixer settings) was broken. That lasted through at least three kernel releases. There was a worse period where the VM had fits, and where performance degraded way too rapidly if the system had to swap. That lasted at least six kernel releases. There were at least one or two releases where I discovered that Alan Cox's (usually more bleeding edge) tree was being better behaved.

    Of course, whenever I'm playing around with this stuff I don't delete my "last known good" kernel, so if after a couple hours or a couple days I noticed a problem, I just booted back to what worked. The default (albeit heavily patched) Red Hat kernels were good, so "last known good" always existed for me.

    To summarize: this hasn't been a source of inconvenience for me, but it has been one of vicarious embarrassment. I've only been using Linux since 2.0.somehighnumber, but this is the worst mess I've seen the "stable" kernel tree go through in that time. Don't get me wrong, I've experienced system-crashing bugs (a tulip driver that freaked at some tulip chipset clones, some really bad OOM behavior a couple years ago) before, and pragmatically I guess that's worse... but those problems were always fixed fast enough that the patches predated my bug reports. Watching even the top kernel developers seem to flounder for months over bugs in a core part of the OS like the virtual memory system just sucked.

  7. We are worse off with 2.2 by oingoboingo · · Score: 5, Insightful

    Interesting what the author was saying about 2.2 versus 2.4 in terms of stability. We have 3 Linux machines which are used quite heavily here at the moment:

    1) A dual PIII-800/Intel 440GX/512MB ECC RAM based server, with a Mylex AcceleRAID 170 adapter, an Adaptec AIC-7896 SCSI adapter, Intel EtherExpress Pro 10/100, and an external 450GB SCSI RAID-5. This box is used for NFS/Samba file serving and an e-mail server for around 100 users.
    It runs kernel 2.2.17

    2) A dual PIII-800/VIA 133 server/1GB PC-133 RAM server, with an Initio A100U2W SCSI adapter, Intel EtherExpress 10/100 and 70GB of external SCSI RAID 1/0. It runs MySQL, Apache, and a collection of internally developed Perl, C and Java server apps, on kernel 2.4.3

    3) A dual PIII-450/Intel 440BX/512MB PC-100 RAM server, with an Adaptec 2940UW adapter, Intel EtherExpress 10/100 and 170GB of external SCSI RAID-5. It is used as a development system, and runs MySQL, Apache, and assorted Perl, C and Java apps, on kernel 2.4.1.

    Systems 2 and 3 have both been up for 197 days as I type this, and would have been up for over 250 days had we not needed to power them down to move them to a new server room.

    System 1 (with the 2.2.17 kernel) has never stayed up for more than 55 days. It hard crashes without anything informative being written to the logs, and obviously required the reset button to be pressed.

    Has anyone got any ideas, given the hardware configs and software running on these machines why 2.2 is so horrendous, yet 2.4 so stable?

    1. Re:We are worse off with 2.2 by WasterDave · · Score: 5, Insightful

      I know that there certainly were some... ahhh... issues with the Intel 8255x driver for Linux. There was a bunfight a while back when FreeBSD wasn't compatible with the embedded version of the 82559 (82559er), and the suggestion was made that someone look at the Linux driver to see what the command we were missing was. This led to a big stream of mails about how bad Linux's 8255x driver was, see.

      Or something like that.

      Anyway, I'd look at the changelogs for the network driver between 2.2.17 and 2.4.1, you may learn something.

      Dave

      --
      I write a blog now, you should be afraid.
  8. Unfortunately I have to agree by JeffL · · Score: 5, Interesting
    I'll start by saying that I find 2.4 to be very stable, and to perform mostly ok on 1 and 2 way machines. My laptop, desktop, and 2-way server stay up until I decide to reboot them. Actually, I brought my 2-way server down for a disk upgrade today for the first time since early November when I installed 2.4.14.

    Having said that, there are some serious issues with 2.4 on some 8-way 8GB machines that I manage. They have been running 2.4.13-ac7 since November, because that is the last kernel that is usable for me (-ac11 would probably be ok). Newer kernels have terrible behavior under the intense IO load these machines go through. They get 14-30 days of uptime, and then hang or get resource starved or something and have to be rebooted.

    I think part of the issue is that there simply aren't that many people running 8-way boxes, so bugs aren't found as easy, this is of course on top of having 8-way SMP being much more complex than a defacto single user, single processor desktop machine. To make it even worse, the machines are pushed hard. They move around GBs of data every day, and often will run for extended periods with loads over 25.

    Of course, it is still mostly ok. While the machines are working they mostly work fine. Of course 20 days of uptime is totally unacceptable. I have an alpha running Tru64 pushing 300 days of uptime, and the last time it was down was due to a drive failure, not an OS problem.

    My only remaining issue with Linux on "small" machines is an oscillation problem in IO. Data will fill up all available memory before being written to disk, and then everything from memory will be written out, and then memory fills up again before anything new is written to disk. This is a bit inefficient, and the machine's responsiveness at the memory-full part of the cycle is poor.

    What are my options though? I guess I could try FreeBSD, but a bit of lurking on their lists and forums reveals plenty of problems there, too. Do I switch and hope things get better, or wait out 2.4 and hope it comes around soon? Aside from a few nasty bugs in some releases, pretty much each successive 2.4 kernel has been better than the previous one, at least on small systems.

    Several years ago I was having a hard lockup problem with Tru64 (Digital Unix, at the time) and that was very scary. It took time to get the problem escalated to the OS engineers, instead of just sending an e-mail to lkm. Even then I could only hope that the issue was being addressed, but I had no way to know if anybody was doing anything about it or not. (Turned out to be an bug in the NFS server that would cause the machine to lockup when serving to AIX.) For all of its problems though, it is extremely reassuring for me to be able to monitor the development process of Linux through the linux-kernel-mailing list, and other specialized lists. If I feel that people aren't aware of some problem I am experiencing, I can raise the issue. I am not in the dark about what is happening, and what fixes are being made. I know what changes have gone into each kernel update, so I know if there is a chance of it fixing my problems.

  9. Desktop Myth by ImaLamer · · Score: 5, Insightful

    He seems to think 2.4 is fine for desktop systems but is only now, after a year of release, approaching stability for high-end use.

    I don't get it. I use Linux on the desktop. I have to admit that I don't run linux on my main machine. This is only because I've taken my second hard drive out, and put it back into an older machine. [sorry, wine doesn't like Red Alert 2]

    Before I did this though, I ran 2.4 kernels on my desktop. None of the problems I may have had were with the kernel. Problems I had were mainly with certain applications and when I pushed them to their limits. Pan, for instance, crashed a lot on me, but that was because I was downloading gigs per day. A simple Pan upgrade fixed that.

    In my humble opinion, 2.4 is prime for the desktop. Linux is more than ready for the desktop. I know he says it's ready for the desktop, but not ready for high end systems. To me 'high-end' is what you ask of a computer. I've got a 333MHZ running Red Hat 7.2. The computer is running webmin, proftpd, apache, and many mail daemons. I must also mention that SETI runs 24/7, it only has 64 MB of RAM. It never goes down, it never 'crashes', and is up as long as there is power running to it.

    So... it's ready for the desktop? Sure, 2.4.x is prime. All the drivers I've needed supported are there. Even my >$50 webcam.

    The question of 'desktop' use isn't with the kernel though. Desktop users don't patch or compile the kernel... how many times do they do it in *indows or MacOS X? They install complete distributions. IMHO, again, the only thing that keeps Linux off the desktop is easy program install. RPM has killed itself with dependencies, and apt-get is console based. Apt-get is waaay better, and it has worked wonders on my Red Hat machine [apt-rpm]. The problem is not being able to download an app and install it like *indows.

    Solve this and I will sit outside my local computer store and hand out CDs. I don't know about high end systems, but dammit!, desktop users are ready... format that *indows crap and get a real OS!

    Gimme a good apt-get gui... or have the system run apt-get in the background solving dependencies when needed... my g'ma will have it.

    BTW, I just saw a guy on TV and his name is... get this: Joe Householder

  10. Press Release by edibleplastic · · Score: 5, Funny

    And in other news, the Associated Press is reporting that Linus Torvalds has sent out a memo to the core Linux development team telling them to make stability their "highest priority". In his memo he called this strategy "Trustworthy Computing", saying that it should not be the case that people have to use previous versions of the OS in order to find a stable working environment.

  11. Re:Au contraire by hansendc · · Score: 5, Informative

    What are you smoking?!? High end box DOES NOT mean your 1.2 GHz Athlon!! We're talking about machines with >8 processors here. Machines which need to use the PPro PAE so that over 4gig of memory can be addressed.
    There are serious VM stability issues with these systems. Ever wonder why Redhat hasn't released a >2.4.9 kernel? It's because 2.4.10 is where the new VM system went in. Redhat is busily porting Rick van Riel's 2.4.9 VM up to the later kernels so that they can use it.

  12. Re:Why didn't he downgrade immediately? by bakreule · · Score: 5, Insightful
    "Hopelessly incompenetent"??? Are you kidding? You think he has shortcomings because he was doing what every single rational person does when encountering a software problem? When a program that I buy/download doesn't work, I immediately search for a patch. VERY reasonable behaviour.

    Far be it for me to criticize Linus, et. al as I could never do what they do, but the shortcomings are not with this guy, but with the buggy kernels. These are release kernels, they are not beta kernals. I think, considering the reputation of Linux, that a release kernel should be stable. Yes, bugs happen, and when they do, you would expect a patch to fix these problems.

    If everyone did as you suggested and rolled back to 2.2.x at the first whiff of trouble, who would be out using these "bleeding edge kernels"??

    I think you should cut the boy some slack.....

    --

    Buses stop at a bus station
    Trains stop at a train station
    On my desk there's a workstation....

  13. Re:Au contraire by captaineo · · Score: 5, Insightful

    I definitely see this too... In fact I'm about to start a full crusade against shitty windowing performance. (long visible lags between exposure and repaint, very jerky moving/resizing, etc). These are very real issues on Linux/XFree86. I plan to go as far as shooting my screen with a 100Hz camera to really see what's going on, retrace by retrace.

    There are many links in the GUI chain, any of which can cause a problem. Roughly from top to bottom-

    1. Widget toolkit (GTK, QT, etc)
    2. Client painting library (GDK, QT, etc)
    3. Window manager
    4. X protocol
    5. context switches
    6. X server
    7. 2D video card driver

    The folklore seems to be that 4, 5, and 7 are the major problems - "the X protocol is badly designed, switching between client, server, and window manager processes is too expensive, and XFree86's video drivers are no good."

    In reality though, the problems aren't where most people expect. The X protocol is not generally a bottleneck, especially if the client programmer knows what he's doing (wait until the input queue empties before repainting anything, avoid synchronous behavior, double-buffer windows using server-side pixmaps, etc). The copy-and-context-switch overhead isn't too bad either (keep in mind that context switches are much more expensive on Windows, and Windows is the platform to beat for 2D smoothness!). And finally, many of the 2D drivers really do take advantage of all the hardware offers.

    The real culprits are turning out to be 1 and 3 - the tookits and window managers. Many of the Linux toolkits (especially GTK) have very advanced widget alignment/constraint systems that bog down when windows are resized. Some toolkits are doing naughty things with the event loop (painting while events are still in the input queue, or trying to "optimize" by pausing for new events), and most of them aren't fully double-buffered yet (though GTK 2.0 and recent KDE/QT are most of the way there). Window managers are some of the most horrific perpetrators of 2D crappiness. Some of them try too hard to snap or quantize window sizes and positions, resulting in jerky motion. Kwin seems to prolong expose/repaint cycles much more than necessary. And finally, I will make one criticism of X's overall architecture - I don't think separating the window manager from the X server was a good choice. The asynchronous relationship between X and the wm can cause nasty delays in window moving and resizing. (plus, all widely-used wm's have basically the same features these days; what's the use of having a choice? ;]).

    I used to think that the only way to get perfectly smooth 2D would be to embed the widget toolkit in the X server, so that it could handle repainting all on its own. Now I don't think one needs to go that far; it may just take a well-written window manager, and a similarly carefully-designed widget toolkit. (though it may be helpful for the server to mandatorily double-buffer every window - hey, video RAM is plentiful these days =)

    There are lots of issues I haven't investigated yet - for instance, I think Windows may be doing something interesting with vblank; dragging windows around seems to show a lot less tearing compared to X... Also, 3D OpenGL windows seem to cause much worse artifacting on both X and MS Windows. It's almost possible to bring an animating OpenGL program to a complete halt just by resizing the window, or dragging another window in front of it.

    It's an interesting problem, and I'm glad to see I'm not the only one who cares about it. I find it apalling that (to my knowledge) not one major 2D GUI system has been able to produce 100% correct results - i.e. every window correctly drawn on every single monitor retrace, even while dragging or resizing. Why should we settle for less than 100%?

  14. Re:Why didn't he downgrade immediately? by Lumpy · · Score: 5, Insightful

    EXACTLY!!!! Everyone that has been bitching about the 2.4 series could never give a real reason why they switches their servers to it. I swirtched to 2.4 and kept using every version because of my firewire video editing projects. and firewire is just now getting stable and useable. a server does not need this. nor does it need usb, or anything else added to the 2.4 series. All of my linux servers at work are running 2.2 and will continue to do so until they NEED a 2.4 kernel. It this insane constant "tinkering" that many linux zealots do that makes management not even consider linux as an option. MY boss mentioned that another devision asked him how we kept the linux boxes running well, I told him that we installed it ,configured it and then LEFT IT ALONE except for security patches. and that kiddies is the key to any server..

    --
    Do not look at laser with remaining good eye.
  15. Re:Why Linux? by Ded+Bob · · Score: 5, Interesting

    I know BSD can run many Linux binaries, but what about kernel modules?

    At least for nVidia, it is being worked on: FreeBSD NVIDIA Driver Initiative

  16. STABLE vs STABLE by ajs · · Score: 5, Interesting

    I'm beginning to feel like a broken record, and maybe Linus should just change the terminology so that people stop making the same assumption over and over, but people: wake up and smell the bogomips!

    "Stable", in the context of a kernel release refers to the interfaces. When Linus releases 2.&ltleven>.0, he is saying that this kernel is one that has reached some arbitrary plateu of development stability, and it's now ready for others to begin actuall release engineering on.

    You have to understand that the Linux Kernel is released by Linus in a state that is very reasonable for a development team, but that will never be "production quality". Debian puts a lot of realease engineering work into a Kernel. As does Red Hat. As does SuSe, etc, etc.

    If you just grab 2.4.x and install it, you're acting as Linux Q/A, and I applaud your effort, but when it breaks in your environment, you should not be stunned.

    Once again, production release != stable release. A stable release is just one the developers are happy with (and I've yet to see a 2.4 kernel that I can say developers should not be happy with).

    So, maybe next time, 2.6.0 should be called the "post-development" release so that people don't go off half-cocked installing it on production systems.

  17. He Almost Had Me by bwt · · Score: 5, Interesting

    I almost took this guy seriously until this part:

    The kernel seemed to show more stability. Then we hit kernel 2.4.15.

    Linux version 2.4.15 contained a bug that was arguably worse than the VM bug. Essentially, if you unmounted a file system via reboot -- or any another common method -- you would get filesystem corruption. A fix, called kernel 2.4.16, was released 24 hours later.


    Look, anybody who is deploying a kernel on the day it is released on a production server deserves what they get. One day turnaround on a bug fix is phenomenal. Even if these are marked as "stable" kernels, trying to track the new versions in real time is a dumb thing to do.

    This guy has written a moan and groan article based on a small set of bugs, some of which he only could have experienced if he is experimenting on his production system. He obviously requires extreme stability and says he needs this over the new 2.4 features (SMP, 2G memory, 2G files), which makes me ask: why was he putting new kernels on his production system before emperical evidence was there about high stability?

    Open source will fix bugs faster the proprietary. It doesn't change reality to make bugs impossible. This is true even in "stable" releases, especially if you are talking about highly stressful production environments.

  18. Re:Au contraire by Havoc+Pennington · · Score: 5, Informative

    As the author of a window manager and big hunks of GTK, I don't think your analysis is quite right.

    The primary problem is synchronization, not delay. GTK 1.2 is very fast, its geometry code is not causing any slowness. You are confusing slow with flicker. Flicker looks slow but slow is not the problem; no matter how fast code is, if it flickers, you will see it, and it will look slow.

    Similarly when opaque resizing a window; it has nothing to do with quantization or speed, the problem is that the window manager frame and the client are not resized/drawn at the same time resulting in a "tearing" effect. This would be visible no matter how fast you make things.

    As you say, putting the toolkit in the server or putting the WM in the toolkit are overradical ways to fix this. It's not even necessary to backing store all X windows. It could be done with an extension allowing us to push a backing store on a single X window during a resize, for example. However fixing it 100% pretty clearly requires server changes, and that's why you haven't seen a fix yet.

  19. While Linux remains superior to Windows by FreeUser · · Score: 5, Informative

    ... you are absolutely correct in observing that the 2.4 debacle has used up a great deal of Linux's reputation for being stable. I use 2.4.x with SGI's xfs patches both in production systems at work, and at home (like others, we need various features of 2.4.x not available in 2.2.x), and while it has never been anything close to as flakey as the most stable of Microsoft systems, it has in comparison to 2.2.x (and FreeBSD for that matter) been pretty damn unreliable. In comparison to just about everything else it is still quite stable, so happiness is indeed to some degree relative.

    And now for some arm chair quarterbacking, all that having been said, I really think Linus needs to excersize some self discipline and stay away from maintaining even-numbered kernel releases (x.0.x, x.2.x, x.4.x, etc.). By his own admission he isn't good at being a stable kernel maintainer and prefers the more interesting work done in development kernels, and his track record in 2.2 wasn't fantastic (particularly in comparison to 2.0, where he did a fantastic job) and was pretty abysmal in 2.4. As someone who's been using GNU/Linux since the early pre 1.0 days I hope he'll put his efforts where his talents are (managing changes in odd numbered development releases) and leave stable maintenance to Cox and Marcelo (who are very good at maintaining and improving stable releases). But enough commentary from the peanut gallery...

    --
    The Future of Human Evolution: Autonomy