Operational Testing of Linux Kernel 2.5.x
G3ckoG33k writes "The Open Source Development's Lab has begun operational testing of the 2.5.x Kernel: "The staff at OSDL has been involved with development and testing of 2.5 since the beginning and we've noticed that it seems to be very stable for a development tree. So good, in fact, that we think it is ready to be tested in a production environment. We have planned and begun execution of a project to test the 2.5 kernel in our data center using our production environment. The project includes lots of testing and lots of escape hatches so we don't run recklessly off the edge. We began with some of the simpler, less critical servers and, as we build confidence, are moving to the more complex servers. Today we have several servers running 2.5 and within a month we'll have most of the data center migrated to 2.5." Can anyone say Dare Devils?"
I've been trying out 2.5 for quite a while now with varying degrees of sucess.
It would be great to hear from more people like OSDL that it's working well.
Unfortunately, unless RH9 comes with module-init-tools, it will still be a pain to try out the 2.5 kernel.
The reason it's not for production use isn't because it is necessarily crash prone... it's because it can break drastically between minor versions as features are added/changed.
There's a reason people don't use 2.5. It's the DEVELOPMENT kernel. You SHOULD NOT be using it for production use. Often things will break. Sometimes it will cause hard disk corruption. It wouldn't be the first time.
Please, fellow slashdotters, don't be tempted to use 2.5 for your important systems. It's good that it's tested more, but if you do use it, please don't bitch and whine about how it destroyed all your data.
This is however still a DEVELOPMENT kernel. I put that in big letters because it's very, very true. Lots of kernel modules won't compile still. Documentation for what has changed is somewhat spotty, and it took me some time to get everything working decently. And getting a system that can boot into 2.4 or 2.5 seems quite difficult with the new modutils package (or at least I haven't gotten it working yet - have to reinstall modutils RPM if I want to boot into 2.4).
Also there's a major bug with ext3 right now in 2.5.66 - if your computer doesn't shut down cleanly, the journal recovery in 2.5 seems completely broken - I have to reboot into 2.4, let the 2.4 kernel do the journal recover, do a clean shutdown, and THEN boot back into 2.5. Pain in the ass, especially since I've had two hard crashes since I upgraded to 2.5. Also 2.5.66 doesn't compile out of the box with default config. Had to patch one file with a patch from LKML.
So in short, 2.5 may be more stable than usual devel branches, but don't delude yourself about what you are getting into. If you want the latest and greatest in performance for your desktop machine, give it a try. But I wouldn't run even a low uptime-requirement server with it yet.
I've been running 2.5.x on both my station at work and at home. For the most part it's been pretty stable.
I've run 1.3, 2.1, 2.3 and now 2.5 kernels as they came out and 2.5.5x and on have been a pleasure. I had a 2.1.x kernel eat my file system, I've had nothing like that so far.
Now the caveat: don't run a 2.5.x kernel unless your willing to lose everything, backup regularly! and most important because I don't think anything bad will happen, be prepared to write bug reports correctly! READ THIS AFTER DOWNLOAD! linux-2.5.x/Documentation/BUG-HUNTING
"think of it as evolution in action"
Having gone through high cpu/disk load crashes over multiple kernels, I would suggest a good test plan before embarking on any new kernel.
Our most recent experience with 'stable' kernels (specifically drivers in our case) was the default kernel in RH 8. It had some very subtle issues with Intel's GigPHY/MAC chipset that caused crashes only under specific high load every three to four days. Crashes were not repeatable in specific time frames but would eventually happen. I suggest finding a characteristic set of applications/loading of disk/mem/CPU applications and then test out your favorite kernel under all those circumstances. Many programs that run huge FFTs or other number crunching applications are many times too specific to cause failures. We in this example used a program to calculate huge FFTs while doing looping network file transfers to test without issues... nothing beats the real thing!
Also don't think that even 2.4.x series kernels are above this... as I stated earlier even a heavily patched 2.4.18 kernel could be your downfall... so maybe a 2.5.x kernel is okay but beat the crap out of it before putting both feet in.
-Ho
This reminds me- one problem I've always had is that new stuff that gets thrown into the kernel isn't clearly explained- in the most basic ways. Ie, what the heck is it? I remember lots of versions of 2.4 had features and options with no help to explain what they did. Google searches don't always turn up anything handy- often they turn up lots of hits on patches or posts talking about the feature, but not describing what it actually is.
Anyway, For those wondering what the heck cpufreq is...From a kerneltrap interview:
JA: You also mentioned working on the x86 side of Russell King's cpufreq code. We spoke with Russell King in an earlier interview, but we didn't talk about cpufreq. What is it?
Dave Jones: Quite a few CPUs these days allow changing of the voltage/multiplier/bus speed through software. Russell and Erik Mouw did a bunch of work on the ARM CPUs that support this feature, and started writing a generic framework for this type of technology so that he wouldn't have to duplicate code that for eg, recalculates loops_per_sec in every speed scaling.
etc.
Please help metamoderate.
You should only use a development kernel in a production environment if you've already tested it extensively and found it to have no problems with your particular load on your particular hardware with the options you're using. Of course, if you're OSDL, you can actually do this sort of testing, but practically everyone else doesn't have the spare hardware and test suites necessary.
I'm running 2.5.65-mm4 on my home box because i wanted to find out whats all the excitement and nice numbers about the new scheduler. After i got all the modules right, i did some tests ... and was a bit dissapointed. You see, it's not all that faster ... it just feels different. Yes, programs do load somewhat faster, but at the same time doing a ls -l in my home dir was kinda slower that with excellent WOLK patchset for 2.4.18. On the other side, i was completely able to browse my large inbox (~20k mails in maildir) while checking md5 of the latest knoppix iso on the same disk.
... i just can't wait to test the 'fixed up' promise driver and ide tcq code! Right now ide tcq on promise is somewhat borken. If ide tcq shows some numbers, that would be the last argument down for scsi vs. ide in our servers...
I have a lot of expectations of the Alan/Andre team with their ide work
Most desktop apps at least support going through arts, esd or some other software mixer so while its kindof a crappy solution its not that much of an issue.
I'm not sure about most of the cards available these days, but I do know that at least the SBLive (and the linux drivers, both alsa and the old oss ones) allows for hardware mixing of multiple channels (not sure how many but its way more than just 2).
The ALSA drivers ARE of a much higher quality though. Has anyone else noticed that if you put the PCM volume all the way up on pretty much any sound card with the old OSS drivers you start getting nasty distortion (well not really nasty, but for anyone thats picky about audio its pretty nasty). Main volume is fine all the way up, and its definately not a speaker issue. Going on about the same volume level, PCM down Main all the way up vs PCM all the way up and Main down there is most certainly a difference in the quality of sound.
I've seen this with SBPCI 128, SBLive, and on my 800MHz iBook as well, so its not even limited to one platform.
Interactive performance - Pretty sharp. I/O background load really doesn't put much of a burden on foreground stuff, but then, 2.4 + preempt patches didn't either. Resizing is weird. Resize slowly, and the effect is like kernel 2.4 (canvas lags behind window frame). Resize fast, and the effect is like OS X, the window frame lags while the canvas catches up. Both kinda suck. CPU background load (MP3 compression) causes the machine to feel like an XP machine -- big 10-15 pauses.
CD drivers - They suck. Certain CDs (Evanescence's Fallen) will cause the CD drive to go into spasms. This doesn't happen under 2.4.
I/O scheduler - Gimpy. Under heavy CPU load (the aformentioned MP3 compression) starting an app that isn't in cache will take tens of seconds.
Compile performance - awesome. I use Gentoo, and I've noticed big improvements.
Power management - Mediocre. APM is alright. ACPI sucks. Causes weird beeping noises when I try to load the "processor" module. It's probably a fault of my Inspiron 8200's fsck'ed DSDT, so I won't bitch, but WinXP has no problem with it.
Stability - Surprisingly good, for development code. A far cry from 2.4, crashes maybe once a week, but much better than the 2.5.20-something releases, which once hosed my entire partition when I burned a bad CD...
A deep unwavering belief is a sure sign you're missing something...