Linux Kernel Gets Fully Automated Test
An anonymous reader writes "The Linux Kernel is now getting automatically tested within 15 minutes of a new version being released, across a variety of hardware and the results are being published for all to see. Martin Bligh
announced this yesterday, running on top of IBM's internal test automation system. Maybe this will enable the kernel developers to keep up with the 2.6 kernel's rapid pace of change. Looks like it caught one new problem with last night's build already ..."
code generation...
How were the previous kernels being tested? Were sources for improvement/change/modification, bugs and areas requiring refactoring being discovered by chance?
Most projects of any complexity use automated continuous build and testing as a standard development practise.
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
But it can't catch everything - the 1394 bus was screwed in 2.6.11. There are a lot of regressions that show up - and even that healthy cluster of systems will not show every problem.
Sound issues? Older network and SCSI cards? There are a lot of drivers that break, and no one notices it because there is nobody with the hardware testing the -rc or -mm kernels.
Wouldn't it make more sense to package these tools for someone to install on their collection of oddball equipment, and assist in the debugging/testing?
Where's the ARM, MIPS, and SH?
Why can't I mod "-1 Idiot"?
ARM Linux has had something similar in Kautobuild for some time.
Although the testing and building is limited to the ARM platform.
The site also has a whos who thats worh looking at ;-)
Sounds like the solution to this problem is clear. Always use the second to latest kernel released. Stay away from the new one untill it's fully tested to your satisfaction.
Life is not for the lazy.
...the cross-platform, cross-hardware part? Setting up one machine to build automatically is easy. Setting up a whole bunch of them (and all unique, read administration nightmare) and tie them together to a system, that's quite a bit of work.
Kjella
Live today, because you never know what tomorrow brings
"Release" in the open source world has a broader sense than in commercial software. In open source not all "released" versions are meant for general public consumption; they include unstable versions targeted mostly at developers, so that severe isues can be detected and patched quickly.
Taking this into account, I believe this is meant to catch bugs mainly in nightly (unstable) builds and release candidates, not in "final" versions (those should, at least in theory, have no serious bugs left around as the latter have already been eradicated from release candidates).
Score: i, Imaginary
I automatically test every nightly -git snapshot release, so it's fairly well tied in anyway. This also means my heaviest usage of our machines is at night, when most of the (US) developers are asleep.
... and the whole -rc cycle should enable us to catch a lot of stuff.
So it's fairly well tied in already
Red Hat (and probably Novell/SuSe, since they use over one thousand kernel patches) runs a myriad of tests on each of its own kernel builds nightly - and has been doing so for years. On more than just the 3 architectures covered by this test.
That said, pushing tests upstream is a great idea. Just not revolutionary or anything.
I hope they are using code from the Linux testing suite. That piece of work has already formed a nice set of tests. Also, I hope that the kernel is automatically built with many different combinations of options. And with time, I hope this will become better. The more tests, with the more hardware configurations, with the more kernel configurations, with the more types of input data (including many imaginative forms of incorrect input data to test that the kernel handles it gracefully and thwarts attacks based on such methods), the better quality we will have in the kernel, and it is likely that Linux will be unmatched in quality, stability, efficiency (well, maybe not efficiency necessarily), and long uptimes.
Compiles, boots, runs dbench, tbench, kernbench, reaim, fsx. If one test fails, it'll highlight it
... LTP is a pain because some tests always fail, and I have to work out the differential against baseline. Will come later.
in yellow, rather than green or red. I have a few of those in the internal tests, but not the external set.
This is only the tip of the iceberg as to what can be done. We're already running LTP, etc internally, and several other tests. Some have licensing restrictions on results release (SPEC)
With an automated test suite, what happens when a class of bug is discovered to be untested-for? Presumably, the suite is modified to detect it. Then, is the resulting new suite itself subjected to an automated test suite? And, then...[divide-by-zero error...]
Seeing bad movies only encourages them. Watch responsibly
because they are nightly builds, that is - versions with applied patch, but untested yet.
The results are all there if anyone wants to play with them. Go to the results matrix, and click on the numerical part of the green box. Pick a test, and drill down to the results directory.
;-) Preferably python or perl into gnuplot.
The numbers are there, it's just a question of drawing graphs, etc. I have some for kernbench already, but I'm not finished automating them. If anyone wants to email me code to generate them from the directory structure published there, feel free
Martin Bligh announced this yesterday, running on top of IBM's internal test automation system.
Hope he doesn't fall off and hurt himself.
mod me troll, but (free)bsd had this for years and not only for the kernel, but for world, too.
Reliable, repeatable testing is a great way to prevent fixes in one area from causing bugs in another. When I fix A, I generally only test A manually. I don't test every other conceivable code path, even though my fix for A might well impact them.
An automated test for B will catch regressions caused by my fix in A, making it harder to backslide. Backsliding is very expensive because bugs are far removed from their cause. If an automated test sees that changes in A caused a regression in B, the cause is immediately obvious.
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
Ummm...
If everyone did this, the newest kernels would never get tested. I think it is important that we have a diverse range of users using new, almost new, and older but well tested kernels.
You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.