Linux Kernel Gets Fully Automated Test

← Back to Stories (view on slashdot.org)

Linux Kernel Gets Fully Automated Test

Posted by CmdrTaco on Sunday June 5, 2005 @03:45AM from the just-like-a-real-project dept.

An anonymous reader writes "The Linux Kernel is now getting automatically tested within 15 minutes of a new version being released, across a variety of hardware and the results are being published for all to see. Martin Bligh announced this yesterday, running on top of IBM's internal test automation system. Maybe this will enable the kernel developers to keep up with the 2.6 kernel's rapid pace of change. Looks like it caught one new problem with last night's build already ..."

39 of 159 comments (clear)

Min score:

Reason:

Sort:

now all we need is automated.... by 3seas · 2005-06-05 03:48 · Score: 4, Funny

code generation...
1. Re:now all we need is automated.... by Baal+Sebub · 2005-06-05 04:20 · Score: 3, Funny
  
  I already got 1 million monkeys in my basement working on it.
  
  --
  120 chars are not enough for a signature. I have discovered a truly remarkable proof which this margin is too small to c
2. Re:now all we need is automated.... by maxwell+demon · 2005-06-05 04:24 · Score: 2, Insightful
  
  No problem. The following is an automated code generator. It generates a hello world program in C and writes it to stdout. (untested)
  #include <stdio.h> int main() { char const* program_pattern = "%s%s"; char const* include_pattern = "#include <%s>\n"; char const* function_declaration_pattern = "int %s(%s)"; char const* function_definition_pattern = "%s\n{\n %s;\n}\n"; char const* print_pattern = "printf(%s)\n"; char const* string_pattern = "\"%s\""; char const* stdio_header_name = "stdio.h"; char const* main_function_name = "main"; char const* main_arguments = ""; // we don't read command line arguments char const* output_string = "hello world!"; char string[15]; char print[23]; char main_decl[11]; char include[19]; char main_func[42]; sprintf(string, string_pattern, output_string); sprintf(print, print_pattern, string); sprintf(main_decl, function_declaration_pattern, main_function_name, main_arguments); sprintf(main_func, function_definition_pattern, main_decl, print); sprintf(include, include_pattern, stdio_header_name); printf(program, include, main_func); return 0; }
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
3. Re:now all we need is automated.... by Curtman · 2005-06-05 04:26 · Score: 4, Interesting
  
  Actually, that could be done, could it not?
  
  Apparently it works for Samba. :)
4. Re:now all we need is automated.... by jrockway · 2005-06-05 09:31 · Score: 2, Insightful
  
  It was a joke, dumbass.
  
  If you're going to used fixed-length buffers, though, at least use sNprintf!
  
  --
  My other car is first.
Question: by bogaboga · 2005-06-05 03:50 · Score: 4, Interesting

How were the previous kernels being tested? Were sources for improvement/change/modification, bugs and areas requiring refactoring being discovered by chance?
1. Re:Question: by Anonymous Coward · 2005-06-05 03:55 · Score: 3, Informative
  
  " How were the previous kernels being tested?"
  Hey guys, new kernel is out, bang away at it and let me know what you think.
2. Re:Question: by steve_l · 2005-06-05 08:21 · Score: 2, Funny
  
  I thought it was "hello, here is a new release of fedora for you to install..."
How much testing? by anthony_dipierro · 2005-06-05 03:51 · Score: 2, Interesting

This is good, and long overdue (I'm surprised it hasn't been around for years), but just how much testing is being done? Compiling? Booting? Or are there actual functional and reliability tests which are being performed?
1. Re:How much testing? by oxfletch · 2005-06-05 04:06 · Score: 5, Informative
  
  Compiles, boots, runs dbench, tbench, kernbench, reaim, fsx. If one test fails, it'll highlight it
  in yellow, rather than green or red. I have a few of those in the internal tests, but not the external set.
  
  This is only the tip of the iceberg as to what can be done. We're already running LTP, etc internally, and several other tests. Some have licensing restrictions on results release (SPEC) ... LTP is a pain because some tests always fail, and I have to work out the differential against baseline. Will come later.
What took so long by Timesprout · 2005-06-05 03:53 · Score: 3, Interesting

Most projects of any complexity use automated continuous build and testing as a standard development practise.

--
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
Maybe... by ratta · 2005-06-05 03:53 · Score: 2, Interesting

automated performance regression tests may be useful too.

--
Wondering why i am doing so strange posts? I am trying to get a "+5,Flamebait" or "-1,Insightful" rating.
1. Re:Maybe... by oxfletch · 2005-06-05 04:19 · Score: 5, Informative
  
  The results are all there if anyone wants to play with them. Go to the results matrix, and click on the numerical part of the green box. Pick a test, and drill down to the results directory.
  
  The numbers are there, it's just a question of drawing graphs, etc. I have some for kernbench already, but I'm not finished automating them. If anyone wants to email me code to generate them from the directory structure published there, feel free ;-) Preferably python or perl into gnuplot.
This is awesome by jnelson4765 · 2005-06-05 03:54 · Score: 5, Insightful

But it can't catch everything - the 1394 bus was screwed in 2.6.11. There are a lot of regressions that show up - and even that healthy cluster of systems will not show every problem.

Sound issues? Older network and SCSI cards? There are a lot of drivers that break, and no one notices it because there is nobody with the hardware testing the -rc or -mm kernels.

Wouldn't it make more sense to package these tools for someone to install on their collection of oddball equipment, and assist in the debugging/testing?

Where's the ARM, MIPS, and SH?

--
Why can't I mod "-1 Idiot"?
1. Re:This is awesome by Meshach · 2005-06-05 04:12 · Score: 5, Insightful
  
  But it can't catch everything...
  But that is not the point of automated testing. As a member of a qa team who is developing automated tests I get comments like that every day
  
  Automated tests are not intended to catch everything or test strange permutations of pre-conditions. There purpose is to provide a mechanism for verifying that a build satisfies the basic requirements of the project.
  
  More exotic configs need to be tested manually as usual but automated tests can provide a "failsafe" just in case a basic part of the build is broken.
  
  --
  "Maybe this world is another planet's hell"
  Aldous Huxley
ARM Linux has something similar by kyllikki · 2005-06-05 03:54 · Score: 5, Informative

ARM Linux has had something similar in Kautobuild for some time.

Although the testing and building is limited to the ARM platform.

The site also has a whos who thats worh looking at ;-)
Re:Why has it taken so long? by Anonymous Coward · 2005-06-05 03:54 · Score: 2, Insightful

Bitkeeper.
Re:Within 15 Minutes? WTF by DigiShaman · 2005-06-05 03:57 · Score: 3, Insightful

Sounds like the solution to this problem is clear. Always use the second to latest kernel released. Stay away from the new one untill it's fully tested to your satisfaction.

--
Life is not for the lazy.
Presumably... by Kjella · 2005-06-05 03:57 · Score: 4, Insightful

...the cross-platform, cross-hardware part? Setting up one machine to build automatically is easy. Setting up a whole bunch of them (and all unique, read administration nightmare) and tie them together to a system, that's quite a bit of work.

Kjella

--
Live today, because you never know what tomorrow brings
1. Re:Presumably... by oxfletch · 2005-06-05 04:10 · Score: 5, Informative
  
  Indeed. The automation system I wrote is just a wrapper around an internal harness called ABAT that has a massive amount of work behind it. If systems crash it can detect that, power cycle them, etc.
  
  Going from 90% working to 99.9% working is frigging hard. I had all this working 3-6 months ago, but the results weren't good enough quality to be published. Several people internally put a massive amount of work into improving the quality and stability of the harness.
2. Re:Presumably... by Bob_Robertson · 2005-06-05 08:22 · Score: 2, Insightful
  
  I don't remember who said it first:
  
  The first 90% takes 10% of the time.
  
  The last 10% takes 90% of the time.
  
  I expect one could substitute "money", "labor", "effort" for "time" in the above.
  
  Bob-
  
  --
  The Ludwig von Mises Institute. The reasoning individuals economics
Related projects at OSDL by anandpur · 2005-06-05 04:00 · Score: 2, Informative

Related projects at OSDL
http://osdl.org/projects/26lnxstblztn/results/
http://developer.osdl.org/cherry/compile/
Re:Within 15 Minutes? WTF by doshell · 2005-06-05 04:01 · Score: 3, Insightful

"Release" in the open source world has a broader sense than in commercial software. In open source not all "released" versions are meant for general public consumption; they include unstable versions targeted mostly at developers, so that severe isues can be detected and patched quickly.

Taking this into account, I believe this is meant to catch bugs mainly in nightly (unstable) builds and release candidates, not in "final" versions (those should, at least in theory, have no serious bugs left around as the latter have already been eradicated from release candidates).

--
Score: i, Imaginary
Re:Within 15 Minutes? WTF by oxfletch · 2005-06-05 04:02 · Score: 5, Informative

I automatically test every nightly -git snapshot release, so it's fairly well tied in anyway. This also means my heaviest usage of our machines is at night, when most of the (US) developers are asleep.

So it's fairly well tied in already ... and the whole -rc cycle should enable us to catch a lot of stuff.
News Flash by sirReal.83. · 2005-06-05 04:02 · Score: 4, Informative

Red Hat (and probably Novell/SuSe, since they use over one thousand kernel patches) runs a myriad of tests on each of its own kernel builds nightly - and has been doing so for years. On more than just the 3 architectures covered by this test.

That said, pushing tests upstream is a great idea. Just not revolutionary or anything.
Long uptimes by rice_burners_suck · 2005-06-05 04:02 · Score: 4, Interesting

This is a very smart system. The Samba team uses something very similar. The key to finding regressions with this method is to create tests for every piece of functionality, and to integrate it with the rest of the testing suite, so that each function of the kernel will be continuously tested. For new features, it is preferable to create these tests as the features are being coded. For existing millions of lines of code, it is necessary for some brave souls to go in and create these tests.
I hope they are using code from the Linux testing suite. That piece of work has already formed a nice set of tests. Also, I hope that the kernel is automatically built with many different combinations of options. And with time, I hope this will become better. The more tests, with the more hardware configurations, with the more kernel configurations, with the more types of input data (including many imaginative forms of incorrect input data to test that the kernel handles it gracefully and thwarts attacks based on such methods), the better quality we will have in the kernel, and it is likely that Linux will be unmatched in quality, stability, efficiency (well, maybe not efficiency necessarily), and long uptimes.
through the looking glass... by moviepig.com · 2005-06-05 04:06 · Score: 3, Funny

With an automated test suite, what happens when a class of bug is discovered to be untested-for? Presumably, the suite is modified to detect it. Then, is the resulting new suite itself subjected to an automated test suite? And, then...[divide-by-zero error...]

--
Seeing bad movies only encourages them. Watch responsibly
1. Re:through the looking glass... by oxfletch · 2005-06-05 04:15 · Score: 4, Informative
  
  There is indeed an internal self-test suite on the harness. It's not desperately sophisticated, and I wouldn't dare show it to anyone ;-) However, it does catch a lot of stupid bugs. It requires some manual intervention/inspection to work.
  
  Plus, there's a separate development grid where we test new test-harness code before it's put onto the
  production grid.
2. Re:through the looking glass... by moviepig.com · 2005-06-05 05:34 · Score: 2, Funny
  
  You're not looking at a divide-by-zero error, but a stack overflow from the infinite recursion.
  You're right, I made a mistake. I shall modify my test suite forthwith... [divide-by-zero error]
  
  --
  Seeing bad movies only encourages them. Watch responsibly
Re:Within 15 Minutes? WTF by Metteyya · 2005-06-05 04:06 · Score: 5, Informative

because they are nightly builds, that is - versions with applied patch, but untested yet.
Does this mean... by blixel · 2005-06-05 04:09 · Score: 2

Does this mean we'll get back to 2.6.x releases? Instead of new version of 2.6.x being released as 2.6.x.x every third day?
Safety issues by DruggedBunny · 2005-06-05 04:23 · Score: 5, Funny

Martin Bligh announced this yesterday, running on top of IBM's internal test automation system.

Hope he doesn't fall off and hurt himself.
Re:Why has it taken so long? by teh_cn · 2005-06-05 04:26 · Score: 3, Informative

mod me troll, but (free)bsd had this for years and not only for the kernel, but for world, too.
Wait a minute... by RoLi · 2005-06-05 05:53 · Score: 2

So let me summarize wether I understood it right:
You say it's "completely useless" because you have to wait 15 minutes when a kernel is released.

And this is modded "insightful".
Re:Why has it taken so long? by VStrider · 2005-06-05 06:44 · Score: 2, Funny

They had to. There isn't anyone left to do the testing.

--
VStrider.
Furthermore, it prevents regressions by xant · 2005-06-05 07:01 · Score: 3, Insightful

Reliable, repeatable testing is a great way to prevent fixes in one area from causing bugs in another. When I fix A, I generally only test A manually. I don't test every other conceivable code path, even though my fix for A might well impact them.

An automated test for B will catch regressions caused by my fix in A, making it harder to backslide. Backsliding is very expensive because bugs are far removed from their cause. If an automated test sees that changes in A caused a regression in B, the cause is immediately obvious.

--
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
Re:Within 15 Minutes? WTF by digitalunity · 2005-06-05 07:10 · Score: 3, Insightful

Ummm...

If everyone did this, the newest kernels would never get tested. I think it is important that we have a diverse range of users using new, almost new, and older but well tested kernels.

--
You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
That is what aegis does by nietsch · 2005-06-05 08:43 · Score: 2, Interesting

http://aegis.sf.net/aegis.sf.net
and it can do a lot of other things too, like making sure that each change has an accompagning test and that all tests pass before anybody else is bothered with that change.

The biggest downside for aegis (as I see it) is that it needs to run on a central development server, it is not server based like CVS or the others(it has a cvs-like interface for reading). But OTOH, would it be so hare to have the kernel developers log into a central compile farm where the linux kernel is developed.

--
This space is intentionally staring blankly at you
Re:Well, this time I am really unhappy! by posternutbaguk · 2005-06-05 10:48 · Score: 2, Insightful

Current 2.6x very kernels unstable? Linux does not have any stable version? Obviously you havn't even used Linux in the last year or so.

Testing a product to make it better doesn't mean the product is bad to start with. Some code has higher aspirations than that.