Test Coverage Leading You Astray?
An anonymous reader writes "Are your test coverage measurements leading you astray? Test coverage tools bring valuable depth to unit testing, but they're often misused. This article takes a closer look at what the numbers on the coverage report really mean, as well as what they don't. It then suggests three ways you can use your coverage to ensure code quality early and often."
Who needs testing? Doesn't everyone's code work perfectly on the first ru
Segmentation fault
you could stab the CTO who suggests that you code the next project with java
code flow is just as important as code coverage. If code in section 1 is executed in unit test 1, and code in section 2 is executed in unit test 3, there needs to be a unit test which executes both. All combinations have to be handled, if sections of code have side effects on other sections.
The idea that you can input some values and expect useful output from a function is nice in theory. Perhaps in some very limited mathematics oriented programs where the inputs must lead to a nice answer, but real world applications may end up manipulating more than just the input data.
Can you test that the LCD has refreshed at the inputted rate? Can you verify that the input data was correctly injected into the database just be checking the output of the function?
Functions lie like dogs. You can test the output of functions until you're blue in the face, but until you take a holistic view of the application and what it does, unit tests are more a salve for management's mind than a boon to developers.
It's a pity the submitter didn't provide a short paragraph review of the article rather than just copy-paste the abstract.
.NET.
Anyway, having had a quick look, it is all about Java.
I'd love to hear from anyone who can recommend test coverage tools for C (ie. non-object oriented). I think that just about all of the articles I've ever read about testing methodologies have been exclusively about object-oriented patterns, and pretty much only Java or
Object-oriented techniques are a good tool, but not the right tool for every job...
Bloody cricket.
Real Daleks don't climb stairs - they level the building.
Three types of code coverage are required for safety critical airline applications:
1) Line Coverage - Has every line been tested
2) Branch Coverage - Has every branch been tested
3) Boolean Coverage - Is EVERY possibility on a truth table for each logical operator explicitly defined
These tests alone don't certify that the code is ready for an airplane and that it is indeed "bug free." My software engineering professor said it best when he stated, you can only prove the existence of bugs, you cannot prove the non-existence of bugs. These guidelines as adopted by the FAA for the certification of safety critical code, don't prove the non-existence of bugs, but they do go a long way towards proving the existence many bugs and provide a MINIMUM standard to which code must be exercised before being allowed into an airplane.
Software Engineering is a science, methodology has been pioneered to help us ENGINEER the software we develop to be as defect free as we know how to make it. As in other disciplines of engineering, there will always be things not yet quantified. Take architecture for example, an architect would design a bridge to withstand an earthquake of a specific magnitude, winds of a specific speed. Does that mean the bridge is safe? What if the materials used weren't rated for the temperature range needed for the locale, etc...
As much as we do to ensure quality, there is no silver bullet. The company I interned at which will remain nameless made a multi-function navigational display for airforce one. It rebooted during a touch and go at 40 degrees farenheit. Wasn't it tested you ask? Of course it was, it was tested at -40 degrees and 140 degrees, but the timing on one of the buses was off at 40 and the hardware watchdog took it into a reboot at a very critical time. It was DO 178B Level A certified, had 100% code coverage of course, but there will always be bugs. Don't trust tools to tell you otherwise, because you can never prove the non-existence of bugs.
(For those who don't know, a touch and go is where the plane starts landing and takes off again)
Bit of a strange subject for slashdot, eh?
The technique of unit testing is good, and catches many errors, and code coverage is a very good companion in finding out what you haven't tested. Unlike what some posters above have indicated, this is generic, and has nothing to do with the programming paradigm used, nor the programming language. There are two major problems, however. 1. With unit testing, you're only testing that the unit does what you expects it to, given its interfaces (the API, global variables, whatever...) If a bug is a misunderstanding of the specs, you won't catch it, unless the person who wrote the unit test is the one who wrote the specs. 2. You won't discover errors in situations you haven't tested for, and if the code is written poorly enough, it'll give you very good coverage numbers. Example, code that has no error handling what so ever, and a test suite that doesn't subject it to error situations. These problems doesn't make unit testing, and code coverage analysis bad. It's far better than not even trying. But you have to be aware of them and scrutinise the test suite to see what it *doesn't* test, especially if code coverage numbers are really high.
Just executing a line of code or a branch (whilst running a test) does not imply that you are testing that code.
Paid Q&A/Research
Note that DO-178B requires MCDC (Modified Condition Decision Coverage) for level A software (check DO178B page 74).
MCDC requires that "every point of entry and exit in the program has been invoked at least once, every condition in a decision in the program has taken all possible outcomes at least once, every decision in the program has taken all possible outcomes at least once, and each condition in a decision has been shown to independently affect that decision's outcome. A condition is shown to independently affect a decision's outcome by varying just that condition while holding fixed all other possible conditions" (Miller and Chilenski).
I was simplifying the process to make a point, they did that too. I actually felt sorry for the verification people who had to sign off on a screen capture that every pixel was correct. Thats the brute force testing for every possible combination of inputs, for over a dozen analog and digital inputs. Which results in thousands, if not tens of thousands screen shots to be hand verified as correct with those inputs. And that the failure conditions were all displayed accurately since old data can be much worse than no data while flying a plane, and so on. The point being that even with all that process, there are still bugs which have not been tested for.
If a database has a recent data caching mechanism, all bets are off as to whether the recently added data was truly added to the data file or whether the data is just hanging around in the cache for quick retrieval. If the system goes down before the cache flushing thread comes around, that data is long gone.
Unfortunately, the unit test shows success on the initial data insertion call, and it shows that the data is correct on the verification call. Two correctly behaving tests, but a fundamental bug is lurking.
Functions lie like dogs. You can test the output of functions until you're blue in the face, but until you take a holistic view of the application and what it does, unit tests are more a salve for management's mind than a boon to developers.
And the solution is... "holistic" unit tests.
While it's true that unit tests have a hard time making that last little yard (mostly in the form of hardware output, like graphics on the screen or your example), you're not writing your unit tests correctly. It's a rare unit test for me that is the equivalent of checking that adding two numbers work correctly, and while those are useful in development, they very, very rarely ever break later. Pure arithmetic function are the easiest to write, in general, and they correspondingly have the smallest need for continuous automated testing. (Not zero, of course, just the smallest. And when they do break, boy howdy...!)
In your other example, you ask:
Can you verify that the input data was correctly injected into the database...
(and I cut the rest of this question off as it posits an incorrect approach.)
The answer to this is yes, although you need a good database and a good understanding of how they work. (Not "great", just good.) I have thousands of tests that verify that certain code correctly manipulates the database, and that verifies calling certain webpages correctly manipulates the database. It's only marginally harder than testing a traditional function. The key here is to do everything inside a transaction; perform the task, do your verification, then roll the entire transaction back. Then it doesn't affect your database (which should normally be the "test" database, of course), and as a side effect under all but the "READ-UNCOMMITTED" transaction level, allows you to have any number of copies of the same test(s) running against the exact same database.
I can't imagine writing a distributed database-based application without such tests. Well, I can, but it's no fun.
In a lot of database-based applications, since the database is the application, this goes a long way toward testing the entire app.
Your unit tests ought to cover everything but the hardware output, which is more the exception than the rule.
Part of the problem is the number of APIs that exist with no thought for testing, making it seem as if unit testing them is impossible. For example, a lot of GUI toolkits are a major pain in the ass because it's difficult or impossible to fully simulate pressing a key in them and then processing the event loop exactly once, after which you will see what happened. This is a limitation of the toolkit, though, not unit testing, one I fervently hope will someday be eliminated after my whining on Slashdot catches the eye of one of the GTK developers or something.
In other cases, you have to a little work, but it can be done. We use Apache::ASP, and it ships with a little Perl script that can run an ASP page outside of the webserver via a command line. Still not terribly useful, but I was able to take that script and turn it into something that accepts multiple requests over a pipe, and wrap another Perl module around it that manages the connection to make it easy to use. Now, in my unit tests, calling a web page looks just like calling a function. Unfortunately, the rollback idea doesn't trivially work here, but I have some other things in place to help with this. The upshot is my unit tests include whether entire web pages work. This is some damned fine testing, and it's caught plenty of bugs long before they get out to the user.
Sure, right on the periphery of some systems is hard to reach, but the vast majority of any system is perfectly managable.
She was a fast machine
She kept her processor clean
She was the best damn computer I had ever seen
She had Bugzilla eyes
Telling me no lies
Knockin' me out with those APIs
Taking more than her share
Had me fighting for air
She told me to com(pil)e but I was already there
'Cause the walls start shaking
The game was Quaking
My mind was aching
And we were make-ing it and you -
Test me all night long
Yeah you test me all night long
..cuz I read "Test Coverage Leading You to Ashtray".
Test coverage efforts are more likely to drive people to drink, IMO.
--- The American Way of Life is not a birthright. Hell, it's not even sustainable.
Obviously your co-worker was a dork when it came to handling environmental issues (file locking, permissions, etc.) but I can see where his attitude would be helpful to some of the programmers I've met. It is far too common in this day of virtual machine environments and structured exception handling for folks to write in an error handler that doesn't do ANYTHING with the error, including propagate it up if it's not mitigated. In other words, many programmers write exception handling code that simply EATS the error and does nothing useful with it. They figure that showing errors to the user makes them look less than competent, so they would rather hide problems. This is a pet peeve of mine because it causes so much extra debugging.
In business applications, I would rather that no error handling code be present rather than incorrect error handling. An outright crash is far more useful to troubleshooting than an app that quietly forgoes saving data or even corrupts it because of improper exception handling.
So your coworker, while less than enlightened, would at least avoid my wrath on that count. It's easy to demonstrate the error of his ways. Demonstrating the error in exception eating is much more difficult because you often won't find those instances until that person is off the project or until it's too late to prevent impact to the development schedule.
Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
package com.vanward.coverage.example01;
public class PathCoverage {
public String pathExample(boolean condition){
String value = null;
if(condition){
value = " " + condition + " ";
}
return value.trim();
}
}
and the code was executed once with condition equal to TRUE. It then reported 100% coverage!
How is that 100% coverage? If condition was FALSE then a completely different path through the instructions would have been executed!
I would think it should have reported it as 50%. There are 2 different paths through the code and only one was executed.