Slashdot Mirror


Test Coverage Leading You Astray?

An anonymous reader writes "Are your test coverage measurements leading you astray? Test coverage tools bring valuable depth to unit testing, but they're often misused. This article takes a closer look at what the numbers on the coverage report really mean, as well as what they don't. It then suggests three ways you can use your coverage to ensure code quality early and often."

48 comments

  1. Testing? by B4D+BE4T · · Score: 5, Funny

    Who needs testing? Doesn't everyone's code work perfectly on the first ru
    Segmentation fault

    1. Re:Testing? by big+ben+bullet · · Score: 1

      Reminds me of a coworkers job on file access in vb6. I think it had something to do with parsing and writing config files...

      When i asked him where his error handling routines were he replied: "I don't program errors!"

      I know, nothing to do with unit testing; but still worth a mention.

  2. or.. by Anonymous Coward · · Score: 0

    you could stab the CTO who suggests that you code the next project with java

  3. code flow is also important by joebebel · · Score: 2, Informative

    code flow is just as important as code coverage. If code in section 1 is executed in unit test 1, and code in section 2 is executed in unit test 3, there needs to be a unit test which executes both. All combinations have to be handled, if sections of code have side effects on other sections.

    1. Re:code flow is also important by LardBrattish · · Score: 1
      Not just that but if you have code that is called from different places in a subtly different way. You end up saying "Yep, covered that routine" only to have it go bang when the user accesses the code using an obscure method...

      That would be embarassing ;)

      --
      What are you listening to? (http://megamanic.blogetery.com/)
    2. Re:code flow is also important by yermoungder · · Score: 2, Insightful

      But for any reasonable size program the combinations are just so high you can't sit and write unit tests to cover and execute each and every posibility. Either physically or financially.

      What you can do, is use tools like PolySpace (www.polyspace.com) to ensure you won't have any array overruns, out of range errors, access through dangling pointers, etc. You can then run unit tests on the 'working' code in working scenarios to ensure it does what it should.

  4. Unit tests are too simplistic for many apps by BadAnalogyGuy · · Score: 2, Informative

    The idea that you can input some values and expect useful output from a function is nice in theory. Perhaps in some very limited mathematics oriented programs where the inputs must lead to a nice answer, but real world applications may end up manipulating more than just the input data.

    Can you test that the LCD has refreshed at the inputted rate? Can you verify that the input data was correctly injected into the database just be checking the output of the function?

    Functions lie like dogs. You can test the output of functions until you're blue in the face, but until you take a holistic view of the application and what it does, unit tests are more a salve for management's mind than a boon to developers.

    1. Re:Unit tests are too simplistic for many apps by Anonymous Coward · · Score: 1, Funny

      by BadAnalogyGuy (945258): Functions lie like dogs

      YHBT

    2. Re:Unit tests are too simplistic for many apps by dangermouse · · Score: 1

      The idea that you can input some values and expect useful output from a function is nice in theory. Perhaps in some very limited mathematics oriented programs where the inputs must lead to a nice answer, but real world applications may end up manipulating more than just the input data.


      You're right. And in such instances, if all you're doing is checking output, you don't really understand unit testing-- a key tenet of which is testing the code unit in isolation.


      Can you test that the LCD has refreshed at the inputted rate? Can you verify that the input data was correctly injected into the database just be checking the output of the function?


      Depends. Most of the time you're not writing the actual driver for the LCD or for the DB. You want to test your code, not code somebody else wrote-- that's their job. So you mock up the LCD driver or the DB driver, and you verify with some unit tests that your code is making the correct calls to those drivers with the correct parameters.


      This works whenever you need to verify that a function is interacting with other components in the way that it should. You prove that if all other components behave as expected, your function works for the tested input. Then if your function works with the mock components but not with the production components, you know that either those components are broken or your understanding of them is incorrect.


      You still have to do integration tests and functional system tests, because nobody cares if your function works if the system does not, but proper unit testing can drastically reduce the time and effort involved in such integration and system testing because you don't have to try to hit every code path from the system boundaries.

  5. Non-object oriented test tools? by iangoldby · · Score: 4, Interesting

    It's a pity the submitter didn't provide a short paragraph review of the article rather than just copy-paste the abstract.

    Anyway, having had a quick look, it is all about Java.

    I'd love to hear from anyone who can recommend test coverage tools for C (ie. non-object oriented). I think that just about all of the articles I've ever read about testing methodologies have been exclusively about object-oriented patterns, and pretty much only Java or .NET.

    Object-oriented techniques are a good tool, but not the right tool for every job...

    1. Re:Non-object oriented test tools? by yermoungder · · Score: 3, Informative

      "I'd love to hear from anyone who can recommend test coverage tools for C..."

      See http://www.polyspace.com/

    2. Re:Non-object oriented test tools? by GejTOO · · Score: 3, Informative

      Here is a list of testing frameworks for several languages.

      http://c2.com/cgi/wiki?TestingFramework

      G.

    3. Re:Non-object oriented test tools? by iangoldby · · Score: 1

      Thanks for that excellent link. On a quick first glance, http://check.sourceforge.net seems like it could fit the bill.

      Cheers

    4. Re:Non-object oriented test tools? by Anonymous Coward · · Score: 0

      All I've got to say is "gcov". Hell, it's mentioned in the GCC manual even. Try it. Or perish!

    5. Re:Non-object oriented test tools? by Nevyn · · Score: 1
      I'd love to hear from anyone who can recommend test coverage tools for C

      You want gcc with "-fprofile-arcs -ftest-coverage", then you can use gcov/ggcov/lcov to produce usable output. See Vstr and And-httpd for examples.

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    6. Re:Non-object oriented test tools? by chromatic · · Score: 1

      Stig Brautaset recently wrote about Testing C with Libtap.

    7. Re:Non-object oriented test tools? by iangoldby · · Score: 1

      Thanks.

      Of course, I should have mentioned that I'm using Microsoft Developer Studio, but regardless, I'm looking for a completely cross-platform compiler-independent and build-system-independent test framework.

    8. Re:Non-object oriented test tools? by Nevyn · · Score: 1
      but regardless, I'm looking for a completely cross-platform compiler-independent and build-system-independent test framework.

      For C?! ... yeh, I'm looking for a flying car too. So if you let me know when you find what you want that'll help, as I have a feeling they'll be in the same place.

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    9. Re:Non-object oriented test tools? by iangoldby · · Score: 1

      OK, maybe they don't exist, but I was rather hoping for something written using just the standard ANSI C library. Such a framework would be truly cross-platform.

      As far as the build system is concerned, most unit test frameworks I've seen to date rely on autoconf and automake, or Ant, etc. It would be nice to find a framework that doesn't specify the build system at all. Would that be so very difficult?

      Maybe I'm missing something - I've never actually used a unit test framework.

  6. Test coverage is just annoying... by meringuoid · · Score: 5, Funny
    ... they always cancel the stuff I want to watch to make way for it.

    Bloody cricket.

    --
    Real Daleks don't climb stairs - they level the building.
    1. Re:Test coverage is just annoying... by Anonymous Coward · · Score: 1, Funny

      yeah it led me astray, bloody poms winning the ashes.

    2. Re:Test coverage is just annoying... by Anonymous Coward · · Score: 0

      I agree

    3. Re:Test coverage is just annoying... by caluml · · Score: 2, Insightful

      What can be better than cricket?!

  7. DO-178B by nonsequitor · · Score: 5, Interesting

    Three types of code coverage are required for safety critical airline applications:

    1) Line Coverage - Has every line been tested
    2) Branch Coverage - Has every branch been tested
    3) Boolean Coverage - Is EVERY possibility on a truth table for each logical operator explicitly defined

    These tests alone don't certify that the code is ready for an airplane and that it is indeed "bug free." My software engineering professor said it best when he stated, you can only prove the existence of bugs, you cannot prove the non-existence of bugs. These guidelines as adopted by the FAA for the certification of safety critical code, don't prove the non-existence of bugs, but they do go a long way towards proving the existence many bugs and provide a MINIMUM standard to which code must be exercised before being allowed into an airplane.

    Software Engineering is a science, methodology has been pioneered to help us ENGINEER the software we develop to be as defect free as we know how to make it. As in other disciplines of engineering, there will always be things not yet quantified. Take architecture for example, an architect would design a bridge to withstand an earthquake of a specific magnitude, winds of a specific speed. Does that mean the bridge is safe? What if the materials used weren't rated for the temperature range needed for the locale, etc...

    As much as we do to ensure quality, there is no silver bullet. The company I interned at which will remain nameless made a multi-function navigational display for airforce one. It rebooted during a touch and go at 40 degrees farenheit. Wasn't it tested you ask? Of course it was, it was tested at -40 degrees and 140 degrees, but the timing on one of the buses was off at 40 and the hardware watchdog took it into a reboot at a very critical time. It was DO 178B Level A certified, had 100% code coverage of course, but there will always be bugs. Don't trust tools to tell you otherwise, because you can never prove the non-existence of bugs.

    (For those who don't know, a touch and go is where the plane starts landing and takes off again)

    1. Re:DO-178B by OneManCongaLine · · Score: 3, Interesting

      There are also: Path coverage (extremley complex and not practical to use in most cases, but for critical sub-systems it might come in handy) Linear Code Sequence and Jump (LCSJ) There are more, but these two on top of my head is worthy of inclusion in any discussion on coverage. There are a lot of business-specific standards out there that specify use of coverage. Aerospace has one, vehicle control systems has one, pharmaceutical and nuclear system yet others. Guess wich one of these that has the _least_ strict standard for coverage? (Hint, Homer's workplace =)

      --
      -Queen of the Kung-Fu fairies
    2. Re:DO-178B by msobkow · · Score: 3, Insightful

      The code was designed, exercised, tested, and executed properly from what you're saying. The display failed due to hardware problems.

      In what way is that hardware failure related to code coverage or any other form of software testing or QA metric?

      --
      I do not fail; I succeed at finding out what does not work.
    3. Re:DO-178B by nonsequitor · · Score: 2, Interesting

      You're right, that was technically a hardware bug, but they fixed it with a software patch to change the bus timings. Not the optimal solution of course, but a board respin costs much more than a software update. It may not have been the best example, but the point remains, you can't prove the non-existence of bugs. Anyone thats trying to tell you otherwise is either a genius or more likely an idiot.

      To answer your question, I was trying to illustrate that the metrics while a good starting point are merely metrics and only look for specific types of bugs which have frequent occurance and have been well quantified. Since that specific problem ended up categorized ultimately as a software bug, the metric being applied didn't have a chance in hell of finding it. And while new metrics could be put in place to find that sort of low occurance bug, the expense of development would increase accordingly. There then comes a point where the decision has to be made what metrics to use, and how much they cost to implement across the board. This of course cuts into profits which is why I said regulations like DO-178B are a starting point, but not proof that a device will operate flawlessly under all conditions it may be rated for. Its also the reason not all avionics equipment is certified at level A, the effort to obtain that certification is much more costly than the effort required to obtain a level C certification.

      The goal of course is to find faster, cheaper ways, to meet these minimum requirements. Or if your field is not regulated, implement best practices which can improve overall quality at a minimum of cost. I made my initial post because while a lot of people who read slashdot are very technically adept, there may be some who might not realize 100% code coverage is not the same as bug free code, or like my example, a defect free product.

    4. Re:DO-178B by Anonymous Coward · · Score: 0
      Parent wrote: Software Engineering is a science


      No wonder your planes had the reboot anecdote you described. Let's try this more slowly:

      • "Computer Science" is a Science - with all the research and non-practicality that goes with them - (example, natural language processing)
      • "Programming" is very much a manufacturing discipline; and can be managed very much like any other - (example: coding another DVD player from the spec)
      • "Software Engineering" is (except at parent company's workplace) an engineering discipline which involves dimensions of real-world problem solving not in either of the other two.

      If you confuse these; and try to treat Software Engineering as an academic research project or as a cookbook of test procedures, I really don't want to get in your planes.
    5. Re:DO-178B by nonsequitor · · Score: 1

      Too late, ever flown in something made by boeing or airbus. I've worked for subcontractors for both. Also you captured my intent when you corrected me, I should know better than to post before I've had coffee in the morning.

    6. Re:DO-178B by nonsequitor · · Score: 1

      Note to self: don't write code for airplanes or post to slashdot before morning coffee. ^_^

    7. Re: DO-178B by gidds · · Score: 1
      you can only prove the existence of bugs, you cannot prove the non-existence of bugs

      You sure that's what he said? Program proofs can indeed prove the correctness of programs (i.e. the non-existence of bugs). It's just that they're hard to do for any significant amount of code.

      The way I heard the quote, it's about testing: "Program testing can at best show the presence of errors but never their absence." (Edsger W. Dijkstra)

      --

      Ceterum censeo subscriptionem esse delendam.

  8. Yes by peterpi · · Score: 2, Funny
    The test coverage lead me astry last summer. My boss had a TV near his desk, and most afternoons we'd find ourselves gathered around it following the action. Thankfully he was as much into the game as I was, so it didn't really matter.

    Bit of a strange subject for slashdot, eh?

  9. know what to test for by dresseduptoday · · Score: 4, Insightful

    The technique of unit testing is good, and catches many errors, and code coverage is a very good companion in finding out what you haven't tested. Unlike what some posters above have indicated, this is generic, and has nothing to do with the programming paradigm used, nor the programming language. There are two major problems, however. 1. With unit testing, you're only testing that the unit does what you expects it to, given its interfaces (the API, global variables, whatever...) If a bug is a misunderstanding of the specs, you won't catch it, unless the person who wrote the unit test is the one who wrote the specs. 2. You won't discover errors in situations you haven't tested for, and if the code is written poorly enough, it'll give you very good coverage numbers. Example, code that has no error handling what so ever, and a test suite that doesn't subject it to error situations. These problems doesn't make unit testing, and code coverage analysis bad. It's far better than not even trying. But you have to be aware of them and scrutinise the test suite to see what it *doesn't* test, especially if code coverage numbers are really high.

  10. The Emperor has very few clothes by ribuck · · Score: 3, Informative
    Test coverage measurement is a really dilute quality assurance tool. It can show you parts of your code that are untested, but it doesn't say anything about whether the other parts of your code are tested.

    Just executing a line of code or a branch (whilst running a test) does not imply that you are testing that code.

    1. Re:The Emperor has very few clothes by jgrahn · · Score: 1
      Test coverage measurement is a really dilute quality assurance tool. It can show you parts of your code that are untested, but it doesn't say anything about whether the other parts of your code are tested.

      Which is fine if you understand that. My fear, if I should use such tools, is that they would produce semi-meaningful figures (say, a percentage) and Management would learn about it, and start measuring progress and performance based on them. Once that happened, these semi-meaningful figures would control me ...

  11. Re:DO-178B - MCDC by Anonymous Coward · · Score: 3, Informative

    Note that DO-178B requires MCDC (Modified Condition Decision Coverage) for level A software (check DO178B page 74).
    MCDC requires that "every point of entry and exit in the program has been invoked at least once, every condition in a decision in the program has taken all possible outcomes at least once, every decision in the program has taken all possible outcomes at least once, and each condition in a decision has been shown to independently affect that decision's outcome. A condition is shown to independently affect a decision's outcome by varying just that condition while holding fixed all other possible conditions" (Miller and Chilenski).

  12. Re:DO-178B - MCDC by nonsequitor · · Score: 1

    I was simplifying the process to make a point, they did that too. I actually felt sorry for the verification people who had to sign off on a screen capture that every pixel was correct. Thats the brute force testing for every possible combination of inputs, for over a dozen analog and digital inputs. Which results in thousands, if not tens of thousands screen shots to be hand verified as correct with those inputs. And that the failure conditions were all displayed accurately since old data can be much worse than no data while flying a plane, and so on. The point being that even with all that process, there are still bugs which have not been tested for.

  13. Case in point by Anonymous Coward · · Score: 0

    If a database has a recent data caching mechanism, all bets are off as to whether the recently added data was truly added to the data file or whether the data is just hanging around in the cache for quick retrieval. If the system goes down before the cache flushing thread comes around, that data is long gone.

    Unfortunately, the unit test shows success on the initial data insertion call, and it shows that the data is correct on the verification call. Two correctly behaving tests, but a fundamental bug is lurking.

  14. Your unit tests are too simplistic; mine aren't by Jerf · · Score: 4, Insightful

    Functions lie like dogs. You can test the output of functions until you're blue in the face, but until you take a holistic view of the application and what it does, unit tests are more a salve for management's mind than a boon to developers.

    And the solution is... "holistic" unit tests.

    While it's true that unit tests have a hard time making that last little yard (mostly in the form of hardware output, like graphics on the screen or your example), you're not writing your unit tests correctly. It's a rare unit test for me that is the equivalent of checking that adding two numbers work correctly, and while those are useful in development, they very, very rarely ever break later. Pure arithmetic function are the easiest to write, in general, and they correspondingly have the smallest need for continuous automated testing. (Not zero, of course, just the smallest. And when they do break, boy howdy...!)

    In your other example, you ask:

    Can you verify that the input data was correctly injected into the database...

    (and I cut the rest of this question off as it posits an incorrect approach.)

    The answer to this is yes, although you need a good database and a good understanding of how they work. (Not "great", just good.) I have thousands of tests that verify that certain code correctly manipulates the database, and that verifies calling certain webpages correctly manipulates the database. It's only marginally harder than testing a traditional function. The key here is to do everything inside a transaction; perform the task, do your verification, then roll the entire transaction back. Then it doesn't affect your database (which should normally be the "test" database, of course), and as a side effect under all but the "READ-UNCOMMITTED" transaction level, allows you to have any number of copies of the same test(s) running against the exact same database.

    I can't imagine writing a distributed database-based application without such tests. Well, I can, but it's no fun.

    In a lot of database-based applications, since the database is the application, this goes a long way toward testing the entire app.

    Your unit tests ought to cover everything but the hardware output, which is more the exception than the rule.

    Part of the problem is the number of APIs that exist with no thought for testing, making it seem as if unit testing them is impossible. For example, a lot of GUI toolkits are a major pain in the ass because it's difficult or impossible to fully simulate pressing a key in them and then processing the event loop exactly once, after which you will see what happened. This is a limitation of the toolkit, though, not unit testing, one I fervently hope will someday be eliminated after my whining on Slashdot catches the eye of one of the GTK developers or something.

    In other cases, you have to a little work, but it can be done. We use Apache::ASP, and it ships with a little Perl script that can run an ASP page outside of the webserver via a command line. Still not terribly useful, but I was able to take that script and turn it into something that accepts multiple requests over a pipe, and wrap another Perl module around it that manages the connection to make it easy to use. Now, in my unit tests, calling a web page looks just like calling a function. Unfortunately, the rollback idea doesn't trivially work here, but I have some other things in place to help with this. The upshot is my unit tests include whether entire web pages work. This is some damned fine testing, and it's caught plenty of bugs long before they get out to the user.

    Sure, right on the periphery of some systems is hard to reach, but the vast majority of any system is perfectly managable.

    1. Re:Your unit tests are too simplistic; mine aren't by BovineSpirit · · Score: 1

      Just out of interest, have you checked out Ruby On Rails' testing? It comes 'out of the box' with all the bits you need to create a test database and break it in all kinds of interesting ways. It automatically rolls the database back to a sane state ready for your next unit test, so you can test your transactions all you like. It also allows you to call webpages as functions and test them, and there's addons that will automatically validate your pages using w3c's validators. It does seem to answer a lot of your issues with web testing, although if you're already knee deep in ASP code it's too late... Obviously it stills relies somewhat on the developer taking testing seriously, but the ease with which you can do this stuff encourages them to 'do the right thing'.

    2. Re:Your unit tests are too simplistic; mine aren't by Jerf · · Score: 1

      Even if I could choose my language, and I choose Ruby instead of Python, I'm still not ready to commit to Ruby on Rails for the size of application I'm talking about. I wouldn't have chosen Apache::ASP, either, but at least I've got it harnessed.

      (Honestly, my problem hasn't been the frameworks, my problem has been people proving that you can write tightly-coupled spaghetti code in any environment if you don't watch them like a hawk. Ruby's no more the answer to that than what I've already got in place.)

  15. Re:DO-178B - MCDC by Dachannien · · Score: 2, Funny

    She was a fast machine
    She kept her processor clean
    She was the best damn computer I had ever seen
    She had Bugzilla eyes
    Telling me no lies
    Knockin' me out with those APIs
    Taking more than her share
    Had me fighting for air
    She told me to com(pil)e but I was already there

    'Cause the walls start shaking
    The game was Quaking
    My mind was aching
    And we were make-ing it and you -

    Test me all night long
    Yeah you test me all night long

  16. Thought this was about smoking.. by Anonymous+Meoward · · Score: 1

    ..cuz I read "Test Coverage Leading You to Ashtray".

    Test coverage efforts are more likely to drive people to drink, IMO.

    --
    --- The American Way of Life is not a birthright. Hell, it's not even sustainable.
  17. Pet peeve: Exception "eating" by Da+VinMan · · Score: 1

    Obviously your co-worker was a dork when it came to handling environmental issues (file locking, permissions, etc.) but I can see where his attitude would be helpful to some of the programmers I've met. It is far too common in this day of virtual machine environments and structured exception handling for folks to write in an error handler that doesn't do ANYTHING with the error, including propagate it up if it's not mitigated. In other words, many programmers write exception handling code that simply EATS the error and does nothing useful with it. They figure that showing errors to the user makes them look less than competent, so they would rather hide problems. This is a pet peeve of mine because it causes so much extra debugging.

    In business applications, I would rather that no error handling code be present rather than incorrect error handling. An outright crash is far more useful to troubleshooting than an app that quietly forgoes saving data or even corrupts it because of improper exception handling.

    So your coworker, while less than enlightened, would at least avoid my wrath on that count. It's easy to demonstrate the error of his ways. Demonstrating the error in exception eating is much more difficult because you often won't find those instances until that person is off the project or until it's too late to prevent impact to the development schedule.

    --
    Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
    1. Re:Pet peeve: Exception "eating" by big+ben+bullet · · Score: 1

      right on the dot

      and that reminds me of something i recently saw on thedailywtf: The Perils of Error-free Code

      it all comes down to being competent or not

  18. Bug in the tool itself? by jathan · · Score: 1
    The testing tool supposedly saw this code:

    package com.vanward.coverage.example01;
    public class PathCoverage {

    public String pathExample(boolean condition){
    String value = null;
    if(condition){

    value = " " + condition + " ";
    }
    return value.trim();
    }
    }

    and the code was executed once with condition equal to TRUE. It then reported 100% coverage!

    How is that 100% coverage? If condition was FALSE then a completely different path through the instructions would have been executed!

    I would think it should have reported it as 50%. There are 2 different paths through the code and only one was executed.

    1. Re:Bug in the tool itself? by Profound · · Score: 1

      There was no else clause, so the 100% code of the code was covered.

      The bug occurs when one branch is NOT executed (that run will have 75% coverage)

    2. Re:Bug in the tool itself? by Anonymous Coward · · Score: 0

      Cobertura, at least, reports 100% branch coverage on this example as well since there are no statements in the non-existant "else" clause.

      Of course, "branch coverage" means at least two things when used casually (you know, like when you're hanging out with your friends this weekend).

      [See http://www.javaranch.com/newsletter/200401/IntroTo CodeCoverage.html for one non-proprietary intro to the lingo].

      It's at most 50% path coverage, or 50% decision coverage, but 100% "basic branch coverage" and 100% code coverage.

      What we really want is something akin to path coverage. Do any of the free tools calculate that for us?