The Illusion Of High Test Coverage
by Lidor Wyssocky

GhostThere are many tools out there for checking your test coverage (in terms of tested code). Unit tests coverage is becoming as popular as unit tests themselves. And that is of course a step in the right direction.

Test coverage data helps developers identify missing test cases. It also helps development managers to get a clearer picture of the functional quality of the code.

However, the test coverage metric can also create an illusion.

The problem is that this number is always related to what’s in the code. It can never tell you what is missing from the code. Unfortunately, many bugs are the result of unwritten code.

Here’s the simplest example. You write a method receiving a string representing a person’s id. You forget to write the code for validating the correctness of the input (a person’s id cannot contain any character, for example). Because you didn’t think of that, you also didn’t think of writing a test case in which a bad id is passed to this method. The result: 100% coverage, but less than satisfactory quality.

Now, this might seem obvious to some of you, but I see many cases of high test coverage and missing test cases. Does this mean that coverage data is meaningless? No. It just means you have to go beyond the numbers.

Constantly reviewing unit tests and trying to locate missing cases is a complementary method for ensuring the effectiveness of the tests.

Share this post:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • BlinkList
  • Reddit
  • digg
  • NewsVine
  • blogmarks
  • Furl
  • Netvouz
  • Spurl
  • YahooMyWeb

Optimize Your Software Development

See how I can help you develop software more effectively

12 Responses to “The Illusion Of High Test Coverage”

  1. Ed Gibbs Says:

    Agreed, unit test coverage reports don’t guarantee high quality code. Still I’ve found them to be a good supporting practice as I bring a team over to the idea of TDD. They start to get excited about the green bar, and they catch some unseen bugs.

  2. Lidor Wyssocky Says:

    No doubt about it. I’ve just wanted to point out that there is a need to review the tests from time to time to improve their logical coverage (in addition to their code coverage).

    Lidor

  3. Zuzzu Says:

    you are right and that’s why there are other kind of tests. Taking you example a FIT test written by a tester would have spotted the validation issue at the first run!

  4. Tom Harris Says:

    Lidor,

    I recently reviewed a co-worker’s training session on test coverage, and commented on just this topic. But I would go further, and say it more forcefully.

    Test coverage measures that mark 100% coverage when there’s one test for every requirement are sorely misleading. Accepting their numbers as any measure of test set correctness and completeness is just wrong.

    Writing the right number of correct tests for a requirement is the activity of test design, which requires knowledge and experience. Determining whether you’ve got true test coverage cannot be done by a tool (FIT or otherwise); only by a person. An automated test coverage tool does the following, and no more: it identifies requirements (or code) with zero test coverage so that you can design tests for them.

    - Tom

  5. Tom Harris Says:

    Accordingly, a response to Zuzzu as well:

    FIT (for those who haven’t, see http://www.artima.com/weblogs/viewpost.jsp?thread=67373), can indeed “spot the … issue.” What it can’t do is tell you what to do about it. Yes, you’ll know you missed a test, but neither FIT nor any other tool will tell you how to write the test, or whether you really need to write two or three tests.

  6. shmarya Says:

    Test Coverage can also help to identify dead-code which can be removed during a refactoring cycle…

  7. Lidor Wyssocky Says:

    Hi Shmarmeant

    Again, there is no argument about the benefits of code-based test coverage. The point is that trying to get a sense of the quality of your testing suite based solely on test coverage is misleading. You have to analyze the tests and see what’s missing (logically).

    Lidor

  8. Harry Nieboer Says:

    This reminds me of a fine compiler we used at the Nijmegen university some 20 years ago.
    If a program compiled fine, the program would show the message:

    “None of the errors found” …

  9. Phill Says:

    Test coverage seems to come up as some sort of unit test valhalla-builder every once and a while and I’ve gotten my response down pat:

    Imagine a function that divides two numbers and returns the result. You test the arguments 4, and 2 and you’ve got a 100% coverage. Are you comfortable with that?

    That’s why 100% coverage is a _good start_.

    That said, I do believe it’s a good tool in investigating the code to find out where the best bang for your buck is in new unit tests.. Unfortunately, we don’t always write our tests first… if we did, code coverage tools would be useless.

  10. How I jump to my conclusions » None of the errors found Says:

    […] Lidor warns about using test coverage metrics as they may create the illusion you are doing well, when you are not. […]

  11. engtech Says:

    Coverage metrics are all misleading, but they are better than the alternative.

    In terms of hardware verification (where having a bug can mean a $1 million respin) there are a couple of approaches used.

    - code coverage, but that doesn’t catch combinations of sequences
    - separate team writing black box tests to the spec, and gray box tests to the design
    - constraint random with functional coverage (track all of the requirements, and run random combinations of data to hit the majority and then aim for the smaller remaining percentage)
    - formal verification where you have software that formally asserts that a condition will always hold true for all possible combinations to a certain depth

  12. Dave Nicolette Says:

    It’s easy to be fooled into thinking you’re testing everything that needs to be tested when a coverage tool tells you you’ve hit 100% coverage. All it really means is that you’re exercising every line of code at least once. There may be many conditions that cause execution of a given section of code. I used to think it was okay to write unit tests after the fact, but since then I’ve become convinced that writing tests first helps avoid a multitude of problems. One problem it helps avoid is the false sense of security you can get from a coverage tool. If you’ve thought through the tests that need to be done before you write the code, you can’t be lulled into writing tests merely to nudge the coverage toward 100%. You have to think about how to tell whether the requirements have been met. Once you have a set of tests that tells you the truth about that, you’re in good shape. The "red" phase of the TDD cycle doesn’t just mean that a test fails; it has to fail for the right reason. Otherwise, it won’t pass for the right reason, either, and then where will you be? Seems to me the test suite doesn’t tell you anything useful, then.

    Lidor’s example about the id string is a case in point. In the example, you write the code first and then you forget to write all the relevant test cases. If you wrote the tests first, you’d be compelled to think through all the necessary test cases before you could be distracted by the behavior of any actual code. There’s a natural tendency to write tests that prove the code "does what it does." Writing the tests first is a way to short-circuit that natural tendency and force you to think things through. Maybe that’s more about psychology than methodology, but I’ve seen it happen enough times to accept it as reality.

    Another issue is that people who aren’t very mature in their use of TDD sometimes worry a lot about seeing red. They assume red means "bad." They may take shortcuts in order to be sure they see green. Sometimes they end up coding to satisfy the coverage tool, because it’s an easy way to block out the red. That can lead to an inadequate test suite. Someone can make a change that breaks something, but the test suite doesn’t fail. The thing is, red is your friend when it shows up for the right reasons. You want to have tests fail when the code starts generating different results, because it’s the easiest mechanism you have for detecting defects and notifying you of them. Assuming you’re working with TDD and continuous integration, a failed build doesn’t mean you’ve done bad work, it’s just an indication that you need to investigate what changed. If you’ve just written the minimal tests necessary to satisfy a coverage tool, you could easily miss a case like that. A defect could go undetected for some time, and we all know that the longer a defect goes undetected, the more expensive it’s likely to be to fix it.

Leave a Reply