The Illusion Of High Test Coverage
There are many tools out there for checking your test coverage (in terms of tested code). Unit tests coverage is becoming as popular as unit tests themselves. And that is of course a step in the right direction.
Test coverage data helps developers identify missing test cases. It also helps development managers to get a clearer picture of the functional quality of the code.
However, the test coverage metric can also create an illusion.
The problem is that this number is always related to what’s in the code. It can never tell you what is missing from the code. Unfortunately, many bugs are the result of unwritten code.
Here’s the simplest example. You write a method receiving a string representing a person’s id. You forget to write the code for validating the correctness of the input (a person’s id cannot contain any character, for example). Because you didn’t think of that, you also didn’t think of writing a test case in which a bad id is passed to this method. The result: 100% coverage, but less than satisfactory quality.
Now, this might seem obvious to some of you, but I see many cases of high test coverage and missing test cases. Does this mean that coverage data is meaningless? No. It just means you have to go beyond the numbers.
Constantly reviewing unit tests and trying to locate missing cases is a complementary method for ensuring the effectiveness of the tests.












May 13th, 2006 at 5:58 am
Agreed, unit test coverage reports don’t guarantee high quality code. Still I’ve found them to be a good supporting practice as I bring a team over to the idea of TDD. They start to get excited about the green bar, and they catch some unseen bugs.
May 13th, 2006 at 8:12 am
No doubt about it. I’ve just wanted to point out that there is a need to review the tests from time to time to improve their logical coverage (in addition to their code coverage).
Lidor
May 13th, 2006 at 9:56 am
you are right and that’s why there are other kind of tests. Taking you example a FIT test written by a tester would have spotted the validation issue at the first run!
May 13th, 2006 at 10:59 am
Lidor,
I recently reviewed a co-worker’s training session on test coverage, and commented on just this topic. But I would go further, and say it more forcefully.
Test coverage measures that mark 100% coverage when there’s one test for every requirement are sorely misleading. Accepting their numbers as any measure of test set correctness and completeness is just wrong.
Writing the right number of correct tests for a requirement is the activity of test design, which requires knowledge and experience. Determining whether you’ve got true test coverage cannot be done by a tool (FIT or otherwise); only by a person. An automated test coverage tool does the following, and no more: it identifies requirements (or code) with zero test coverage so that you can design tests for them.
- Tom
May 13th, 2006 at 11:06 am
Accordingly, a response to Zuzzu as well:
FIT (for those who haven’t, see http://www.artima.com/weblogs/viewpost.jsp?thread=67373), can indeed “spot the … issue.” What it can’t do is tell you what to do about it. Yes, you’ll know you missed a test, but neither FIT nor any other tool will tell you how to write the test, or whether you really need to write two or three tests.
May 13th, 2006 at 10:09 pm
Test Coverage can also help to identify dead-code which can be removed during a refactoring cycle…
May 14th, 2006 at 12:28 am
Hi Shmarmeant
Again, there is no argument about the benefits of code-based test coverage. The point is that trying to get a sense of the quality of your testing suite based solely on test coverage is misleading. You have to analyze the tests and see what’s missing (logically).
Lidor
May 14th, 2006 at 1:16 pm
This reminds me of a fine compiler we used at the Nijmegen university some 20 years ago.
If a program compiled fine, the program would show the message:
“None of the errors found” …
May 15th, 2006 at 8:09 am
Test coverage seems to come up as some sort of unit test valhalla-builder every once and a while and I’ve gotten my response down pat:
Imagine a function that divides two numbers and returns the result. You test the arguments 4, and 2 and you’ve got a 100% coverage. Are you comfortable with that?
That’s why 100% coverage is a _good start_.
That said, I do believe it’s a good tool in investigating the code to find out where the best bang for your buck is in new unit tests.. Unfortunately, we don’t always write our tests first… if we did, code coverage tools would be useless.
June 2nd, 2006 at 11:41 pm
[…] Lidor warns about using test coverage metrics as they may create the illusion you are doing well, when you are not. […]
July 13th, 2006 at 6:12 pm
Coverage metrics are all misleading, but they are better than the alternative.
In terms of hardware verification (where having a bug can mean a $1 million respin) there are a couple of approaches used.
- code coverage, but that doesn’t catch combinations of sequences
- separate team writing black box tests to the spec, and gray box tests to the design
- constraint random with functional coverage (track all of the requirements, and run random combinations of data to hit the majority and then aim for the smaller remaining percentage)
- formal verification where you have software that formally asserts that a condition will always hold true for all possible combinations to a certain depth
August 11th, 2006 at 6:37 am
It’s easy to be fooled into thinking you’re testing everything that needs to be tested when a coverage tool tells you you’ve hit 100% coverage. All it really means is that you’re exercising every line of code at least once. There may be many conditions that cause execution of a given section of code. I used to think it was okay to write unit tests after the fact, but since then I’ve become convinced that writing tests first helps avoid a multitude of problems. One problem it helps avoid is the false sense of security you can get from a coverage tool. If you’ve thought through the tests that need to be done before you write the code, you can’t be lulled into writing tests merely to nudge the coverage toward 100%. You have to think about how to tell whether the requirements have been met. Once you have a set of tests that tells you the truth about that, you’re in good shape. The "red" phase of the TDD cycle doesn’t just mean that a test fails; it has to fail for the right reason. Otherwise, it won’t pass for the right reason, either, and then where will you be? Seems to me the test suite doesn’t tell you anything useful, then.
Lidor’s example about the id string is a case in point. In the example, you write the code first and then you forget to write all the relevant test cases. If you wrote the tests first, you’d be compelled to think through all the necessary test cases before you could be distracted by the behavior of any actual code. There’s a natural tendency to write tests that prove the code "does what it does." Writing the tests first is a way to short-circuit that natural tendency and force you to think things through. Maybe that’s more about psychology than methodology, but I’ve seen it happen enough times to accept it as reality.
Another issue is that people who aren’t very mature in their use of TDD sometimes worry a lot about seeing red. They assume red means "bad." They may take shortcuts in order to be sure they see green. Sometimes they end up coding to satisfy the coverage tool, because it’s an easy way to block out the red. That can lead to an inadequate test suite. Someone can make a change that breaks something, but the test suite doesn’t fail. The thing is, red is your friend when it shows up for the right reasons. You want to have tests fail when the code starts generating different results, because it’s the easiest mechanism you have for detecting defects and notifying you of them. Assuming you’re working with TDD and continuous integration, a failed build doesn’t mean you’ve done bad work, it’s just an indication that you need to investigate what changed. If you’ve just written the minimal tests necessary to satisfy a coverage tool, you could easily miss a case like that. A defect could go undetected for some time, and we all know that the longer a defect goes undetected, the more expensive it’s likely to be to fix it.