Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Striving for 100% code coverage is actively harmful #297

Closed
victorb opened this issue Apr 18, 2018 · 5 comments
Closed

Striving for 100% code coverage is actively harmful #297

victorb opened this issue Apr 18, 2018 · 5 comments

Comments

@victorb
Copy link
Member

victorb commented Apr 18, 2018

cc @ipfs/javascript-team

One of this quarters OKR is "Bring code coverage up to 100% everywhere!"

I argue that this OKR does not address our needs for testing and having a hard metric of "100%" is actively harmful and misses the point.

Code coverage is a useful tool for finding parts of a program that is untested. It does not however tells us about the quality of the test, or if the test actually test what we want. A function can be counted as covered by code coverage metrics as long as it's being called and the test where it was called from is successful.

Example: https://github.com/ipfs/js-ipfs/blob/64c3bfb45cda27e268655cc9a2bce28eebd09454/test/http-api/block.js#L54-L59

it('returns error for request with invalid argument', (done) => {
  ipfs.block.get('invalid', (err, result) => {
    expect(err).to.exist()
    done()
  })
})

The example shows that ipfs.block.get returns an error when we provide invalid arguments. The check expect(err).to.exist() just checks that err is not null or undefined. Rather than checking that the error is correct, we just check if there is a error. Literally any error would be treated as a success in this test, which doesn't match with what the test is supposed to make sure, that "invalid arguments" are throwing an expected error.

This means that ipfs.block.get according to code coverage is covered, but it's a false-positive. It's covered as in "called at some point" but since the test is of low quality, makes us not being able to focus on it or refactor it to be covered. I would argue that having functions not being tested at all is more helpful than having tests that are incorrect or of low-quality. It's basically just useful as a syntax and general check .

These sort of tests are popular and even have their own name, called "AssertionFreeTests", which means that while there is a test, the assertion is of low-quality that the test can't really be relied on. Having a goal of 100% code coverage, will only bring more of these.

Code coverage is not a useful metric when it comes to determining how useful a test is, and it should not be used a metric value for how well we're doing testing. Rather, it's a tool to find least tested functions.

With a explicit focus on code coverage as a metric, we'll have people aiming for achieving this with whatever means are possible, this includes low-quality tests. This is why aiming for "100% code coverage" is actively harmful. We'll get more low-quality tests that we have to maintain and eventually refactor, while the current situation is already bad, because of other reasons. Our main problem with our tests is not about code coverage. It's about low-quality tests.

Ending up with a code coverage around 80% is reasonable while seeing projects with 100% test coverage makes me immediately suspicious about people just throwing tests without thinking about it to be able to raise a metric.

This goes into the general idea that you can test too much but it's hard to nail down this idea fully. Basically if the tests are slowing you down rather than helping you moving faster, you could be having duplicate tests, tests of low quality and such.

@ghost
Copy link

ghost commented Apr 18, 2018

What you might want is something like mutation testing, whose results are an actual logical mapping to code operations. Mutation testing makes sure that your code does exactly what your tests expect, and not a single operation more or less. It's much more effort than increasing line-based coverage (which I agree is a relatively naive metric), but it's definitely worth it for building blocks libraries like ours. (I've developed promise.rb in a mutation-test-driven way in the past.)

Another useful technique is fuzzing, especially for things like multiaddr, multihash, and ipld formats.

@victorb
Copy link
Member Author

victorb commented Apr 22, 2018

I came across a information filled blog-post today, listing testing anti-patterns, where "Paying excessive attention to test coverage" is one of the points. While the entire post is worth reading, scroll to "Anti-Pattern 6" for more reasoning behind this specific issue.

http://blog.codepipes.com/testing/software-testing-antipatterns.html

@hugomrdias
Copy link
Member

i agreed completely with @victorbjelkholm, my rule of thumb is high quality unit tests for high complexity code paths, errors propagation and specific domain logic besides that integration tests is where spend time to build confidence !!

One of the best references i found about this stuff:
https://tinyletter.com/kentcdodds/letters/write-tests-not-too-many-mostly-integration

@daviddias
Copy link
Member

@victorbjelkholm you are missing the main point and assuming a couple of things which then leads you to take in whole sale testing reports from poor executions of teams with very different goals and incentives.

That KR is guided towards achieving Sustainable Modules, reducing dev time in review and encouraging contributors to respect the contract to maintain the same level of coverage already present in the module.

To your point of "people can fake tests to make coverage pass". Yes, this is true, but it is also true for code that is not tests and bad code in general. This is why we have pull request reviews and module captains. The real problem is that things do get bad when someone naively thumbs up a PR without proper review, misleading the contributor that the code is perfect and not supporting them to grow and improve. I've seen this happen multiple times across our modules.

I do wish to find a way to encourage everyone to have the same level of rigor, both in their code and their reviews so that we don't have to have the discussion on how "people learn how to fake tests to pass coverage". In the end, code coverage provides a way for people to reason about code paths that they haven't before, it encourages new contributors to learn more about the codebase in depth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants