-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Striving for 100% code coverage is actively harmful #297
Comments
What you might want is something like mutation testing, whose results are an actual logical mapping to code operations. Mutation testing makes sure that your code does exactly what your tests expect, and not a single operation more or less. It's much more effort than increasing line-based coverage (which I agree is a relatively naive metric), but it's definitely worth it for building blocks libraries like ours. (I've developed promise.rb in a mutation-test-driven way in the past.) Another useful technique is fuzzing, especially for things like multiaddr, multihash, and ipld formats. |
I came across a information filled blog-post today, listing testing anti-patterns, where "Paying excessive attention to test coverage" is one of the points. While the entire post is worth reading, scroll to "Anti-Pattern 6" for more reasoning behind this specific issue. http://blog.codepipes.com/testing/software-testing-antipatterns.html |
i agreed completely with @victorbjelkholm, my rule of thumb is high quality unit tests for high complexity code paths, errors propagation and specific domain logic besides that integration tests is where spend time to build confidence !! One of the best references i found about this stuff: |
@victorbjelkholm you are missing the main point and assuming a couple of things which then leads you to take in whole sale testing reports from poor executions of teams with very different goals and incentives. That KR is guided towards achieving Sustainable Modules, reducing dev time in review and encouraging contributors to respect the contract to maintain the same level of coverage already present in the module. To your point of "people can fake tests to make coverage pass". Yes, this is true, but it is also true for code that is not tests and bad code in general. This is why we have pull request reviews and module captains. The real problem is that things do get bad when someone naively thumbs up a PR without proper review, misleading the contributor that the code is perfect and not supporting them to grow and improve. I've seen this happen multiple times across our modules. I do wish to find a way to encourage everyone to have the same level of rigor, both in their code and their reviews so that we don't have to have the discussion on how "people learn how to fake tests to pass coverage". In the end, code coverage provides a way for people to reason about code paths that they haven't before, it encourages new contributors to learn more about the codebase in depth. |
cc @ipfs/javascript-team
One of this quarters OKR is "Bring code coverage up to 100% everywhere!"
I argue that this OKR does not address our needs for testing and having a hard metric of "100%" is actively harmful and misses the point.
Code coverage is a useful tool for finding parts of a program that is untested. It does not however tells us about the quality of the test, or if the test actually test what we want. A function can be counted as covered by code coverage metrics as long as it's being called and the test where it was called from is successful.
Example: https://github.com/ipfs/js-ipfs/blob/64c3bfb45cda27e268655cc9a2bce28eebd09454/test/http-api/block.js#L54-L59
The example shows that
ipfs.block.get
returns an error when we provide invalid arguments. The checkexpect(err).to.exist()
just checks that err is notnull
orundefined
. Rather than checking that the error is correct, we just check if there is a error. Literally any error would be treated as a success in this test, which doesn't match with what the test is supposed to make sure, that "invalid arguments" are throwing an expected error.This means that
ipfs.block.get
according to code coverage is covered, but it's a false-positive. It's covered as in "called at some point" but since the test is of low quality, makes us not being able to focus on it or refactor it to be covered. I would argue that having functions not being tested at all is more helpful than having tests that are incorrect or of low-quality. It's basically just useful as a syntax and general check .These sort of tests are popular and even have their own name, called "AssertionFreeTests", which means that while there is a test, the assertion is of low-quality that the test can't really be relied on. Having a goal of 100% code coverage, will only bring more of these.
Code coverage is not a useful metric when it comes to determining how useful a test is, and it should not be used a metric value for how well we're doing testing. Rather, it's a tool to find least tested functions.
With a explicit focus on code coverage as a metric, we'll have people aiming for achieving this with whatever means are possible, this includes low-quality tests. This is why aiming for "100% code coverage" is actively harmful. We'll get more low-quality tests that we have to maintain and eventually refactor, while the current situation is already bad, because of other reasons. Our main problem with our tests is not about code coverage. It's about low-quality tests.
Ending up with a code coverage around 80% is reasonable while seeing projects with 100% test coverage makes me immediately suspicious about people just throwing tests without thinking about it to be able to raise a metric.
This goes into the general idea that you can test too much but it's hard to nail down this idea fully. Basically if the tests are slowing you down rather than helping you moving faster, you could be having duplicate tests, tests of low quality and such.
The text was updated successfully, but these errors were encountered: