You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Swarm testing is way to improve the diversity of generated test cases. Instead of potentially including all features in every test case, a large “swarm” of randomly generated configurations is used, each of which omits some features. ... First, some features actively prevent the system from executing interesting behaviors; e.g., pop calls may prevent an overflow bug from executing. Second, test features compete for space in each test, limiting the depth to which logic driven by features can be explored. Experimental results show that swarm testing increases coverage and can improve fault detection dramatically.
I first proposed that Hypothesis should use this trick in #1637, and a more advanced and shrinker-friendly variant was implemented in #2238 - but only used in rule-based stateful tests (where it has been very useful). In this issue I propose adding swarm testing logic in three more areas, though still without a public API.
st.one_of()
This is perhaps the most obvious place to add swarm testing - just disable a subset of the strategies being combined. It's also common enough that doing so might have performance implications, but "measure, don't guess"; and example quality may justify a slight slowdown anyway.
In conversation with @Stranger6667 we estimated that this would cover most downstream use-cases, which makes me inclined to keep swarm testing as an implementation detail with no public API at least for now.
Unicode strings (i.e. st.characters())
AKA #1401. This is a little trickier, as we'd be making many swarm-decisions (hence high overhead ratio of metadata to actual generated data), and the "shrink open" trick would need several layers. Performance more likely to be a problem. I can imagine memoizing our way out of that with chained lookups and the "make your own luck" trick, but we'll see.
from_lark() and grammar-based strategies
This is the original use-case for swarm testing, in CSmith, and I'd really like it to work for hypothesmith.
The complexity here is that we would want to analyse the grammar to decide the order in which to consider disabling production rules, and also ensure that the logic is aware of dependencies between productions. I'm pretty sure that I've seen John Regehr write about this somewhere, but can't find the paper or post now.
The text was updated successfully, but these errors were encountered:
To paraphrase Swarm Testing (Groce et al, 2012),
I first proposed that Hypothesis should use this trick in #1637, and a more advanced and shrinker-friendly variant was implemented in #2238 - but only used in rule-based stateful tests (where it has been very useful). In this issue I propose adding swarm testing logic in three more areas, though still without a public API.
st.one_of()
This is perhaps the most obvious place to add swarm testing - just disable a subset of the strategies being combined. It's also common enough that doing so might have performance implications, but "measure, don't guess"; and example quality may justify a slight slowdown anyway.
In conversation with @Stranger6667 we estimated that this would cover most downstream use-cases, which makes me inclined to keep swarm testing as an implementation detail with no public API at least for now.
Unicode strings (i.e.
st.characters()
)AKA #1401. This is a little trickier, as we'd be making many swarm-decisions (hence high overhead ratio of metadata to actual generated data), and the "shrink open" trick would need several layers. Performance more likely to be a problem. I can imagine memoizing our way out of that with chained lookups and the "make your own luck" trick, but we'll see.
from_lark()
and grammar-based strategiesThis is the original use-case for swarm testing, in CSmith, and I'd really like it to work for
hypothesmith
.The complexity here is that we would want to analyse the grammar to decide the order in which to consider disabling production rules, and also ensure that the logic is aware of dependencies between productions. I'm pretty sure that I've seen John Regehr write about this somewhere, but can't find the paper or post now.
The text was updated successfully, but these errors were encountered: