Expand our use of swarm testing #2643

Zac-HD · 2020-10-17T13:04:59Z

To paraphrase Swarm Testing (Groce et al, 2012),

Swarm testing is way to improve the diversity of generated test cases. Instead of potentially including all features in every test case, a large “swarm” of randomly generated configurations is used, each of which omits some features. ... First, some features actively prevent the system from executing interesting behaviors; e.g., pop calls may prevent an overflow bug from executing. Second, test features compete for space in each test, limiting the depth to which logic driven by features can be explored. Experimental results show that swarm testing increases coverage and can improve fault detection dramatically.

I first proposed that Hypothesis should use this trick in #1637, and a more advanced and shrinker-friendly variant was implemented in #2238 - but only used in rule-based stateful tests (where it has been very useful). In this issue I propose adding swarm testing logic in three more areas, though still without a public API.

`st.one_of()`

This is perhaps the most obvious place to add swarm testing - just disable a subset of the strategies being combined. It's also common enough that doing so might have performance implications, but "measure, don't guess"; and example quality may justify a slight slowdown anyway.

In conversation with @Stranger6667 we estimated that this would cover most downstream use-cases, which makes me inclined to keep swarm testing as an implementation detail with no public API at least for now.

Unicode strings (i.e. `st.characters()`)

AKA #1401. This is a little trickier, as we'd be making many swarm-decisions (hence high overhead ratio of metadata to actual generated data), and the "shrink open" trick would need several layers. Performance more likely to be a problem. I can imagine memoizing our way out of that with chained lookups and the "make your own luck" trick, but we'll see.

`from_lark()` and grammar-based strategies

This is the original use-case for swarm testing, in CSmith, and I'd really like it to work for hypothesmith.

The complexity here is that we would want to analyse the grammar to decide the order in which to consider disabling production rules, and also ensure that the logic is aware of dependencies between productions. I'm pretty sure that I've seen John Regehr write about this somewhere, but can't find the paper or post now.

The text was updated successfully, but these errors were encountered:

auvipy · 2020-11-12T09:09:01Z

this seems great

Zac-HD added the new-feature entirely novel capabilities or strategies label Oct 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand our use of swarm testing #2643

Expand our use of swarm testing #2643

Zac-HD commented Oct 17, 2020

auvipy commented Nov 12, 2020

Expand our use of swarm testing #2643

Expand our use of swarm testing #2643

Comments

Zac-HD commented Oct 17, 2020

st.one_of()

Unicode strings (i.e. st.characters())

from_lark() and grammar-based strategies

auvipy commented Nov 12, 2020

`st.one_of()`

Unicode strings (i.e. `st.characters()`)

`from_lark()` and grammar-based strategies