Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The shrinker cannot sort most objects in lists #1094

Closed
DRMacIver opened this issue Jan 30, 2018 · 0 comments
Closed

The shrinker cannot sort most objects in lists #1094

DRMacIver opened this issue Jan 30, 2018 · 0 comments
Labels
test-case-reduction about efficiently finding smaller failing examples

Comments

@DRMacIver
Copy link
Member

Note: This is part of #1093, where I encourage people to work on the shrinker. As such I will not be working on it myself and encourage you to take a look at it if it takes your fancy.

Due to the underlying representation, Hypothesis usually wants to return lists of things in sorted order of increasing complexity (this is strictly speaking not always true, but it is true for "well behaved" element strategies, including anything in hypothesis.strategies).

Sometimes this works currently:

>>> find(st.lists(st.integers(), min_size=5, unique=True), lambda x: True)
[0, 1, -1, 2, -2]

(simplicity for integers is size then sign, so that's the correct order)

However for more complex objects it doesn't:

>>> find(st.lists(st.text(), min_size=5, unique_by=len), lambda x: True)
['0000', '0', '00', '', '000']

These should be in order of ascending size, so that the correct answer is ['', '0', '00', '000', '0000'].

The problem is that our current sorting logic is focused entirely on blocks. These basically correspond to individual low-level primitives. This does not allow the sorting to have any sort of higher level knowledge of how to move things around, which is required here.

I believe that we have all of the correct information under the hood to do this relatively easily. One possible approach here would be:

  • Partition all drawn examples by their depth and label (this ensures you get non-overlapping examples that are "roughly similar")
  • Where those partitions have more than one element, attempt an insertion sort on them to reorder the byte stream to put them in order.

Some care may be needed when examples are of difference size because swapping them might not be valid even if it looks it from the individual examples because of what bytes come after it in the underlying representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-case-reduction about efficiently finding smaller failing examples
Projects
None yet
Development

No branches or pull requests

2 participants