Dealing with linked dependencies #1099

DRMacIver · 2018-01-31T09:22:06Z

Note: This is part of #1093, where I encourage people to work on the shrinker. As such I will not be working on it myself and encourage you to take a look at it if it takes your fancy. Note: This one is a bit fiddly and requires experimentation. I do not recommend it as a first attempt on the shrinker, but it's probably a decent second attempt.

Hypothesis has a special case of sorts (actually, two) for things that look like the following:

@composite
def stuff(draw, elements):
    n = draw(integers(0, 10))
    return draw(lists(elements, min_size=n, max_size=n))

Basically it will try to lower n while simultaneously deleting some data afterwards. It does this by noticing that lowering n reduces the total amount of data drawn and using this to trigger some heuristics that say that it's worth trying to simultaneously lower n and delete data.

Currently it can only delete data in two ways:

Deleting the data that immediately comes after n
Deleting one or two adjacent draw calls anywhere in the byte sequence after n.

This works surprisingly well for a lot of different things, especially in combination with the way that the reordering passes tend to move the interesting stuff to the end, so deleting right after n is often exactly the right thing to do.

Where it doesn't work is when the decisions we make after n become non-local, so that the things that need to be deleted don't line up neatly. For example, imagine the following:

@st.composite
def fixed(draw, k, elements):
    n = draw(st.integers(0, 10))
    part = st.lists(elements, min_size=n, max_size=n)
    return tuple(draw(part) for _ in range(k))

We can sometimes get stuck and fail to reduce this well:

>>> find(fixed(3, st.integers(0, 255)), lambda ts: all(t and max(t) == t[-1] > 0 for t in ts), settings=settings(database=None))
([0, 0, 0, 1], [0, 0, 0, 1], [0, 0, 0, 1])

Here we should always get ([1], [1], [1]) as an outcome, but (rarely) we don't.

This is actually quite difficult to reliably demonstrate with natural examples because have some backtracking logic (escape_local_minimum) which helps get us out of messes like this and it usually works, but it's non-deterministic and thus sometimes fails.

The easiest way to demonstrate this is with a unit test. The following follows the conventions of tests/cover/test_conjecture_engine.py and uses monkeypatching to put the shrinker into the right state to run into problems:

def test_can_move_multiple_dependent_values(monkeypatch):
    monkeypatch.setattr(
        Shrinker, 'shrink', Shrinker.greedy_shrink)
    monkeypatch.setattr(
        ConjectureRunner, 'generate_new_examples',
        lambda runner: runner.test_function(
            ConjectureData.for_buffer([3, 0, 0, 1, 0, 0, 1])))

    @run_to_buffer
    def x(data):
        n = data.draw_bits(8)
        if n == 0:
            data.mark_invalid()
        for _ in hrange(2):
            data.start_example()
            for i in hrange(n):
                k = data.draw_bits(1)
                if k > 1:
                    data.mark_invalid()
                if (i + 1 == n) != (k == 1):
                    data.mark_invalid()
            data.stop_example()
        data.mark_interesting()
    assert x == hbytes([1, 1, 1])

Here we create a scenario like this and turn off the backtracking logic so the shrinker is forced to do this the hard way. The thing for it to do here is to lower n to 1 and delete all the zero bytes, but there's no way for it to discover this right now.

Note: The start/stop example calls allow us to wrap a boundary around each drawn section so we know what goes where. If you want to play this on hard mode it would be cool if this also worked with those removed, but I suspect it will end up much harder.

I do not know exactly what the correct solution is going to be, but I suspect something like the following:

Add an expensive shrink pass that looks for this scenario by trying each shrinking block and lowering it to its predecessor.
Do a comparison of the draws in the current shrink target to look for things that "go missing" - places where in the current draw target some draw contains a child with some particular label and byte pattern, and in the shrunk version it doesn't.
Try to delete some values prior to those points until they no longer go missing. e.g. if it lost k draw calls of a particular label, try deleting both the first k and last k draw calls that ended up in that label in the child.

Note that this might not work and you might have to try something else! But I do think the general idea of looking for what went missing and trying to repair the draw is the way to go.

The text was updated successfully, but these errors were encountered:

Zac-HD · 2022-08-20T21:53:09Z

I'm closing this because after four years I don't expect a volunteer to take it on soon - and as we said in #1099:

At present there is literally no good reason to improve the Hypothesis shrinker except enjoyment and weird aesthetic obsessions: due to the combination of better underlying model and a truly disproportionate amount of invested effort, it is currently so good that everyone else's shrinking looks comedically bad in comparison.

DRMacIver added help wanted test-case-reduction about efficiently finding smaller failing examples labels Jan 31, 2018

Zac-HD added good first issue and removed help wanted labels Feb 20, 2018

Zac-HD removed the good first issue label Apr 24, 2018

Zac-HD mentioned this issue Nov 16, 2021

Bad shrinking with complex strategy involving mutually_broadcastable_shapes #3151

Closed

This was referenced Aug 20, 2022

The shrinker can get stuck on indexes into lists #1187

Closed

Handle zig-zagging when doing more global lexicographic minimization #1102

Closed

Zac-HD closed this as completed Aug 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with linked dependencies #1099

Dealing with linked dependencies #1099

DRMacIver commented Jan 31, 2018

Zac-HD commented Aug 20, 2022

Dealing with linked dependencies #1099

Dealing with linked dependencies #1099

Comments

DRMacIver commented Jan 31, 2018

Zac-HD commented Aug 20, 2022