-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for crosshair
backend
#3806
Conversation
d9bd2c9
to
6f06db7
Compare
No problem - this works; updated on my side.
Sounds good. IIUC, I would then, at context manager exit, blast all of my realized primitives into the run's conjecturedata via the forced operations on a regular PrimitiveProvider. Is that right? We might need to be more surgical about what the context manager covers; I noticed last night that ConjectureRunner.test_function seems to want to operate on the buffer pretty immediately after executing the user code. |
I'm aiming to have a new attribute which is just the list of primitive values provided, so we'd want to materialize the contents of that list (working title Converting to a buffer would I think force realization, so we'd want to delay that until the end of the test, which means we'd actually have to re-run the test... and at that point we might as well do the buffer construction only for failing examples which we pass off to the shrinker or replay of the minimal failure. If we currently realize earlier, we can probably make some changes to defer that; at worst waiting for some more work on the original issue. |
6f06db7
to
b1735ea
Compare
@tybug - I've now exceeded my timebox for this, and won't be able to come back to it until mid-January at earliest. If you want to use it as a starting point, you'd be welcome to take it over 🙂 |
@Zac-HD roger, thanks! I'll likely end up working on this after the datatree ir migration. (which is slower than I'd hoped, but I am chugging along.) |
Makes sense! Good chance this one will be easier once the datatree has native IR support too, I got a bit bogged down in adding special cases where an alternative backend means we have to ignore the usual semantics of an empty buffer. |
I'm taking a look at getting this working. I've got a branch with full shrinking + database playback support for I took a different approach from this branch, where primitives are forced to their buffer representation immediately when drawn. I tried forcing them only at the end in Even though "normal" backends work with this, I don't know whether this approach is OK for crosshair? It seems like "only convert to buffer at the end" may be a requirement for crosshair given the above conversation. I.e., is it the responsibility of hypothesis (convert at end) or crosshair (use per_test_case context manager) to respect the lifetime of the crosshair primitives? I should disclaim that I have no knowledge of crosshair internals and was only vaguely following the above conversation. If only converting at the end is a requirement, it'll be a bit more work to change things in hypothesis to account for that (unsure how much yet.) @pschanely, on a related note, I tried running the crosshair plugin with my above branch, but got: @settings(backend="crosshair", suppress_health_check=HealthCheck.all())
@given(st.integers())
def test(value):
print("called", value)
assert value != 150
test()
Traceback (most recent call last):
File "/Users/tybug/Desktop/Liam/coding/hypothesis/sandbox.py", line 20, in <module>
test()
File "/Users/tybug/Desktop/Liam/coding/hypothesis/sandbox.py", line 15, in test
@given(st.integers())
^^^
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/core.py", line 1620, in wrapped_test
raise the_error_hypothesis_found
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/core.py", line 1587, in wrapped_test
state.run_engine()
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/core.py", line 1115, in run_engine
runner.run()
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/engine.py", line 500, in run
self._run()
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/engine.py", line 917, in _run
self.generate_new_examples()
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/engine.py", line 638, in generate_new_examples
zero_data = self.cached_test_function(bytes(BUFFER_SIZE))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/engine.py", line 1089, in cached_test_function
self.tree.simulate_test_function(dummy_data)
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/datatree.py", line 727, in simulate_test_function
v = draw(
^^^^^
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/datatree.py", line 716, in draw
value = draw_func(**kwargs, forced=forced)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/data.py", line 1768, in draw_integer
value = self.provider.draw_integer(**kwargs, forced=forced)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/hypothesis_crosshair_provider/crosshair_provider.py", line 104, in draw_integer
symbolic = proxy_for_type(int, self._next_name("int"), allow_subtypes=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/crosshair/core.py", line 586, in proxy_for_type
space = context_statespace()
^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/crosshair/statespace.py", line 243, in context_statespace
raise CrosshairInternal
File "/opt/homebrew/lib/python3.12/site-packages/crosshair/util.py", line 594, in __init__
debug("CrosshairInternal", str(self))
^^^^^^^^^
RecursionError: maximum recursion depth exceeded while getting the str of an object (note the, I believe, real error of |
Unfortunately yeah, for crosshair we have to convert only at the end. Converting while the test is in progress forces crosshair to choose a concrete value for that element, and thus give up the main benefit of concolic execution, earlier than would otherwise be the case. I think we can get around this reasonably well with only one additional epicycle: add a Happily that'll go away again when we convert the shrinker and database format to IR-native. I guess the places where we try to access an unfinished buffer might turn out to be a problem them, but let's not borrow trouble - they might go away in the refactoring. |
Yup, Zac's on the ball; draw-time will be too early for CrossHair. And @tybug thanks - it's not you. Looks like I aleady broke my plugin already with mainline changes. I've got a suspicion about that crash; will be able to investigate this weekend and report back. |
@tybug I hooked up tybug/provider-plugins-2 to the latest versions of CrossHair and crosshair-hypothesis; and made some additional changes to both. (please pull latest) Some of those updates will hopefully reduce some of the confusion around error reporting. (though expect this to be challenging in general - whenever either Hypothesis or Pytest tries to do some exception handling involving symbolics, things will generally go haywire) Going forward, I'd call out some likely kinds of exceptions:
In particular, right now, I see the following immediate issues, though I expect they might just go away with the additional changes you're planning:
At any rate, I'm happy to continue to diagnose issues as you encounter them. Don't be shy! |
48fe127
to
6587f3b
Compare
Thanks @pschanely! Especially for the elaboration on the exceptions. Hypothesis' usage of I'm thinking about how this pull will play with other backends going forward. Crosshair seems like roughly the most complicated backend we would ever want to support in terms of the requirements it places on how we interact with its provider. Crosshair values can't be reified until the end of a test run, so it seems we have to avoid using But other backends may not have this restriction and would benefit from I know the stated goal is to get the crosshair backend working, and I don't mean to create extra work here 🙂. I'm personally invested in this because I have ideas for other backends ( |
I've called out It seems to me that the basic split is between backends which do their own tracking of what's happened - whether fuzzing or solving - and those which don't. Having an attribute on Seems like we're pretty close to something which kinda-works though, if we thread through the context manager and manage to defer reification I think that might actually work? If you think of a subset that would make sense to ship sooner, I'd also be happy to release an explicitly-unstable feature with e.g. Atheris integration... although that involves arguing over who controls the main loop, so maybe not. |
I'm also hopeful we're almost there. I'll continue working here and try to get something which works for both crosshair and simpler backends. (thanks for your fixes above, by the way — merge gone wrong on my end). |
87d035e
to
e514d73
Compare
This comment was marked as outdated.
This comment was marked as outdated.
OK, sorry, my print is at fault above; that's triggering reification outside the context manager. There is an issue here somewhere, because I only added that print to debug the buffer being empty when it shouldn't be, but what I posted above isn't it. |
Here's the real problem: crosshair raises
I think crosshair is correct to raise By the way, reproducing this requires some work, because hypothesis swallows the NotDeterministic exception and produces its own Flaky. Details in case it's useful. |
I think the problem is that we shouldn't track symbolic values in our DataTree - if we just appended the symbolic values to a list, then asked crosshair to reify them at the end of the test, and then added them to the tree... would that work? |
what would "ask crosshair to reify them" look like? We basically are appending the symbolic values to a list by tracking them in If there was a magic function from crosshair we could call, where we give it a symbolic value and it returns the most-recently-reified (?) value for that symbolic, then I think that may work? But that's quite a tight coupling for what is supposedly a backend. |
I've finally got some time! Investigating now; please hold. |
Ok! I made another change on top of @tybug's changes (there was some other state on the provider that needed to be reset now that's it's persistent). And, things are looking good. I've even gotten a little saucy and ventured into non-primitives. Lists work. It's ineffective at dictionaries - something about how hypothesis is ensuring uniqueness is confounding crosshair, but I'm confident this is resolvable. |
aha, nice! |
Oh wow, I am so excited right now. Does it sound plausible that we could squash/rebase for a clean history, write some more tests, and release our clearly-marked-experimental version over the weekend? |
That sounds good to me. I think if we want to explicitly test against |
I hadn't. But just published something. All that said, I'm not sure we should run crosshair in hypothesis's tests yet. It still does some destructive things to the interpreter, and I am worried they'll cause you trouble ... if not now, sometime down the line. I don't know how feasible it would be to write a dummy provider that would attempt to ensure hypothesis continues to meet the expectations we've placed on it. Particularly tricky might be to ensure hypothesis doesn't access values without using the reification hook. At any rate, why don't I add the real integration tests on the plugin side ... and I'll set up a nightly build against hypothesis or something. |
We can run your integration tests as a not-required-to-pass CI job too, so we'll find out before we ship a breaking change. To be clear we might well do so anyway, I just dislike accidentally breaking things. |
Well, now I'm getting excited too. (though this weekend sounds ... ambitious to me!) I just cut CrossHair v0.0.50, with which we can generate hypothesis sets and dictionaries ~reasonably well. Though, I am noticing an assertion failure for |
thanks, also reproduces with: @settings(backend="crosshair")
@given(st.sets(st.integers()))
def test(d):
assert d != {42} Cause is hypothesis/hypothesis-python/src/hypothesis/strategies/_internal/strategies.py Lines 1015 to 1018 in 059357d
this is actually a bug in non-hypothesis backends that is surfacing in the mentioned assertion. We're checking Will attempt a fix tomorrow. (if you want a temporary hacky local fix, commenting that line should work unless you try some truly weird strategies.) |
I'm not sure this was doing anything since the buffer should be empty always
I don't have a clean resolution that preserves the condition, but I'm relatively confident we can safely remove it. Blame points to a revision where the loop was infinite, whereas we now limit to 3 iterations. Tested with: @st.composite
def nones(draw):
return draw(st.just(None))
b = False
@given(nones().filter(lambda x: b))
def f(n):
pass
f() (intentionally convoluted to trick our various intelligent rewriting rules). |
Well... the implementation looks good, and I've just tried it out:
so: anything left to do before we ship? |
Read through the diff again, and I'm happy with it! I'm afraid I'm not comfortable/efficient enough with squashing commits yet for that to be an efficient use of time, but no offense taken if someone else wants to rewrite history. I'll try to squash commits as I go in the future instead of pushing silly fixup commits. Also, very excited! Setting crosshair loose on tests and watching it find the needle in the haystack counterexample is super satisfying. |
OK - let's ship! I'm pretty sure the journey isn't over yet, but thanks so much to @tybug and @pschanely - I think this is one of the coolest things we've shipped in literally years, and I couldn't have done it without you 🥰 |
Well, even if I'm only starting the journey, I'm glad it's this one. Thank you both SO much. |
crosshair
backendcrosshair
backend
See #3086 for design and previous discussion.
Ping @pschanely - fyi there are some interface changes from the previous draft, both in how you register the backend and in that we now expect a global context manager which yields a per-test-case-ctx function.
I'm inclined to require #3801; it might slow us down a bit but I think will make converting the failing example to bytes practical, and from there it's a small step to database and shrinking support 😁