-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add note to TypeError when each element to sampled_from
is a strategy
#3820
Add note to TypeError when each element to sampled_from
is a strategy
#3820
Conversation
fb88761
to
f647beb
Compare
sampled_from
suggests that one_of
was intended
0de010a
to
3bbcf9b
Compare
Hi Vince - thanks so much for contributing! Unfortunately I don't think that a warning is suitable here, because the user might well want to sample from a list of strategies (e.g. as part of a However, we can still do something useful, by providing a note or suggestion if and only if the test failed - the implementation gets a bit more complicated, but in return we can reduce the false-alarm rate and avoid annoying users' whose tests are actually OK. The basic idea:
|
…ypothesisWorks#3819 Revised per suggestion here: HypothesisWorks#3820 (comment)
81232a2
to
a8a5635
Compare
…ypothesisWorks#3819 Revised per suggestion here: HypothesisWorks#3820 (comment)
a8a5635
to
7706606
Compare
…ypothesisWorks#3819 Revised per suggestion here: HypothesisWorks#3820 (comment)
5472452
to
6993160
Compare
💯 agree @Zac-HD I've moved the implementation more in line with this description (though setting a flag on the strategy instance rather than on the data object), narrowing the context in which this API use suggestion would be given, and lessening the severity (note on exception rather than a warning). WDYT? |
Nice work! Setting a flag on the data instance is going to be more annoying, but I think it's also necessary - with the current state, we'll only add the note if our
|
@Zac-HD I've updated the implementation to apply to the indirect and direct sampling cases, rather than just I thought an even safer way would be to concatenate or take an order-preserving union of the potentially two collections of errors (from the original and from the reprod), but I wasn't sure if there's already a code path that would result in duplication, or if you envision one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I particularly like the new tests 😁
See comments below on the message details; I think this will be the last round of review before we merge 🤞
try: | ||
func_to_call() | ||
except BaseException as e: | ||
pytest.fail(f"Expected call to succeed but got error: {e}") | ||
else: | ||
assert True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try: | |
func_to_call() | |
except BaseException as e: | |
pytest.fail(f"Expected call to succeed but got error: {e}") | |
else: | |
assert True | |
func_to_call() |
OK to fail the test with whatever unexpected exception we got, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I simplified here: ffe03c4
if ( | ||
isinstance(result, SearchStrategy) | ||
and not hasattr(data, "sampling_is_from_a_collection_of_strategies") | ||
and all(isinstance(x, SearchStrategy) for x in self.elements) | ||
): | ||
data.sampling_is_from_a_collection_of_strategies = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ( | |
isinstance(result, SearchStrategy) | |
and not hasattr(data, "sampling_is_from_a_collection_of_strategies") | |
and all(isinstance(x, SearchStrategy) for x in self.elements) | |
): | |
data.sampling_is_from_a_collection_of_strategies = True | |
if ( | |
isinstance(result, SearchStrategy) | |
and all(isinstance(x, SearchStrategy) for x in self.elements) | |
): | |
data._sampled_from_strategy_reprs.append(repr(self)) |
If we collect this list (and initialize it in the __init__
method), we can show the exact repr of the problematic strategy(s), and I think that would make the note even more helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I insert a repr of the strategies here: 4948d08
I just did this as a single str
value rather than a list. Is there a reason to append
here (and create a list if doesn't already exist)?
try: | ||
notes = info._expected_exception.__notes__ | ||
except AttributeError: | ||
pass | ||
else: | ||
e.__notes__ = notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably better to leave this out. While seeing the notes might be helpful, I'm concerned that moving the notes from the old exception might be confusing (especially if the new exception already has notes!).
Did you have a particular case in mind where it would be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to update this another way, but with a removal of this and no other change, the "positive" tests (where the note should be available) don't pass. It seems to be because the code path where the note's currently added is not walked by the reproduction (which directly calls execute_one
rather than doing so through _execute_once_for_engine
). Is this acceptable? Am I missing something that should be evident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That suggests to me that we might be adding the note in the wrong place... but I'm not sure exactly what the solution should be. I'll take a look soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this sense but wasn't sure how you'd prefer to alter the attachment strategy/placement, sorry about that. Looks good from your recent commit!
@Zac-HD any idea what that memory error that looks specific to Windows 11 may be? I thought it may be ephemeral, but it seems to be persistent. |
It looks like the |
👌 I'm trying limiting the number of displayed elements, TY @jobh |
4543005
to
699850f
Compare
The test currently failing is doing so because it's bailing out of the last 2 attempts of the 100 test runs to generate random text, as 47 + (100 - 98) = 49 < 50 required successes per the default spec of 0.5. Why is this ( "> 1/2 of draws from Regardless, this test failure reproduces a very low fraction of the time for me locally. Could this contract either be relaxed, or the test rewritten in such a way that it's more stably reproduced? @Zac-HD @jobh |
This failing test (check-quality) does not reproduce locally (macOS), running For the record, here's the output of the local run: ============================================ test session starts =============================================
platform darwin -- Python 3.10.13, pytest-7.4.3, pluggy-1.3.0
cachedir: .tox/quality/.pytest_cache
rootdir: ~/code/hypothesis
configfile: pytest.ini
plugins: hypothesis-6.92.1, xdist-3.5.0
12 workers [254 items]
...................................................................................................... [ 40%]
...................................................................................................... [ 80%]
.................................................. [100%]
============================================ slowest 20 durations ============================================
88.20s call hypothesis-python/tests/quality/test_float_shrinking.py::test_shrinks_downwards_to_integers
7.38s call hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_multiple_elements_in_silly_large_int_range
5.82s call hypothesis-python/tests/quality/test_discovery_ability.py::test_can_produce_multi_line_strings
4.28s call hypothesis-python/tests/quality/test_float_shrinking.py::test_shrinks_downwards_to_integers_when_fractional
3.43s call hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_multiple_elements_in_silly_large_int_range_min_is_not_dupe
2.95s call hypothesis-python/tests/quality/test_shrink_quality.py::test_find_large_union_list
2.93s call hypothesis-python/tests/quality/test_discovery_ability.py::test_can_produce_large_binary_strings
2.62s call hypothesis-python/tests/quality/test_deferred_strategies.py::test_non_trivial_json
2.57s call hypothesis-python/tests/quality/test_shrink_quality.py::test_can_ignore_left_hand_side_of_flatmap
2.26s call hypothesis-python/tests/quality/test_shrink_quality.py::test_dictionary[OrderedDict]
2.22s call hypothesis-python/tests/quality/test_shrink_quality.py::test_dictionary[dict]
1.93s call hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_long_list
1.84s call hypothesis-python/tests/quality/test_discovery_ability.py::test_ints_can_occasionally_be_really_large
1.83s call hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_sets_of_sets
1.76s call hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_mixed_list
1.67s call hypothesis-python/tests/quality/test_discovery_ability.py::test_can_produce_unstripped_strings
1.58s call hypothesis-python/tests/quality/test_poisoned_trees.py::test_can_reduce_poison_from_any_subtree[15993493061449915028-10]
1.52s call hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_longer_list_of_strings
1.18s call hypothesis-python/tests/quality/test_shrink_quality.py::test_duplicate_containment
(1 durations < 1s hidden. Use -vv to show these durations.)
======================================= 254 passed in 95.14s (0:01:35) ======================================= |
sampled_from
suggests that one_of
was intendedsampled_from
is a strategy
This is indeed a flaky test and was separately reported in #3829. Let's track it in that issue; you can ignore that test failure for now. I suspect its flakiness is a result of recent changes (not from this PR) and may be indicative of a deeper issue. |
Much appreciated @tybug , sorry I'd not noticed the other report of that test flakiness. Sounds good, thanks! |
…ypothesisWorks#3819 Revised per suggestion here: HypothesisWorks#3820 (comment)
9e67f68
to
82408d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @vreuter!
…version 6.92.6 CI on behalf of the Hypothesis team (1): Bump hypothesis-python version to 6.92.6 and update changelog Vince Reuter (16): add warning all elements to sampled_from are strategies, suggestive of one_of; close #3819 try catching expected warning in test contexts add explicit stacklevel argument to pass linting generate patch release description file remove warning-compensatory changes to test suite narrow the severity and cases for alternate API use suggestion; close #3819 remove unused helper method, add param ids, work better with staticmethod make implementation valid for more use cases add indirect tests ensure that err reprod doesn't squash error notes add direct draw tests for #3819; condense and narrow assertion logic simplify no-error check in test; https://github.com/HypothesisWorks/hypothesis/pull/3820\#discussion_r1438466523 more informative note for #3819 ignore mypy check for new attr limit the number of elements shown for API use suggestion; #3819 locally ignore undefined attribute for mypy Zac Hatfield-Dodds (2): tweak formatting + move note remove unused ignore
Close #3819