Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add note to TypeError when each element to sampled_from is a strategy #3820

Merged

Conversation

vreuter
Copy link
Contributor

@vreuter vreuter commented Dec 22, 2023

Close #3819

@vreuter vreuter force-pushed the vr/warn-sampling-stategies-issue3819 branch 2 times, most recently from fb88761 to f647beb Compare December 22, 2023 14:22
@vreuter vreuter changed the title Warn when argument to sampled_from suggests one_of was intended Warn when argument to sampled_from suggests that one_of was intended Dec 22, 2023
@vreuter vreuter force-pushed the vr/warn-sampling-stategies-issue3819 branch 4 times, most recently from 0de010a to 3bbcf9b Compare December 22, 2023 19:32
@Zac-HD
Copy link
Member

Zac-HD commented Dec 22, 2023

Hi Vince - thanks so much for contributing!

Unfortunately I don't think that a warning is suitable here, because the user might well want to sample from a list of strategies (e.g. as part of a .flatmap() strategy).

However, we can still do something useful, by providing a note or suggestion if and only if the test failed - the implementation gets a bit more complicated, but in return we can reduce the false-alarm rate and avoid annoying users' whose tests are actually OK. The basic idea:

  • Instead of emitting a warning, set a boolean flag on the data object (somewhat like this)
  • If the test fails with a TypeError and the message contains the substring SearchStrategy, then add a note to the exception object. (we have a backport helper in core.py)

vreuter added a commit to vreuter/hypothesis that referenced this pull request Dec 26, 2023
@vreuter vreuter force-pushed the vr/warn-sampling-stategies-issue3819 branch from 81232a2 to a8a5635 Compare December 26, 2023 17:32
vreuter added a commit to vreuter/hypothesis that referenced this pull request Dec 26, 2023
@vreuter vreuter force-pushed the vr/warn-sampling-stategies-issue3819 branch from a8a5635 to 7706606 Compare December 26, 2023 17:33
vreuter added a commit to vreuter/hypothesis that referenced this pull request Dec 26, 2023
@vreuter vreuter force-pushed the vr/warn-sampling-stategies-issue3819 branch from 5472452 to 6993160 Compare December 27, 2023 00:35
@vreuter
Copy link
Contributor Author

vreuter commented Dec 27, 2023

However, we can still do something useful, by providing a note or suggestion if and only if the test failed - the implementation gets a bit more complicated, but in return we can reduce the false-alarm rate and avoid annoying users' whose tests are actually OK.

💯 agree

@Zac-HD I've moved the implementation more in line with this description (though setting a flag on the strategy instance rather than on the data object), narrowing the context in which this API use suggestion would be given, and lessening the severity (note on exception rather than a warning). WDYT?

@Zac-HD
Copy link
Member

Zac-HD commented Dec 27, 2023

Nice work!

Setting a flag on the data instance is going to be more annoying, but I think it's also necessary - with the current state, we'll only add the note if our sampled_from strategy is passed directly to @given(), but will miss e.g. st.lists(st.sampled_from(...)) or data.draw(st.sampled_from(...)). Implementation notes:

  • Instead of iterating over the list in SampledFromStrategy.__init__, let's add the check to SampledFromStrategy.do_draw - if isinstance(result, SearchStrategy) and <flag not yet set> and <all options are strategies>:
  • We'll then need to add the note somewhere where we have access to the data object - I'd suggest adding the note to e here
  • Let's test with indirect strategies and with st.data() as noted above

@vreuter
Copy link
Contributor Author

vreuter commented Dec 29, 2023

@Zac-HD I've updated the implementation to apply to the indirect and direct sampling cases, rather than just given and added some corresponding tests. I wound up needing to do a check in StateForActualGivenExecution.run_engine for loss of __notes__ when an error is reproduced, as I didn't see a better way/place to do it. WDYT?

I thought an even safer way would be to concatenate or take an order-preserving union of the potentially two collections of errors (from the original and from the reprod), but I wasn't sure if there's already a code path that would result in duplication, or if you envision one.

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I particularly like the new tests 😁

See comments below on the message details; I think this will be the last round of review before we merge 🤞

Comment on lines 292 to 297
try:
func_to_call()
except BaseException as e:
pytest.fail(f"Expected call to succeed but got error: {e}")
else:
assert True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
func_to_call()
except BaseException as e:
pytest.fail(f"Expected call to succeed but got error: {e}")
else:
assert True
func_to_call()

OK to fail the test with whatever unexpected exception we got, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I simplified here: ffe03c4

Comment on lines 531 to 536
if (
isinstance(result, SearchStrategy)
and not hasattr(data, "sampling_is_from_a_collection_of_strategies")
and all(isinstance(x, SearchStrategy) for x in self.elements)
):
data.sampling_is_from_a_collection_of_strategies = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (
isinstance(result, SearchStrategy)
and not hasattr(data, "sampling_is_from_a_collection_of_strategies")
and all(isinstance(x, SearchStrategy) for x in self.elements)
):
data.sampling_is_from_a_collection_of_strategies = True
if (
isinstance(result, SearchStrategy)
and all(isinstance(x, SearchStrategy) for x in self.elements)
):
data._sampled_from_strategy_reprs.append(repr(self))

If we collect this list (and initialize it in the __init__ method), we can show the exact repr of the problematic strategy(s), and I think that would make the note even more helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I insert a repr of the strategies here: 4948d08
I just did this as a single str value rather than a list. Is there a reason to append here (and create a list if doesn't already exist)?

Comment on lines 1167 to 1172
try:
notes = info._expected_exception.__notes__
except AttributeError:
pass
else:
e.__notes__ = notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably better to leave this out. While seeing the notes might be helpful, I'm concerned that moving the notes from the old exception might be confusing (especially if the new exception already has notes!).

Did you have a particular case in mind where it would be useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to update this another way, but with a removal of this and no other change, the "positive" tests (where the note should be available) don't pass. It seems to be because the code path where the note's currently added is not walked by the reproduction (which directly calls execute_one rather than doing so through _execute_once_for_engine). Is this acceptable? Am I missing something that should be evident?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That suggests to me that we might be adding the note in the wrong place... but I'm not sure exactly what the solution should be. I'll take a look soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this sense but wasn't sure how you'd prefer to alter the attachment strategy/placement, sorry about that. Looks good from your recent commit!

@vreuter
Copy link
Contributor Author

vreuter commented Jan 3, 2024

@Zac-HD any idea what that memory error that looks specific to Windows 11 may be? I thought it may be ephemeral, but it seems to be persistent.

@jobh
Copy link
Contributor

jobh commented Jan 3, 2024

@Zac-HD any idea what that memory error that looks specific to Windows 11 may be? I thought it may be ephemeral, but it seems to be persistent.

It looks like the repr has just become extremely long, for some reason, and exhausts available memory.

@vreuter
Copy link
Contributor Author

vreuter commented Jan 3, 2024

It looks like the repr has just become extremely long, for some reason, and exhausts available memory.

👌 I'm trying limiting the number of displayed elements, TY @jobh

@vreuter vreuter force-pushed the vr/warn-sampling-stategies-issue3819 branch from 4543005 to 699850f Compare January 3, 2024 15:25
@vreuter
Copy link
Contributor Author

vreuter commented Jan 3, 2024

The test currently failing is doing so because it's bailing out of the last 2 attempts of the 100 test runs to generate random text, as 47 + (100 - 98) = 49 < 50 required successes per the default spec of 0.5.

Why is this ( "> 1/2 of draws from text() should be multiline") being enforced as part of the contract of the strategy? Is this intended and/or reasonable?

Regardless, this test failure reproduces a very low fraction of the time for me locally. Could this contract either be relaxed, or the test rewritten in such a way that it's more stably reproduced? @Zac-HD @jobh

@vreuter
Copy link
Contributor Author

vreuter commented Jan 3, 2024

This failing test (check-quality) does not reproduce locally (macOS), running build.sh check-quality even after clearing the cache.

For the record, here's the output of the local run:

============================================ test session starts =============================================
platform darwin -- Python 3.10.13, pytest-7.4.3, pluggy-1.3.0
cachedir: .tox/quality/.pytest_cache
rootdir: ~/code/hypothesis
configfile: pytest.ini
plugins: hypothesis-6.92.1, xdist-3.5.0
12 workers [254 items]
...................................................................................................... [ 40%]
...................................................................................................... [ 80%]
..................................................                                                     [100%]
============================================ slowest 20 durations ============================================
88.20s call     hypothesis-python/tests/quality/test_float_shrinking.py::test_shrinks_downwards_to_integers
7.38s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_multiple_elements_in_silly_large_int_range
5.82s call     hypothesis-python/tests/quality/test_discovery_ability.py::test_can_produce_multi_line_strings
4.28s call     hypothesis-python/tests/quality/test_float_shrinking.py::test_shrinks_downwards_to_integers_when_fractional
3.43s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_multiple_elements_in_silly_large_int_range_min_is_not_dupe
2.95s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_find_large_union_list
2.93s call     hypothesis-python/tests/quality/test_discovery_ability.py::test_can_produce_large_binary_strings
2.62s call     hypothesis-python/tests/quality/test_deferred_strategies.py::test_non_trivial_json
2.57s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_can_ignore_left_hand_side_of_flatmap
2.26s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_dictionary[OrderedDict]
2.22s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_dictionary[dict]
1.93s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_long_list
1.84s call     hypothesis-python/tests/quality/test_discovery_ability.py::test_ints_can_occasionally_be_really_large
1.83s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_sets_of_sets
1.76s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_mixed_list
1.67s call     hypothesis-python/tests/quality/test_discovery_ability.py::test_can_produce_unstripped_strings
1.58s call     hypothesis-python/tests/quality/test_poisoned_trees.py::test_can_reduce_poison_from_any_subtree[15993493061449915028-10]
1.52s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_minimize_longer_list_of_strings
1.18s call     hypothesis-python/tests/quality/test_shrink_quality.py::test_duplicate_containment

(1 durations < 1s hidden.  Use -vv to show these durations.)
======================================= 254 passed in 95.14s (0:01:35) =======================================

@vreuter vreuter changed the title Warn when argument to sampled_from suggests that one_of was intended Add note to TypeError when each element to sampled_from is a strategy Jan 4, 2024
@tybug
Copy link
Member

tybug commented Jan 6, 2024

Regardless, this test failure reproduces a very low fraction of the time for me locally. Could this contract either be relaxed, or the test rewritten in such a way that it's more stably reproduced?

This is indeed a flaky test and was separately reported in #3829. Let's track it in that issue; you can ignore that test failure for now. I suspect its flakiness is a result of recent changes (not from this PR) and may be indicative of a deeper issue.

@vreuter
Copy link
Contributor Author

vreuter commented Jan 7, 2024

Regardless, this test failure reproduces a very low fraction of the time for me locally. Could this contract either be relaxed, or the test rewritten in such a way that it's more stably reproduced?

This is indeed a flaky test and was separately reported in #3829. Let's track it in that issue; you can ignore that test failure for now. I suspect its flakiness is a result of recent changes (not from this PR) and may be indicative of a deeper issue.

Much appreciated @tybug , sorry I'd not noticed the other report of that test flakiness. Sounds good, thanks!

@Zac-HD Zac-HD force-pushed the vr/warn-sampling-stategies-issue3819 branch from 9e67f68 to 82408d9 Compare January 8, 2024 12:05
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @vreuter!

@Zac-HD Zac-HD merged commit e2e5b4c into HypothesisWorks:master Jan 8, 2024
47 checks passed
clrpackages referenced this pull request in clearlinux-pkgs/pypi-hypothesis Jan 12, 2024
…version 6.92.6

CI on behalf of the Hypothesis team (1):
      Bump hypothesis-python version to 6.92.6 and update changelog

Vince Reuter (16):
      add warning all elements to sampled_from are strategies, suggestive of one_of; close #3819
      try catching expected warning in test contexts
      add explicit stacklevel argument to pass linting
      generate patch release description file
      remove warning-compensatory changes to test suite
      narrow the severity and cases for alternate API use suggestion; close #3819
      remove unused helper method, add param ids, work better with staticmethod
      make implementation valid for more use cases
      add indirect tests
      ensure that err reprod doesn't squash error notes
      add direct draw tests for #3819; condense and narrow assertion logic
      simplify no-error check in test; https://github.com/HypothesisWorks/hypothesis/pull/3820\#discussion_r1438466523
      more informative note for #3819
      ignore mypy check for new attr
      limit the number of elements shown for API use suggestion; #3819
      locally ignore undefined attribute for mypy

Zac Hatfield-Dodds (2):
      tweak formatting + move note
      remove unused ignore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn when passing *strategies* as elements to sampled_from (rather than to one_of)?
4 participants