-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6.91.2: flaky tests/quality/test_discovery_ability.py::test_can_produce_multi_line_strings
#3829
Comments
Thanks for the report! This seems to have regressed in 6.91.2 (#3801). Checked by running the following until failure: We could just accept that the distribution has changed slightly and lower the probability required for a pass here. Regardless, I'll take a look at the distribution of |
Thanks Liam! I think lowing the probability (substantially) is a fine solution here; the important thing is that we're unlikely to miss bugs which only trigger on multi-line strings. |
This was caused by 9283da3 specifically. 6.91.1 is ~0.75 pass rate, and 6.91.2 is ~0.55 pass rate. Given the turnover rate of our distributions here, I'm not going to look much deeper here as long as our distributions aren't obviously incorrect. Our distributions are going to change again with #3818, for instance. Indeed the pass rate on that branch is back up to ~0.68. A brief distribution investigation: from hypothesis import *
from hypothesis.strategies import *
import matplotlib.pyplot as plot
small_ords = []
large_ords = []
max_ord = 1000
@given(text())
@settings(max_examples=20_000)
def f(s):
for c in s:
o = ord(c)
if o < max_ord:
small_ords.append(o)
else:
large_ords.append(o)
f()
print(f"small ords: {len(small_ords)}")
print(f"large ords: {len(large_ords)}")
plot.hist(small_ords, bins=max_ord // 2)
plot.show()
plot.hist(large_ords, bins=100)
plot.show()
# settings
# --------
# max_examples = 20_000
# max_ord = 1_000
#
# 6.91.2
# ------
# small ords: 122077
# large ords: 25508
#
# 6.91.1
# ------
# small ords: 83599
# large ords: 32362 which shows that actually 6.91.2 is more likely to generate small ords, at least in the long run. This contradicts the above ( I also looked at graphs for string length and ord distribution, and while they were slightly changed, nothing was obviously wrong. I'm going to decrease the probability and call it a day. Any time here is probably better spent working on the IR and obsoleting this distribution! |
tests/quality/test_discovery_ability.py::test_can_produce_multi_line_strings
tests/quality/test_discovery_ability.py::test_can_produce_multi_line_strings
Recently, I started to see intermittent test failures for
test_can_produce_multi_line_strings
in Debian package builds. I don't know if this is some sort of regression in version 6.92.2 or if this test has always been a bit flaky and merely became more likely to fail because Debian runs the test suite twice at the moment (Python 3.11 and Python 3.12).Example failure for Python 3.11:
Full Buildlog
Example failure for Python 3.12:
Full Buildlog
The text was updated successfully, but these errors were encountered: