-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add places argument to decimals strategy #508
Conversation
d3da646
to
e969a4e
Compare
src/hypothesis/strategies.py
Outdated
if max_value is None: | ||
# Cap values at 999/1000 of the validity threshold | ||
max_value = 10 ** (precision - places) - 10 ** (-places - 3) | ||
elif max_value >= 10 ** (precision - places): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this should be an error, just because precision
with places
make a tighter bound than max_value
. I think it would be more interesting to worry if min_value
is so high that its interaction with precision
or places
makes it impossible for strategy to generate any values or, similarly, if max_value
is too low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you get an exception when drawing an impossible value, and suffer from a potentially severe filtering effect if I catch that for you. I though it was more efficient to ensure up front that the strategy wouldn't come pre-filtered.
min_value
being high actually can't make it impossible to draw values - you just end up with lots of trailing zeros (before and/or after the point). And if max_value
is really low, you just get lots of decimals rounded to zero - but that's what your combination of places and max_value asked for!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If min_value
is 300 and precision
is 2, what values do I get from the strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
300, 310, 320, ..., 1000, 1100, ..., etc. Add trailing zeros to fill out places.
That's the plan anyway, it appears that the C-style signal handling and context management is worse than I expected so mostly it's just throwing exceptions.
src/hypothesis/strategies.py
Outdated
min_value=None, max_value=None, allow_nan=None, allow_infinity=None | ||
): | ||
def decimals(min_value=None, max_value=None, allow_nan=None, | ||
allow_infinity=None, places=None, precision=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more general to have bounds on the places and precision, rather than single numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to construct an equivalent strategy either way (as I've written you'd use flatmap, otherwise set bounds to the same value), and I did consider it. The deciding factor for me was that I think users will want a specific level of precision more often than a bounded range, and three interacting boundary conditions felt like a really clumsy interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you go with this interface now, it will be impossible to add bounds later, so this is a permanent decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But likewise difficult to move from bounds to a single argument, I think - how about we defer this bit of design to @DRMacIver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An analogous case where we have bounds is in the collection types, where we have min_size and max_size rather than allowing for a single bound. I've been thinking of adding a size parameter which has the same effect as min_size=size, max_size=size and is an error to mix with also providing the other two, but it's in the long tail of things I haven't quite got around to. Something like this would work here, but it would also work in the other direction, so I don't think it's actually that hard to move from one to the other.
Another point in favour of bounds is that we can potentially arrange things to shrink better than we'd get if the user did it themselves with flatmap - doing it with flatmap will probably shrink OK but not great. Even if we just use flatmap internally for now it opens up the option to change that later if and when it becomes a problem in practice.
So my inclination is marginally in favour of bounds with a possible optional fixed size parameter as a convenience feature, but I haven't thought about the problem domain in any great depth and this should definitely be taken as inclination and not an official decree.
0440d07
to
cc53545
Compare
OK, this really, really needs some more tests - signal handling has been much harder to get right than I expected. Will revisit in a while once I've had more time to think it through. |
I'm kinda confused by this comment, but that's probably just because I don't understand the problem domain. Where do signals come into it? And are these literal signals (and if so what happens on windows?) or just signal style. |
They're not unix signals, but called that in the IBM spec that Python's decimal module follows. Effects are appropriately (painfully) stateful, configurable as exceptions or to set a global or local variable, etc. Working out how to do the right thing instead of just hiding symptoms is will take some time for careful reading and thought. |
Having taken some time away from this and thought about the implementation, I think there are basically three things this strategy should cover:
The commits I've just pushed try to handle these cases. Comments appreciated - I'll write up covering tests and a changelog if the idea looks sound. |
718f9f5
to
80bb410
Compare
@DRMacIver, IMO this is ready to merge after review. |
Cool. I'm quite busy this week, so probably won't get a chance to look at it properly until Saturday. Will let you know when I do! |
b6f2496
to
da26d04
Compare
@DRMacIver ping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core computations look good to me, but I'm not a Python numerical expert.
It might be that a strategy that allows infinity becomes weighted too heavily toward infinity, but if that turns out to be true it can be dealt with later.
That's all I've got.
src/hypothesis/strategies.py
Outdated
): | ||
"""Generates instances of decimals.Decimal. | ||
def decimals(min_value=None, max_value=None, | ||
allow_nan=True, allow_infinity=None, places=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess default value for allow_infinity
is None to fake a Maybe type, and the other possibilities are True and False, meaning Some(True) and Some(False). Maybe a further comment saying default is None, meaning accept the meaning of the other parameters would be helpful.
src/hypothesis/strategies.py
Outdated
check_valid_integer(places) | ||
if places is not None and places < 0: | ||
raise InvalidArgument('cannot have negative number of places (%r)') | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why pass
instead of raise an exception in these two. Maybe it would make sense to document what causes an exception to be raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Decimal(None)
is a likely input, and we don't want to raise a TypeError. Anything else should either be invalid for another reason, caught in the integers() validation, or strange but legal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, None
. It makes sense now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like these would be more idiomatic as
if min_value is not None:
min_value = Decimal(min_value)
and similar. I don't think it's unreasonable to raise a TypeError if given something that's not convertable to a decimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm also kinda surprised flake8 doesn't make bare excepts into an error)
src/hypothesis/strategies.py
Outdated
check_valid_bound(min_value, 'min_value') | ||
check_valid_bound(max_value, 'max_value') | ||
check_valid_interval(min_value, max_value, 'min_value', 'max_value') | ||
if allow_infinity is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better:
if allow_infinity is None:
allow_negative_infinity = min_value is not None
allow_positive_infinity = max_value is not None
elif ....
and below, line 901:
if allow_positive_infinity:
special.append(Decimal('Infinity'))
if allow_negative_infinity:
special.append(Decimal('-Infinity'))
src/hypothesis/strategies.py
Outdated
# Compose with sampled_from for infinities and NaNs as appropriate | ||
special = [] | ||
if allow_nan: | ||
special.extend(map(Decimal, 'NaN -NaN sNaN -sNaN'.split())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes a bit more programmatic sense to map over a tuple then to do string manipulations, as:
map(Decimal, ('NaN', '-NaN', 'sNaN', '-sNaN'))
|
||
@pytest.mark.parametrize('places', range(10)) | ||
def test_decimals_have_correct_places(places): | ||
@given(decimals(0, 10, allow_nan=False, places=places)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do this without range, like:
@given(decimals(allow_inf=False, allow_nan=False, places=places))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - you'd get a strategy over any finite decimal. In practice, that's probably confined to the 64bit integer range due to the implementation of integers()
drawing eight bytes.
src/hypothesis/strategies.py
Outdated
if isinstance(min_value, Decimal) and not min_value.is_finite(): | ||
raise InvalidArgument('min_value=%r is not a finite bound' % min_value) | ||
if isinstance(max_value, Decimal) and not max_value.is_finite(): | ||
raise InvalidArgument('min_value=%r is not a finite bound' % max_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste error right here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally 👍 from me. A few small requests for changes.
src/hypothesis/strategies.py
Outdated
check_valid_integer(places) | ||
if places is not None and places < 0: | ||
raise InvalidArgument('cannot have negative number of places (%r)') | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like these would be more idiomatic as
if min_value is not None:
min_value = Decimal(min_value)
and similar. I don't think it's unreasonable to raise a TypeError if given something that's not convertable to a decimal.
src/hypothesis/strategies.py
Outdated
check_valid_integer(places) | ||
if places is not None and places < 0: | ||
raise InvalidArgument('cannot have negative number of places (%r)') | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm also kinda surprised flake8 doesn't make bare excepts into an error)
src/hypothesis/strategies.py
Outdated
elif allow_infinity and min_value is not None and max_value is not None: | ||
raise InvalidArgument('Cannot allow infinity between finite bounds') | ||
# Set up a strategy for finite decimals | ||
if places is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a request for changes, but I'm curious if you tried just quantizing after the value was generated and if so what problems that caused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty much what it does - draw an integer, multiply it by the decimal factor, quantize and emit None on error, filter out None.
src/hypothesis/strategies.py
Outdated
check_valid_bound(max_value, 'max_value') | ||
check_valid_interval(min_value, max_value, 'min_value', 'max_value') | ||
if allow_infinity is None: | ||
allow_infinity = min_value is not None or max_value is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of a sudden, this looks backward to me, like it should be:
allow_infinity = min_value is None or max_value is None
src/hypothesis/strategies.py
Outdated
max_value = Decimal(max_value) | ||
if math.isinf(max_value) and max_value > 0: | ||
max_value = None | ||
if min_value and not min_value.is_finite(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can get rid of these four lines, since, now that infinity is a valid bound, check_valid_bound
does the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually still want to exclude min_value='infinity'
, because we won't be able to draw any finite values (and the infinity with an allowed sign was converted to None above). And check_valid_bound
uses math.isnan
, which is lovely but doesn't work for Decimal('sNaN')
, which doesn't have a floating-point equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that mean that check_valid_bound
is a bit broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather, check_valid_bound
is a bit broken, because it can not check the valid bounds for Decimals and check_valid_range
is a bit broken as well, because it doesn't detect that ('Infiity', ) or (, '-Infinity') is a bad range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. After thinking some more, I've just made check_valid_range
test for decimal NaN first, and if you can't draw any finite values that's what you asked for!
Edit: nope, because it breaks the underlying strategies. Back to this form.
13ab786
to
6d956b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick skim.
src/hypothesis/strategies.py
Outdated
- Not a Number, if ``allow_nan`` is truthy. | ||
- Positive or negative infinity, if ``max_value`` and ``min_value`` | ||
respectively are None, and ``allow_infinity`` is not False. (None means | ||
"allow infinity, unless excluded by the min and max values") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing full stop
src/hypothesis/strategies.py
Outdated
# Convert min_value and max_value to Decimal values, and validate args | ||
check_valid_integer(places) | ||
if places is not None and places < 0: | ||
raise InvalidArgument('cannot have negative number of places (%r)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever you meant to substitute into this string, you haven't.
src/hypothesis/strategies.py
Outdated
check_valid_interval(min_value, max_value, 'min_value', 'max_value') | ||
if allow_infinity is None: | ||
allow_infinity = min_value is None or max_value is None | ||
elif allow_infinity and min_value is not None and max_value is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add some parentheses for readability?
More generally, some line breaks in this block of validation code would make it slightly easier to read.
9f770b5
to
11447a7
Compare
src/hypothesis/strategies.py
Outdated
- If both min_value and max_value are not None, it is an error to enable | ||
allow_infinity. | ||
- A finite rational number, between ``min_value`` and ``max_value``. | ||
- Not a Number, if ``allow_nan`` is truthy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've talked about this before, but can you remind me why the behaviour for allow_nan and bounds is different for decimals than it is for floats? It still seems wrong to me that you can return NaN if you have a bound, because it breaks the invariant that the generated value is in those bounds, but I remember there being some sort of complication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was a matter of principle rather than implementation: NaN isn't outside bounds either. As NaN-handling is an easy source of bugs, and to minimise the interactions between settings, I thought it should be separate.
Easy to change if you disagree, since my scientific background makes me think of NaN as a significant value which maybe should be ignored for most cases?
Edit: consistency is a virtue, so I've reverted to the same default as floats. However, I'm not banning allow_nan=True
with bounds - IMO this is a crucial data type that we should allow users to generate, and the floats validation should be loosened.
src/hypothesis/strategies.py
Outdated
check_valid_interval(min_value, max_value, 'min_value', 'max_value') | ||
|
||
if allow_infinity is None: | ||
allow_infinity = (min_value is None) or (max_value is None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still puzzled why you summarize here, only to deconstruct further on.
New suggestion, have just:
if allow_infinity and (None not in (min_value, max_value)):
raise ...
here, and at line 907:
if (allow_infinity is None and min_value is None) or allow_infinity:
special.append(Decimal(-Infinity))
if (allow_infinity is None and max_value is None) or allow_infinity:
special.append(Decimal(Infinity))
Also, there might be an error condition for allow_infinity
= False
which is missing, if, e.g., the max_value
or min_value
are specified as infinity or -infinity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a more elegant way of expressing it, thanks! And, sigh, I'll add that extra bit of validation.
|
||
if min_value is not None: | ||
min_value = Decimal(min_value) | ||
if min_value.is_infinite() and min_value < 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks funny if is_infinite()
and is_finite()
are contradictory.
In that case, what you've got is:
if A and B:
// do stuff
elif A:
// error
which could be:
if A:
if B:
// do stuff
else:
// error
If they are not contradictory, then it is is probably worth a comment.
But maybe conditional should be:
if min_value.is_infinite():
if min_value == Decimal('-Infinity'):
min_value = None
else:
raise ...
in that case, for greater precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment - the not value.isfinite()
could be infinity with the wrong sign, a "quiet" (float-like) NaN, or "signalling" NaN (any operation raises an exception).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
if min_value == Decimal('-Infinity'):
min_value = None
elif min_value == Decimal('Infinity') or min_value.is_nan():
raise...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing Decimal('sNaN')
to anything is an error, so this won't work 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops!
cc2ae06
to
294fcd7
Compare
Plus lots more validation and a change to default allow_nan and allow_infinity arguments.
Ping @DRMacIver; I think this is ready to merge now! I've commented some of the validation, and made |
if min_value is not None: | ||
min_value = Decimal(min_value) | ||
if min_value.is_infinite() and min_value < 0: | ||
if not (allow_infinity or allow_infinity is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying DeMorgan's law might make this clearer.
special = [] | ||
if allow_nan or (allow_nan is None and (None in (min_value, max_value))): | ||
special.extend(map(Decimal, ('NaN', '-NaN', 'sNaN', '-sNaN'))) | ||
if allow_infinity or (allow_infinity is max_value is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that
allow_infinity is max_value is None
is always False, as is
operator is left associative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked again, and None is None is None
returns True
, on both Python 2 and Python 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of Python's slightly silly comparison chaining rules. x is y is z
means something completely different from (x is y) is z
or x is (y is z)
.
It's a supposedly natural generalisation of the thing that makes x <= y <= z
work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to change it? Because I'll do a lot to get this approved and off my list!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm fine with it, sorry, that was just an explanatory comment.
I need to go now but I'll do a hopefully final review of this in an hour or two. Sorry it's taken so long!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DRMacIver, I guess now I know why I never use that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tired, too. Expect no more comments from me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 Thanks for all the hard work on this!
Thank you! |
Why is this called places and not precision (
|
They're used interchangeably in the standard library docs, and it's intuitive that |
This patch to the
decimals
strategy recognises that Decimal objects can be used in three distinct ways:NaN
(of which only non-negative quiet NaN was previously generated)It therefore uses an underlying integers strategy if places is specified, or fractions if not, and makes special values an orthogonal concern.
Closes #420.
I dropped the planned
precision
argument, as it was hard to implement correctly and didn't have a compelling use - we can always come back later, and users can already use contexts to control the precision when drawing.