Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add places argument to decimals strategy #508

Merged
merged 3 commits into from
May 19, 2017

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented Apr 12, 2017

This patch to the decimals strategy recognises that Decimal objects can be used in three distinct ways:

  • as essentially integer quantities, with a fixed decimal place (e.g. accounting in units of cents)
  • as rational numbers, where the divisor is a power of ten (i.e. variable number of places)
  • as a floating-point type supporting infinities and four kinds of NaN (of which only non-negative quiet NaN was previously generated)

It therefore uses an underlying integers strategy if places is specified, or fractions if not, and makes special values an orthogonal concern.

Closes #420.

I dropped the planned precision argument, as it was hard to implement correctly and didn't have a compelling use - we can always come back later, and users can already use contexts to control the precision when drawing.

@Zac-HD Zac-HD force-pushed the decimals-strategy branch from d3da646 to e969a4e Compare April 12, 2017 11:18
if max_value is None:
# Cap values at 999/1000 of the validity threshold
max_value = 10 ** (precision - places) - 10 ** (-places - 3)
elif max_value >= 10 ** (precision - places):
Copy link
Contributor

@mulkieran mulkieran Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this should be an error, just because precision with places make a tighter bound than max_value. I think it would be more interesting to worry if min_value is so high that its interaction with precision or places makes it impossible for strategy to generate any values or, similarly, if max_value is too low.

Copy link
Member Author

@Zac-HD Zac-HD Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you get an exception when drawing an impossible value, and suffer from a potentially severe filtering effect if I catch that for you. I though it was more efficient to ensure up front that the strategy wouldn't come pre-filtered.

min_value being high actually can't make it impossible to draw values - you just end up with lots of trailing zeros (before and/or after the point). And if max_value is really low, you just get lots of decimals rounded to zero - but that's what your combination of places and max_value asked for!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If min_value is 300 and precision is 2, what values do I get from the strategy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

300, 310, 320, ..., 1000, 1100, ..., etc. Add trailing zeros to fill out places.

That's the plan anyway, it appears that the C-style signal handling and context management is worse than I expected so mostly it's just throwing exceptions.

min_value=None, max_value=None, allow_nan=None, allow_infinity=None
):
def decimals(min_value=None, max_value=None, allow_nan=None,
allow_infinity=None, places=None, precision=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more general to have bounds on the places and precision, rather than single numbers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to construct an equivalent strategy either way (as I've written you'd use flatmap, otherwise set bounds to the same value), and I did consider it. The deciding factor for me was that I think users will want a specific level of precision more often than a bounded range, and three interacting boundary conditions felt like a really clumsy interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you go with this interface now, it will be impossible to add bounds later, so this is a permanent decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But likewise difficult to move from bounds to a single argument, I think - how about we defer this bit of design to @DRMacIver?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An analogous case where we have bounds is in the collection types, where we have min_size and max_size rather than allowing for a single bound. I've been thinking of adding a size parameter which has the same effect as min_size=size, max_size=size and is an error to mix with also providing the other two, but it's in the long tail of things I haven't quite got around to. Something like this would work here, but it would also work in the other direction, so I don't think it's actually that hard to move from one to the other.

Another point in favour of bounds is that we can potentially arrange things to shrink better than we'd get if the user did it themselves with flatmap - doing it with flatmap will probably shrink OK but not great. Even if we just use flatmap internally for now it opens up the option to change that later if and when it becomes a problem in practice.

So my inclination is marginally in favour of bounds with a possible optional fixed size parameter as a convenience feature, but I haven't thought about the problem domain in any great depth and this should definitely be taken as inclination and not an official decree.

@Zac-HD Zac-HD force-pushed the decimals-strategy branch 2 times, most recently from 0440d07 to cc53545 Compare April 12, 2017 13:46
@Zac-HD
Copy link
Member Author

Zac-HD commented Apr 12, 2017

OK, this really, really needs some more tests - signal handling has been much harder to get right than I expected. Will revisit in a while once I've had more time to think it through.

@DRMacIver
Copy link
Member

signal handling has been much harder to get right than I expected.

I'm kinda confused by this comment, but that's probably just because I don't understand the problem domain. Where do signals come into it? And are these literal signals (and if so what happens on windows?) or just signal style.

@Zac-HD
Copy link
Member Author

Zac-HD commented Apr 13, 2017

They're not unix signals, but called that in the IBM spec that Python's decimal module follows. Effects are appropriately (painfully) stateful, configurable as exceptions or to set a global or local variable, etc. Working out how to do the right thing instead of just hiding symptoms is will take some time for careful reading and thought.

@Zac-HD Zac-HD force-pushed the decimals-strategy branch from 9a291b5 to 77ef118 Compare May 3, 2017 14:05
@Zac-HD
Copy link
Member Author

Zac-HD commented May 3, 2017

Having taken some time away from this and thought about the implementation, I think there are basically three things this strategy should cover:

  • users of fixed-point decimals, eg simple financial things where it's always in cents
  • more general floating-point rational numbers in decimal form
  • special values, ie infinities and nans

The commits I've just pushed try to handle these cases. Comments appreciated - I'll write up covering tests and a changelog if the idea looks sound.

@Zac-HD Zac-HD changed the title Add places, precision arguments to decimals strategy Add places argument to decimals strategy May 4, 2017
@Zac-HD Zac-HD force-pushed the decimals-strategy branch 2 times, most recently from 718f9f5 to 80bb410 Compare May 4, 2017 11:30
@Zac-HD
Copy link
Member Author

Zac-HD commented May 4, 2017

@DRMacIver, IMO this is ready to merge after review.

@DRMacIver
Copy link
Member

this is ready to merge after review.

Cool. I'm quite busy this week, so probably won't get a chance to look at it properly until Saturday. Will let you know when I do!

@Zac-HD Zac-HD force-pushed the decimals-strategy branch 2 times, most recently from b6f2496 to da26d04 Compare May 10, 2017 04:24
@Zac-HD
Copy link
Member Author

Zac-HD commented May 10, 2017

@DRMacIver ping?

Copy link
Contributor

@mulkieran mulkieran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core computations look good to me, but I'm not a Python numerical expert.

It might be that a strategy that allows infinity becomes weighted too heavily toward infinity, but if that turns out to be true it can be dealt with later.

That's all I've got.

):
"""Generates instances of decimals.Decimal.
def decimals(min_value=None, max_value=None,
allow_nan=True, allow_infinity=None, places=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess default value for allow_infinity is None to fake a Maybe type, and the other possibilities are True and False, meaning Some(True) and Some(False). Maybe a further comment saying default is None, meaning accept the meaning of the other parameters would be helpful.

check_valid_integer(places)
if places is not None and places < 0:
raise InvalidArgument('cannot have negative number of places (%r)')
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why pass instead of raise an exception in these two. Maybe it would make sense to document what causes an exception to be raised.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Decimal(None) is a likely input, and we don't want to raise a TypeError. Anything else should either be invalid for another reason, caught in the integers() validation, or strange but legal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, None. It makes sense now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these would be more idiomatic as

if min_value is not None:
    min_value = Decimal(min_value)

and similar. I don't think it's unreasonable to raise a TypeError if given something that's not convertable to a decimal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm also kinda surprised flake8 doesn't make bare excepts into an error)

check_valid_bound(min_value, 'min_value')
check_valid_bound(max_value, 'max_value')
check_valid_interval(min_value, max_value, 'min_value', 'max_value')
if allow_infinity is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better:

if allow_infinity is None:
    allow_negative_infinity = min_value is not None
    allow_positive_infinity = max_value is not None
elif ....

and below, line 901:

if allow_positive_infinity:
    special.append(Decimal('Infinity'))
if allow_negative_infinity:
    special.append(Decimal('-Infinity'))

# Compose with sampled_from for infinities and NaNs as appropriate
special = []
if allow_nan:
special.extend(map(Decimal, 'NaN -NaN sNaN -sNaN'.split()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes a bit more programmatic sense to map over a tuple then to do string manipulations, as:
map(Decimal, ('NaN', '-NaN', 'sNaN', '-sNaN'))


@pytest.mark.parametrize('places', range(10))
def test_decimals_have_correct_places(places):
@given(decimals(0, 10, allow_nan=False, places=places))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this without range, like:

@given(decimals(allow_inf=False, allow_nan=False, places=places))

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - you'd get a strategy over any finite decimal. In practice, that's probably confined to the 64bit integer range due to the implementation of integers() drawing eight bytes.

if isinstance(min_value, Decimal) and not min_value.is_finite():
raise InvalidArgument('min_value=%r is not a finite bound' % min_value)
if isinstance(max_value, Decimal) and not max_value.is_finite():
raise InvalidArgument('min_value=%r is not a finite bound' % max_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste error right here.

Copy link
Member

@DRMacIver DRMacIver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally 👍 from me. A few small requests for changes.

check_valid_integer(places)
if places is not None and places < 0:
raise InvalidArgument('cannot have negative number of places (%r)')
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these would be more idiomatic as

if min_value is not None:
    min_value = Decimal(min_value)

and similar. I don't think it's unreasonable to raise a TypeError if given something that's not convertable to a decimal.

check_valid_integer(places)
if places is not None and places < 0:
raise InvalidArgument('cannot have negative number of places (%r)')
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm also kinda surprised flake8 doesn't make bare excepts into an error)

elif allow_infinity and min_value is not None and max_value is not None:
raise InvalidArgument('Cannot allow infinity between finite bounds')
# Set up a strategy for finite decimals
if places is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a request for changes, but I'm curious if you tried just quantizing after the value was generated and if so what problems that caused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty much what it does - draw an integer, multiply it by the decimal factor, quantize and emit None on error, filter out None.

@Zac-HD Zac-HD force-pushed the decimals-strategy branch from d9b0f34 to 6969896 Compare May 15, 2017 13:02
check_valid_bound(max_value, 'max_value')
check_valid_interval(min_value, max_value, 'min_value', 'max_value')
if allow_infinity is None:
allow_infinity = min_value is not None or max_value is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of a sudden, this looks backward to me, like it should be:
allow_infinity = min_value is None or max_value is None

max_value = Decimal(max_value)
if math.isinf(max_value) and max_value > 0:
max_value = None
if min_value and not min_value.is_finite():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get rid of these four lines, since, now that infinity is a valid bound, check_valid_bound does the right thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually still want to exclude min_value='infinity', because we won't be able to draw any finite values (and the infinity with an allowed sign was converted to None above). And check_valid_bound uses math.isnan, which is lovely but doesn't work for Decimal('sNaN'), which doesn't have a floating-point equivalent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that mean that check_valid_bound is a bit broken?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather, check_valid_bound is a bit broken, because it can not check the valid bounds for Decimals and check_valid_range is a bit broken as well, because it doesn't detect that ('Infiity', ) or (, '-Infinity') is a bad range.

Copy link
Member Author

@Zac-HD Zac-HD May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. After thinking some more, I've just made check_valid_range test for decimal NaN first, and if you can't draw any finite values that's what you asked for!

Edit: nope, because it breaks the underlying strategies. Back to this form.

@Zac-HD Zac-HD force-pushed the decimals-strategy branch 3 times, most recently from 13ab786 to 6d956b3 Compare May 15, 2017 14:03
Copy link
Contributor

@alexwlchan alexwlchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick skim.

- Not a Number, if ``allow_nan`` is truthy.
- Positive or negative infinity, if ``max_value`` and ``min_value``
respectively are None, and ``allow_infinity`` is not False. (None means
"allow infinity, unless excluded by the min and max values")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing full stop

# Convert min_value and max_value to Decimal values, and validate args
check_valid_integer(places)
if places is not None and places < 0:
raise InvalidArgument('cannot have negative number of places (%r)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever you meant to substitute into this string, you haven't.

check_valid_interval(min_value, max_value, 'min_value', 'max_value')
if allow_infinity is None:
allow_infinity = min_value is None or max_value is None
elif allow_infinity and min_value is not None and max_value is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add some parentheses for readability?

More generally, some line breaks in this block of validation code would make it slightly easier to read.

@Zac-HD Zac-HD force-pushed the decimals-strategy branch from 6d956b3 to 7ca7635 Compare May 16, 2017 05:12
@Zac-HD Zac-HD force-pushed the decimals-strategy branch 3 times, most recently from 9f770b5 to 11447a7 Compare May 16, 2017 09:39
- If both min_value and max_value are not None, it is an error to enable
allow_infinity.
- A finite rational number, between ``min_value`` and ``max_value``.
- Not a Number, if ``allow_nan`` is truthy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've talked about this before, but can you remind me why the behaviour for allow_nan and bounds is different for decimals than it is for floats? It still seems wrong to me that you can return NaN if you have a bound, because it breaks the invariant that the generated value is in those bounds, but I remember there being some sort of complication.

Copy link
Member Author

@Zac-HD Zac-HD May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was a matter of principle rather than implementation: NaN isn't outside bounds either. As NaN-handling is an easy source of bugs, and to minimise the interactions between settings, I thought it should be separate.

Easy to change if you disagree, since my scientific background makes me think of NaN as a significant value which maybe should be ignored for most cases?

Edit: consistency is a virtue, so I've reverted to the same default as floats. However, I'm not banning allow_nan=True with bounds - IMO this is a crucial data type that we should allow users to generate, and the floats validation should be loosened.

check_valid_interval(min_value, max_value, 'min_value', 'max_value')

if allow_infinity is None:
allow_infinity = (min_value is None) or (max_value is None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still puzzled why you summarize here, only to deconstruct further on.

New suggestion, have just:

if allow_infinity and (None not in (min_value, max_value)):
   raise ...

here, and at line 907:

if (allow_infinity is None and min_value is None) or allow_infinity:
    special.append(Decimal(-Infinity))
if (allow_infinity is None and max_value is None) or allow_infinity:
    special.append(Decimal(Infinity))

Also, there might be an error condition for allow_infinity = False which is missing, if, e.g., the max_value or min_value are specified as infinity or -infinity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a more elegant way of expressing it, thanks! And, sigh, I'll add that extra bit of validation.


if min_value is not None:
min_value = Decimal(min_value)
if min_value.is_infinite() and min_value < 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks funny if is_infinite() and is_finite() are contradictory.

In that case, what you've got is:

if A and B:
  // do stuff
elif A:
  // error

which could be:

if A:
  if B:
    // do stuff
  else:
    // error

If they are not contradictory, then it is is probably worth a comment.

But maybe conditional should be:

if min_value.is_infinite():
  if min_value == Decimal('-Infinity'):
    min_value = None
  else:
    raise ...

in that case, for greater precision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment - the not value.isfinite() could be infinity with the wrong sign, a "quiet" (float-like) NaN, or "signalling" NaN (any operation raises an exception).

Copy link
Contributor

@mulkieran mulkieran May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

if min_value == Decimal('-Infinity'):
  min_value = None
elif min_value == Decimal('Infinity') or min_value.is_nan():
  raise...

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing Decimal('sNaN') to anything is an error, so this won't work 😢

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops!

@Zac-HD Zac-HD force-pushed the decimals-strategy branch 3 times, most recently from cc2ae06 to 294fcd7 Compare May 19, 2017 01:01
Zac-HD added 3 commits May 19, 2017 12:40
Plus lots more validation and a change to default allow_nan and
allow_infinity arguments.
@Zac-HD Zac-HD force-pushed the decimals-strategy branch from 294fcd7 to c0206da Compare May 19, 2017 02:40
@Zac-HD
Copy link
Member Author

Zac-HD commented May 19, 2017

Ping @DRMacIver; I think this is ready to merge now!

I've commented some of the validation, and made allow_nan default to False if bounds are given.

if min_value is not None:
min_value = Decimal(min_value)
if min_value.is_infinite() and min_value < 0:
if not (allow_infinity or allow_infinity is None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applying DeMorgan's law might make this clearer.

special = []
if allow_nan or (allow_nan is None and (None in (min_value, max_value))):
special.extend(map(Decimal, ('NaN', '-NaN', 'sNaN', '-sNaN')))
if allow_infinity or (allow_infinity is max_value is None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that

allow_infinity is max_value is None

is always False, as is operator is left associative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked again, and None is None is None returns True, on both Python 2 and Python 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of Python's slightly silly comparison chaining rules. x is y is z means something completely different from (x is y) is z or x is (y is z).

It's a supposedly natural generalisation of the thing that makes x <= y <= z work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to change it? Because I'll do a lot to get this approved and off my list!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm fine with it, sorry, that was just an explanatory comment.

I need to go now but I'll do a hopefully final review of this in an hour or two. Sorry it's taken so long!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DRMacIver, I guess now I know why I never use that...

Copy link
Contributor

@mulkieran mulkieran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tired, too. Expect no more comments from me.

Copy link
Member

@DRMacIver DRMacIver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Thanks for all the hard work on this!

@mulkieran
Copy link
Contributor

Thank you!

@Tjorriemorrie
Copy link

Why is this called places and not precision (prec param)?

>>> from decimal import *
>>> getcontext().prec = 6

@Zac-HD
Copy link
Member Author

Zac-HD commented Aug 4, 2017

Why is this called places and not precision?

They're used interchangeably in the standard library docs, and it's intuitive that places measures digits after the decimal place while precision is less clear. Since merging we also have backwards-compatibility to think about, so it's unlikely to change now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate fixed point decimals
5 participants