Add places argument to decimals strategy #508

Zac-HD · 2017-04-12T10:49:32Z

This patch to the decimals strategy recognises that Decimal objects can be used in three distinct ways:

as essentially integer quantities, with a fixed decimal place (e.g. accounting in units of cents)
as rational numbers, where the divisor is a power of ten (i.e. variable number of places)
as a floating-point type supporting infinities and four kinds of NaN (of which only non-negative quiet NaN was previously generated)

It therefore uses an underlying integers strategy if places is specified, or fractions if not, and makes special values an orthogonal concern.

Closes #420.

I dropped the planned precision argument, as it was hard to implement correctly and didn't have a compelling use - we can always come back later, and users can already use contexts to control the precision when drawing.

mulkieran · 2017-04-12T12:51:44Z

src/hypothesis/strategies.py

+        if max_value is None:
+            # Cap values at 999/1000 of the validity threshold
+            max_value = 10 ** (precision - places) - 10 ** (-places - 3)
+        elif max_value >= 10 ** (precision - places):


I don't see why this should be an error, just because precision with places make a tighter bound than max_value. I think it would be more interesting to worry if min_value is so high that its interaction with precision or places makes it impossible for strategy to generate any values or, similarly, if max_value is too low.

Well, you get an exception when drawing an impossible value, and suffer from a potentially severe filtering effect if I catch that for you. I though it was more efficient to ensure up front that the strategy wouldn't come pre-filtered.

min_value being high actually can't make it impossible to draw values - you just end up with lots of trailing zeros (before and/or after the point). And if max_value is really low, you just get lots of decimals rounded to zero - but that's what your combination of places and max_value asked for!

If min_value is 300 and precision is 2, what values do I get from the strategy?

300, 310, 320, ..., 1000, 1100, ..., etc. Add trailing zeros to fill out places.

That's the plan anyway, it appears that the C-style signal handling and context management is worse than I expected so mostly it's just throwing exceptions.

mulkieran · 2017-04-12T12:54:48Z

src/hypothesis/strategies.py

-    min_value=None, max_value=None, allow_nan=None, allow_infinity=None
-):
+def decimals(min_value=None, max_value=None, allow_nan=None,
+             allow_infinity=None, places=None, precision=None):


It would be more general to have bounds on the places and precision, rather than single numbers.

It's possible to construct an equivalent strategy either way (as I've written you'd use flatmap, otherwise set bounds to the same value), and I did consider it. The deciding factor for me was that I think users will want a specific level of precision more often than a bounded range, and three interacting boundary conditions felt like a really clumsy interface.

I think if you go with this interface now, it will be impossible to add bounds later, so this is a permanent decision.

But likewise difficult to move from bounds to a single argument, I think - how about we defer this bit of design to @DRMacIver?

An analogous case where we have bounds is in the collection types, where we have min_size and max_size rather than allowing for a single bound. I've been thinking of adding a size parameter which has the same effect as min_size=size, max_size=size and is an error to mix with also providing the other two, but it's in the long tail of things I haven't quite got around to. Something like this would work here, but it would also work in the other direction, so I don't think it's actually that hard to move from one to the other.

Another point in favour of bounds is that we can potentially arrange things to shrink better than we'd get if the user did it themselves with flatmap - doing it with flatmap will probably shrink OK but not great. Even if we just use flatmap internally for now it opens up the option to change that later if and when it becomes a problem in practice.

So my inclination is marginally in favour of bounds with a possible optional fixed size parameter as a convenience feature, but I haven't thought about the problem domain in any great depth and this should definitely be taken as inclination and not an official decree.

Zac-HD · 2017-04-12T16:11:28Z

OK, this really, really needs some more tests - signal handling has been much harder to get right than I expected. Will revisit in a while once I've had more time to think it through.

DRMacIver · 2017-04-12T16:29:03Z

signal handling has been much harder to get right than I expected.

I'm kinda confused by this comment, but that's probably just because I don't understand the problem domain. Where do signals come into it? And are these literal signals (and if so what happens on windows?) or just signal style.

Zac-HD · 2017-04-13T13:55:22Z

They're not unix signals, but called that in the IBM spec that Python's decimal module follows. Effects are appropriately (painfully) stateful, configurable as exceptions or to set a global or local variable, etc. Working out how to do the right thing instead of just hiding symptoms is will take some time for careful reading and thought.

Zac-HD · 2017-05-03T14:08:56Z

Having taken some time away from this and thought about the implementation, I think there are basically three things this strategy should cover:

users of fixed-point decimals, eg simple financial things where it's always in cents
more general floating-point rational numbers in decimal form
special values, ie infinities and nans

The commits I've just pushed try to handle these cases. Comments appreciated - I'll write up covering tests and a changelog if the idea looks sound.

Zac-HD · 2017-05-04T12:32:34Z

@DRMacIver, IMO this is ready to merge after review.

DRMacIver · 2017-05-04T12:33:46Z

this is ready to merge after review.

Cool. I'm quite busy this week, so probably won't get a chance to look at it properly until Saturday. Will let you know when I do!

Zac-HD · 2017-05-10T04:55:41Z

@DRMacIver ping?

mulkieran

The core computations look good to me, but I'm not a Python numerical expert.

It might be that a strategy that allows infinity becomes weighted too heavily toward infinity, but if that turns out to be true it can be dealt with later.

That's all I've got.

mulkieran · 2017-05-10T12:21:19Z

src/hypothesis/strategies.py

-):
-    """Generates instances of decimals.Decimal.
+def decimals(min_value=None, max_value=None,
+             allow_nan=True, allow_infinity=None, places=None):


I guess default value for allow_infinity is None to fake a Maybe type, and the other possibilities are True and False, meaning Some(True) and Some(False). Maybe a further comment saying default is None, meaning accept the meaning of the other parameters would be helpful.

mulkieran · 2017-05-10T12:24:35Z

src/hypothesis/strategies.py

+    check_valid_integer(places)
+    if places is not None and places < 0:
+        raise InvalidArgument('cannot have negative number of places (%r)')
+    try:


I'm not sure why pass instead of raise an exception in these two. Maybe it would make sense to document what causes an exception to be raised.

Because Decimal(None) is a likely input, and we don't want to raise a TypeError. Anything else should either be invalid for another reason, caught in the integers() validation, or strange but legal.

Right, None. It makes sense now.

I feel like these would be more idiomatic as

if min_value is not None: min_value = Decimal(min_value)

and similar. I don't think it's unreasonable to raise a TypeError if given something that's not convertable to a decimal.

(I'm also kinda surprised flake8 doesn't make bare excepts into an error)

mulkieran · 2017-05-10T12:36:43Z

src/hypothesis/strategies.py

+    check_valid_bound(min_value, 'min_value')
+    check_valid_bound(max_value, 'max_value')
+    check_valid_interval(min_value, max_value, 'min_value', 'max_value')
+    if allow_infinity is None:


Maybe better:

if allow_infinity is None: allow_negative_infinity = min_value is not None allow_positive_infinity = max_value is not None elif ....

and below, line 901:

if allow_positive_infinity: special.append(Decimal('Infinity')) if allow_negative_infinity: special.append(Decimal('-Infinity'))

mulkieran · 2017-05-10T12:40:21Z

src/hypothesis/strategies.py

+    # Compose with sampled_from for infinities and NaNs as appropriate
+    special = []
+    if allow_nan:
+        special.extend(map(Decimal, 'NaN -NaN sNaN -sNaN'.split()))


It makes a bit more programmatic sense to map over a tuple then to do string manipulations, as:
map(Decimal, ('NaN', '-NaN', 'sNaN', '-sNaN'))

mulkieran · 2017-05-10T13:07:46Z

tests/cover/test_numerics.py

+
+@pytest.mark.parametrize('places', range(10))
+def test_decimals_have_correct_places(places):
+    @given(decimals(0, 10, allow_nan=False, places=places))


Can you do this without range, like:

@given(decimals(allow_inf=False, allow_nan=False, places=places))

?

Yep - you'd get a strategy over any finite decimal. In practice, that's probably confined to the 64bit integer range due to the implementation of integers() drawing eight bytes.

mulkieran · 2017-05-10T13:50:04Z

src/hypothesis/strategies.py

+    if isinstance(min_value, Decimal) and not min_value.is_finite():
+        raise InvalidArgument('min_value=%r is not a finite bound' % min_value)
+    if isinstance(max_value, Decimal) and not max_value.is_finite():
+        raise InvalidArgument('min_value=%r is not a finite bound' % max_value)


Copy-paste error right here.

DRMacIver

Generally 👍 from me. A few small requests for changes.

DRMacIver · 2017-05-15T07:52:51Z

src/hypothesis/strategies.py

+    check_valid_integer(places)
+    if places is not None and places < 0:
+        raise InvalidArgument('cannot have negative number of places (%r)')
+    try:


I feel like these would be more idiomatic as

if min_value is not None: min_value = Decimal(min_value)

and similar. I don't think it's unreasonable to raise a TypeError if given something that's not convertable to a decimal.

DRMacIver · 2017-05-15T07:53:12Z

src/hypothesis/strategies.py

+    check_valid_integer(places)
+    if places is not None and places < 0:
+        raise InvalidArgument('cannot have negative number of places (%r)')
+    try:


(I'm also kinda surprised flake8 doesn't make bare excepts into an error)

DRMacIver · 2017-05-15T08:12:30Z

src/hypothesis/strategies.py

+    elif allow_infinity and min_value is not None and max_value is not None:
+        raise InvalidArgument('Cannot allow infinity between finite bounds')
+    # Set up a strategy for finite decimals
+    if places is not None:


This is not a request for changes, but I'm curious if you tried just quantizing after the value was generated and if so what problems that caused.

That's pretty much what it does - draw an integer, multiply it by the decimal factor, quantize and emit None on error, filter out None.

mulkieran · 2017-05-15T12:47:00Z

src/hypothesis/strategies.py

+    check_valid_bound(max_value, 'max_value')
+    check_valid_interval(min_value, max_value, 'min_value', 'max_value')
+    if allow_infinity is None:
+        allow_infinity = min_value is not None or max_value is not None


All of a sudden, this looks backward to me, like it should be:
allow_infinity = min_value is None or max_value is None

mulkieran · 2017-05-15T13:05:43Z

src/hypothesis/strategies.py

+        max_value = Decimal(max_value)
+        if math.isinf(max_value) and max_value > 0:
+            max_value = None
+    if min_value and not min_value.is_finite():


I think you can get rid of these four lines, since, now that infinity is a valid bound, check_valid_bound does the right thing.

We actually still want to exclude min_value='infinity', because we won't be able to draw any finite values (and the infinity with an allowed sign was converted to None above). And check_valid_bound uses math.isnan, which is lovely but doesn't work for Decimal('sNaN'), which doesn't have a floating-point equivalent.

Doesn't that mean that check_valid_bound is a bit broken?

Or rather, check_valid_bound is a bit broken, because it can not check the valid bounds for Decimals and check_valid_range is a bit broken as well, because it doesn't detect that ('Infiity', ) or (, '-Infinity') is a bad range.

Hmm. After thinking some more, I've just made check_valid_range test for decimal NaN first, and if you can't draw any finite values that's what you asked for!

Edit: nope, because it breaks the underlying strategies. Back to this form.

alexwlchan

Quick skim.

alexwlchan · 2017-05-15T21:04:09Z

src/hypothesis/strategies.py

+    - Not a Number, if ``allow_nan`` is truthy.
+    - Positive or negative infinity, if ``max_value`` and ``min_value``
+      respectively are None, and ``allow_infinity`` is not False.  (None means
+      "allow infinity, unless excluded by the min and max values")


nit: missing full stop

alexwlchan · 2017-05-15T21:04:40Z

src/hypothesis/strategies.py

+    # Convert min_value and max_value to Decimal values, and validate args
+    check_valid_integer(places)
+    if places is not None and places < 0:
+        raise InvalidArgument('cannot have negative number of places (%r)')


Whatever you meant to substitute into this string, you haven't.

alexwlchan · 2017-05-15T21:05:44Z

src/hypothesis/strategies.py

+    check_valid_interval(min_value, max_value, 'min_value', 'max_value')
+    if allow_infinity is None:
+        allow_infinity = min_value is None or max_value is None
+    elif allow_infinity and min_value is not None and max_value is not None:


Perhaps add some parentheses for readability?

More generally, some line breaks in this block of validation code would make it slightly easier to read.

DRMacIver · 2017-05-17T14:03:43Z

src/hypothesis/strategies.py

-    - If both min_value and max_value are not None, it is an error to enable
-      allow_infinity.
+    - A finite rational number, between ``min_value`` and ``max_value``.
+    - Not a Number, if ``allow_nan`` is truthy.


I think we've talked about this before, but can you remind me why the behaviour for allow_nan and bounds is different for decimals than it is for floats? It still seems wrong to me that you can return NaN if you have a bound, because it breaks the invariant that the generated value is in those bounds, but I remember there being some sort of complication.

I think it was a matter of principle rather than implementation: NaN isn't outside bounds either. As NaN-handling is an easy source of bugs, and to minimise the interactions between settings, I thought it should be separate.

Easy to change if you disagree, since my scientific background makes me think of NaN as a significant value which maybe should be ignored for most cases?

Edit: consistency is a virtue, so I've reverted to the same default as floats. However, I'm not banning allow_nan=True with bounds - IMO this is a crucial data type that we should allow users to generate, and the floats validation should be loosened.

mulkieran · 2017-05-18T12:28:17Z

src/hypothesis/strategies.py

+    check_valid_interval(min_value, max_value, 'min_value', 'max_value')
+
+    if allow_infinity is None:
+        allow_infinity = (min_value is None) or (max_value is None)


I'm still puzzled why you summarize here, only to deconstruct further on.

New suggestion, have just:

if allow_infinity and (None not in (min_value, max_value)): raise ...

here, and at line 907:

if (allow_infinity is None and min_value is None) or allow_infinity: special.append(Decimal(-Infinity)) if (allow_infinity is None and max_value is None) or allow_infinity: special.append(Decimal(Infinity))

Also, there might be an error condition for allow_infinity = False which is missing, if, e.g., the max_value or min_value are specified as infinity or -infinity.

That is a more elegant way of expressing it, thanks! And, sigh, I'll add that extra bit of validation.

mulkieran · 2017-05-18T12:28:21Z

src/hypothesis/strategies.py

+
+    if min_value is not None:
+        min_value = Decimal(min_value)
+        if min_value.is_infinite() and min_value < 0:


This looks funny if is_infinite() and is_finite() are contradictory.

In that case, what you've got is:

if A and B: // do stuff elif A: // error

which could be:

if A: if B: // do stuff else: // error

If they are not contradictory, then it is is probably worth a comment.

But maybe conditional should be:

if min_value.is_infinite(): if min_value == Decimal('-Infinity'): min_value = None else: raise ...

in that case, for greater precision.

I'll add a comment - the not value.isfinite() could be infinity with the wrong sign, a "quiet" (float-like) NaN, or "signalling" NaN (any operation raises an exception).

Thanks for the explanation.

if min_value == Decimal('-Infinity'): min_value = None elif min_value == Decimal('Infinity') or min_value.is_nan(): raise...

?

Comparing Decimal('sNaN') to anything is an error, so this won't work 😢

Plus lots more validation and a change to default allow_nan and allow_infinity arguments.

Zac-HD · 2017-05-19T02:56:52Z

Ping @DRMacIver; I think this is ready to merge now!

I've commented some of the validation, and made allow_nan default to False if bounds are given.

mulkieran · 2017-05-19T12:31:36Z

src/hypothesis/strategies.py

+    if min_value is not None:
+        min_value = Decimal(min_value)
+        if min_value.is_infinite() and min_value < 0:
+            if not (allow_infinity or allow_infinity is None):


Applying DeMorgan's law might make this clearer.

mulkieran · 2017-05-19T12:37:13Z

src/hypothesis/strategies.py

+    special = []
+    if allow_nan or (allow_nan is None and (None in (min_value, max_value))):
+        special.extend(map(Decimal, ('NaN', '-NaN', 'sNaN', '-sNaN')))
+    if allow_infinity or (allow_infinity is max_value is None):


I'm pretty sure that

allow_infinity is max_value is None

is always False, as is operator is left associative.

I just checked again, and None is None is None returns True, on both Python 2 and Python 3.

This is because of Python's slightly silly comparison chaining rules. x is y is z means something completely different from (x is y) is z or x is (y is z).

It's a supposedly natural generalisation of the thing that makes x <= y <= z work.

Do you want me to change it? Because I'll do a lot to get this approved and off my list!

No, I'm fine with it, sorry, that was just an explanatory comment.

I need to go now but I'll do a hopefully final review of this in an hour or two. Sorry it's taken so long!

Thanks @DRMacIver, I guess now I know why I never use that...

mulkieran

I'm tired, too. Expect no more comments from me.

DRMacIver

🎉 Thanks for all the hard work on this!

mulkieran · 2017-05-19T15:03:52Z

Thank you!

Tjorriemorrie · 2017-08-04T00:09:38Z

Why is this called places and not precision (prec param)?

>>> from decimal import *
>>> getcontext().prec = 6

Zac-HD · 2017-08-04T04:31:49Z

Why is this called places and not precision?

They're used interchangeably in the standard library docs, and it's intuitive that places measures digits after the decimal place while precision is less clear. Since merging we also have backwards-compatibility to think about, so it's unlikely to change now.

Zac-HD force-pushed the decimals-strategy branch from d3da646 to e969a4e Compare April 12, 2017 11:18

mulkieran reviewed Apr 12, 2017

View reviewed changes

Zac-HD force-pushed the decimals-strategy branch 2 times, most recently from 0440d07 to cc53545 Compare April 12, 2017 13:46

Zac-HD force-pushed the decimals-strategy branch from 9a291b5 to 77ef118 Compare May 3, 2017 14:05

Zac-HD changed the title ~~Add places, precision arguments to decimals strategy~~ Add places argument to decimals strategy May 4, 2017

Zac-HD force-pushed the decimals-strategy branch 2 times, most recently from 718f9f5 to 80bb410 Compare May 4, 2017 11:30

Zac-HD force-pushed the decimals-strategy branch 2 times, most recently from b6f2496 to da26d04 Compare May 10, 2017 04:24

Zac-HD mentioned this pull request May 10, 2017

Ensure that changelog entries are ordered #613

Merged

mulkieran reviewed May 10, 2017

View reviewed changes

Zac-HD force-pushed the decimals-strategy branch from da26d04 to d9b0f34 Compare May 10, 2017 14:04

Zac-HD mentioned this pull request May 13, 2017

False positive HypothesisDeprecationWarning #595

Closed

DRMacIver requested changes May 15, 2017

View reviewed changes

Zac-HD force-pushed the decimals-strategy branch from d9b0f34 to 6969896 Compare May 15, 2017 13:02

mulkieran reviewed May 15, 2017

View reviewed changes

Zac-HD force-pushed the decimals-strategy branch 3 times, most recently from 13ab786 to 6d956b3 Compare May 15, 2017 14:03

alexwlchan reviewed May 15, 2017

View reviewed changes

Zac-HD force-pushed the decimals-strategy branch from 6d956b3 to 7ca7635 Compare May 16, 2017 05:12

Zac-HD force-pushed the decimals-strategy branch 3 times, most recently from 9f770b5 to 11447a7 Compare May 16, 2017 09:39

DRMacIver reviewed May 17, 2017

View reviewed changes

mulkieran reviewed May 18, 2017

View reviewed changes

Zac-HD force-pushed the decimals-strategy branch 3 times, most recently from cc2ae06 to 294fcd7 Compare May 19, 2017 01:01

Zac-HD added 3 commits May 19, 2017 12:40

Add "places" arg, for fixed point decimals

56ce0c8

Plus lots more validation and a change to default allow_nan and allow_infinity arguments.

Fix tests for new decimals strategy

c5a8435

Bump version, changelog

c0206da

Zac-HD force-pushed the decimals-strategy branch from 294fcd7 to c0206da Compare May 19, 2017 02:40

mulkieran reviewed May 19, 2017

View reviewed changes

DRMacIver approved these changes May 19, 2017

View reviewed changes

DRMacIver merged commit 63522cf into HypothesisWorks:master May 19, 2017

DRMacIver mentioned this pull request Jul 11, 2017

FailedHealthCheck for decimals(places=X, allow_nan=False, allow_infinity=False) #725

Closed

Add places argument to decimals strategy #508

Add places argument to decimals strategy #508

Conversation

Zac-HD commented Apr 12, 2017 • edited Loading

mulkieran Apr 12, 2017 • edited Loading

Choose a reason for hiding this comment

Zac-HD Apr 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zac-HD commented Apr 12, 2017

DRMacIver commented Apr 12, 2017

Zac-HD commented Apr 13, 2017

Zac-HD commented May 3, 2017

Zac-HD commented May 4, 2017

DRMacIver commented May 4, 2017

Zac-HD commented May 10, 2017

mulkieran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DRMacIver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zac-HD May 15, 2017 • edited Loading

Choose a reason for hiding this comment

alexwlchan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zac-HD May 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mulkieran May 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zac-HD commented May 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mulkieran left a comment

Choose a reason for hiding this comment

DRMacIver left a comment

Choose a reason for hiding this comment

mulkieran commented May 19, 2017

Tjorriemorrie commented Aug 4, 2017

Zac-HD commented Aug 4, 2017

Zac-HD commented Apr 12, 2017 •

edited

Loading

mulkieran Apr 12, 2017 •

edited

Loading

Zac-HD Apr 12, 2017 •

edited

Loading

Zac-HD May 15, 2017 •

edited

Loading

Zac-HD May 18, 2017 •

edited

Loading

mulkieran May 18, 2017 •

edited

Loading