Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New core strategies: datetimes, dates, times, timedeltas #621

Merged
merged 9 commits into from
May 23, 2017

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented May 12, 2017

This time with feeling! Successor to #556 and #520. Closes #333, closes #352, closes #441, closes #599. (Apparently we've wanted to do something about time for a while)

599 outlines most of the design lessons I learned from 556, and has a few explanatory comments I won't repeat here.

I will note that I see the timezones() strategy largely as a placeholder: there are several ways I could reorder timezones, or allow them to be grouped and selected. Until I've seen it used in the wild though, I'd prefer to keep it forward-compatible and let user code shape how it evolves - especially since it's trivial to copy and modify.

@Zac-HD Zac-HD force-pushed the core-time-strats branch 5 times, most recently from cf8bfba to 78011ca Compare May 12, 2017 14:39
Copy link
Contributor

@alexwlchan alexwlchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments from an initial skim (sorry, I haven’t had time to review it properly yet).

docs/changes.rst Outdated
3.9.0 - 2017-05-12
------------------

This is a feature release, adding datetime-related strategies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it clear these are part of the core lib.

docs/changes.rst Outdated

- ``times`` and ``datetimes`` take an optional ``timezones=`` argument, which
defaults to ``none()`` for naive times. You can use our extra strategy
based on pytz, or roll your timezones strategy with dateutil or even the stdlib.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"roll your own timezones strategies"

docs/changes.rst Outdated

- The old ``dates``, ``times``, and ``datetimes`` strategies in
``hypothesis.extra.datetimes`` are deprecated in favor of the new
strategies, which are more flexible and have no dependencies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be presented as bullets? It's all part of the same concept – a new set of date-related strategies – let’s have it as top-level paragraphs instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to give a deprecation timeline?

Q for @DRMacIver – how long do you want to keep around these deprecated strategies?

timezones = list(pytz.all_timezones)
timezones.remove(u'UTC')
timezones.insert(0, u'UTC')
return sampled_from(list(map(pytz.timezone, timezones)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: I prefer list comprehensions to maps, as I think they’re more readable.

def timezones():
"""A strategy for pytz tzinfo objects.

Essentially ``sampled_from(map(pytz.timezone, pytz.all_timezones))``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s not put implementation details in the docstring, particularly because this isn’t accurate. Would be better to highlight the key difference – this strategy shrinks to UTC.


There are a number of ways you can create a timezones strategy, depending
mostly on your timezone objects (e.g. from the standard library,
``datetutil``, or ``pytz``). In each case, you should probably use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"probably"?

When would I not want to use sampled_from?

working purely in UTC to avoid such issues.

There are a number of ways you can create a timezones strategy, depending
mostly on your timezone objects (e.g. from the standard library,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop "mostly", unless we want to give pointers to alternative reasons

if not isinstance(timezones, SearchStrategy):
raise InvalidArgument(
'timezones=%r must be a SearchStrategy that can provide tzinfo '
'for datetimes (either None or dt.tzinfo objects)' % (timezones,))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have we suddenly wrapped a single format string argument in a tuple when you didn't above?

Copy link
Member Author

@Zac-HD Zac-HD May 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that one obvious mistake would be to supply e.g. timezones=[None, pytz.UTC] instead of a strategy. Above, we know that it's either None or a datetime so there's no need to handle the sequence case.

return just(min_date)
return datetimes(min_datetime=dt.datetime.combine(min_date, dt.time.min),
max_datetime=dt.datetime.combine(max_date, dt.time.max)
).map(dt.datetime.date)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code makes me uncomfortable – what if we try this on a day that didn’t start at midnight, or doesn’t finish at 23:59? The former is definitely possible.

I’d be tempted to implement this in a way that avoids letting times get involved at all:

interval_length = (max_date - min_date).days
return integers(min=0, max=interval_length).map(
    lambda d: min_date + dt.timedelta(days=d)
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those scenarios are only representable with tz-aware datetimes, luckily. I agree that the delta-based strategy is nicer, but the advantage of using the datetimes strat is it minimises towards 2000-01-01 rather than min_date (usually 0001-01-01).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be solved relatively easily by using the center argument to integer_range

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp, of course I can just calculate the timedelta in days to the millennium. Yeah, that's much nicer.

from hypothesis.strategytests import strategy_test_suite
from hypothesis.internal.compat import hrange

# TIMEDELTAS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is useful.

@Zac-HD Zac-HD force-pushed the core-time-strats branch 2 times, most recently from 3474dad to 65be6dd Compare May 14, 2017 03:11
Copy link
Member

@DRMacIver DRMacIver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, thanks! Happy with the API except for possible the one comment about None vs none(). Specific comments inline.

with settings(strict=False):
try:
func(*args, **kwargs)
except HypothesisDeprecationWarning:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. Surely the correct thing to do here is to never raise the error when strict=False?

Perhaps you wanted a capture_warnings call here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but as it happens strategy_test_suite(...) overrides it somehow. Should I just delete those tests for the deprecated strategies, and remove the try/except?

assert datetimes(val, val).example() is val


TestStandardDescriptorFeatures_dates1 = strategy_test_suite(dates())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a request for changes, but I'm vaguely thinking we should just get rid of these. I'm not convinced they are testing anything useful any more - they're kinda a legacy of the old much more complicated way of writing strategies, and I'm not sure I've ever seen them catch an error in the new implementation that wasn't caught by another test.



@checks_deprecated_behaviour
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decorator was a really good idea, thanks. It's much nicer just being able to slap it on all the old tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge this first, it would also be useful in #580 for a few places where you've put settings(strict=False).

lambda d: assume(d.tzinfo) and d.tzinfo.zone != u'UTC')


@given(just('min_time') | just('max_time'), times(timezones=timezones()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just(a) | just(b) would be better written sampled_from((a, b)) I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm not totally convinced this benefits from the given and might suggest just hard-coding a time value and using a parametrize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about future changes - if UTC or a fixed offset is later allowed, I want that change to make a test fail.

self.tz_strat = timezones_strat

def do_draw(self, data):
# I've never seen this take eight iterations, but let's be safe...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can break in weird ways when things are being mutated and/or shrunk. It's important to remember that a ConjectureData object is not actually a random number generator and can, by design, fuck around arbitrarily with the data it generates.

The comparable logic in filter is to try 3 times and then fail which might be better to emulate here (though three is an arbitrary number and it could be 8 just as easily - filter might be expensive so it makes sense to keep it smaller).

Failing (by calling mark_invalid() on the data object) is definitely the correct thing to do at the end of the loop rather than raising an exception.

Regardless of the number you end up settling on, here's how you might test this:

  1. Factor out the loop body into its own method.
  2. Use strategies.binary() + find() to find a buffer such that when do_draw is called on ConjectureData.for_buffer the loop body fails.
  3. Call data.draw(datetimes()) on a ConjectureData.for_buffer(failure_inducing * 100).

docs/changes.rst Outdated
core strategies, which are more flexible and have no dependencies.

The default field mapping for DateTimeField in the Django extra now respects
the ``USE_TZ`` setting when choosing a strategy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth referencing #439 here?

Also, I feel like I should ask you to separate this out into its own patch release, but I won't actually insist on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current convention seems to be no issue links in the changelog; I'm happy to change that but it should happen via an entry in the review guide (or, better, part of a more detailed "making a pull" guide).

If it's a patch before, I then repatch it as part of this pull. If it's a patch after, why not include it? (I'm willing to split it out but inclined not to, given the build issues I've had lately)

@@ -15,111 +15,120 @@
#
# END HEADER

"""This module provides time and date related strategies.

It depends on the ``pytz`` package, which is stable enough that almost any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be better to put the timezones strategy into a hypothesis.extra.pytz package and let htis one become entirely deprecated.

due to daylight savings, leap seconds, timezone and calendar
adjustments, etc. This is intentional, as malformed timestamps are a
common source of bugs. Consider validating all datetime values, or
working purely in UTC to avoid such issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does working purely in UTC actually fully avoid such issues? It's not like UTC doesn't have leap seconds.

@@ -910,6 +912,108 @@ def do_draw(self, data):
return PermutationStrategy()


@defines_strategy
def datetimes(min_datetime=dt.datetime.min, max_datetime=dt.datetime.max,
timezones=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have timezones default to none() and not allow None as a value here? That way it's always consistently a strategy object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I was trying to allow a just(min_datetime) case, but consistency of API is probably better than implementation details.

check_type(dt.datetime, max_datetime, 'max_datetime')
if min_datetime.tzinfo is not None:
raise InvalidArgument('min_datetime=%r must not have tzinfo'
% min_datetime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is fairly strongly to always wrap % arguments in a tuple even when it's valid not to FWIW. I know it can't go wrong here, but it's better to be obviously correct than non-obviously correct.


"""
all_timezones = [pytz.timezone(tz) for tz in pytz.all_timezones]
static = [pytz.UTC] + sorted(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one final thing: I don't quite understand this logic and would appreciate a comment here explaining what the significance of a a StaticTzInfo is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. A StaticTzInfo represents a timezone that has always had a constant offset from UTC - no daylight savings, and no other tricks like moving across the date line. IMO that makes them simpler than the others, and smaller absolute offsets simpler within that.

lambda b: strat._attempt_one_draw(
ConjectureData.for_buffer(b)) is None
)
# Should raise some kind of error?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely should. I wonder why it isn't. I think the most likely explanation is that datetimes() is for some reason not exactly the same strategy as strat. Try passing strat instead of datetimes?

The other thing to check if that doesn't work is to make sure it's actually consuming the whole buffer. e.g. do a data = ConjectureData.for_buffer(failure_inducing), pass that to strat._attempt_one_draw, and assert afterwards that data.buffer == failure_inducing and the result

docs/changes.rst Outdated

This is a feature release, adding datetime-related strategies to the core strategies.

``extra.datetime.timezones`` allows you to sample pytz timezones from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra.pytz.timezones rather than extra.datetime.timezones I think?

@Zac-HD Zac-HD force-pushed the core-time-strats branch 3 times, most recently from 251dcd4 to b20d42f Compare May 22, 2017 13:15
@Zac-HD
Copy link
Member Author

Zac-HD commented May 22, 2017

OK, rebased and generally fixed up.

However, the "maybe a test" is passing when it shouldn't be, and this is the cause of failing coverage. No idea why, but it does produce the minimal datetime(2000, 1, 1, 0, 0).

@DRMacIver
Copy link
Member

However, the "maybe a test" is passing when it shouldn't be

Something weird is going on here. I'm going to have a bit of a poke into it.

@DRMacIver
Copy link
Member

OK. I understand the problem now.

The min_size parameter you're adding to binary() is breaking things: You're ending up with a bunch of extra zeros at the end which don't result in a failed draw when it retries there.

I think you want to do something like:

  def test_DatetimeStrategy_draw_may_fail():
      def is_failure_inducing(b):
          try:
              return strat._attempt_one_draw(
                  ConjectureData.for_buffer(b)) is None
          except StopTest:
              return False

      strat = DatetimeStrategy(dt.datetime.min, dt.datetime.max, none())
      failure_inducing = find(binary(), is_failure_inducing)

      data = ConjectureData.for_buffer(failure_inducing * 100)
      with pytest.raises(StopTest):
          data.draw(strat)
      assert data.status == Status.INVALID

(Status and StopTest can both be imported from hypothesis.internal.conjecture.data)

@Zac-HD
Copy link
Member Author

Zac-HD commented May 22, 2017

Ah, that makes sense - I got a StopTest error in the previous find because it raised, silenced that by adding the min_size, and then it couldn't raise later.

@DRMacIver
Copy link
Member

I got a StopTest error in the previous find because it raised,

Yeah, StopTest in this context basically gets raised to, err, stop the test when something has prevented data from being correctly drawn. It mostly happens when you run out of bytes, or when mark_invalid is called. Here you're getting the former. The problem is that having too much data here is bad as well as too little, because the goal of the test is to get things to line up perfectly so that the strategy keeps getting given the wrong data.

(As a side note this test would have very occasionally failed with a StopTest before hand: The amount of data drawn by integer_range and similar is slightly variable, so sometimes it needed more than 30 bytes. I saw this happen about once in ten runs)

Zac-HD added 4 commits May 23, 2017 17:31
To include the type of the argument that was invalid, and optionally the
name too.
Allowing them to exist unchanged.
And wow, is it nice not to have compatibility constraints.
@Zac-HD Zac-HD force-pushed the core-time-strats branch 2 times, most recently from 7e8f4ba to 0f23443 Compare May 23, 2017 07:35
Zac-HD added 2 commits May 23, 2017 18:56
Largely taken from the tests for deprecated versions,
because why mess with a working implementation?
@Zac-HD Zac-HD force-pushed the core-time-strats branch from 0f23443 to 441f92c Compare May 23, 2017 08:56
@Zac-HD
Copy link
Member Author

Zac-HD commented May 23, 2017

Ping @DRMacIver for final review. It looks like the appveyor build simply timed out, so if you rerun that job I'm confident it will pass.

Copy link
Member

@DRMacIver DRMacIver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 ❤️

@DRMacIver
Copy link
Member

@alexwlchan you still have an open review of this but I think @Zac-HD addressed all the issues. Do you want to do a more in depth final review or should we dismiss your review?

@Zac-HD Zac-HD dismissed alexwlchan’s stale review May 23, 2017 10:33

Addressed comments on an early version

@Zac-HD
Copy link
Member Author

Zac-HD commented May 23, 2017

@alexwlchan - if you have any issues I haven't addressed, let me know; otherwise I'll merge this late tonight (UTC).

[hmm, the comment above hadn't loaded. I'll wait!]

@DRMacIver
Copy link
Member

[hmm, the comment above hadn't loaded. I'll wait!]

I think it's fine to just merge if you're keen to get it through FWIW.

@Zac-HD
Copy link
Member Author

Zac-HD commented May 23, 2017

While I'm very keen to get this merged, I'd rather err on the side of more feedback. Since midnight UTC is 10am local, it's no trouble for me to merge in the morning 😄

@Zac-HD Zac-HD merged commit 67d6922 into HypothesisWorks:master May 23, 2017
@Zac-HD
Copy link
Member Author

Zac-HD commented May 23, 2017

Released! 🥂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding datetimes() to the core strategies Add timedeltas strategy to datetime extra
3 participants