-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Folds and imaginary datetimes in the datetime strategy #2273
Comments
Fantastic writeup, thank you! I'd be delighted to accept a PR adding Also very interested in setting the fold attribute (on 3.6+, or is_dst on any version), and the associated machinery. Just fiddly to stay compatible with Python 3.5 in the meantime... |
The documented contract of this function is that it is the inverse of `datetime.isoformat()`. See GH HypothesisWorks/hypothesis#1401, HypothesisWorks/hypothesis#69 and to a lesser extent HypothesisWorks/hypothesis#2273 for background on why I have set max_examples so high.
The documented contract of this function is that it is the inverse of `datetime.isoformat()`. See GH HypothesisWorks/hypothesis#1401, HypothesisWorks/hypothesis#69 and to a lesser extent HypothesisWorks/hypothesis#2273 for background on why I have set max_examples so high.
@pganssle - how does master...Zac-HD:better-datetimes look to you? It adds the |
@Zac-HD Sorry I haven't had time to make a proper PR for this. Few things:
def _draw_naive(min_dt, max_dt):
... # Assume that this draws a naive datetime in the range [min_dt, max_dt]
# and sets `fold` randomly in Python 3.6+
def _draw_real(min_dt, max_dt, tz):
def to_naive_utc(dt):
if dt.tzinfo is None:
dt.replace(tzinfo=tz)
return dt.astimezone(timezone.utc).replace(tzinfo=None)
min_dt_utc, max_dt_utc = map(to_naive_utc, (min_dt, max_dt))
dt_utc = _draw_naive(min_dt, max_dt)
return dt_utc.replace(tzinfo=timezone.utc).astimezone(tz)
def _draw_any(min_dt, max_dt, tz):
dt = _draw_naive(min_dt, max_dt)
if is_pytz_timezone(tz):
# fold and is_dst aren't necessarily the exact same thing
# but they're both booleans and it's randomly chosen anyway
dt = tz.localize(dt, is_dst=not dt.fold)
else:
dt = dt.replace(tzinfo=tz)
def do_draw(self, data):
tz = data.draw(self.tz_strat)
if self.allow_imaginary:
return _draw_any(self.min_dt, self.max_dt, tz)
else:
return _draw_real(self.min_dt, self.max_dt, tz) I think you'll need to refactor what I have here to get it to work correctly with the
For
I think the biggest downside with the "ensure that it's a real time zone" thing is that it requires invoking |
It's true-by-default in the public API (function); false in the class because we reuse the machinery for
Because the bounds are currently expressed in naive time, and as I understand it interpreting as UTC then converting can change the naive part of the datetime to possibly-out-of-bounds. On reflection it's probably a lot safer just to check for and reject imaginary times! (interpret as timezone, convert to UTC and back, reject if unequal) Supporting tz-aware bounds is definitely on my list of nice-to-haves, but there's a fair bit of refactoring to do first!
Eh, this sounds pretty normal to me really - everything breaks when I point Hypothesis at it. So long as the bugs get fixed I'm happy! |
Hm.. So I did consider this, though obviously didn't put enough thought into actually dealing with the edge cases. I think you may have the same problem with the "convert to UTC and back" problem if it's a valid local time but it can't be represented in UTC because of the boundary conditions. You'll still fail to generate the UTC times for something like I think the options are to either:
I'm not sure if either is better than the other. The non-restricted version is more likely to turn up bugs, but they are likely to have no real effect on "business logic", since time zone offsets are very close to meaningless before ~1900 and decreasingly accurate the further into the future you go, so correctly handling time zones at 0 CE or 9999 CE are very much YAGNI situations. I'm inclined to say restrict the domain of valid datetimes to the overlap between the two and just do the draw in UTC, because I think that's the domain that people who would opt in to One other point to consider about the "draw a naive and reject if it can't round trip" option is that people may accidentally set the boundaries in such a way that imaginary datetimes are a large fraction of the valid search space, which I assume would be very slow. |
OK, I think I've been trying to fix too many issues with datetime support at once. Here they are:
After the refactoring, some feature improvements:
which should be much easier to both write and review as two PRs! And finally some options which would be nice-to-have but are not on the roadmap:
|
I was just testing a function using hypothesis and I realized that hypothesis never generates a datetime with the
fold
value set. When I went to fix this, I realized that there are two ways to handle how ambiguous and imaginary datetimes are handled with aware datetimes, and the current strategy is actually inconsistent in this regard.The problem is whether you think of
strategies.datetime
as generating arbitrarydatetime
objects or whether they are intended to generate valid datetimes in a given zone. Currently, if you generate an awaredatetime
and pass a strategy that generatespytz
zones, you'll always get valid datetimes, because the library actively detectspytz
zones and callsnormalize(localize(dt))
on the datetimes generated. If it draws any other kind of zone,hypothesis
may generate an imaginary datetime, but ifhypothesis
picks an ambiguous datetime, it will only ever generate one side of the transition (whichever one happens chronologically first, fordateutil
).If you want it to generate arbitrary datetimes, we can drop the
normalize
call, and setfold
randomly for every datetime generated. If you draw fromst.booleans()
, which shrinks towardsFalse
,fold
will almost always shrink away because it usually doesn't affect the behavior of the datetime. You will need to special-casepytz
here, and instead of settingfold
set theis_dst
flag in thelocalize
call.If you always want the aware datetimes generated by
hypothesis
to be valid (non-imaginary), then you can avoid hard-coding any awareness ofpytz
's unusual interface and just draw the datetimes as UTC, then convert. This will make the behavior uniform betweendateutil
andpytz
and will setfold
only for ambiguous datetimes but it will never generate imaginary datetimes.I think these are both reasonable behaviors for
datetimes
. My inclination is to add a flag likeallow_imaginary
that switches between them and defaults toTrue
. An alternative to this would be to stick to one or the other and have people who want the behavior not chosen create it themselves by generating a naive datetime and a timezone separately, then combining them themselves, but I think that will make it much harder for #69 to be effective.I'm happy to implement whatever decision we come to.
The text was updated successfully, but these errors were encountered: