-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: nonexistent Timestamp pre-summer/winter DST w/dateutil timezone #31155
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. ping on green.
cee72ed
to
c9a87bd
Compare
@@ -1092,3 +1092,15 @@ def test_constructor_ambigous_dst(): | |||
expected = ts.value | |||
result = Timestamp(ts).value | |||
assert result == expected | |||
|
|||
|
|||
def test_constructor_before_dst_switch(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test the example in the issue here? (i.e. that Timestamp(Timestamp(epoch_time, tz=..))
doesn't change value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke Sure!
But the value breaks only 1 nanosecond before the switch. Currently, I'm struggling to make everything else work on every azure pipeline (see comment below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke
Added to the test. Pinging you on change.
Remove exception from test_dti_construction_onexistent_endpoint
@jreback
They had this for Python 2.6 datetime support and dropped it in this PR. I set up a python 3.6 environment and tested this hypothesis. The tests fail with dateutil 2.6.1 and pass with dateutil 2.7.0. Basically, I want to know how I should proceed. |
so just change the test have different cases for >=2,7 and <2,7 |
@@ -2,10 +2,12 @@ | |||
Tests for DatetimeIndex timezone-related methods | |||
""" | |||
from datetime import date, datetime, time, timedelta, tzinfo | |||
from distutils.version import LooseVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to parse version properly, so have to import this.
ts = Timestamp(epoch, tz="dateutil/US/Pacific") | ||
result = ts.tz.dst(ts) | ||
expected = timedelta(seconds=0) | ||
assert Timestamp(ts).value == epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke
This tests that the value no longer shifts when we call the constructor again. In the issue, this failed for epoch = 1552211999999999999
Because |
@jreback |
thanks @AlexKirko very nice |
# Make sure that calling Timestamp constructor | ||
# on time just before DST switch doesn't lead to | ||
# nonexistent time or value change | ||
# Works only with dateutil >= 2.7.0 as dateutil overrid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/overrid/overrode
# nonexistent time or value change | ||
# Works only with dateutil >= 2.7.0 as dateutil overrid | ||
# pandas.Timedelta.total_seconds with | ||
# datetime.timedelta.total_seconds before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not make sense to me - datetime.timedelta.total_seconds
should succeed, because it's equivalent to:
def total_seconds(td):
useconds = td.days * 86400
useconds += td.seconds
useconds *= 1000000
useconds += td.microseconds
return useconds / 1e6
I think there's actually a deeper issue here, which is that td.microseconds
and td.seconds
are rounded rather than truncated. Consider this:
def to_ns(td):
ns = td.days * 86400
ns += td.seconds
ns *= 1000000
ns += td.microseconds
ns *= 1000
ns += td.nanoseconds
return ns
td = timedelta(1552211999999999872, unit="ns")
print(td.value) # 1552211999999999872
print(to_ns(td)) # 1552212000000000872
That seems to be the actual root cause of this issue and should probably be fixed.
# Works only with dateutil >= 2.7.0 as dateutil overrid | ||
# pandas.Timedelta.total_seconds with | ||
# datetime.timedelta.total_seconds before | ||
ts = Timestamp(epoch, tz="dateutil/US/Pacific") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention earlier: you should probably use dateutil/America/Los_Angeles
, as that is the canonical name for this zone. The US/...
zones are symlinks for backwards compatibility.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This implements rounding down to microseconds into Timedelta.total_seconds(). Lack of this rounding led to
dateutil.tz.tzinfo.utcoffset
anddst
breaking less than 128 nanoseconds before winter/summer DST switch. This happened due to an unintended cast to float inTimedelta.total_seconds()
compounded withTimedelta.value
being np.int64 type which doesn't support long arithmetic. The loss of precision led to rounding up, which meant thattotal_seconds
since epoch time implied DST time while theTimedelta.value
hasn't yet reached DST.Details and code examples below.
This was quite a journey, but I found out what's going on. Let's say we have a
dateutil.tz.tzinfo
object nameddu_tz
and want to find out the DST-aware UTC offset.du_tz.utcoffset(dt)
which callsdu_tz._find_ttinfo(dt)
which callsdu_tz._resolve_ambiguous_time(dt)
to find the index of the last transition time that it uses to return the correct offset.du_tz._resolve_ambiguous_time(dt)
callsdu_tz._find_last_transition(dt)
which calls the_datetime_to_timestamp(dt)
dateutil function.The problem is dateutil's reliance on
Timedelta.total_seconds
which casts to float:Demonstration:
The same thing happens with a
pytz
timezone, only pytz relies on something else to check for DST transitions:So
timedelta
value is okay, buttimedelta.total_seconds
hits the precision limit of floats, and this leads to rounding and an incorrect DST offset.