BUG: nonexistent Timestamp pre-summer/winter DST w/dateutil timezone #31155

AlexKirko · 2020-01-20T15:23:12Z

closes BUG: nonexistent Timestamp pre-summer/winter DST change with dateutil timezone #31043
tests added 1 / passed 1
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This implements rounding down to microseconds into Timedelta.total_seconds(). Lack of this rounding led to dateutil.tz.tzinfo.utcoffset and dst breaking less than 128 nanoseconds before winter/summer DST switch. This happened due to an unintended cast to float in Timedelta.total_seconds() compounded with Timedelta.value being np.int64 type which doesn't support long arithmetic. The loss of precision led to rounding up, which meant that total_seconds since epoch time implied DST time while the Timedelta.value hasn't yet reached DST.

Details and code examples below.

This was quite a journey, but I found out what's going on. Let's say we have a dateutil.tz.tzinfo object named du_tz and want to find out the DST-aware UTC offset.

We call du_tz.utcoffset(dt) which calls du_tz._find_ttinfo(dt) which calls du_tz._resolve_ambiguous_time(dt) to find the index of the last transition time that it uses to return the correct offset.
du_tz._resolve_ambiguous_time(dt) calls du_tz._find_last_transition(dt) which calls the _datetime_to_timestamp(dt) dateutil function.
This is what this function does:

def _datetime_to_timestamp(dt):
    """
    Convert a :class:`datetime.datetime` object to an epoch timestamp in
    seconds since January 1, 1970, ignoring the time zone.
    """
    return (dt.replace(tzinfo=None) - EPOCH).total_seconds()

The problem is dateutil's reliance on Timedelta.total_seconds which casts to float:

    def total_seconds(self):
        """
        Total duration of timedelta in seconds (to ns precision).
        """
        return self.value / 1e9

Demonstration:

IN:
import datetime
import pandas as pd

epoch =  1552211999999999872
ts = pd.Timestamp(epoch, tz='dateutil/US/Pacific')

EPOCH = datetime.datetime.utcfromtimestamp(0)

delta = ts.replace(tzinfo=None) - EPOCH
print(delta.value)
OUT:
1552183199999999872
IN:
print(delta.total_seconds())
OUT:
1552183200.0
IN:
print(ts.tz.dst(ts))
OUT:
datetime.timedelta(seconds=3600)

The same thing happens with a pytz timezone, only pytz relies on something else to check for DST transitions:

IN:
import datetime
import pandas as pd

epoch =  1552211999999999872
ts = pd.Timestamp(epoch, tz='US/Pacific')

EPOCH = datetime.datetime.utcfromtimestamp(0)

delta = ts.replace(tzinfo=None) - EPOCH
print(delta.value)
OUT:
1552183199999999872
IN:
print(delta.total_seconds())
OUT:
1552183200.0
IN:
print(ts.tz.dst(ts))
OUT:
datetime.timedelta(0)

So timedelta value is okay, but timedelta.total_seconds hits the precision limit of floats, and this leads to rounding and an incorrect DST offset.

jreback

lgtm. ping on green.

pandas/_libs/tslibs/timedeltas.pyx

… timezone

mroeschke · 2020-01-20T21:39:18Z

pandas/tests/scalar/timestamp/test_timestamp.py

@@ -1092,3 +1092,15 @@ def test_constructor_ambigous_dst():
    expected = ts.value
    result = Timestamp(ts).value
    assert result == expected
+
+
+def test_constructor_before_dst_switch():


Can we test the example in the issue here? (i.e. that Timestamp(Timestamp(epoch_time, tz=..)) doesn't change value)

@mroeschke Sure!
But the value breaks only 1 nanosecond before the switch. Currently, I'm struggling to make everything else work on every azure pipeline (see comment below).

@mroeschke
Added to the test. Pinging you on change.

pandas/_libs/tslibs/timedeltas.pyx

pandas/tests/scalar/timestamp/test_timestamp.py

Remove exception from test_dti_construction_onexistent_endpoint

pandas/tests/scalar/timestamp/test_timestamp.py

AlexKirko · 2020-01-21T10:45:38Z

@jreback
Okay, I found why this breaks on minimum versions and macOS. Both specify python-dateutil == 2.6.1.
Before version 2.7.0, dateutil didn't fully rely on timedelta.total_seconds(). Instead it implemented this function:

def _total_seconds(td):
    # Python 2.6 doesn't have a total_seconds() method on timedelta objects
    return ((td.seconds + td.days * 86400) * 1000000 +
            td.microseconds) // 1000000


_total_seconds = getattr(timedelta, 'total_seconds', _total_seconds)

They had this for Python 2.6 datetime support and dropped it in this PR.
This implementation grabs datetime.timedelta.total_seconds() with getattr, and we can't influence this from the pandas side.

I set up a python 3.6 environment and tested this hypothesis. The tests fail with dateutil 2.6.1 and pass with dateutil 2.7.0.

Basically, I want to know how I should proceed.
I don't see a way to work around this without overriding dateutil's tzinfo methods, and this seems like a horrible idea.
Or we could change our minimum supported version of dateutil to 2.7.0 if that's acceptable. We don't support Python 2 in pandas, so there is really no need to support a version of dateutil this low.
I wonder why we specify version 2.6.1 for macOS though.

jreback · 2020-01-21T11:37:42Z

so just change the test have different cases for >=2,7 and <2,7
we already moved to
2.6.1 as the min (for 1.0.0);

AlexKirko · 2020-01-21T15:40:52Z

pandas/tests/indexes/datetimes/test_timezones.py

@@ -2,10 +2,12 @@
 Tests for DatetimeIndex timezone-related methods
 """
 from datetime import date, datetime, time, timedelta, tzinfo
+from distutils.version import LooseVersion


Need to parse version properly, so have to import this.

pandas/tests/scalar/timestamp/test_timestamp.py

AlexKirko · 2020-01-22T08:13:53Z

pandas/tests/scalar/timestamp/test_timestamp.py

+    ts = Timestamp(epoch, tz="dateutil/US/Pacific")
+    result = ts.tz.dst(ts)
+    expected = timedelta(seconds=0)
+    assert Timestamp(ts).value == epoch


@mroeschke
This tests that the value no longer shifts when we call the constructor again. In the issue, this failed for epoch = 1552211999999999999

This reverts commit 8e754f7.

This reverts commit 894ed16.

AlexKirko · 2020-01-22T14:00:14Z

Because dateutil.__version__ breaks the mypy check, I'm using pandas.compat._optional._get_version. Seems like it was written precisely for something like this, and it's certainly a better idea than importing pkg_resources.get_distribution.
Once the numpy dev PR is merged, I'll merge master into my branch, and we should be green.

AlexKirko · 2020-01-22T19:19:13Z

@jreback
All green, please review.

jreback · 2020-01-24T03:33:06Z

thanks @AlexKirko very nice

pganssle · 2020-01-24T15:41:14Z

pandas/tests/scalar/timestamp/test_timestamp.py

+    # Make sure that calling Timestamp constructor
+    # on time just before DST switch doesn't lead to
+    # nonexistent time or value change
+    # Works only with dateutil >= 2.7.0 as dateutil overrid


s/overrid/overrode

pganssle · 2020-01-24T16:24:29Z

pandas/tests/scalar/timestamp/test_timestamp.py

+    # nonexistent time or value change
+    # Works only with dateutil >= 2.7.0 as dateutil overrid
+    # pandas.Timedelta.total_seconds with
+    # datetime.timedelta.total_seconds before


This does not make sense to me - datetime.timedelta.total_seconds should succeed, because it's equivalent to:

def total_seconds(td): useconds = td.days * 86400 useconds += td.seconds useconds *= 1000000 useconds += td.microseconds return useconds / 1e6

I think there's actually a deeper issue here, which is that td.microseconds and td.seconds are rounded rather than truncated. Consider this:

def to_ns(td): ns = td.days * 86400 ns += td.seconds ns *= 1000000 ns += td.microseconds ns *= 1000 ns += td.nanoseconds return ns td = timedelta(1552211999999999872, unit="ns") print(td.value) # 1552211999999999872 print(to_ns(td)) # 1552212000000000872

That seems to be the actual root cause of this issue and should probably be fixed.

pganssle · 2020-01-24T16:31:54Z

pandas/tests/scalar/timestamp/test_timestamp.py

+    # Works only with dateutil >= 2.7.0 as dateutil overrid
+    # pandas.Timedelta.total_seconds with
+    # datetime.timedelta.total_seconds before
+    ts = Timestamp(epoch, tz="dateutil/US/Pacific")


Forgot to mention earlier: you should probably use dateutil/America/Los_Angeles, as that is the canonical name for this zone. The US/... zones are symlinks for backwards compatibility.

jreback added Datetime Datetime data dtype Timezones Timezone data dtype Bug labels Jan 20, 2020

jreback added this to the 1.1 milestone Jan 20, 2020

jreback approved these changes Jan 20, 2020

View reviewed changes

AlexKirko commented Jan 20, 2020

View reviewed changes

pandas/_libs/tslibs/timedeltas.pyx Show resolved Hide resolved

AlexKirko added 2 commits January 20, 2020 20:55

BUG: nonexistent Timestamp pre-summer/winter DST change with dateutil…

651a55f

… timezone

switch from scientific number notation

c9a87bd

AlexKirko force-pushed the fix-nonexistent-time branch from cee72ed to c9a87bd Compare January 20, 2020 17:56

AlexKirko added 4 commits January 20, 2020 21:54

move back to scientific notation to reset tests

ca34eed

move away from scientific notation to reset tests

65b3bb8

add casts to force correct division

1eb9500

switch to dateutil implementation

2f3850e

mroeschke reviewed Jan 20, 2020

View reviewed changes

AlexKirko commented Jan 20, 2020

View reviewed changes

pandas/_libs/tslibs/timedeltas.pyx Outdated Show resolved Hide resolved

AlexKirko added 3 commits January 21, 2020 00:50

TST: expand test for debugging

6c87f1b

TST: fix test expansion

43f6645

go back to previous implementation

b1defde

AlexKirko commented Jan 20, 2020

View reviewed changes

pandas/tests/scalar/timestamp/test_timestamp.py Outdated Show resolved Hide resolved

AlexKirko added 4 commits January 21, 2020 09:13

hopefully a more robust implementation

e46c774

add rounding

4f8b490

TST: remove exception from another test

f8dfb36

Remove exception from test_dti_construction_onexistent_endpoint

add more to the test

3ad3212

AlexKirko commented Jan 21, 2020

View reviewed changes

pandas/tests/scalar/timestamp/test_timestamp.py Show resolved Hide resolved

try removing rounding, see if it breaks

2174ca0

AlexKirko added 3 commits January 21, 2020 15:11

add value stability and skip condition to the test

0cf53c1

TST: add xfails for dateutil version < 2.7.0

3f79c3d

TST: remove duplicate test

1c147d3

AlexKirko added 3 commits January 21, 2020 15:59

TST: remove packaging import to find the error

f03e7c9

TST: switch to LooseVersion for dateutil version check

c88e354

TST: use get_distribution to correct mypy error

109a285

AlexKirko commented Jan 21, 2020

View reviewed changes

pandas/tests/scalar/timestamp/test_timestamp.py Outdated Show resolved Hide resolved

AlexKirko requested a review from jreback January 21, 2020 16:09

TST: fix value stability test

8e754f7

AlexKirko commented Jan 22, 2020

View reviewed changes

AlexKirko added 3 commits January 22, 2020 13:28

Revert "TST: fix value stability test"

894ed16

This reverts commit 8e754f7.

Revert Revert "TST: fix value stability test"

614176f

This reverts commit 894ed16.

TST: move to compat._optional._get_version

56a3e71

AlexKirko added 2 commits January 22, 2020 17:53

Merge branch 'master' into fix-nonexistent-time

89a5c01

restart checks

9d21a81

jreback merged commit 3b2c8f6 into pandas-dev:master Jan 24, 2020

AlexKirko deleted the fix-nonexistent-time branch January 24, 2020 06:00

AlexKirko mentioned this pull request Jan 24, 2020

PERF: add shortcut to Timestamp constructor #30676

Merged

5 tasks

pganssle reviewed Jan 24, 2020

View reviewed changes

pganssle mentioned this pull request Jan 27, 2020

BUG: Timedelta components rounded by float imprecision #31354

Closed

mroeschke mentioned this pull request Jan 29, 2020

BUG: Timedelta components no longer rounded with high precision integers #31380

Merged

5 tasks

simonjayhawkins mentioned this pull request Jun 7, 2021

BUG: total_seconds() method returns zero for timedeltas smaller then 1 microsecond #40946

Closed

3 tasks

simonjayhawkins mentioned this pull request May 29, 2022

BUG: Timedelta.total_seconds method is returning wrong values in nanosecond intervals #46819

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: nonexistent Timestamp pre-summer/winter DST w/dateutil timezone #31155

BUG: nonexistent Timestamp pre-summer/winter DST w/dateutil timezone #31155

AlexKirko commented Jan 20, 2020

jreback left a comment

mroeschke Jan 20, 2020

AlexKirko Jan 20, 2020

AlexKirko Jan 22, 2020

AlexKirko commented Jan 21, 2020 •

edited

Loading

jreback commented Jan 21, 2020

AlexKirko Jan 21, 2020 •

edited

Loading

AlexKirko Jan 22, 2020 •

edited

Loading

AlexKirko commented Jan 22, 2020 •

edited

Loading

AlexKirko commented Jan 22, 2020

jreback commented Jan 24, 2020

pganssle Jan 24, 2020

pganssle Jan 24, 2020

pganssle Jan 24, 2020

BUG: nonexistent Timestamp pre-summer/winter DST w/dateutil timezone #31155

BUG: nonexistent Timestamp pre-summer/winter DST w/dateutil timezone #31155

Conversation

AlexKirko commented Jan 20, 2020

jreback left a comment

Choose a reason for hiding this comment

mroeschke Jan 20, 2020

Choose a reason for hiding this comment

AlexKirko Jan 20, 2020

Choose a reason for hiding this comment

AlexKirko Jan 22, 2020

Choose a reason for hiding this comment

AlexKirko commented Jan 21, 2020 • edited Loading

jreback commented Jan 21, 2020

AlexKirko Jan 21, 2020 • edited Loading

Choose a reason for hiding this comment

AlexKirko Jan 22, 2020 • edited Loading

Choose a reason for hiding this comment

AlexKirko commented Jan 22, 2020 • edited Loading

AlexKirko commented Jan 22, 2020

jreback commented Jan 24, 2020

pganssle Jan 24, 2020

Choose a reason for hiding this comment

pganssle Jan 24, 2020

Choose a reason for hiding this comment

pganssle Jan 24, 2020

Choose a reason for hiding this comment

AlexKirko commented Jan 21, 2020 •

edited

Loading

AlexKirko Jan 21, 2020 •

edited

Loading

AlexKirko Jan 22, 2020 •

edited

Loading

AlexKirko commented Jan 22, 2020 •

edited

Loading