BUG: nonexistent Timestamp pre-summer/winter DST change with dateutil timezone #31043

AlexKirko · 2020-01-15T13:48:10Z

Code Sample, a copy-pastable example if possible

This is fine:

>>> pd.__version__
'0.26.0.dev0+1790.gdd94e0db9'
>>> epoch =  1552211999999999871
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 01:59:59.999999871-0800', tz='dateutil/US/Pacific')
>>> t.value
1552211999999999871
>>> pd.Timestamp(t)
Timestamp('2019-03-10 01:59:59.999999871-0800', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552211999999999871

This is also fine:

>>> epoch =  1552212000000000000
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 03:00:00-0700', tz='dateutil/US/Pacific')
>>>
>>> t.value
1552212000000000000
>>> pd.Timestamp(t)
Timestamp('2019-03-10 03:00:00-0700', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552212000000000000

Meanwhile, this breaks representation and gets us nonexistent times:

>>> epoch =  1552211999999999872
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 01:59:59.999999872-0700', tz='dateutil/US/Pacific')
>>> t.value
1552211999999999872
>>> pd.Timestamp(t)
Timestamp('2019-03-10 01:59:59.999999872-0800', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552208399999999872

And right on the cusp, the value breaks too:

>>> epoch =  1552211999999999999
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 01:59:59.999999999-0700', tz='dateutil/US/Pacific')
>>> t.value
1552211999999999999
>>> pd.Timestamp(t)
Timestamp('2019-03-10 01:59:59.999999999-0800', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552208399999999999

Problem description

When we use dateutil timezones and try to create a Timestamp that is right on the cusp of the change from winter to summer time, we can get nonexistent times (the clock is supposed to jump from 2 A.M. to 3 A.M. and yet we get 2:59:59).

I've investigated this, and it appears that at 128 nanoseconds before the clock jump, DST offset and utcoffset in dateutil change, so we end up in a situation when the offsets are what they are supposed to be after the jump, but the time hasn't jumped yet, so the constructor returns a nonexistent time. Calling the constructor again moves the clock 1 hour back.

This can be checked out with:

>>> epoch =  1552211999999999872
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t.tz.dst(t)
datetime.timedelta(seconds=3600)

My assumption is that when we need to determine UTC offset, rounding happens at some point, and we round to epoch=1552212000000000000, get offset, and then use it on time pre-clock jump.

I'd like to try to fix this one.

Expected Output

>>> epoch =  1552211999999999872
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 01:59:59.999999872-0800', tz='dateutil/US/Pacific')
>>> t.value
1552211999999999872
>>> pd.Timestamp(t)
Timestamp('2019-03-10 01:59:59.999999872-0800', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552208399999999872

Notes

This was thought to be part of #24329 but turned to be a separate bug as I worked on closing that issue in PR #30995.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : dd94e0d
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : ru_RU.UTF-8
LOCALE : None.None

pandas : 0.26.0.dev0+1790.gdd94e0db9
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200106
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 5.1.5
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
s3fs : 0.4.0
scipy : 1.3.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.47.0

The text was updated successfully, but these errors were encountered:

jreback · 2020-01-15T13:57:42Z

cc @pganssle

pganssle · 2020-01-15T16:19:49Z

I am not sure why the same thing is not happening with pytz - it's probably just going through a different code path. I believe what's happening here is that at some point in the process pandas (or, less likely, dateutil) is converting the integer timestamp into a float before truncating to the nearest microsecond, which is why the transition point happens at 1552211999999999872. Note:

>>> print(f"{float(1552211999999999872):0.0f}")
1552212000000000000
>>> print(f"{float(1552211999999999871):0.0f}") 
1552211999999999744

Seems that in that region of the number line, floats are spaced apart by 256, and the behavior for converting invalid floats into floats is to round to the nearest valid float (which in this case crosses boundary of seconds).

Very interesting situation. I think if you track down where the int to double conversion is happening and prevent it or otherwise make sure it doesn't round up you should be fine. Presumably it's in one of the dateutil-specific code branches. The current version of dateutil.tz.tz does not contain any instances of float or any division operations (plus, it wouldn't have any use for a datetime specified in nanoseconds-since-epoch anyway), so it's almost certainly in pandas.

AlexKirko · 2020-01-15T17:56:34Z

@pganssle Thanks, this narrows it down a lot. I'll investigate this issue further and try to find where the rounding up happens.

AlexKirko · 2020-01-15T17:56:49Z

take

mroeschke added Bug Timezones Timezone data dtype labels Jan 15, 2020

mroeschke mentioned this issue Jan 15, 2020

BUG: Fix Timestamp constructor changes value on ambiguous DST #30995

Merged

5 tasks

github-actions bot assigned AlexKirko Jan 15, 2020

AlexKirko mentioned this issue Jan 20, 2020

BUG: nonexistent Timestamp pre-summer/winter DST w/dateutil timezone #31155

Merged

5 tasks

jreback added this to the 1.1 milestone Jan 20, 2020

jreback closed this as completed in #31155 Jan 24, 2020

pganssle mentioned this issue Jan 27, 2020

BUG: Timedelta components rounded by float imprecision #31354

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: nonexistent Timestamp pre-summer/winter DST change with dateutil timezone #31043

BUG: nonexistent Timestamp pre-summer/winter DST change with dateutil timezone #31043

AlexKirko commented Jan 15, 2020

INSTALLED VERSIONS

jreback commented Jan 15, 2020

pganssle commented Jan 15, 2020

AlexKirko commented Jan 15, 2020

AlexKirko commented Jan 15, 2020

BUG: nonexistent Timestamp pre-summer/winter DST change with dateutil timezone #31043

BUG: nonexistent Timestamp pre-summer/winter DST change with dateutil timezone #31043

Comments

AlexKirko commented Jan 15, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Notes

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Jan 15, 2020

pganssle commented Jan 15, 2020

AlexKirko commented Jan 15, 2020

AlexKirko commented Jan 15, 2020

Output of `pd.show_versions()`