You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we use dateutil timezones and try to create a Timestamp that is right on the cusp of the change from winter to summer time, we can get nonexistent times (the clock is supposed to jump from 2 A.M. to 3 A.M. and yet we get 2:59:59).
I've investigated this, and it appears that at 128 nanoseconds before the clock jump, DST offset and utcoffset in dateutil change, so we end up in a situation when the offsets are what they are supposed to be after the jump, but the time hasn't jumped yet, so the constructor returns a nonexistent time. Calling the constructor again moves the clock 1 hour back.
My assumption is that when we need to determine UTC offset, rounding happens at some point, and we round to epoch=1552212000000000000, get offset, and then use it on time pre-clock jump.
I am not sure why the same thing is not happening with pytz - it's probably just going through a different code path. I believe what's happening here is that at some point in the process pandas (or, less likely, dateutil) is converting the integer timestamp into a float before truncating to the nearest microsecond, which is why the transition point happens at 1552211999999999872. Note:
Seems that in that region of the number line, floats are spaced apart by 256, and the behavior for converting invalid floats into floats is to round to the nearest valid float (which in this case crosses boundary of seconds).
Very interesting situation. I think if you track down where the int to double conversion is happening and prevent it or otherwise make sure it doesn't round up you should be fine. Presumably it's in one of the dateutil-specific code branches. The current version of dateutil.tz.tz does not contain any instances of float or any division operations (plus, it wouldn't have any use for a datetime specified in nanoseconds-since-epoch anyway), so it's almost certainly in pandas.
Code Sample, a copy-pastable example if possible
This is fine:
This is also fine:
Meanwhile, this breaks representation and gets us nonexistent times:
And right on the cusp, the value breaks too:
Problem description
When we use
dateutil
timezones and try to create a Timestamp that is right on the cusp of the change from winter to summer time, we can get nonexistent times (the clock is supposed to jump from 2 A.M. to 3 A.M. and yet we get 2:59:59).I've investigated this, and it appears that at 128 nanoseconds before the clock jump, DST offset and utcoffset in
dateutil
change, so we end up in a situation when the offsets are what they are supposed to be after the jump, but the time hasn't jumped yet, so the constructor returns a nonexistent time. Calling the constructor again moves the clock 1 hour back.This can be checked out with:
My assumption is that when we need to determine UTC offset, rounding happens at some point, and we round to
epoch=1552212000000000000
, get offset, and then use it on time pre-clock jump.I'd like to try to fix this one.
Expected Output
Notes
This was thought to be part of #24329 but turned to be a separate bug as I worked on closing that issue in PR #30995.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : dd94e0d
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : ru_RU.UTF-8
LOCALE : None.None
pandas : 0.26.0.dev0+1790.gdd94e0db9
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200106
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 5.1.5
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
s3fs : 0.4.0
scipy : 1.3.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.47.0
The text was updated successfully, but these errors were encountered: