Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: infer Timestamp(iso8601string) resolution #49737

Merged
merged 21 commits into from
Dec 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,8 @@ Other API changes
- :meth:`Index.astype` now allows casting from ``float64`` dtype to datetime-like dtypes, matching :class:`Series` behavior (:issue:`49660`)
- Passing data with dtype of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; timedelta64 data with lower resolution will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
- Passing ``dtype`` of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; passing a dtype with lower resolution for :class:`Series` or :class:`DataFrame` will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
- Passing a ``np.datetime64`` object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`)
- Passing a ``np.datetime64`` object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", "us", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`)
- Passing a string in ISO-8601 format to :class:`Timestamp` will retain the resolution of the parsed input if it is "s", "ms", "us", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49737`)
- The ``other`` argument in :meth:`DataFrame.mask` and :meth:`Series.mask` now defaults to ``no_default`` instead of ``np.nan`` consistent with :meth:`DataFrame.where` and :meth:`Series.where`. Entries will be filled with the corresponding NULL value (``np.nan`` for numpy dtypes, ``pd.NA`` for extension dtypes). (:issue:`49111`)
- Changed behavior of :meth:`Series.quantile` and :meth:`DataFrame.quantile` with :class:`SparseDtype` to retain sparse dtype (:issue:`49583`)
- When creating a :class:`Series` with a object-dtype :class:`Index` of datetime objects, pandas no longer silently converts the index to a :class:`DatetimeIndex` (:issue:`39307`, :issue:`23598`)
Expand Down Expand Up @@ -798,6 +799,7 @@ Datetimelike
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed as :class:`NaT`, for compatibility with how is done for ISO8601 formats (:issue:`50251`)
- Bug in :class:`Timestamp` was showing ``UserWarning``, which was not actionable by users, when parsing non-ISO8601 delimited date strings (:issue:`50232`)
- Bug in :func:`to_datetime` was showing misleading ``ValueError`` when parsing dates with format containing ISO week directive and ISO weekday directive (:issue:`50308`)
- Bug in :meth:`Timestamp.round` when the ``freq`` argument has zero-duration (e.g. "0ns") returning incorrect results instead of raising (:issue:`49737`)
- Bug in :func:`to_datetime` was not raising ``ValueError`` when invalid format was passed and ``errors`` was ``'ignore'`` or ``'coerce'`` (:issue:`50266`)
- Bug in :class:`DateOffset` was throwing ``TypeError`` when constructing with milliseconds and another super-daily argument (:issue:`49897`)
-
Expand Down
51 changes: 33 additions & 18 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -405,7 +405,8 @@ cdef _TSObject convert_datetime_to_tsobject(


cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
int tzoffset, tzinfo tz=None):
int tzoffset, tzinfo tz=None,
NPY_DATETIMEUNIT reso=NPY_FR_ns):
"""
Convert a datetimestruct `dts`, along with initial timezone offset
`tzoffset` to a _TSObject (with timezone object `tz` - optional).
Expand All @@ -416,6 +417,7 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
tzoffset: int
tz : tzinfo or None
timezone for the timezone-aware output.
reso : NPY_DATETIMEUNIT, default NPY_FR_ns

Returns
-------
Expand All @@ -427,16 +429,19 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
datetime dt
Py_ssize_t pos

value = npy_datetimestruct_to_datetime(NPY_FR_ns, &dts)
value = npy_datetimestruct_to_datetime(reso, &dts)
obj.dts = dts
obj.tzinfo = timezone(timedelta(minutes=tzoffset))
obj.value = tz_localize_to_utc_single(value, obj.tzinfo)
obj.value = tz_localize_to_utc_single(
value, obj.tzinfo, ambiguous=None, nonexistent=None, creso=reso
)
obj.creso = reso
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for use by ensure_reso?

if tz is None:
check_overflows(obj, NPY_FR_ns)
check_overflows(obj, reso)
return obj

cdef:
Localizer info = Localizer(tz, NPY_FR_ns)
Localizer info = Localizer(tz, reso)

# Infer fold from offset-adjusted obj.value
# see PEP 495 https://www.python.org/dev/peps/pep-0495/#the-fold-attribute
Expand All @@ -454,6 +459,7 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
obj.dts.us, obj.tzinfo, fold=obj.fold)
obj = convert_datetime_to_tsobject(
dt, tz, nanos=obj.dts.ps // 1000)
obj.ensure_reso(reso) # TODO: more performant to get reso right up front?
return obj


Expand Down Expand Up @@ -490,7 +496,7 @@ cdef _TSObject _convert_str_to_tsobject(object ts, tzinfo tz, str unit,
int out_local = 0, out_tzoffset = 0, string_to_dts_failed
datetime dt
int64_t ival
NPY_DATETIMEUNIT out_bestunit
NPY_DATETIMEUNIT out_bestunit, reso

if len(ts) == 0 or ts in nat_strings:
ts = NaT
Expand All @@ -513,19 +519,26 @@ cdef _TSObject _convert_str_to_tsobject(object ts, tzinfo tz, str unit,
&out_tzoffset, False
)
if not string_to_dts_failed:
reso = get_supported_reso(out_bestunit)
try:
check_dts_bounds(&dts, NPY_FR_ns)
check_dts_bounds(&dts, reso)
if out_local == 1:
return _create_tsobject_tz_using_offset(dts,
out_tzoffset, tz)
return _create_tsobject_tz_using_offset(
dts, out_tzoffset, tz, reso
)
else:
ival = npy_datetimestruct_to_datetime(NPY_FR_ns, &dts)
ival = npy_datetimestruct_to_datetime(reso, &dts)
if tz is not None:
# shift for _localize_tso
ival = tz_localize_to_utc_single(ival, tz,
ambiguous="raise")

return convert_to_tsobject(ival, tz, None, False, False)
ival = tz_localize_to_utc_single(
ival, tz, ambiguous="raise", nonexistent=None, creso=reso
)
obj = _TSObject()
obj.dts = dts
obj.value = ival
obj.creso = reso
maybe_localize_tso(obj, tz, obj.creso)
return obj
Comment on lines -528 to +541
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because convert_to_tsobject doesn't (yet?) allow passing a reso to set before calling maybe_localize_tso?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some combination of that and me not wanting to recurse. honestly i dont remember how much of each off the top of my head


except OutOfBoundsDatetime:
# GH#19382 for just-barely-OutOfBounds falling back to dateutil
Expand All @@ -538,10 +551,12 @@ cdef _TSObject _convert_str_to_tsobject(object ts, tzinfo tz, str unit,
pass

try:
dt = parse_datetime_string(ts, dayfirst=dayfirst,
yearfirst=yearfirst)
except (ValueError, OverflowError):
raise ValueError("could not convert string to Timestamp")
# TODO: use the one that returns reso
dt = parse_datetime_string(
ts, dayfirst=dayfirst, yearfirst=yearfirst
)
except (ValueError, OverflowError) as err:
raise ValueError("could not convert string to Timestamp") from err

return convert_datetime_to_tsobject(dt, tz)

Expand Down
6 changes: 5 additions & 1 deletion pandas/_libs/tslibs/offsets.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,11 @@ def apply_wraps(func):

result = func(self, other)

result = (<_Timestamp>Timestamp(result))._as_creso(other._creso)
result2 = Timestamp(result).as_unit(other.unit)
if result == result2:
# i.e. the conversion is non-lossy, not the case for e.g.
# test_milliseconds_combination
result = result2

if self._adjust_dst:
result = result.tz_localize(tz)
Expand Down
14 changes: 12 additions & 2 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,7 @@ cdef class _Timestamp(ABCTimestamp):
# cython semantics, args have been switched and this is __radd__
# TODO(cython3): remove this it moved to __radd__
return other.__add__(self)

return NotImplemented

def __radd__(self, other):
Expand Down Expand Up @@ -1560,8 +1561,17 @@ class Timestamp(_Timestamp):
cdef:
int64_t nanos

to_offset(freq).nanos # raises on non-fixed freq
nanos = delta_to_nanoseconds(to_offset(freq), self._creso)
freq = to_offset(freq)
freq.nanos # raises on non-fixed freq
nanos = delta_to_nanoseconds(freq, self._creso)
if nanos == 0:
if freq.nanos == 0:
raise ValueError("Division by zero in rounding")
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved

# e.g. self.unit == "s" and sub-second freq
return self

# TODO: problem if nanos==0

if self.tz is not None:
value = self.tz_localize(None).value
Expand Down
10 changes: 4 additions & 6 deletions pandas/core/computation/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
import numpy as np

from pandas._libs.tslibs import (
NaT,
Timedelta,
Timestamp,
)
Expand Down Expand Up @@ -216,17 +215,16 @@ def stringify(value):
if isinstance(v, (int, float)):
v = stringify(v)
v = ensure_decoded(v)
v = Timestamp(v)
if v is not NaT:
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
v = v.as_unit("ns") # pyright: ignore[reportGeneralTypeIssues]
v = Timestamp(v).as_unit("ns")
if v.tz is not None:
v = v.tz_convert("UTC")
return TermValue(v, v.value, kind)
elif kind in ("timedelta64", "timedelta"):
if isinstance(v, str):
v = Timedelta(v).value
v = Timedelta(v)
else:
v = Timedelta(v, unit="s").value
v = Timedelta(v, unit="s")
v = v.as_unit("ns").value
return TermValue(int(v), v, kind)
elif meta == "category":
metadata = extract_array(self.metadata, extract_numpy=True)
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -2085,7 +2085,7 @@ def _adjust_dates_anchored(
elif origin == "start":
origin_nanos = first.value
elif isinstance(origin, Timestamp):
origin_nanos = origin.value
origin_nanos = origin.as_unit("ns").value
elif origin in ["end", "end_day"]:
origin_last = last if origin == "end" else last.ceil("D")
sub_freq_times = (origin_last.value - first.value) // freq.nanos
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/arithmetic/test_datetime64.py
Original file line number Diff line number Diff line change
Expand Up @@ -1699,15 +1699,15 @@ def test_datetimeindex_sub_timestamp_overflow(self):
dtimax = pd.to_datetime(["2021-12-28 17:19", Timestamp.max])
dtimin = pd.to_datetime(["2021-12-28 17:19", Timestamp.min])

tsneg = Timestamp("1950-01-01")
tsneg = Timestamp("1950-01-01").as_unit("ns")
ts_neg_variants = [
tsneg,
tsneg.to_pydatetime(),
tsneg.to_datetime64().astype("datetime64[ns]"),
tsneg.to_datetime64().astype("datetime64[D]"),
]

tspos = Timestamp("1980-01-01")
tspos = Timestamp("1980-01-01").as_unit("ns")
ts_pos_variants = [
tspos,
tspos.to_pydatetime(),
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/arrays/test_timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def test_add_pdnat(self, tda):
# TODO: 2022-07-11 this is the only test that gets to DTA.tz_convert
# or tz_localize with non-nano; implement tests specific to that.
def test_add_datetimelike_scalar(self, tda, tz_naive_fixture):
ts = pd.Timestamp("2016-01-01", tz=tz_naive_fixture)
ts = pd.Timestamp("2016-01-01", tz=tz_naive_fixture).as_unit("ns")

expected = tda.as_unit("ns") + ts
res = tda + ts
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/datetimes/methods/test_astype.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ def _check_rng(rng):
)
def test_integer_index_astype_datetime(self, tz, dtype):
# GH 20997, 20964, 24559
val = [Timestamp("2018-01-01", tz=tz).value]
val = [Timestamp("2018-01-01", tz=tz).as_unit("ns").value]
result = Index(val, name="idx").astype(dtype)
expected = DatetimeIndex(["2018-01-01"], tz=tz, name="idx")
tm.assert_index_equal(result, expected)
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/datetimes/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -804,7 +804,7 @@ def test_constructor_timestamp_near_dst(self):
)
def test_constructor_with_int_tz(self, klass, box, tz, dtype):
# GH 20997, 20964
ts = Timestamp("2018-01-01", tz=tz)
ts = Timestamp("2018-01-01", tz=tz).as_unit("ns")
result = klass(box([ts.value]), dtype=dtype)
expected = klass([ts])
assert result == expected
Expand Down
4 changes: 3 additions & 1 deletion pandas/tests/io/json/test_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -973,7 +973,9 @@ def test_mixed_timedelta_datetime(self):
ts = Timestamp("20130101")
frame = DataFrame({"a": [td, ts]}, dtype=object)

expected = DataFrame({"a": [pd.Timedelta(td).as_unit("ns").value, ts.value]})
expected = DataFrame(
{"a": [pd.Timedelta(td).as_unit("ns").value, ts.as_unit("ns").value]}
)
result = read_json(frame.to_json(date_unit="ns"), dtype={"a": "int64"})
tm.assert_frame_equal(result, expected, check_index_type=False)

Expand Down
5 changes: 3 additions & 2 deletions pandas/tests/scalar/timedelta/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,13 +99,14 @@ def test_td_add_datetimelike_scalar(self, op):
assert result is NaT

def test_td_add_timestamp_overflow(self):
ts = Timestamp("1700-01-01").as_unit("ns")
msg = "Cannot cast 259987 from D to 'ns' without overflow."
with pytest.raises(OutOfBoundsTimedelta, match=msg):
Timestamp("1700-01-01") + Timedelta(13 * 19999, unit="D")
ts + Timedelta(13 * 19999, unit="D")

msg = "Cannot cast 259987 days 00:00:00 to unit='ns' without overflow"
with pytest.raises(OutOfBoundsTimedelta, match=msg):
Timestamp("1700-01-01") + timedelta(days=13 * 19999)
ts + timedelta(days=13 * 19999)

@pytest.mark.parametrize("op", [operator.add, ops.radd])
def test_td_add_td(self, op):
Expand Down
10 changes: 5 additions & 5 deletions pandas/tests/scalar/timestamp/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def test_overflow_offset_raises(self):
# xref https://github.com/statsmodels/statsmodels/issues/3374
# ends up multiplying really large numbers which overflow

stamp = Timestamp("2017-01-13 00:00:00")
stamp = Timestamp("2017-01-13 00:00:00").as_unit("ns")
offset_overflow = 20169940 * offsets.Day(1)
msg = (
"the add operation between "
Expand All @@ -59,7 +59,7 @@ def test_overflow_offset_raises(self):
# xref https://github.com/pandas-dev/pandas/issues/14080
# used to crash, so check for proper overflow exception

stamp = Timestamp("2000/1/1")
stamp = Timestamp("2000/1/1").as_unit("ns")
offset_overflow = to_offset("D") * 100**5

lmsg3 = (
Expand All @@ -77,8 +77,8 @@ def test_overflow_offset_raises(self):
def test_overflow_timestamp_raises(self):
# https://github.com/pandas-dev/pandas/issues/31774
msg = "Result is too large"
a = Timestamp("2101-01-01 00:00:00")
b = Timestamp("1688-01-01 00:00:00")
a = Timestamp("2101-01-01 00:00:00").as_unit("ns")
b = Timestamp("1688-01-01 00:00:00").as_unit("ns")

with pytest.raises(OutOfBoundsDatetime, match=msg):
a - b
Expand Down Expand Up @@ -239,7 +239,7 @@ def test_add_int_with_freq(self, ts, other):
@pytest.mark.parametrize("shape", [(6,), (2, 3)])
def test_addsub_m8ndarray(self, shape):
# GH#33296
ts = Timestamp("2020-04-04 15:45")
ts = Timestamp("2020-04-04 15:45").as_unit("ns")
other = np.arange(6).astype("m8[h]").reshape(shape)

result = ts + other
Expand Down
Loading