Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue to use nanosecond-precision Timestamps in precision-sensitive areas #7731

Merged
merged 12 commits into from
Apr 13, 2023
Merged
2 changes: 1 addition & 1 deletion ci/requirements/all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ dependencies:
- numbagg
- numpy<1.24
- packaging
- pandas<2
- pandas
- pint
- pip
- pseudonetcdf
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ dependencies:
- numba
- numpy>=1.21,<1.24
- packaging>=21.3
- pandas>=1.4,<2
- pandas>=1.4
- pooch
- pip
- pre-commit
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-py311.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ dependencies:
- numexpr
- numpy
- packaging
- pandas<2
- pandas
- pint
- pip
- pooch
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-windows-py311.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ dependencies:
# - numbagg
- numpy
- packaging
- pandas<2
- pandas
- pint
- pip
- pre-commit
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ dependencies:
- numbagg
- numpy<1.24
- packaging
- pandas<2
- pandas
- pint
- pip
- pre-commit
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ dependencies:
- numexpr
- numpy<1.24
- packaging
- pandas<2
- pandas
- pint
- pip
- pooch
Expand Down
16 changes: 11 additions & 5 deletions doc/user-guide/weather-climate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,14 @@ CF-compliant coordinate variables

.. _CFTimeIndex:

Non-standard calendars and dates outside the Timestamp-valid range
------------------------------------------------------------------
Non-standard calendars and dates outside the nanosecond-precision range
-----------------------------------------------------------------------

Through the standalone ``cftime`` library and a custom subclass of
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars commonly used in climate science or dates
using a standard calendar, but outside the `Timestamp-valid range`_
using a standard calendar, but outside the `nanosecond-precision range`_
(approximately between years 1678 and 2262).

.. note::
Expand All @@ -75,13 +75,19 @@ using a standard calendar, but outside the `Timestamp-valid range`_
any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range.
- Any dates are outside the nanosecond-precision range.

Otherwise pandas-compatible dates from a standard calendar will be
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
and their full set of associated features.

As of pandas version 2.0.0, pandas supports non-nanosecond precision datetime
values. For the time being, xarray still automatically casts datetime values
to nanosecond-precision for backwards compatibility with older pandas
versions; however, this is something we would like to relax going forward.
See :issue:`7493` for more discussion.

For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
:py:class:`~xarray.CFTimeIndex` will automatically be used:
Expand Down Expand Up @@ -235,6 +241,6 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:

da.resample(time="81T", closed="right", label="right", offset="3T").mean()

.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _nanosecond-precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _ISO 8601 standard: https://en.wikipedia.org/wiki/ISO_8601
.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#partial-string-indexing
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ include_package_data = True
python_requires = >=3.9
install_requires =
numpy >= 1.21 # recommended to use >= 1.22 for full quantile method support
pandas >= 1.4, <2
pandas >= 1.4
packaging >= 21.3

[options.extras_require]
Expand Down
13 changes: 10 additions & 3 deletions xarray/coding/cftime_offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,12 @@
format_cftime_datetime,
)
from xarray.core.common import _contains_datetime_like_objects, is_np_datetime_like
from xarray.core.pdcompat import NoDefault, count_not_none, no_default
from xarray.core.pdcompat import (
NoDefault,
count_not_none,
nanosecond_precision_timestamp,
no_default,
)
from xarray.core.utils import emit_user_level_warning

try:
Expand Down Expand Up @@ -1286,8 +1291,10 @@ def date_range_like(source, calendar, use_cftime=None):
if is_np_datetime_like(source.dtype):
# We want to use datetime fields (datetime64 object don't have them)
source_calendar = "standard"
source_start = pd.Timestamp(source_start)
source_end = pd.Timestamp(source_end)
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
source_start = nanosecond_precision_timestamp(source_start)
source_end = nanosecond_precision_timestamp(source_end)
else:
if isinstance(source, CFTimeIndex):
source_calendar = source.calendar
Expand Down
2 changes: 1 addition & 1 deletion xarray/coding/cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -613,7 +613,7 @@ def to_datetimeindex(self, unsafe=False):
------
ValueError
If the CFTimeIndex contains dates that are not possible in the
standard calendar or outside the pandas.Timestamp-valid range.
standard calendar or outside the nanosecond-precision range.

Warns
-----
Expand Down
20 changes: 16 additions & 4 deletions xarray/coding/times.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from xarray.core import indexing
from xarray.core.common import contains_cftime_datetimes, is_np_datetime_like
from xarray.core.formatting import first_n_items, format_timestamp, last_item
from xarray.core.pdcompat import nanosecond_precision_timestamp
from xarray.core.pycompat import is_duck_dask_array
from xarray.core.variable import Variable

Expand Down Expand Up @@ -224,7 +225,9 @@ def _decode_datetime_with_pandas(
delta, ref_date = _unpack_netcdf_time_units(units)
delta = _netcdf_to_numpy_timeunit(delta)
try:
ref_date = pd.Timestamp(ref_date)
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
ref_date = nanosecond_precision_timestamp(ref_date)
except ValueError:
# ValueError is raised by pd.Timestamp for non-ISO timestamp
# strings, in which case we fall back to using cftime
Expand Down Expand Up @@ -391,7 +394,9 @@ def infer_datetime_units(dates) -> str:
dates = to_datetime_unboxed(dates)
dates = dates[pd.notnull(dates)]
reference_date = dates[0] if len(dates) > 0 else "1970-01-01"
reference_date = pd.Timestamp(reference_date)
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
reference_date = nanosecond_precision_timestamp(reference_date)
else:
reference_date = dates[0] if len(dates) > 0 else "1970-01-01"
reference_date = format_cftime_datetime(reference_date)
Expand Down Expand Up @@ -432,14 +437,16 @@ def cftime_to_nptime(times, raise_on_invalid: bool = True) -> np.ndarray:
If raise_on_invalid is True (default), invalid dates trigger a ValueError.
Otherwise, the invalid element is replaced by np.NaT."""
times = np.asarray(times)
# TODO: the strict enforcement of nanosecond precision datetime values can
# be relaxed when addressing GitHub issue #7493.
new = np.empty(times.shape, dtype="M8[ns]")
for i, t in np.ndenumerate(times):
try:
# Use pandas.Timestamp in place of datetime.datetime, because
# NumPy casts it safely it np.datetime64[ns] for dates outside
# 1678 to 2262 (this is not currently the case for
# datetime.datetime).
dt = pd.Timestamp(
dt = nanosecond_precision_timestamp(
t.year, t.month, t.day, t.hour, t.minute, t.second, t.microsecond
)
except ValueError as e:
Expand Down Expand Up @@ -498,6 +505,10 @@ def convert_time_or_go_back(date, date_type):

This is meant to convert end-of-month dates into a new calendar.
"""
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
if date_type == pd.Timestamp:
date_type = nanosecond_precision_timestamp
try:
return date_type(
date.year,
Expand Down Expand Up @@ -641,7 +652,8 @@ def encode_cf_datetime(

delta_units = _netcdf_to_numpy_timeunit(delta)
time_delta = np.timedelta64(1, delta_units).astype("timedelta64[ns]")
ref_date = pd.Timestamp(_ref_date)

ref_date = nanosecond_precision_timestamp(_ref_date)

# If the ref_date Timestamp is timezone-aware, convert to UTC and
# make it timezone-naive (GH 2649).
Expand Down
13 changes: 13 additions & 0 deletions xarray/core/pdcompat.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
from typing import Literal

import pandas as pd
from packaging.version import Version

from xarray.coding import cftime_offsets

Expand Down Expand Up @@ -91,3 +92,15 @@ def _convert_base_to_offset(base, freq, index):
return base * freq.as_timedelta() // freq.n
else:
raise ValueError("Can only resample using a DatetimeIndex or CFTimeIndex.")


def nanosecond_precision_timestamp(*args, **kwargs):
spencerkclark marked this conversation as resolved.
Show resolved Hide resolved
"""Return a nanosecond-precision Timestamp object.

Note this function should no longer be needed after addressing GitHub issue
#7493.
"""
if Version(pd.__version__) >= Version("2.0.0"):
return pd.Timestamp(*args, **kwargs).as_unit("ns")
else:
return pd.Timestamp(*args, **kwargs)