Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't select time with str after converting the calendar to noleap #9138

Closed
5 tasks done
juliettelavoie opened this issue Jun 19, 2024 · 8 comments · Fixed by #9192
Closed
5 tasks done

Can't select time with str after converting the calendar to noleap #9138

juliettelavoie opened this issue Jun 19, 2024 · 8 comments · Fixed by #9192

Comments

@juliettelavoie
Copy link

What happened?

I am trying to select a time period after converting the calendar to noleap and I receive the following error:

TypeError: '<' not supported between instances of 'cftime._cftime.DatetimeNoLeap' and 'str'

What did you expect to happen?

I expected the slicing to work as normal.

Minimal Complete Verifiable Example

import xarray as xr
import pandas as pd
import numpy as np
time=xr.cftime_range('2000-01-01','2006-01-01', freq='D', calendar='noleap')
temperature =  np.ones(len(time))
da = xr.DataArray(
    data=temperature,
    dims=["time"],
    coords=dict(time=time,),
)
out=da.sel(time=slice('2001','2002'))  # works if array is built with cftime
da1=da.convert_calendar('noleap')
out1=da1.sel(time=slice('2001','2002'))  # works if array is built with cftime and converted

time=pd.date_range('2000-01-01','2006-01-01', freq='D',)
temperature =  np.ones(len(time))
da2 = xr.DataArray(
    data=temperature,
    dims=["time"],
    coords=dict(time=time,),
)
out2=da2.sel(time=slice('2001','2002')) #  # works if array is built with pandas time
da3=da2.convert_calendar('noleap')
out3=da3.sel(time=slice('2001','2002')) # fails if array is built with pandas time and convert to noleap

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

KeyError                                  Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: '2001'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/pandas/core/indexes/base.py:6798, in Index.get_slice_bound(self, label, side)
   6797 try:
-> 6798     slc = self.get_loc(label)
   6799 except KeyError as err:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.

KeyError: '2001'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[1], line 24
     22 out2=da2.sel(time=slice('2001','2002')) #  # works if array is built with pandas time
     23 da3=da2.convert_calendar('noleap')
---> 24 out3=da3.sel(time=slice('2001','2002')) # fails if array is built with pandas time and convert to noleap

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1644, in DataArray.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1528 def sel(
   1529     self,
   1530     indexers: Mapping[Any, Any] | None = None,
   (...)
   1534     **indexers_kwargs: Any,
   1535 ) -> Self:
   1536     """Return a new DataArray whose data is given by selecting index
   1537     labels along the specified dimension(s).
   1538 
   (...)
   1642     Dimensions without coordinates: points
   1643     """
-> 1644     ds = self._to_temp_dataset().sel(
   1645         indexers=indexers,
   1646         drop=drop,
   1647         method=method,
   1648         tolerance=tolerance,
   1649         **indexers_kwargs,
   1650     )
   1651     return self._from_temp_dataset(ds)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataset.py:3126, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   3058 """Returns a new dataset with each array indexed by tick labels
   3059 along the specified dimension(s).
   3060 
   (...)
   3123 
   3124 """
   3125 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 3126 query_results = map_index_queries(
   3127     self, indexers=indexers, method=method, tolerance=tolerance
   3128 )
   3130 if drop:
   3131     no_scalar_variables = {}

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/indexing.py:192, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
    190         results.append(IndexSelResult(labels))
    191     else:
--> 192         results.append(index.sel(labels, **options))
    194 merged = merge_sel_results(results)
    196 # drop dimension coordinates found in dimension indexers
    197 # (also drop multi-index if any)
    198 # (.sel() already ensures alignment)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/indexes.py:758, in PandasIndex.sel(self, labels, method, tolerance)
    755 coord_name, label = next(iter(labels.items()))
    757 if isinstance(label, slice):
--> 758     indexer = _query_slice(self.index, label, coord_name, method, tolerance)
    759 elif is_dict_like(label):
    760     raise ValueError(
    761         "cannot use a dict-like object for selection on "
    762         "a dimension that does not have a MultiIndex"
    763     )

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/indexes.py:497, in _query_slice(index, label, coord_name, method, tolerance)
    493 if method is not None or tolerance is not None:
    494     raise NotImplementedError(
    495         "cannot use ``method`` argument if any indexers are slice objects"
    496     )
--> 497 indexer = index.slice_indexer(
    498     _sanitize_slice_element(label.start),
    499     _sanitize_slice_element(label.stop),
    500     _sanitize_slice_element(label.step),
    501 )
    502 if not isinstance(indexer, slice):
    503     # unlike pandas, in xarray we never want to silently convert a
    504     # slice indexer into an array indexer
    505     raise KeyError(
    506         "cannot represent labeled-based slice indexer for coordinate "
    507         f"{coord_name!r} with a slice over integer positions; the index is "
    508         "unsorted or non-unique"
    509     )

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pandas/core/indexes/base.py:6662, in Index.slice_indexer(self, start, end, step)
   6618 def slice_indexer(
   6619     self,
   6620     start: Hashable | None = None,
   6621     end: Hashable | None = None,
   6622     step: int | None = None,
   6623 ) -> slice:
   6624     """
   6625     Compute the slice indexer for input labels and step.
   6626 
   (...)
   6660     slice(1, 3, None)
   6661     """
-> 6662     start_slice, end_slice = self.slice_locs(start, end, step=step)
   6664     # return a slice
   6665     if not is_scalar(start_slice):

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pandas/core/indexes/base.py:6879, in Index.slice_locs(self, start, end, step)
   6877 start_slice = None
   6878 if start is not None:
-> 6879     start_slice = self.get_slice_bound(start, "left")
   6880 if start_slice is None:
   6881     start_slice = 0

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pandas/core/indexes/base.py:6801, in Index.get_slice_bound(self, label, side)
   6799 except KeyError as err:
   6800     try:
-> 6801         return self._searchsorted_monotonic(label, side)
   6802     except ValueError:
   6803         # raise the original KeyError
   6804         raise err

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pandas/core/indexes/base.py:6733, in Index._searchsorted_monotonic(self, label, side)
   6731 def _searchsorted_monotonic(self, label, side: Literal["left", "right"] = "left"):
   6732     if self.is_monotonic_increasing:
-> 6733         return self.searchsorted(label, side=side)
   6734     elif self.is_monotonic_decreasing:
   6735         # np.searchsorted expects ascending sort order, have to reverse
   6736         # everything for it to work (element ordering, search side and
   6737         # resulting value).
   6738         pos = self[::-1].searchsorted(
   6739             label, side="right" if side == "left" else "left"
   6740         )

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pandas/core/base.py:1352, in IndexOpsMixin.searchsorted(self, value, side, sorter)
   1348 if not isinstance(values, np.ndarray):
   1349     # Going through EA.searchsorted directly improves performance GH#38083
   1350     return values.searchsorted(value, side=side, sorter=sorter)
-> 1352 return algorithms.searchsorted(
   1353     values,
   1354     value,
   1355     side=side,
   1356     sorter=sorter,
   1357 )

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pandas/core/algorithms.py:1329, in searchsorted(arr, value, side, sorter)
   1325     arr = ensure_wrapped_if_datetimelike(arr)
   1327 # Argument 1 to "searchsorted" of "ndarray" has incompatible type
   1328 # "Union[NumpyValueArrayLike, ExtensionArray]"; expected "NumpyValueArrayLike"
-> 1329 return arr.searchsorted(value, side=side, sorter=sorter)

TypeError: '<' not supported between instances of 'cftime._cftime.DatetimeNoLeap' and 'str'

Anything else we need to know?

It was working with version 2024.05.0

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.105.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: ('fr_CA', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2
xarray: 2024.6.0
pandas: 2.2.1
numpy: 1.24.4
scipy: 1.11.2
netCDF4: 1.6.4
pydap: None
h5netcdf: 1.2.0
h5py: 3.9.0
zarr: 2.16.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.7
dask: 2023.9.2
distributed: 2023.9.2
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: 0.13.1
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.10.1
setuptools: 68.2.2
pip: 23.2.1
conda: installed
pytest: 7.4.2
mypy: None
IPython: 8.15.0
sphinx: 7.2.6

@juliettelavoie juliettelavoie added bug needs triage Issue that has not been reviewed by xarray team member labels Jun 19, 2024
Copy link

welcome bot commented Jun 19, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@spencerkclark
Copy link
Member

Thanks for the well-written report @juliettelavoie—a git bisect suggests this issue was introduced in #9002. I'll try and dig into it more when I get a chance.

@spencerkclark spencerkclark added topic-indexing topic-cftime and removed needs triage Issue that has not been reviewed by xarray team member labels Jun 19, 2024
@dcherian
Copy link
Contributor

@hmaarrfk do you have time to take a look here?

@spencerkclark
Copy link
Member

spencerkclark commented Jun 29, 2024

I think this is maybe the crux of the issue. convert_calendar initially assigns an array of times that includes NaNs to the time coordinate, which cannot be converted to a CFTimeIndex. In a subsequent step it drops those values. Previously after the drop the index would automatically be converted to a CFTimeIndex, but now it is not:

>>> import cftime; import numpy as np; import xarray as xr
>>> times_with_nan = [cftime.DatetimeNoLeap(2000, 1, 1), np.nan]
>>> da = xr.DataArray(range(2), dims=["time"], coords=[times_with_nan])
>>> da.indexes
Indexes:
    time     Index([2000-01-01 00:00:00, nan], dtype='object', name='time')
>>> da = da.where(da.time.notnull(), drop=True)
>>> da.indexes
Indexes:
    time     Index([2000-01-01 00:00:00], dtype='object', name='time')

It does seem maybe a bit surprising that this worked before. A workaround within convert_calendar would be to reassign the time coordinate after dropping the NaN values, which causes xarray to re-determine whether it can be converted to a CFTimeIndex:

>>> da["time"] = da.time
>>> da.indexes
Indexes:
    time     CFTimeIndex([2000-01-01 00:00:00],
            dtype='object', length=1, calendar='noleap', freq=None)

Or maybe clearer in this context would be to explicitly cast the index after the drop as a CFTimeIndex, which would raise instead of potentially silently letting a non-convertible index pass through:

>>> da["time"] = xr.CFTimeIndex(da.indexes["time"])
>>> da.indexes
Indexes:
    time     CFTimeIndex([2000-01-01 00:00:00],
            dtype='object', length=1, calendar='noleap', freq=None)

@hmaarrfk
Copy link
Contributor

sorry. i didn't reply i'll try to take a look (sometimes github mentions explode)

@hmaarrfk
Copy link
Contributor

my first feeling is that given the quality of the reproducer (as in it is very high) one could conceive of including it as a test and removing the offending optimization in #9002. While it does hurt me, I understand that this is likely the right thing to do avoid larger breakages.

@dcherian
Copy link
Contributor

Thanks @hmaarrfk looks like your PR may have only uncovered some implicit assumptions in the code base

@spencerkclark
Copy link
Member

spencerkclark commented Jun 29, 2024

No worries @hmaarrfk. It is somewhat odd that previously xarray would magically convert an Index to a CFTimeIndex once all NaNs were removed. I agree with @dcherian—this might just be something we need to fix within convert_calendar itself.

hmaarrfk added a commit to hmaarrfk/xarray that referenced this issue Jun 29, 2024
See discussion in pydata#9138

This commit and pull request mostly serves as a staging group for a
potential fix.

Test with:

```
pytest xarray/tests/test_cftimeindex.py::test_cftime_noleap_with_str
```
hmaarrfk added a commit to hmaarrfk/xarray that referenced this issue Jun 29, 2024
See discussion in pydata#9138

This commit and pull request mostly serves as a staging group for a
potential fix.

Test with:

```
pytest xarray/tests/test_cftimeindex.py::test_cftime_noleap_with_str
```
hmaarrfk added a commit to hmaarrfk/xarray that referenced this issue Jun 29, 2024
See discussion in pydata#9138

This commit and pull request mostly serves as a staging group for a
potential fix.

Test with:

```
pytest xarray/tests/test_cftimeindex.py::test_cftime_noleap_with_str
```
dcherian pushed a commit that referenced this issue Jul 11, 2024
* MRC -- Selecting with string for cftime

See discussion in #9138

This commit and pull request mostly serves as a staging group for a
potential fix.

Test with:

```
pytest xarray/tests/test_cftimeindex.py::test_cftime_noleap_with_str
```

* effectively remove fastpath

* Add docstring

* Revert "effectively remove fastpath"

This reverts commit 0f1a5a2.

* Fix by reassigning coordinate

* Update what's new entry

* Simplify if condition

---------

Co-authored-by: Spencer Clark <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants