Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulties with selecting from numpy.datetime64[ns] dimensions #7207

Closed
grahamfindlay opened this issue Oct 24, 2022 · 3 comments
Closed

Comments

@grahamfindlay
Copy link

What is your issue?

I have a DataArray ("spgs") containing time-frequency data, with a time dimension of dtype numpy.datetime64[ns]. I used to be able to select using:

# Select using datetime strings
 spgs.sel(time=slice("2022-10-13T09:00:00", "2022-10-13T21:00:00")
# Select using Timestamp objects
 rng = tuple(pd.to_datetime(x) for x in ["2022-10-13T09:00:00", "2022-10-13T21:00:00"])
 spgs.sel(time=slice(*rng)) 
 # Select using numpy.datetime64[ns] objects, such that rng[0].dtype == spgs.time.values.dtype
 rng = tuple(pd.to_datetime(["2022-10-13T09:00:00", "2022-10-13T21:00:00"]).values)
 spg.sel(time=slice(*rng)) 

None of these work after upgrading to v2022.10.0. The first method yields:

Traceback (most recent call last):
 File "<string>", line 1, in <module>
 File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/dataarray.py", line 1523, in sel
   ds = self._to_temp_dataset().sel(
 File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/dataset.py", line 2550, in sel
   query_results = map_index_queries(
 File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/indexing.py", line 183, in map_index_queries
   results.append(index.sel(labels, **options))  # type: ignore[call-arg]
 File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/indexes.py", line 434, in sel
   indexer = _query_slice(self.index, label, coord_name, method, tolerance)
 File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/indexes.py", line 210, in _query_slice
   raise KeyError(
KeyError: "cannot represent labeled-based slice indexer for coordinate 'time' with a slice over integer positions; the index is unsorted or non-unique"

The second two methods yield:

Traceback (most recent call last):
 File "pandas/_libs/index.pyx", line 545, in pandas._libs.index.DatetimeEngine.get_loc
 File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
 File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 1665651600000000000
...
KeyError: Timestamp('2022-10-13 09:00:00')

Interestingly, this works:

start = spgs.time.values.min()
stop = spgs.time.values.max()
spgs.sel(time=slice(start, stop))

This does not:

start = spgs.time.values.min()
stop = start + pd.to_timedelta('10s')
spgs.sel(time=slice(start, stop))

I filed this as an issue and not a bug, because from reading other issues here and over at pandas, it seems like this may be an unintended consequence of changes to Datetime/Timestamp handling, especially within pandas, rather than a bug with xarray per se. This is supported by the fact that downgrading xarray to 2022.9.0, without touching other dependencies (e.g. pandas), does not restore the old behavior.

@grahamfindlay grahamfindlay added the needs triage Issue that has not been reviewed by xarray team member label Oct 24, 2022
@grahamfindlay
Copy link
Author

Update: I was mistaken about the nature of the issue. I can load someone else's data with a datetime64[ns] dimension and select from it just fine. Meanwhile, I cannot select from my DataArray, even when I have replaced the datetime64[ns] dimension time with float64 values.

>> spgs = xr.open_dataarray("mydata.nc")
>> print(spgs)

<xarray.DataArray (channel: 5, frequency: 2049, time: 52549)>
[538364505 values with dtype=float32]
Coordinates:
  * frequency  (frequency) float32 0.0 0.1526 0.3052 ... 312.2 312.3 312.5
  * time       (time) float64 6.438 9.714 12.99 ... 1.729e+05 1.729e+05
  * channel    (channel) object 'lmws' 'spws' 'rips' 'spin' 'dgws'
    x          (channel, time) float64 ...
    y          (channel, time) float64 ...
Attributes:
    units:    uV
    fs:       625.0008026193788

 >> spgs.sel(time=slice(20, 30))
KeyError: 20

Error traceback

@grahamfindlay
Copy link
Author

Okay, it gets even weirder. This does not work:

>>> dat = np.arange(spgs.time.values.size).astype('float')
>>> foo = xr.DataArray(dat, dims=("time",), coords={"time": spgs.time.values})
>>> print(foo)

<xarray.DataArray (time: 52549)>
array([0.0000e+00, 1.0000e+00, 2.0000e+00, ..., 5.2546e+04, 5.2547e+04,
       5.2548e+04])
Coordinates:
  * time     (time) float64 6.438 9.714 12.99 ... 1.729e+05 1.729e+05 1.729e+05

>>> foo.sel(time=slice(20, 30)) # KeyError

This does:

>>> foo = xr.DataArray(dat, dims=("time",), coords={"time": dat})
>>> print(foo)

<xarray.DataArray (time: 52549)>
array([0.0000e+00, 1.0000e+00, 2.0000e+00, ..., 5.2546e+04, 5.2547e+04,
       5.2548e+04])
Coordinates:
  * time     (time) float64 0.0 1.0 2.0 3.0 ... 5.255e+04 5.255e+04 5.255e+04
  
  >>> foo.sel(time=slice(20, 30))
  
<xarray.DataArray (time: 11)>
array([20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.])
Coordinates:
  * time     (time) float64 20.0 21.0 22.0 23.0 24.0 ... 27.0 28.0 29.0 30.0

I am baffled.

@grahamfindlay
Copy link
Author

Figured it out. Somehow I got my hands on a piece of data where the timestamps are ever-so-slightly non-monotonic. 2 of the 50k data points are out of order. That makes this a duplicate of #5012 and pandas #42331. Closing this, since those are still open. I would never have figured out what the problem was from the error message 😢, but that's a pandas issue. Sorry for the confusion.

@kmuehlbauer kmuehlbauer added upstream issue and removed needs triage Issue that has not been reviewed by xarray team member labels Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants