Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in time coordinate values in xarray tutorial dataset loaded with numpy v2 #9179

Closed
5 tasks done
scottyhq opened this issue Jun 26, 2024 · 2 comments · Fixed by #9182
Closed
5 tasks done
Labels

Comments

@scottyhq
Copy link
Contributor

What happened?

Time coordinate values are significantly different (second precision) if numpy v2 in the environment for the xarray "air temperature" tutorial dataset. This leads to discrepancies and errors in selection by date strings.

Numpy v2.0.0

array(['2013-01-01T00:02:06.757437440', '2013-01-01T05:59:27.234179072',
       '2013-01-01T11:56:47.710920704', ...,
       '2014-12-31T05:58:10.831327232', '2014-12-31T11:55:31.308068864',
       '2014-12-31T18:02:01.540624384'], dtype='datetime64[ns]')

Numpy v1.26.4

array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
       '2013-01-01T12:00:00.000000000', ...,
       '2014-12-31T06:00:00.000000000', '2014-12-31T12:00:00.000000000',
       '2014-12-31T18:00:00.000000000'], dtype='datetime64[ns]')

What did you expect to happen?

I expect time coordinates to be identical for different numpy versions

Minimal Complete Verifiable Example

#mamba create -n xarray2024.6.0 xarray ipython pooch netCDF4 numpy>2
import xarray as xr
ds = xr.tutorial.load_dataset("air_temperature")
print(ds.time.values)
dates = ['2013-07-09', '2013-10-11', '2013-12-24']
ds.sel(time=dates)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[21], line 2
      1 dates = ['2013-07-09', '2013-10-11', '2013-12-24']
----> 2 ds.sel(time=dates)

File ~/miniforge3/envs/xarray2024.6.0/lib/python3.12/site-packages/xarray/core/dataset.py:3126, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   3058 """Returns a new dataset with each array indexed by tick labels
   3059 along the specified dimension(s).
   3060 
   (...)
   3123 
   3124 """
   3125 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 3126 query_results = map_index_queries(
   3127     self, indexers=indexers, method=method, tolerance=tolerance
   3128 )
   3130 if drop:
   3131     no_scalar_variables = {}

File ~/miniforge3/envs/xarray2024.6.0/lib/python3.12/site-packages/xarray/core/indexing.py:192, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
    190         results.append(IndexSelResult(labels))
    191     else:
--> 192         results.append(index.sel(labels, **options))
    194 merged = merge_sel_results(results)
    196 # drop dimension coordinates found in dimension indexers
    197 # (also drop multi-index if any)
    198 # (.sel() already ensures alignment)

File ~/miniforge3/envs/xarray2024.6.0/lib/python3.12/site-packages/xarray/core/indexes.py:801, in PandasIndex.sel(self, labels, method, tolerance)
    799     indexer = get_indexer_nd(self.index, label_array, method, tolerance)
    800     if np.any(indexer < 0):
--> 801         raise KeyError(f"not all values found in index {coord_name!r}")
    803 # attach dimension names and/or coordinates to positional indexer
    804 if isinstance(label, Variable):

KeyError: "not all values found in index 'time'"

Anything else we need to know?

xarray-contrib/xarray-tutorial#271

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.6.0
pandas: 2.2.2
numpy: 2.0.0
scipy: None
netCDF4: 1.7.1
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 70.1.1
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.25.0
sphinx: None

@scottyhq scottyhq added bug needs triage Issue that has not been reviewed by xarray team member labels Jun 26, 2024
@keewis
Copy link
Collaborator

keewis commented Jun 26, 2024

this is most likely due to the changed casting rules in numpy>=2: adding

    elif flat_num_dates.dtype.kind in "f":
        flat_num_dates = flat_num_dates.astype(np.float64)

just after

if flat_num_dates.dtype.kind in "iu":
flat_num_dates = flat_num_dates.astype(np.int64)
results in even hours.

cc @kmuehlbauer, @spencerkclark

@keewis keewis removed the needs triage Issue that has not been reviewed by xarray team member label Jun 26, 2024
@spencerkclark
Copy link
Member

Thanks for jumping on this quickly @keewis. I think I agree with your suggested solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants