Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to slice dataset using its own coordinate value #1932

Closed
rafa-guedes opened this issue Feb 21, 2018 · 2 comments
Closed

Not able to slice dataset using its own coordinate value #1932

rafa-guedes opened this issue Feb 21, 2018 · 2 comments
Labels

Comments

@rafa-guedes
Copy link
Contributor

rafa-guedes commented Feb 21, 2018

Code Sample, a copy-pastable example if possible

In [1]: import xarray as xr
In [2]: ds = xr.open_dataset('test.nc')
In [3]: ds.sel(time=ds.time[0]) #works
In [4]: ds.sel(time=ds.time[1], method='nearest') #works
In [5]: ds.sel(time=ds.time[1]) #does not work
In [6]: ds.time[0]
Out[6]: 
<xarray.DataArray 'time' ()>
array('2018-02-12T06:00:00.000000000', dtype='datetime64[ns]')
Coordinates:
    time     datetime64[ns] 2018-02-12T06:00:00
    site     float64 ...
Attributes:
    standard_name:  time

In [7]: ds.time[1]
Out[7]: 
<xarray.DataArray 'time' ()>
array('2018-02-12T06:59:59.999986000', dtype='datetime64[ns]')
Coordinates:
    time     datetime64[ns] 2018-02-12T06:59:59.999986
    site     float64 ...
Attributes:
    standard_name:  time

Problem description

xarray sometimes fails to slice using its own coordinate values. It looks like it may have to do with precision. Traceback below, test file attached.

In [7]: ds.sel(time=ds.time[1])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-371d2f896b4a> in <module>()
----> 1 ds.sel(time=ds.time[1])

/usr/lib/python2.7/site-packages/xarray/core/dataset.pyc in sel(self, method, tolerance, drop, **indexers)
   1444 
   1445         pos_indexers, new_indexes = indexing.remap_label_indexers(
-> 1446             self, v_indexers, method=method, tolerance=tolerance
   1447         )
   1448         # attach indexer's coordinate to pos_indexers

/usr/lib/python2.7/site-packages/xarray/core/indexing.pyc in remap_label_indexers(data_obj, indexers, method, tolerance)
    234         else:
    235             idxr, new_idx = convert_label_indexer(index, label,
--> 236                                                   dim, method, tolerance)
    237             pos_indexers[dim] = idxr
    238             if new_idx is not None:

/usr/lib/python2.7/site-packages/xarray/core/indexing.pyc in convert_label_indexer(index, label, index_name, method, tolerance)
    163                 indexer, new_index = index.get_loc_level(label.item(), level=0)
    164             else:
--> 165                 indexer = get_loc(index, label.item(), method, tolerance)
    166         elif label.dtype.kind == 'b':
    167             indexer = label

/usr/lib/python2.7/site-packages/xarray/core/indexing.pyc in get_loc(index, label, method, tolerance)
     93 def get_loc(index, label, method=None, tolerance=None):
     94     kwargs = _index_method_kwargs(method, tolerance)
---> 95     return index.get_loc(label, **kwargs)
     96 
     97 

/usr/lib/python2.7/site-packages/pandas/core/indexes/datetimes.pyc in get_loc(self, key, method, tolerance)
   1444                 return Index.get_loc(self, stamp, method, tolerance)
   1445             except KeyError:
-> 1446                 raise KeyError(key)
   1447             except ValueError as e:
   1448                 # list-like tolerance size must match target index size

KeyError: 1518418799999986000L

Expected Output

Output of xr.show_versions()

In [9]: xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.15-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8

xarray: 0.10.0
pandas: 0.22.0
numpy: 1.14.0
scipy: 0.17.1
netCDF4: 1.2.9
h5netcdf: None
Nio: None
bottleneck: None
cyordereddict: None
dask: 0.11.1
matplotlib: 2.1.0
cartopy: 0.14.2
seaborn: None
setuptools: 34.2.0
pip: 9.0.1
conda: None
pytest: 3.3.1
IPython: 5.2.2
sphinx: None

test.zip

@fujiisoup
Copy link
Member

@rafa-guedes , thank you for reporting this.
This reproduces also in my environment.

As we use pandas index under the hood, I thought it was pandas' issue.
But I confirmed that this works with pandas

import xarray as xr
ds = xr.open_dataset('test.nc')
se = ds['wdir'].to_series()
se.loc[se.index[0]]  # works
se.loc[se.index[1]]  # also works

It looks our issue.

@fujiisoup fujiisoup added the bug label Feb 21, 2018
@fujiisoup
Copy link
Member

fujiisoup commented Feb 21, 2018

I found the discrepancy between ds['time'][1] and ds.get_index('time')[1]

In [13]: ds.get_index('time')[1]
Out[13]: Timestamp('2018-02-12 06:59:59.999986560')

In [14]: ds['time'][1].values
Out[14]: numpy.datetime64('2018-02-12T06:59:59.999986000')

Last three digits are different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants