Should indexing be possible on 1D coords, even if not dims? #934

max-sixty · 2016-08-02T14:33:43Z

In [1]: arr = xr.DataArray(np.random.rand(4, 3),
    ...:    ...:                    [('time', pd.date_range('2000-01-01', periods=4)),
    ...:    ...:                     ('space', ['IA', 'IL', 'IN'])])
    ...:    ...: 

In [17]: arr.coords['space2'] = ('space', ['A','B','C'])

In [18]: arr
Out[18]: 
<xarray.DataArray (time: 4, space: 3)>
array([[ 0.05187049,  0.04743067,  0.90329666],
       [ 0.59482538,  0.71014366,  0.86588207],
       [ 0.51893157,  0.49442107,  0.10697737],
       [ 0.16068189,  0.60756757,  0.31935279]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) |S2 'IA' 'IL' 'IN'
    space2   (space) |S1 'A' 'B' 'C'

Now try to select on the space2 coord:

In [19]: arr.sel(space2='A')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-eae5e4b64758> in <module>()
----> 1 arr.sel(space2='A')

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in sel(self, method, tolerance, **indexers)
    601         """
    602         return self.isel(**indexing.remap_label_indexers(
--> 603             self, indexers, method=method, tolerance=tolerance))
    604 
    605     def isel_points(self, dim='points', **indexers):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataarray.pyc in isel(self, **indexers)
    588         DataArray.sel
    589         """
--> 590         ds = self._to_temp_dataset().isel(**indexers)
    591         return self._from_temp_dataset(ds)
    592 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/xarray/core/dataset.pyc in isel(self, **indexers)
    908         invalid = [k for k in indexers if k not in self.dims]
    909         if invalid:
--> 910             raise ValueError("dimensions %r do not exist" % invalid)
    911 
    912         # all indexers should be int, slice or np.ndarrays

ValueError: dimensions ['space2'] do not exist

Is there an easier way to do this? I couldn't think of anything...

CC @justinkuosixty

fmaussion · 2016-08-02T14:39:15Z

I tried to awake interest for this kind of indexing on the mailinglist without success so far:

https://groups.google.com/forum/#!topic/xarray/KTlG2snZabg

fmaussion · 2016-08-02T14:40:56Z

In your case:

arr.isel(space=(arr.space2=='A'))

shoyer · 2016-08-02T16:27:16Z

Yes, this would be nice to support automatically.

Doing indexing requiring constructing a hash table (in the form of a pandas.Index), which we currently cache on xarray.Coordinate variables. Coordinate is a Variable subclass used only for dimension coordinates (maybe we should rename it DimCoordinate or Coordinate1D).

The only material difference between Coordinate and Variable is that coordinate caches values in the form of a pandas.Index, whereas Variable caches values in the form of a numpy array. This means that Coordinate is currently immutable (because Index is immutable) and some subtle distinctions in terms of how different types of data are stored due to Index vs ndarray differences (basically, keeping things as an index is more efficient for handling native pandas types like Period, but a little less efficient if you don't need indexing).

So there are a few approaches we could take here:

Convert 1D coordinates that are not dimensions into a pandas.Index via .to_index() when indexing happens with .sel. This approach would be non-ideal, because we would need to recreate the hash table every time indexing happens.
Switch all 1D coordinates (even non-dimensions) to use the Coordinate class. This would be the preferred approach, except it would be a breaking change because it would make them immutable.
Cache the result of .to_index() on Variable objects, too, and invalidate it when they are changed with __setitem__. The downside is that it makes Variable a little more complex.

max-sixty · 2016-08-02T17:48:39Z

That's very clear @shoyer.

I know you've discussed in the past whether indexes are really that different from arrays (they are treated very different in pandas, for example). To reiterate the above, the only real difference is one is designed for lookups (and so uses a hash table), and the other is designed for data access (and so mutation is easier).

We try to never use mutation, but our data is not that big, so making a copy is generally OK. But that's probably not the main use case.

Another option (potentially 1b in your list) is to slice the array rather than select from an index - i.e. sugar over @fmaussion 's solution above. Not as fast to do multiple times, but simple and probably as fast to do a single time.
Or to add that to the docs.

stale · 2019-01-27T04:43:49Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

shoyer · 2019-01-27T06:49:52Z

This will be part of the explicit indexes refactor (#1603)

shoyer added topic-indexing enhancement labels Aug 2, 2016

shoyer mentioned this issue Mar 29, 2018

slice using non-index coordinates #2028

Closed

stale bot added the stale label Jan 27, 2019

shoyer closed this as completed Jan 27, 2019

aldanor mentioned this issue May 10, 2019

Explicit indexes in xarray's data-model (Future of MultiIndex) #1603

Closed

TomNicholas mentioned this issue Apr 2, 2020

sel along 1D non-index coordinates #3925

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should indexing be possible on 1D coords, even if not dims? #934

Should indexing be possible on 1D coords, even if not dims? #934

max-sixty commented Aug 2, 2016 •

edited

Loading

fmaussion commented Aug 2, 2016

fmaussion commented Aug 2, 2016

shoyer commented Aug 2, 2016

max-sixty commented Aug 2, 2016

stale bot commented Jan 27, 2019

shoyer commented Jan 27, 2019

Should indexing be possible on 1D coords, even if not dims? #934

Should indexing be possible on 1D coords, even if not dims? #934

Comments

max-sixty commented Aug 2, 2016 • edited Loading

fmaussion commented Aug 2, 2016

fmaussion commented Aug 2, 2016

shoyer commented Aug 2, 2016

max-sixty commented Aug 2, 2016

stale bot commented Jan 27, 2019

shoyer commented Jan 27, 2019

max-sixty commented Aug 2, 2016 •

edited

Loading