-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pointwise indexing via isel_points method #481
Conversation
This provides behavior equivalent to numpy slicing with multiple lists. Example ------- >>> da = xray.DataArray(np.arange(56).reshape((7, 8)), dims=['x', 'y']) >>> da <xray.DataArray (x: 7, y: 8)> array([[ 0, 1, 2, 3, 4, 5, 6, 7], [ 8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47], [48, 49, 50, 51, 52, 53, 54, 55]]) Coordinates: * x (x) int64 0 1 2 3 4 5 6 * y (y) int64 0 1 2 3 4 5 6 7 >>> da.isel_points(x=[0, 1, 6], y=[0, 1, 0]) <xray.DataArray (points: 3)> array([ 0, 9, 48]) Coordinates: y (points) int64 0 1 0 x (points) int64 0 1 6 * points (points) int64 0 1 2 related: #475
|
||
# all the indexers should be iterables | ||
keys = indexers.keys() | ||
indexers = [(k, ([v] if not isinstance(v, Sequence) else v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to raise an error if someone tries to pass in something that isn't a sequence? I would probably coerce with np.asarray
and then raise if v.dtype.kind != 'i'
or v.ndim != 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that you may not know the shape of x and y a priori but am happy to remove this. After thinking about how numpy treats scalar vs array indices (example below), I think it is best to remove this and raise an error.
In [1]: x = np.arange(12).reshape((3, 4))
In [2]: x[2, 3]
Out[2]: 11
In [3]: x[[2], [3]]
Out[3]: array([11])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to implement this and I'm not getting it to work. Specifically, how do we want to handle slice
objects? As I think about it more, I'm not sure how my first commit was working with slice
objects as indexers.
Any thoughts on the best way to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would raise an error with slice objects, too. 1d arrays of integers is enough functionality.
On Wed, Jul 22, 2015 at 9:46 PM, Joe Hamman [email protected]
wrote:
an array indexer, in which case the data will be a copy.
See Also
Dataset.sel
DataArray.isel
DataArray.sel
DataArray.isel_points
"""
invalid = [k for k in indexers if k not in self.dims]
if invalid:
raise ValueError("dimensions %r do not exist" % invalid)
# all the indexers should be iterables
keys = indexers.keys()
indexers = [(k, ([v] if not isinstance(v, Sequence) else v))
I've been trying to implement this and I'm not getting it to work. Specifically, how do we want to handle
Reply to this email directly or view it on GitHub:slice
objects? As I think about it more, I'm not sure how my first commit was working withslice
objects as indexers.
Any thoughts on the best way to handle this?
https://github.com/xray/xray/pull/481/files#r35289799
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. That is done in my last commit so no further change needed here.
|
||
actual = da.isel_points(y=y, x=x, dim='test_coord') | ||
assert 'test_coord' in actual.coords | ||
assert actual.coords['test_coord'].shape == (len(y), ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also verify that x
and y
are still coordinates, along the test_coord
dimension.
Probably easier just to construct the expected data-array and then compare them with self.assertDataArrayIdentical
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Very nice! Really looking forward to this one 👍. |
return concat([self.isel(**d) for d in | ||
[dict(zip(keys, inds)) for inds in | ||
zip(*[v for k, v in indexers])]], | ||
dim=dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's explicitly provide the coords
and data_vars
arguments to concat
:
indexer_dims = set(indexers)
def relevant_keys(mapping):
return [k for k, v in mapping.items()
if any(d in indexer_dims for d in v.dims)]
data_vars = relevant_keys(self.data_vars)
coords = relevant_keys(self.coords)
This means concat
doesn't need to look at any data to figure out which variables should be concatenated (vs. variables which were not indexed).
The test case where would be a dataset that aren't has a constant variable, e.g.,
ds = xray.Dataset({'x': range(5), 'y': 0})
If you do ds.isel_points(x=[0, 1, 2])
, y
should still be a scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test added in test_dataset.py
---------- | ||
dim : str or DataArray or pandas.Index, optinal | ||
Name of the dimension to concatenate along. This can either be a | ||
new dimension name, in which case it is added along axis=0, or an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing dimension names are not valid choices for sel_points, I think. Actually, that's probably an edge case worth writing a test for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to update the docstring here.
…, and a few others
raise ValueError('All indexers must be the same length') | ||
|
||
# Existing dimensions are not valid choices for the dim argument | ||
if dim in self.dims: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if you pass in an array for the dimension argument -- which also still needs a test.
concat
might actually already do reasonable error handling here. If not, you'll need to make sure of _calc_concat_dim_coord
: https://github.com/xray/xray/blob/cb51b359d213e711208f2747ffc5ab4acb25dc4d/xray/core/combine.py#L118-L137
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Happy to write that test. However, I'm actually having trouble picturing the use case here. Do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an xray Dataset ds_stations
with station data with dimensions (station, time)
. Now I pointwise index into a 2D grid with something like: ds_grid.sel_points(latitude=ds_stations.latitude, longitude=ds_stations.longitude, dim=ds_stations.station, method='nearest')
.
Now the station
dimension in the result is labeled by station IDs, not sequential integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the other hand, I might not even want to need to the dim
argument in this case, given that the metadata is already contained in the ds_stations.latitude
and ds_stations.longitude
DataArray objects.
@shoyer - This last commit should handle the |
@@ -134,6 +135,21 @@ __ http://legacy.python.org/dev/peps/pep-0472/ | |||
# this is safe | |||
arr[dict(space=0)] = 0 | |||
|
|||
Pointwise indexing | |||
-------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to have the same number of dashes as the length of the line above.
|
||
Parameters | ||
---------- | ||
dim : str or DataArray or pandas.Index or other list-like object, optinal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: optinal -> optional
This needs a couple of small fixes to the docs but otherwise looks ready to merge! Something to consider: maybe indexing like We could certainly add that later, and it makes more sense once we have |
Okay, thanks. I made some minor revisions to the docs and docstrings. I think we should wait on the automatic discovery of the dimension name. I think adding |
Add pointwise indexing via isel_points method
woohoo! thanks again for putting this together :) |
This provides behavior equivalent to numpy slicing with multiple lists.
Example
related: #475