-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix in vectorized item assignment #1746
Conversation
ind = Variable(['a'], [0, 1]) | ||
da[dict(x=ind, y=ind)] = 0 | ||
expected = DataArray([[0, 1], [1, 0], [1, 1]], dims=['x', 'y']) | ||
self.assertDataArrayIdentical(expected, da) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future (no need to change this time), you can use assert_identical
, from the newer testing tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always forget this, maybe because I usually make tests based on the existing ones...
(I think it is what many contributors do).
I think we should update entire test cases at some point (similar to #1741), rather than gradual migration.
xarray/core/nputils.py
Outdated
mixed_positions, vindex_positions = _advanced_indexer_subspaces(key) | ||
self._array[key] = np.moveaxis(value, vindex_positions, | ||
mixed_positions) | ||
if np.isscalar(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using utils.is_scalar()
and utils.to_0d_array
for coercing a scalar value
to a numpy array, then using the original code path here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that here we should support broadcasting, rather than some special care of scalars.
xarray/core/variable.py
Outdated
value = Variable(dims[-value.ndim:], value).set_dims(dims).data | ||
value = Variable(dims[-value.ndim:], value) | ||
# broadcast to become assignable | ||
value = value.set_dims(dims).data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to revert nputils.NumpyVindexAdapter
to the master version and to make a broadcasting here.
Tests are failing due to #1738. |
xarray/core/dataarray.py
Outdated
@@ -484,7 +484,9 @@ def __setitem__(self, key, value): | |||
if isinstance(key, basestring): | |||
self.coords[key] = value | |||
else: | |||
# xarray-style array indexing | |||
# DataArray key -> Variable key | |||
key = {k: v.variable if isinstance(v, DataArray) else v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to enforce consistency for coordinates here? My inclination would be that we should support exactly the same keys in setitem as are valid in getitem. Ideally we should also reuse the same code. That means we should raise errors if there are multiple indexers with inconsistent alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inclination would be that we should support exactly the same keys in setitem as are valid in getitem.
Reasonable. I will add a validation.
if new_order: | ||
value = duck_array_ops.asarray(value) | ||
if not isinstance(value, Variable): | ||
value = as_compatible_data(value) | ||
if value.ndim > len(dims): | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this special case error message now that we call set_dims below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think value = Variable(dims[-value.ndim:], value)
fails if value.ndim > len(dims)
.
xarray/core/dataarray.py
Outdated
# Coordinates in key, value and self[key] should be consistent. | ||
obj = self[key] | ||
if isinstance(value, DataArray): | ||
assert_coordinate_consistent(value, obj.coords) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking of checking the consistency of coords on each DataArray argument in key
instead. I guess we should probably check both!
xarray/core/dataarray.py
Outdated
# Coordinates in key, value and self[key] should be consistent. | ||
obj = self[key] | ||
if isinstance(value, DataArray): | ||
assert_coordinate_consistent(value, obj.coords) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you use obj.coords.variables
here we can skip the awkward getattr(coords[k], 'variable', coords[k])
above.
xarray/core/coordinates.py
Outdated
for k in obj.dims: | ||
# make sure there are no conflict in dimension coordinates | ||
if (k in coords and k in obj.coords): | ||
coord = getattr(coords[k], 'variable', coords[k]) # Variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to insist that coords
always has the same type (e.g., a dict of with Variable values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really reasonable. Updated.
I will update docs as this is a backward-incompatible change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to update assignment for .loc
, or does this already handle it? It would be good to at least add tests. We can leave that for another PR if you prefer.
doc/indexing.rst
Outdated
.. note:: | ||
|
||
If an indexer is a :py:meth:`~xarray.DataArray`, its coordinates should not | ||
conflict with the selected subpart of the target array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably note the exception for explicitly indexed dimensions with .loc
/.sel
?
xarray/core/coordinates.py
Outdated
""" | ||
for k in obj.dims: | ||
# make sure there are no conflict in dimension coordinates | ||
if (k in coords and k in obj.coords): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can drop the extra parentheses here inside if
assert_coordinate_consistent(value, obj.coords.variables) | ||
# DataArray key -> Variable key | ||
key = {k: v.variable if isinstance(v, DataArray) else v | ||
for k, v in self._item_key_to_dict(key).items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check that coordinates are consistent on the key
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done a few lines above, obj = self[key]
, where in .isel
we check the coordinates in the key
.
But I am wondering this unnecessary indexing, though I think this implementation is the simplest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This could indeed be a significant performance hit. That said, I'm OK leaving this for now, with a TODO note to optimize it later.
@shoyer Thanks for the review.
Done. As I commented above, coordinate validation requires an (practically) unused indexing. |
xarray/core/dataset.py
Outdated
pos_indexers[k] = DataArray(pos_indexers[k], | ||
coords=coords, dims=v.dims) | ||
pos_indexers, new_indexes = remap_label_indexers( | ||
self, method, tolerance, **indexers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only indent by one additional tab for continuation lines.
assert_coordinate_consistent(value, obj.coords.variables) | ||
# DataArray key -> Variable key | ||
key = {k: v.variable if isinstance(v, DataArray) else v | ||
for k, v in self._item_key_to_dict(key).items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This could indeed be a significant performance hit. That said, I'm OK leaving this for now, with a TODO note to optimize it later.
Looks good to me. @fujiisoup please hit the big green button when you're ready! |
git diff upstream/master **/*py | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new APIFound bugs in
nputils.NumpyVindexAdapter.__setitem__
andDataArray.__setitem__
.I will add more tests later.
Test case suggestions would be appreciated.