-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DataTree.isel and DataTree.sel #9588
Conversation
xarray/core/datatree.py
Outdated
for k in node_indexers: | ||
if k not in node._node_coord_variables and k in node_result.coords: | ||
del node_result.coords[k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is basically saying "if the result of indexing would cause an inherited coordinate to be duplicated, then remove that duplicated coordinate from the result"?
Doesn't this imply an inefficiency, in that we are potentially doing the work to index coordinates that we're only going to drop afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to explain. It is indeed a small source of inefficiency.
xarray/core/datatree.py
Outdated
# TODO: reimplement in terms of map_index_queries(), to avoid | ||
# redundant index look-ups on child nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to my comment above or a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a separate issue.
expected = DataTree.from_dict( | ||
{ | ||
"/": xr.Dataset(coords={"x": 2}), | ||
"/child": xr.Dataset({"foo": 4}), | ||
} | ||
) | ||
actual = tree.isel(x=-1) | ||
assert_equal(actual, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this passes the x
coordinate gets duplicated doesn't it? As the result is a non-indexed scalar. Does that mean this would have failed if you had used the definition of assert_identical
from #9473 to check for duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this should also pass
actual = tree.isel(x=[0])
assert_identical(actual, expected)
because the result will have inherited instead of duplicated coordinates instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote special logic for removing duplicated scalar coordinates to handle just this!
There is no longer a need to implement a different version of inheritance for assert_identical
than for assert_equal
, because we no longer allow any cases in the DataTree data model where inheritance is ambiguous:
- Deduplication of coordinates with an index removes them from child nodes.
- Other coordinates are no longer inherited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanations! The comments definitely make it easier to understand the subtleties.
These methods handle dimensions that are not missing on some nodes properly, as discussed in #8949
Adding additional indexing methods (e.g.,
head
,tail
,thin
,interp
,reindex
), should be pretty easy to do in this style.api.rst