Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop inheriting non-indexed coordinates for DataTree #9555

Merged
merged 8 commits into from
Oct 6, 2024

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Sep 28, 2024

This is option (4) from #9475 (comment)

@shoyer shoyer requested a review from TomNicholas September 28, 2024 22:13
@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Sep 29, 2024
@shoyer shoyer marked this pull request as ready for review October 1, 2024 00:11
@@ -438,6 +438,7 @@ class DataTree(
_cache: dict[str, Any] # used by _CachedAccessor
_data_variables: dict[Hashable, Variable]
_node_coord_variables: dict[Hashable, Variable]
_node_indexed_coord_variables: dict[Hashable, Variable]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are indexed, should the type here be IndexVariable?

Copy link
Member

@TomNicholas TomNicholas Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we have just _node_coord_variables and which ones are indexed is determined by examining _node_indexes? (So _node_indexed_coord_variables becomes a derived property)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we have

_node_non_indexed_coord_variables
_node_indexed_coord_variables
_node_indexes

The current naming is kind of unclear whether _node_coord_variables includes _node_indexed_coord_variables or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main thinking with this implementation was to make accessing the _coord_variables property linear in the depth of the tree, not linear in the number of coordinate variables (I'll add a comment about this in the code).

I'll play around with an alternative version here that avoids the need to store the extra state variable.

Comment on lines +1285 to +1286
"/": xr.Dataset(coords={"x": [1], "y": 2}),
"/b": xr.Dataset(coords={"z": 3}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're planning to eventually change the API of the Dataset constructor to make it more explicit which coordinates have indexes, it might be nice to at least add comments here to describe the intention.

(sort of related to #8959)

@keewis
Copy link
Collaborator

keewis commented Oct 4, 2024

I guess I should have asked this in the meeting earlier (or maybe you discussed this and I was too tired): do we still need this PR if we have the explicit APIs of map_over_nodes, map_over_datasets(..., inherit_coords="all"), and to_dict(inherit_coords="all")?

When explicitly opting into coordinate inheritance when converting to dataset objects I'd expect people to either be fine with duplication or manually deduplicate in user code, and with map_over_nodes we know which coordinate is inherited from the DataTree objects.

I guess this is somewhat related to inherited coordinates working somewhat like default values right now?

@TomNicholas
Copy link
Member

@keewis this PR eliminates confusing and unnecessary duplication of indexed coordinates, but introduces the annoying asymmetry that indexed coordinates are inherited but non-indexed ones are not. The API ideas you mention were intended to solve the latter problem that this PR would cause, by giving users a way to access even non-indexed coordinates if they really need to.

Without this PR we still would get the weird behaviours in #9475, which would require users to do nasty things such as de-duplicate every coordinate manually every time they do an arithmetic operation.

Your proposal wouldn't really work for arithmetic, where duplication is definitely annoying, and there is no place for the user to alter the inheritance behaviour.

So merging this PR and also adding the API you mention was proposed as the overall best option.

@keewis
Copy link
Collaborator

keewis commented Oct 5, 2024

I was thinking we'd redefine those operations on top of map_over_nodes, so at least the internal operations would know which coordinate is inherited (not sure how much work that would be). To control the behavior of arithmetic we'd need a new option (there's precedent in arithmetic_join), but since that's global state or a context manager this is definitely harder to read than a function kwarg.

In any case we're restricting new behavior (and are free to relax it later), so at least from that perspective this should be fine.

Comment on lines +519 to +529
@property
def _node_coord_variables_with_index(self) -> Mapping[Hashable, Variable]:
return FilteredMapping(
keys=self._node_indexes, mapping=self._node_coord_variables
)

@property
def _coord_variables(self) -> ChainMap[Hashable, Variable]:
return ChainMap(
self._node_coord_variables, *(p._node_coord_variables for p in self.parents)
self._node_coord_variables,
*(p._node_coord_variables_with_index for p in self.parents),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a much neater implementation, nice!!

Copy link
Member Author

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for another look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants