Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider renaming DataTree.map_over_subtree (or revising its API) #9573

Closed
shoyer opened this issue Oct 3, 2024 · 3 comments
Closed

Consider renaming DataTree.map_over_subtree (or revising its API) #9573

shoyer opened this issue Oct 3, 2024 · 3 comments
Labels
topic-DataTree Related to the implementation of a DataTree class

Comments

@shoyer
Copy link
Member

shoyer commented Oct 3, 2024

What is your issue?

We have a bit of a naming inconsistency in DataTree:

  • The subtrees property is an iterator over all sub-tree nodes as a DataTree objects
  • map_over_subtree maps a function over all sub-tree nodes as Dataset objects

I think it makes sense for "subtree" to refer strictly to DataTree objects. In that case, perhaps we should rename map_over_subtree to map_over_datasets?

Alternatively, could also change the interface to iterate over DataTree objects instead, which are easy to convert into datasets (via .dataset) and additionally provide the full context of a node's path and parents.

@shoyer shoyer added needs triage Issue that has not been reviewed by xarray team member topic-DataTree Related to the implementation of a DataTree class and removed needs triage Issue that has not been reviewed by xarray team member labels Oct 3, 2024
@TomNicholas
Copy link
Member

I think it makes sense for "subtree" to refer strictly to DataTree objects.

👍

In that case, perhaps we should rename map_over_subtree to map_over_datasets?

"Subtree" as a term basically means "this node and all descendants of this node", and is almost redundant with .descendants (.subtree includes self, while .descendants doesn't). map_over_subtree_datasets would have been the most technically correct name I guess.

Alternatively, could also change the interface to iterate over DataTree objects instead, which are easy to convert into datasets (via .dataset) and additionally provide the full context of a node's path and parents.

Providing access to the path somehow has been requested before (), but the reason I didn't do this originally was because:

  1. I liked the simplicity of mapping a func(dataset: Dataset) -> Dataset over the tree, and knowing that would work,
  2. Mapping over .dataset helps clarify expectations by delineating between functions that act on the local node (i.e. everything in Dataset's API) and functions which will after the rest of the subtree,
  3. I was worried that allowing access to the whole tree from within that mapping function would permit people to alter parts of the tree outside of the node the function was currently being applied to, which could get very messy very quickly. Perhaps we are better guarded against that now with the new copying behaviour?

@TomNicholas
Copy link
Member

In one of the meetings we proposed renaming map_over_subtree -> map_over_datasets, and potentially also adding another function which actually does map DataTree objects. That latter function should logically be called map_over_subtree, but then we're effectively changing the behaviour of map_over_subtree in a confusing breaking way...

@TomNicholas
Copy link
Member

This was closed by #9622. If we want to add a new function that actually does map over DataTree objects we should track that separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
Development

No branches or pull requests

2 participants