Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

cleanup subtree assignment in io open functions #96

Closed
wants to merge 1 commit into from
Closed

cleanup subtree assignment in io open functions #96

wants to merge 1 commit into from

Conversation

jhamman
Copy link

@jhamman jhamman commented May 19, 2022

@jhamman jhamman requested a review from TomNicholas May 19, 2022 05:35
@@ -68,16 +68,8 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree:
tree_root = DataTree.from_dict({"/": ds})
for path in _iter_nc_groups(ncds):
subgroup_ds = open_dataset(filename, group=path, **kwargs)
tree_root[path] = DataTree(name=path, data=subgroup_ds)
Copy link
Member

@TomNicholas TomNicholas May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using __setitem__ instead of _set_item here can behave differently, for instance if a group appeared twice one method would allow overwriting by default and the other wouldn't.

Another difference is that __setitem__ creates intermediate nodes, i.e. if dt only has one node, then calling dt['/root/a/b/c/d/e/foo'] = DataTree() would create nodes /root/a, /root/a/b/, /root/a/b/c ... etc. whereas _set_item would raise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I made the _set_item_ method was for internal developer API so that this kind of behaviour could be controlled more explicitly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jhamman for trying to tidy this up.

However I'm actually not sure that these changes make the code better - they might make some behaviour less explicit, for cases that I don't think are currently tested 😕

Comment on lines -73 to -74
node_name = NodePath(path).name
new_node: DataTree = DataTree(name=node_name, data=subgroup_ds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure your new code behaves the same way as this old code? Because previously we were naming new nodes as NodePath(path).name, which would take the last part of a long unix-like path (e.g. /a/b/c -> c), whereas with your changes it looks like it would name the node /a/b/c?


# TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was actually referring to the difference between

dt['folder'] = DataTree(data=ds)
and
dt['folder'] = ds,

the latter of which currently doesn't work, but also perhaps doesn't need to.

@jhamman
Copy link
Author

jhamman commented May 20, 2022

Thanks for the explanation @TomNicholas. Let's close this as a no-fix.

@jhamman jhamman closed this May 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants