-
Notifications
You must be signed in to change notification settings - Fork 41
cleanup subtree assignment in io open functions #96
Conversation
@@ -68,16 +68,8 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree: | |||
tree_root = DataTree.from_dict({"/": ds}) | |||
for path in _iter_nc_groups(ncds): | |||
subgroup_ds = open_dataset(filename, group=path, **kwargs) | |||
tree_root[path] = DataTree(name=path, data=subgroup_ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using __setitem__
instead of _set_item
here can behave differently, for instance if a group appeared twice one method would allow overwriting by default and the other wouldn't.
Another difference is that __setitem__
creates intermediate nodes, i.e. if dt
only has one node, then calling dt['/root/a/b/c/d/e/foo'] = DataTree()
would create nodes /root/a
, /root/a/b/
, /root/a/b/c
... etc. whereas _set_item
would raise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I made the _set_item_
method was for internal developer API so that this kind of behaviour could be controlled more explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jhamman for trying to tidy this up.
However I'm actually not sure that these changes make the code better - they might make some behaviour less explicit, for cases that I don't think are currently tested 😕
node_name = NodePath(path).name | ||
new_node: DataTree = DataTree(name=node_name, data=subgroup_ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure your new code behaves the same way as this old code? Because previously we were naming new nodes as NodePath(path).name
, which would take the last part of a long unix-like path (e.g. /a/b/c
-> c
), whereas with your changes it looks like it would name the node /a/b/c
?
|
||
# TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was actually referring to the difference between
dt['folder'] = DataTree(data=ds)
and
dt['folder'] = ds
,
the latter of which currently doesn't work, but also perhaps doesn't need to.
Thanks for the explanation @TomNicholas. Let's close this as a no-fix. |
pre-commit run --all-files