cleanup subtree assignment in io open functions #96

jhamman · 2022-05-19T05:32:10Z

From Do not call __exit__ on Zarr store when opening #90 (comment)
Tests added
Passes pre-commit run --all-files

TomNicholas · 2022-05-19T16:39:34Z

datatree/io.py

@@ -68,16 +68,8 @@ def _open_datatree_netcdf(filename: str, **kwargs) -> DataTree:
        tree_root = DataTree.from_dict({"/": ds})
        for path in _iter_nc_groups(ncds):
            subgroup_ds = open_dataset(filename, group=path, **kwargs)
+            tree_root[path] = DataTree(name=path, data=subgroup_ds)


Using __setitem__ instead of _set_item here can behave differently, for instance if a group appeared twice one method would allow overwriting by default and the other wouldn't.

Another difference is that __setitem__ creates intermediate nodes, i.e. if dt only has one node, then calling dt['/root/a/b/c/d/e/foo'] = DataTree() would create nodes /root/a, /root/a/b/, /root/a/b/c ... etc. whereas _set_item would raise.

The reason I made the _set_item_ method was for internal developer API so that this kind of behaviour could be controlled more explicitly.

Thanks for the explanation.

TomNicholas

Thanks @jhamman for trying to tidy this up.

However I'm actually not sure that these changes make the code better - they might make some behaviour less explicit, for cases that I don't think are currently tested 😕

TomNicholas · 2022-05-19T16:42:04Z

datatree/io.py

-            node_name = NodePath(path).name
-            new_node: DataTree = DataTree(name=node_name, data=subgroup_ds)


Are you sure your new code behaves the same way as this old code? Because previously we were naming new nodes as NodePath(path).name, which would take the last part of a long unix-like path (e.g. /a/b/c -> c), whereas with your changes it looks like it would name the node /a/b/c?

TomNicholas · 2022-05-19T16:44:53Z

datatree/io.py


-            # TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again


This comment was actually referring to the difference between

dt['folder'] = DataTree(data=ds)
and
dt['folder'] = ds,

the latter of which currently doesn't work, but also perhaps doesn't need to.

jhamman · 2022-05-20T16:36:12Z

Thanks for the explanation @TomNicholas. Let's close this as a no-fix.

cleanup subtree assignment in io open functions

97e7220

jhamman requested a review from TomNicholas May 19, 2022 05:35

TomNicholas reviewed May 19, 2022

View reviewed changes

jhamman closed this May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cleanup subtree assignment in io open functions #96

cleanup subtree assignment in io open functions #96

jhamman commented May 19, 2022

TomNicholas May 19, 2022 •

edited

Loading

TomNicholas May 19, 2022

jhamman May 20, 2022

TomNicholas left a comment

TomNicholas May 19, 2022

TomNicholas May 19, 2022

jhamman commented May 20, 2022

		node_name = NodePath(path).name
		new_node: DataTree = DataTree(name=node_name, data=subgroup_ds)


		# TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again

cleanup subtree assignment in io open functions #96

cleanup subtree assignment in io open functions #96

Conversation

jhamman commented May 19, 2022

TomNicholas May 19, 2022 • edited Loading

Choose a reason for hiding this comment

TomNicholas May 19, 2022

Choose a reason for hiding this comment

jhamman May 20, 2022

Choose a reason for hiding this comment

TomNicholas left a comment

Choose a reason for hiding this comment

TomNicholas May 19, 2022

Choose a reason for hiding this comment

TomNicholas May 19, 2022

Choose a reason for hiding this comment

jhamman commented May 20, 2022

TomNicholas May 19, 2022 •

edited

Loading