From adcfa87c019a0b15f2139ee0bd0a096080587055 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sun, 1 Jan 2023 15:39:23 -0500 Subject: [PATCH 01/16] why hierarchical data --- docs/source/hierarchical-data.rst | 139 ++++++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 docs/source/hierarchical-data.rst diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst new file mode 100644 index 00000000..d20aa576 --- /dev/null +++ b/docs/source/hierarchical-data.rst @@ -0,0 +1,139 @@ +.. _hierarchical data: + +Working With Hierarchical Data +============================== + +Why Hierarchical Data? +---------------------- + +Many real-world datasets are composed of multiple differing components, +and it can often be be useful to think of these in terms of a hierarchy of related groups of data. +Examples of data which one might want organise in a grouped or hierarchical manner include: + +- Simulation data at multiple resolutions, +- Observational data about the same system but from multiple different types of sensors, +- Mixed experimental and theoretical data, +- A systematic study recording the same experiment but with different parameters, +- Heterogenous data, such as demographic and metereological data, + +or even any combination of the above. + +Often datasets like this cannot easily fit into a single ``xarray.Dataset`` object, +or are more usefully thought of as groups of related ``xarray.Dataset`` objects. +For this purpose we provide the ``DataTree`` class. + +This page explains in detail how to understand and use the different features of the ``DataTree`` class for your own heirarchical data needs. + +.. _creating a family tree: + +Creating a Family Tree +---------------------- + +The three main ways of creating a ``DataTree`` object are described briefly in :ref:`creating a datatree`. +Here we go into more detail about how to create a tree node-by-node, using a family tree as an example. + +This could perhaps go in a tutorial? + +(i.e. how to create and manipulate a tree structure from scratch node-by-node, with no data in it). + +Create Simpson's family tree + +Start with Homer, Bart and Lisa + +Add Maggie by setting children on homer + +check that this also set's Maggie's parent + +Add long-lost relations + +add Abe by setting + +(Abe's father, Homer's cousin?) + +add Herbert by setting + +.. _navigating trees: + +Navigating Trees +---------------- + +Node Relationships +~~~~~~~~~~~~~~~~~~ + +Root, ancestors, parent, children, leaves + +Tree of life? + +leaves are either currently living or died out with no descendants +Root is beginning of life +ancestors are evolutionary history + +find common ancestor + +Alien life not in same tree? + +Filesystem-like Paths +~~~~~~~~~~~~~~~~~~~~~ + +file-like access via paths + + +.. _manipulating trees: + +Manipulating Trees +------------------ + +Altering Tree Branches +~~~~~~~~~~~~~~~~~~~~~~ + +pruning, grafting + +Tree of life? + +Graft new discoveries onto the tree? + +Prune when we realise something is in the wrong place? + +Save our updated tree out with ``to_dict`` + +Subsetting Tree Nodes +~~~~~~~~~~~~~~~~~~~~~ + +subset, filter + +Filter the Simpsons by age? + +Subset only the living leaves of the evolutionary tree? + + +.. _tree computation: + +Computation +----------- + +Operations on Trees +~~~~~~~~~~~~~~~~~~~ + +Mapping of methods + + +Mapping Custom Functions Over Trees +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.subtree, map_over_subtree + + +.. _multiple trees: + +Operating on Multiple Trees +--------------------------- + +Comparing trees +~~~~~~~~~~~~~~~ + +isomorphism + +Mapping over Multiple Trees +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +map_over_subtree with binary function From 54907757926baedfd824407cc3b01a056bf8d68d Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sun, 1 Jan 2023 15:42:54 -0500 Subject: [PATCH 02/16] add hierarchical data page to index --- docs/source/hierarchical-data.rst | 2 +- docs/source/index.rst | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index d20aa576..019834f8 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -1,4 +1,4 @@ -.. _hierarchical data: +.. _hierarchical-data: Working With Hierarchical Data ============================== diff --git a/docs/source/index.rst b/docs/source/index.rst index 9448e232..e0e39de7 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -12,6 +12,7 @@ Datatree Quick Overview Tutorial Data Model + Hierarchical Data Reading and Writing Files API Reference Terminology From be81f7872e107269c944a978ed45245cd4f079a6 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sun, 1 Jan 2023 18:10:11 -0500 Subject: [PATCH 03/16] Simpsons family tree --- docs/source/hierarchical-data.rst | 115 +++++++++++++++++++++++++++--- 1 file changed, 104 insertions(+), 11 deletions(-) diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index 019834f8..318ca310 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -3,6 +3,17 @@ Working With Hierarchical Data ============================== +.. ipython:: python + :suppress: + + import numpy as np + import pandas as pd + import xarray as xr + from datatree import DataTree + + np.random.seed(123456) + np.set_printoptions(threshold=10) + Why Hierarchical Data? ---------------------- @@ -30,27 +41,101 @@ Creating a Family Tree ---------------------- The three main ways of creating a ``DataTree`` object are described briefly in :ref:`creating a datatree`. -Here we go into more detail about how to create a tree node-by-node, using a family tree as an example. +Here we go into more detail about how to create a tree node-by-node, using a famous family tree from the Simpsons cartoon as an example. + +Let's start by defining nodes representing the two siblings, Bart and Lisa Simpson: + +.. ipython:: python + + bart = DataTree(name="Bart") + lisa = DataTree(name="Lisa") + +Each of these node objects knows their own ``.name``, but they currently have no relationship to one another. +We can connect them by creating another node representing a common parent, Homer Simpson: + +.. ipython:: python + + homer = DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa}) + +Here we set the children of Homer in the node's constructor. +We now have a small family tree + +.. ipython:: python + + homer + +where we can see how these individual Simpson family members are related to one another. +The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the ``.siblings`` property: + +.. ipython:: python + + list(bart.siblings) + +But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's ``.children`` property to include her: + +.. ipython:: python + + maggie = DataTree(name="Maggie") + homer.children = {"Bart": bart, "Lisa": lisa, "Maggie": maggie} + homer -This could perhaps go in a tutorial? +Let's check that Maggie knows who her Dad is: -(i.e. how to create and manipulate a tree structure from scratch node-by-node, with no data in it). +.. ipython:: python -Create Simpson's family tree + maggie.parent.name -Start with Homer, Bart and Lisa +That's good - updating the properties of our nodes does not break the internal consistency of our tree, as changes of parentage are automatically reflected on both nodes. -Add Maggie by setting children on homer + These children obviously have another parent, Marge Simpson, but ``DataTree`` nodes can only have a maximum of one parent. + Genealogical `family trees are not even technically trees `_ in the mathematical sense - + the fact that distant relatives can mate makes it a directed acyclic graph. + Trees of ``DataTree`` objects cannot represent this. -check that this also set's Maggie's parent +Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his ``.parent`` property: -Add long-lost relations +.. ipython:: python -add Abe by setting + abe = DataTree(name="Abe") + homer.parent = abe -(Abe's father, Homer's cousin?) +Abe is now the "root" of this tree, which we can see by examining the ``.root`` property of any node in the tree + +.. ipython:: python + + maggie.root.name + +We can see the whole tree by printing Abe's node or just part of the tree by printing Homer's node: + +.. ipython:: python + + abe + homer + +We can see that Homer is aware of his parentage, and we say that Homer and his children form a "subtree" of the larger Simpson family tree. + +In episode 28, Abe Simpson reveals that he had another son, Herbert "Herb" Simpson. +We can add Herbert to the family tree without displacing Homer by ``.assign``-ing another child to Abe: + +# TODO write the ``assign`` or ``assign_nodes`` method on ``DataTree`` so that this example works + +.. ipython:: python + :okexcept: + + herb = DataTree(name="Herb") + abe.assign({"Herbert": herb}) + +# TODO Name permanence of herb versus herbert (or abe versus abraham) + +Certain manipulations of our tree are forbidden, if they would create an inconsistent result. +In episode 51 of the show Futurama, Philip J. Fry travels back in time and accidentally becomes his own Grandfather. +If we try similar time-travelling hijinks with Homer, we get a ``InvalidTreeError`` raised: + +.. ipython:: python + :okexcept: + + abe.parent = homer -add Herbert by setting .. _navigating trees: @@ -77,6 +162,8 @@ Filesystem-like Paths file-like access via paths +see relative to of bart to herbert + .. _manipulating trees: @@ -103,6 +190,8 @@ subset, filter Filter the Simpsons by age? +Need to first recreate tree with age data in it + Subset only the living leaves of the evolutionary tree? @@ -116,6 +205,10 @@ Operations on Trees Mapping of methods +Arithmetic + +cause all Simpsons to age simultaneously + Mapping Custom Functions Over Trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From e05fe6d1ff2b083e08de10bd9a3f80e96d1e681f Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Sun, 1 Jan 2023 19:21:01 -0500 Subject: [PATCH 04/16] evolutionary tree --- docs/source/hierarchical-data.rst | 92 ++++++++++++++++++++++++++----- 1 file changed, 79 insertions(+), 13 deletions(-) diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index 318ca310..9c5d15cf 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -35,10 +35,15 @@ For this purpose we provide the ``DataTree`` class. This page explains in detail how to understand and use the different features of the ``DataTree`` class for your own heirarchical data needs. +.. _node relationships: + +Node Relationships +------------------ + .. _creating a family tree: Creating a Family Tree ----------------------- +~~~~~~~~~~~~~~~~~~~~~~ The three main ways of creating a ``DataTree`` object are described briefly in :ref:`creating a datatree`. Here we go into more detail about how to create a tree node-by-node, using a famous family tree from the Simpsons cartoon as an example. @@ -136,26 +141,78 @@ If we try similar time-travelling hijinks with Homer, we get a ``InvalidTreeErro abe.parent = homer +.. _evolutionary tree: -.. _navigating trees: +Ancestry in an Evolutionary Tree +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Navigating Trees ----------------- +Let's use a different example of a tree to discuss more complex relationships between nodes - the phylogenetic tree, or tree of life. -Node Relationships -~~~~~~~~~~~~~~~~~~ +.. ipython:: python -Root, ancestors, parent, children, leaves + vertebrates = DataTree.from_dict( + name="Vertebrae", + d={ + "/Sharks": None, + "/Bony Skeleton/Ray-finned Fish": None, + "/Bony Skeleton/Four Limbs/Amphibians": None, + "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates": None, + "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits": None, + "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Crocodiles": None, + "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs": None, + "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds": None, + }, + ) + + primates = vertebrates["/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates"] + dinosaurs = vertebrates[ + "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs" + ] + +We have used the ``.from_dict`` constructor method as an alternate way to quickly create a whole tree, +and file-like syntax (to be explained shortly) to select two nodes of interest. + +This tree shows various families of species, grouped by their common features (making it technically a `"Cladogram" `_, +rather than an evolutionary tree). + +Here both the species and the features used to group them are represented by ``DataTree`` node objects - there is no distinction in types of node. +We can however get a list of only the nodes we used to represent species by using the fact that all those nodes have no children - they are "leaf nodes". +We can check if a node is a leaf with ``.is_leaf``, and get a list of all leaves with the ``.leaves`` property: -Tree of life? +.. ipython:: python + :okexcept -leaves are either currently living or died out with no descendants -Root is beginning of life -ancestors are evolutionary history + primates.is_leaf + [node.name for node in vertebrates.leaves] + +Pretending that this is a true evolutionary tree for a moment, we can find the features of the evolutionary ancestors (so-called "ancestor" nodes), +the distinguishing feature of the common ancestor of all vertebrate life (the root node), +and even the distinguishing feature of the common ancestor of any two species (the common ancestor of two nodes): + +.. ipython:: python -find common ancestor + [node.name for node in primates.ancestors] + primates.root.name + primates.find_common_ancestor(dinosaurs).name + +We can only find a common ancestor between two nodes that lie in the same tree. +If we try to find the common evolutionary ancestor between primates and an Alien species that has no relationship to Earth's evolutionary tree, +an error will be raised. + +.. ipython:: python + :okexcept: + + alien = DataTree(name="Xenomorph") + primates.find_common_ancestor(alien) + + +.. _navigating trees: + +Navigating Trees +---------------- + +Can move around trees using properties, but there are also neater ways to access nodes. -Alien life not in same tree? Filesystem-like Paths ~~~~~~~~~~~~~~~~~~~~~ @@ -165,6 +222,12 @@ file-like access via paths see relative to of bart to herbert +Attribute-like access +~~~~~~~~~~~~~~~~~~~~~ + +# TODO attribute-like access is not yet implemented, see issue #98 + + .. _manipulating trees: Manipulating Trees @@ -192,6 +255,7 @@ Filter the Simpsons by age? Need to first recreate tree with age data in it +leaves are either currently living or died out with no descendants Subset only the living leaves of the evolutionary tree? @@ -209,6 +273,8 @@ Arithmetic cause all Simpsons to age simultaneously +Find total number of species +Find total biomass Mapping Custom Functions Over Trees ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From f9ae6fded52c47865e6bf1527acb3ec949be7136 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Mon, 2 Jan 2023 13:19:51 -0500 Subject: [PATCH 05/16] WIP rearrangement of creating trees --- docs/source/data-structures.rst | 38 ++++--------- docs/source/hierarchical-data.rst | 92 ++++++++++++++++++++++++++++--- 2 files changed, 95 insertions(+), 35 deletions(-) diff --git a/docs/source/data-structures.rst b/docs/source/data-structures.rst index 67e0e608..7a0cca60 100644 --- a/docs/source/data-structures.rst +++ b/docs/source/data-structures.rst @@ -71,7 +71,10 @@ Again these are not normally used unless explicitly accessed by the user. Creating a DataTree ~~~~~~~~~~~~~~~~~~~ -There are three ways to create a ``DataTree`` from scratch. The first is to create each node individually, +There are three ways to create a ``DataTree`` from scratch. + + +One way to create a create a ``DataTree`` from scratch is to create each node individually, specifying the nodes' relationship to one another as you create each one. The ``DataTree`` constructor takes: @@ -122,37 +125,18 @@ Is is at tree construction time that consistency checks are enforced. For instan node0.parent = node2 -The second way is to build the tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects. - -This relies on a syntax inspired by unix-like filesystems, where the "path" to a node is specified by the keys of each intermediate node in sequence, -separated by forward slashes. The root node is referred to by ``"/"``, so the path from our current root node to its grand-child would be ``"/Oak/Bonsai"``. -A path specified from the root (as opposed to being specified relative to an arbitrary node in the tree) is sometimes also referred to as a -`"fully qualified name" `_. +Alternatively you can also create a ``DataTree`` object from -If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``, -we can construct a complex tree quickly using the alternative constructor ``:py:func::DataTree.from_dict``: - -.. ipython:: python - - d = { - "/": xr.Dataset({"foo": "orange"}), - "/a": xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}), - "/a/b": xr.Dataset({"zed": np.NaN}), - "a/c/d": None, - } - dt = DataTree.from_dict(d) - dt - -Notice that this method will also create any intermediate empty node necessary to reach the end of the specified path -(i.e. the node labelled `"c"` in this case.) - -Finally the third way is from a file. if you have a file containing data on disk (such as a netCDF file or a Zarr Store), you can also create a datatree by opening the -file using ``:py:func::~datatree.open_datatree``. See the page on :ref:`reading and writing files ` for more details. +- An ``xarray.Dataset`` using ``Dataset.to_node()`` (not yet implemented), +- A dictionary mapping directory-like paths to either ``DataTree`` nodes or data, using ``DataTree.from_dict()``, +- A netCDF or Zarr file on disk with ``open_datatree()``. See :ref:`reading and writing files `. DataTree Contents ~~~~~~~~~~~~~~~~~ +TODO create this example datatree but without using ``from_dict`` + Like ``xarray.Dataset``, ``DataTree`` implements the python mapping interface, but with values given by either ``xarray.DataArray`` objects or other ``DataTree`` objects. .. ipython:: python @@ -187,8 +171,6 @@ Like with ``Dataset``, you can access the data and coordinate variables of a nod Dictionary-like methods ~~~~~~~~~~~~~~~~~~~~~~~ -We can update the contents of the tree in-place using a dictionary-like syntax. - We can update a datatree in-place using Python's standard dictionary syntax, similar to how we can for Dataset objects. For example, to create this example datatree from scratch, we could have written: diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index 9c5d15cf..c1648006 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -170,7 +170,7 @@ Let's use a different example of a tree to discuss more complex relationships be ] We have used the ``.from_dict`` constructor method as an alternate way to quickly create a whole tree, -and file-like syntax (to be explained shortly) to select two nodes of interest. +and :ref:`filesystem-like syntax `_ (to be explained shortly) to select two nodes of interest. This tree shows various families of species, grouped by their common features (making it technically a `"Cladogram" `_, rather than an evolutionary tree). @@ -180,7 +180,7 @@ We can however get a list of only the nodes we used to represent species by usin We can check if a node is a leaf with ``.is_leaf``, and get a list of all leaves with the ``.leaves`` property: .. ipython:: python - :okexcept + :okexcept: primates.is_leaf [node.name for node in vertebrates.leaves] @@ -211,22 +211,100 @@ an error will be raised. Navigating Trees ---------------- -Can move around trees using properties, but there are also neater ways to access nodes. +There are various ways to access the different nodes in a tree. +Properties +~~~~~~~~~~ -Filesystem-like Paths -~~~~~~~~~~~~~~~~~~~~~ +We can navigate trees using the ``.parent`` and ``.children`` properties of each node, for example: -file-like access via paths +.. ipython:: python -see relative to of bart to herbert + lisa.parent.children["Bart"].name +but there are also more convenient ways to access nodes. + +Dictionary-like interface +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Children are stored on each node as a key-value mapping from name to child node or variable. +They can be accessed and altered via the ``__getitem__`` and ``__setitem__`` syntax. +In general ``DataTree`` objects support almost the entire set of dict-like methods, +including ``keys``, ``values``, ``items``, ``__delitem__`` and ``update``. + +Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArray``s, +so if we have a node that contains both children and data, calling ``.keys()`` will list both names of child nodes and +names of data variables: + +.. ipython:: python + + dt = DataTree.from_dict( + {"/": xr.Dataset({"foo": 0, "bar": 1}), "/a": None, "/b": None} + ) + print(dt) + list(dt.keys()) + +This means that the names of variables and of child nodes must be different to one another. Attribute-like access ~~~~~~~~~~~~~~~~~~~~~ # TODO attribute-like access is not yet implemented, see issue #98 +.. _filesystem paths: + +Filesystem-like Paths +~~~~~~~~~~~~~~~~~~~~~ + +Hierarchical trees can be thought of as analogous to file systems. +Each node is like a directory, and each directory can contain both more sub-directories and data. + +Datatree objects support a syntax inspired by unix-like filesystems, +where the "path" to a node is specified by the keys of each intermediate node in sequence, +separated by forward slashes. + +.. ipython:: python + + abe["Homer/Bart"].name + +The root node is referred to by ``"/"``, so the path from the root node to its grand-child would be ``"/child/grandchild"``, e.g. + +EXAMPLE of path from root + +A path specified from the root (as opposed to being specified relative to an arbitrary node in the tree) is sometimes also referred to as a +`"fully qualified name" `_. + +file-like access via paths + +set something using a relative path + +example of finding relative path, from bart to herbert? + + +Create a node with intermediates via ``__setitem__`` + +You can use this feature to build a nested tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects in a single step. +If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``, +we can construct a complex tree quickly using the alternative constructor ``:py:func::DataTree.from_dict``: + +.. ipython:: python + + d = { + "/": xr.Dataset({"foo": "orange"}), + "/a": xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}), + "/a/b": xr.Dataset({"zed": np.NaN}), + "a/c/d": None, + } + dt = DataTree.from_dict(d) + dt + +Notice that this method will also create any intermediate empty node necessary to reach the end of the specified path +(i.e. the node labelled `"c"` in this case.) + +.. note:: + + You can even make the filesystem analogy concrete by using ``open_mfdatatree`` or ``save_mfdatatree`` # TODO not yet implemented - see GH issue 51 + .. _manipulating trees: From f625b9528f718f488c48de9876451007eba15c7d Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 3 Jan 2023 21:19:42 -0500 Subject: [PATCH 06/16] fixed examples in data structures page --- docs/source/data-structures.rst | 39 +++++++++++++++------------------ 1 file changed, 18 insertions(+), 21 deletions(-) diff --git a/docs/source/data-structures.rst b/docs/source/data-structures.rst index 7a0cca60..98ad5af2 100644 --- a/docs/source/data-structures.rst +++ b/docs/source/data-structures.rst @@ -71,9 +71,6 @@ Again these are not normally used unless explicitly accessed by the user. Creating a DataTree ~~~~~~~~~~~~~~~~~~~ -There are three ways to create a ``DataTree`` from scratch. - - One way to create a create a ``DataTree`` from scratch is to create each node individually, specifying the nodes' relationship to one another as you create each one. @@ -84,16 +81,16 @@ The ``DataTree`` constructor takes: - ``children``: The various child nodes (if there are any), given as a mapping from string keys to ``DataTree`` objects. - ``name``: A string to use as the name of this node. -Let's make a datatree node without anything in it: +Let's make a single datatree node with some example data in it: .. ipython:: python from datatree import DataTree - # create root node - node1 = DataTree(name="Oak") + ds1 = xr.Dataset({"foo": "orange"}) + dt = DataTree(name="root", data=ds1) # create root node - node1 + dt At this point our node is also the root node, as every tree has a root node. @@ -101,29 +98,32 @@ We can add a second node to this tree either by referring to the first node in t .. ipython:: python + ds2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}) # add a child by referring to the parent node - node2 = DataTree(name="Bonsai", parent=node1) + node2 = DataTree(name="a", parent=dt, data=ds2) or by dynamically updating the attributes of one node to refer to another: .. ipython:: python - # add a grandparent by updating the .parent property of an existing node - node0 = DataTree(name="General Sherman") - node1.parent = node0 + # add a second child by first creating a new node ... + ds3 = xr.Dataset({"zed": np.NaN}) + node3 = DataTree(name='b', data=ds3) + # ... then updating its .parent property + node3.parent = dt -Our tree now has three nodes within it, and one of the two new nodes has become the new root: +Our tree now has three nodes within it: .. ipython:: python - node0 + dt Is is at tree construction time that consistency checks are enforced. For instance, if we try to create a `cycle` the constructor will raise an error: .. ipython:: python :okexcept: - node0.parent = node2 + dt.parent = node3 Alternatively you can also create a ``DataTree`` object from @@ -135,8 +135,6 @@ Alternatively you can also create a ``DataTree`` object from DataTree Contents ~~~~~~~~~~~~~~~~~ -TODO create this example datatree but without using ``from_dict`` - Like ``xarray.Dataset``, ``DataTree`` implements the python mapping interface, but with values given by either ``xarray.DataArray`` objects or other ``DataTree`` objects. .. ipython:: python @@ -178,11 +176,10 @@ For example, to create this example datatree from scratch, we could have written .. ipython:: python - dt = DataTree() + dt = DataTree(name="root") dt["foo"] = "orange" dt["a"] = DataTree(data=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})) dt["a/b/zed"] = np.NaN - dt["a/c/d"] = DataTree() dt To change the variables in a node of a ``DataTree``, you can use all the standard dictionary @@ -191,6 +188,6 @@ methods, including ``values``, ``items``, ``__delitem__``, ``get`` and Note that assigning a ``DataArray`` object to a ``DataTree`` variable using ``__setitem__`` or ``update`` will :ref:`automatically align` the array(s) to the original node's indexes. -If you copy a ``DataTree`` using the ``:py:func::copy`` function or the :py:meth:`~xarray.DataTree.copy` it will copy the entire tree, -including all parents and children. -Like for ``Dataset``, this copy is shallow by default, but you can copy all the data by calling ``dt.copy(deep=True)``. +If you copy a ``DataTree`` using the ``:py:func::copy`` function or the :py:meth:`~xarray.DataTree.copy` it will copy the subtree, +meaning that node and children below it, but no parents above it. +Like for ``Dataset``, this copy is shallow by default, but you can copy all the underlying data arrays by calling ``dt.copy(deep=True)``. From c0ea814ce652cf6b93b64bd99212c7012023148e Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 3 Jan 2023 21:34:04 -0500 Subject: [PATCH 07/16] dict-like navigation --- docs/source/hierarchical-data.rst | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index c1648006..e0651388 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -172,6 +172,10 @@ Let's use a different example of a tree to discuss more complex relationships be We have used the ``.from_dict`` constructor method as an alternate way to quickly create a whole tree, and :ref:`filesystem-like syntax `_ (to be explained shortly) to select two nodes of interest. +.. ipython:: python + + vertebrates + This tree shows various families of species, grouped by their common features (making it technically a `"Cladogram" `_, rather than an evolutionary tree). @@ -180,7 +184,6 @@ We can however get a list of only the nodes we used to represent species by usin We can check if a node is a leaf with ``.is_leaf``, and get a list of all leaves with the ``.leaves`` property: .. ipython:: python - :okexcept: primates.is_leaf [node.name for node in vertebrates.leaves] @@ -227,24 +230,29 @@ but there are also more convenient ways to access nodes. Dictionary-like interface ~~~~~~~~~~~~~~~~~~~~~~~~~ -Children are stored on each node as a key-value mapping from name to child node or variable. +Children are stored on each node as a key-value mapping from name to child node. They can be accessed and altered via the ``__getitem__`` and ``__setitem__`` syntax. In general ``DataTree`` objects support almost the entire set of dict-like methods, including ``keys``, ``values``, ``items``, ``__delitem__`` and ``update``. -Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArray``s, +.. ipython:: python + + vertebrates["Bony Skeleton"]["Ray-finned Fish"] + +Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArrays``, so if we have a node that contains both children and data, calling ``.keys()`` will list both names of child nodes and names of data variables: .. ipython:: python - dt = DataTree.from_dict( - {"/": xr.Dataset({"foo": 0, "bar": 1}), "/a": None, "/b": None} + dt = DataTree( + data=xr.Dataset({"foo": 0, "bar": 1}), + children={"a": DataTree(), "b": DataTree()} ) print(dt) list(dt.keys()) -This means that the names of variables and of child nodes must be different to one another. +This also means that the names of variables and of child nodes must be different to one another. Attribute-like access ~~~~~~~~~~~~~~~~~~~~~ From 016531251b3e4e611d8dff029b7bbe04aeb3c83c Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Tue, 3 Jan 2023 22:12:35 -0500 Subject: [PATCH 08/16] filesystem-like paths explained --- docs/source/hierarchical-data.rst | 45 ++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index e0651388..95f35519 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -158,7 +158,6 @@ Let's use a different example of a tree to discuss more complex relationships be "/Bony Skeleton/Four Limbs/Amphibians": None, "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates": None, "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits": None, - "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Crocodiles": None, "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs": None, "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds": None, }, @@ -267,29 +266,47 @@ Filesystem-like Paths Hierarchical trees can be thought of as analogous to file systems. Each node is like a directory, and each directory can contain both more sub-directories and data. +.. note:: + + You can even make the filesystem analogy concrete by using ``open_mfdatatree`` or ``save_mfdatatree`` # TODO not yet implemented - see GH issue 51 + Datatree objects support a syntax inspired by unix-like filesystems, where the "path" to a node is specified by the keys of each intermediate node in sequence, separated by forward slashes. +This is an extension of the conventional dictionary ``__getitem__`` syntax to allow navigation across multiple levels of the tree. + +Like with filepaths, paths within the tree can either be relative to the current node, e.g. .. ipython:: python abe["Homer/Bart"].name + abe["./Homer/Bart"].name # alternative syntax +or relative to the root node. +A path specified from the root (as opposed to being specified relative to an arbitrary node in the tree) is sometimes also referred to as a +`"fully qualified name" `_, +or as an "absolute path". The root node is referred to by ``"/"``, so the path from the root node to its grand-child would be ``"/child/grandchild"``, e.g. -EXAMPLE of path from root +.. ipython:: python -A path specified from the root (as opposed to being specified relative to an arbitrary node in the tree) is sometimes also referred to as a -`"fully qualified name" `_. + # absolute path will start from root node + lisa["/Homer/Bart"].name -file-like access via paths +Relative paths between nodes also support the ``"../"`` syntax to mean the parent of the current node. +We can use this with ``__setitem__`` to add a missing entry to our evolutionary tree, but add it relative to a more familiar node of interest: -set something using a relative path +.. ipython:: python -example of finding relative path, from bart to herbert? + primates["../../Two Fenestrae/Crocodiles"] = DataTree() + print(vertebrates) +Given two nodes in a tree, we can find their relative path: -Create a node with intermediates via ``__setitem__`` +.. ipython:: python + :okexcept: + + bart.find_relative_path(herbert) You can use this feature to build a nested tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects in a single step. If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``, @@ -306,13 +323,11 @@ we can construct a complex tree quickly using the alternative constructor ``:py: dt = DataTree.from_dict(d) dt -Notice that this method will also create any intermediate empty node necessary to reach the end of the specified path -(i.e. the node labelled `"c"` in this case.) - .. note:: - You can even make the filesystem analogy concrete by using ``open_mfdatatree`` or ``save_mfdatatree`` # TODO not yet implemented - see GH issue 51 - + Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path + (i.e. the node labelled `"c"` in this case.) + This is to help avoid lots of redundant entries when creating deeply-nested trees using ``.from_dict``. .. _manipulating trees: @@ -341,6 +356,10 @@ Filter the Simpsons by age? Need to first recreate tree with age data in it +.. ipython:: + + simpsons.filter(node.age > 18) + leaves are either currently living or died out with no descendants Subset only the living leaves of the evolutionary tree? From 2de37ecb03c01dddd361d36abd8be0f490da6b82 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 4 Jan 2023 11:13:06 -0500 Subject: [PATCH 09/16] split PR into parts --- docs/source/hierarchical-data.rst | 73 ------------------------------- 1 file changed, 73 deletions(-) diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index 95f35519..ac29cae1 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -328,76 +328,3 @@ we can construct a complex tree quickly using the alternative constructor ``:py: Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path (i.e. the node labelled `"c"` in this case.) This is to help avoid lots of redundant entries when creating deeply-nested trees using ``.from_dict``. - -.. _manipulating trees: - -Manipulating Trees ------------------- - -Altering Tree Branches -~~~~~~~~~~~~~~~~~~~~~~ - -pruning, grafting - -Tree of life? - -Graft new discoveries onto the tree? - -Prune when we realise something is in the wrong place? - -Save our updated tree out with ``to_dict`` - -Subsetting Tree Nodes -~~~~~~~~~~~~~~~~~~~~~ - -subset, filter - -Filter the Simpsons by age? - -Need to first recreate tree with age data in it - -.. ipython:: - - simpsons.filter(node.age > 18) - -leaves are either currently living or died out with no descendants -Subset only the living leaves of the evolutionary tree? - - -.. _tree computation: - -Computation ------------ - -Operations on Trees -~~~~~~~~~~~~~~~~~~~ - -Mapping of methods - -Arithmetic - -cause all Simpsons to age simultaneously - -Find total number of species -Find total biomass - -Mapping Custom Functions Over Trees -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.subtree, map_over_subtree - - -.. _multiple trees: - -Operating on Multiple Trees ---------------------------- - -Comparing trees -~~~~~~~~~~~~~~~ - -isomorphism - -Mapping over Multiple Trees -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -map_over_subtree with binary function From 1376f08945296b3ad8900a0dc5d626d661651258 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 4 Jan 2023 16:15:00 +0000 Subject: [PATCH 10/16] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- docs/source/data-structures.rst | 2 +- docs/source/hierarchical-data.rst | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/data-structures.rst b/docs/source/data-structures.rst index 98ad5af2..6bf1fe1a 100644 --- a/docs/source/data-structures.rst +++ b/docs/source/data-structures.rst @@ -108,7 +108,7 @@ or by dynamically updating the attributes of one node to refer to another: # add a second child by first creating a new node ... ds3 = xr.Dataset({"zed": np.NaN}) - node3 = DataTree(name='b', data=ds3) + node3 = DataTree(name="b", data=ds3) # ... then updating its .parent property node3.parent = dt diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index ac29cae1..6bdccf36 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -236,7 +236,7 @@ including ``keys``, ``values``, ``items``, ``__delitem__`` and ``update``. .. ipython:: python - vertebrates["Bony Skeleton"]["Ray-finned Fish"] + vertebrates["Bony Skeleton"]["Ray-finned Fish"] Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArrays``, so if we have a node that contains both children and data, calling ``.keys()`` will list both names of child nodes and @@ -246,7 +246,7 @@ names of data variables: dt = DataTree( data=xr.Dataset({"foo": 0, "bar": 1}), - children={"a": DataTree(), "b": DataTree()} + children={"a": DataTree(), "b": DataTree()}, ) print(dt) list(dt.keys()) From 1c200ff475951e85cb39bd570bfaf3910c9f9626 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Wed, 4 Jan 2023 12:02:51 -0500 Subject: [PATCH 11/16] Update docs/source/data-structures.rst Co-authored-by: Justus Magin --- docs/source/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/data-structures.rst b/docs/source/data-structures.rst index 6bf1fe1a..4417e099 100644 --- a/docs/source/data-structures.rst +++ b/docs/source/data-structures.rst @@ -118,7 +118,7 @@ Our tree now has three nodes within it: dt -Is is at tree construction time that consistency checks are enforced. For instance, if we try to create a `cycle` the constructor will raise an error: +It is at tree construction time that consistency checks are enforced. For instance, if we try to create a `cycle` the constructor will raise an error: .. ipython:: python :okexcept: From 6b7e430c98155545c25d6580ca50e689049689de Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 4 Jan 2023 12:29:17 -0500 Subject: [PATCH 12/16] black --- docs/source/data-structures.rst | 2 +- docs/source/hierarchical-data.rst | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/data-structures.rst b/docs/source/data-structures.rst index 98ad5af2..6bf1fe1a 100644 --- a/docs/source/data-structures.rst +++ b/docs/source/data-structures.rst @@ -108,7 +108,7 @@ or by dynamically updating the attributes of one node to refer to another: # add a second child by first creating a new node ... ds3 = xr.Dataset({"zed": np.NaN}) - node3 = DataTree(name='b', data=ds3) + node3 = DataTree(name="b", data=ds3) # ... then updating its .parent property node3.parent = dt diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index ac29cae1..6bdccf36 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -236,7 +236,7 @@ including ``keys``, ``values``, ``items``, ``__delitem__`` and ``update``. .. ipython:: python - vertebrates["Bony Skeleton"]["Ray-finned Fish"] + vertebrates["Bony Skeleton"]["Ray-finned Fish"] Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArrays``, so if we have a node that contains both children and data, calling ``.keys()`` will list both names of child nodes and @@ -246,7 +246,7 @@ names of data variables: dt = DataTree( data=xr.Dataset({"foo": 0, "bar": 1}), - children={"a": DataTree(), "b": DataTree()} + children={"a": DataTree(), "b": DataTree()}, ) print(dt) list(dt.keys()) From d1880cfa5b007a5d96bb4b4ca8623aa6fbeb2f43 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 4 Jan 2023 12:29:26 -0500 Subject: [PATCH 13/16] whatsnew --- docs/source/whats-new.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/whats-new.rst b/docs/source/whats-new.rst index 3fcf6905..8641a4e9 100644 --- a/docs/source/whats-new.rst +++ b/docs/source/whats-new.rst @@ -56,6 +56,8 @@ Documentation By `Tom Nicholas `_. - Added ``Terminology`` page. (:pull:`174`) By `Tom Nicholas `_. +- Added page on ``Working with Hierarchical Data`` (:pull:`179`) + By `Tom Nicholas `_. Internal Changes ~~~~~~~~~~~~~~~~ From 57d5b5dca27762ff256dbddc7ba6c9a016db99d0 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 4 Jan 2023 17:28:06 -0500 Subject: [PATCH 14/16] get assign example working --- docs/source/hierarchical-data.rst | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index 6bdccf36..af084e71 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -122,15 +122,17 @@ We can see that Homer is aware of his parentage, and we say that Homer and his c In episode 28, Abe Simpson reveals that he had another son, Herbert "Herb" Simpson. We can add Herbert to the family tree without displacing Homer by ``.assign``-ing another child to Abe: -# TODO write the ``assign`` or ``assign_nodes`` method on ``DataTree`` so that this example works - .. ipython:: python - :okexcept: herb = DataTree(name="Herb") abe.assign({"Herbert": herb}) -# TODO Name permanence of herb versus herbert (or abe versus abraham) +.. note:: + This example shows a minor subtlety - the returned tree has Homer's brother listed as ``"Herbert"``, + but the original node was named "Herbert". Not only are names overriden when stored as keys like this, + but the new node is a copy, so that the original node that was reference is unchanged (i.e. ``herb.name == "Herb"`` still). + In other words, nodes are copied into trees, not inserted into them. + This is intentional, and mirrors the behaviour when storing named ``xarray.DataArray`` objects inside datasets. Certain manipulations of our tree are forbidden, if they would create an inconsistent result. In episode 51 of the show Futurama, Philip J. Fry travels back in time and accidentally becomes his own Grandfather. From 08f87499475efb08b61eab550901205525826fc4 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 4 Jan 2023 17:41:07 -0500 Subject: [PATCH 15/16] fix some links to methods --- docs/source/hierarchical-data.rst | 39 ++++++++++++++++--------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index af084e71..27a67e4a 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -31,9 +31,9 @@ or even any combination of the above. Often datasets like this cannot easily fit into a single ``xarray.Dataset`` object, or are more usefully thought of as groups of related ``xarray.Dataset`` objects. -For this purpose we provide the ``DataTree`` class. +For this purpose we provide the :py:class:`DataTree` class. -This page explains in detail how to understand and use the different features of the ``DataTree`` class for your own heirarchical data needs. +This page explains in detail how to understand and use the different features of the :py:class:`DataTree` class for your own heirarchical data needs. .. _node relationships: @@ -55,7 +55,7 @@ Let's start by defining nodes representing the two siblings, Bart and Lisa Simps bart = DataTree(name="Bart") lisa = DataTree(name="Lisa") -Each of these node objects knows their own ``.name``, but they currently have no relationship to one another. +Each of these node objects knows their own :py:class:`~DataTree.name`, but they currently have no relationship to one another. We can connect them by creating another node representing a common parent, Homer Simpson: .. ipython:: python @@ -70,13 +70,13 @@ We now have a small family tree homer where we can see how these individual Simpson family members are related to one another. -The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the ``.siblings`` property: +The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the :py:class:`~DataTree.siblings` property: .. ipython:: python list(bart.siblings) -But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's ``.children`` property to include her: +But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's :py:class:`~DataTree.children` property to include her: .. ipython:: python @@ -97,14 +97,14 @@ That's good - updating the properties of our nodes does not break the internal c the fact that distant relatives can mate makes it a directed acyclic graph. Trees of ``DataTree`` objects cannot represent this. -Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his ``.parent`` property: +Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~DataTree.parent` property: .. ipython:: python abe = DataTree(name="Abe") homer.parent = abe -Abe is now the "root" of this tree, which we can see by examining the ``.root`` property of any node in the tree +Abe is now the "root" of this tree, which we can see by examining the :py:class:`~DataTree.root` property of any node in the tree .. ipython:: python @@ -120,7 +120,7 @@ We can see the whole tree by printing Abe's node or just part of the tree by pri We can see that Homer is aware of his parentage, and we say that Homer and his children form a "subtree" of the larger Simpson family tree. In episode 28, Abe Simpson reveals that he had another son, Herbert "Herb" Simpson. -We can add Herbert to the family tree without displacing Homer by ``.assign``-ing another child to Abe: +We can add Herbert to the family tree without displacing Homer by :py:meth:`~DataTree.assign`-ing another child to Abe: .. ipython:: python @@ -136,7 +136,7 @@ We can add Herbert to the family tree without displacing Homer by ``.assign``-in Certain manipulations of our tree are forbidden, if they would create an inconsistent result. In episode 51 of the show Futurama, Philip J. Fry travels back in time and accidentally becomes his own Grandfather. -If we try similar time-travelling hijinks with Homer, we get a ``InvalidTreeError`` raised: +If we try similar time-travelling hijinks with Homer, we get a :py:class:`InvalidTreeError` raised: .. ipython:: python :okexcept: @@ -170,7 +170,7 @@ Let's use a different example of a tree to discuss more complex relationships be "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs" ] -We have used the ``.from_dict`` constructor method as an alternate way to quickly create a whole tree, +We have used the :py:meth:`~DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree, and :ref:`filesystem-like syntax `_ (to be explained shortly) to select two nodes of interest. .. ipython:: python @@ -182,7 +182,7 @@ rather than an evolutionary tree). Here both the species and the features used to group them are represented by ``DataTree`` node objects - there is no distinction in types of node. We can however get a list of only the nodes we used to represent species by using the fact that all those nodes have no children - they are "leaf nodes". -We can check if a node is a leaf with ``.is_leaf``, and get a list of all leaves with the ``.leaves`` property: +We can check if a node is a leaf with :py:meth:`~DataTree.is_leaf`, and get a list of all leaves with the :py:class:`~DataTree.leaves` property: .. ipython:: python @@ -220,7 +220,7 @@ There are various ways to access the different nodes in a tree. Properties ~~~~~~~~~~ -We can navigate trees using the ``.parent`` and ``.children`` properties of each node, for example: +We can navigate trees using the :py:class:`~DataTree.parent` and :py:class:`~DataTree.children` properties of each node, for example: .. ipython:: python @@ -232,16 +232,17 @@ Dictionary-like interface ~~~~~~~~~~~~~~~~~~~~~~~~~ Children are stored on each node as a key-value mapping from name to child node. -They can be accessed and altered via the ``__getitem__`` and ``__setitem__`` syntax. -In general ``DataTree`` objects support almost the entire set of dict-like methods, -including ``keys``, ``values``, ``items``, ``__delitem__`` and ``update``. +They can be accessed and altered via the :py:class:`~DataTree.__getitem__` and :py:class:`~DataTree.__setitem__` syntax. +In general :py:class:`~DataTree.DataTree` objects support almost the entire set of dict-like methods, +including :py:meth:`~DataTree.keys`, :py:class:`~DataTree.values`, :py:class:`~DataTree.items`, +:py:meth:`~DataTree.__delitem__` and :py:meth:`~DataTree.update`. .. ipython:: python vertebrates["Bony Skeleton"]["Ray-finned Fish"] Note that the dict-like interface combines access to child ``DataTree`` nodes and stored ``DataArrays``, -so if we have a node that contains both children and data, calling ``.keys()`` will list both names of child nodes and +so if we have a node that contains both children and data, calling :py:meth:`~DataTree.keys` will list both names of child nodes and names of data variables: .. ipython:: python @@ -270,7 +271,7 @@ Each node is like a directory, and each directory can contain both more sub-dire .. note:: - You can even make the filesystem analogy concrete by using ``open_mfdatatree`` or ``save_mfdatatree`` # TODO not yet implemented - see GH issue 51 + You can even make the filesystem analogy concrete by using :py:func:`~DataTree.open_mfdatatree` or :py:func:`~DataTree.save_mfdatatree` # TODO not yet implemented - see GH issue 51 Datatree objects support a syntax inspired by unix-like filesystems, where the "path" to a node is specified by the keys of each intermediate node in sequence, @@ -312,7 +313,7 @@ Given two nodes in a tree, we can find their relative path: You can use this feature to build a nested tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects in a single step. If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``, -we can construct a complex tree quickly using the alternative constructor ``:py:func::DataTree.from_dict``: +we can construct a complex tree quickly using the alternative constructor :py:meth:`DataTree.from_dict`: .. ipython:: python @@ -329,4 +330,4 @@ we can construct a complex tree quickly using the alternative constructor ``:py: Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path (i.e. the node labelled `"c"` in this case.) - This is to help avoid lots of redundant entries when creating deeply-nested trees using ``.from_dict``. + This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`DataTree.from_dict`. From f5004472ad487d01c056e67c6030008ac3a99f65 Mon Sep 17 00:00:00 2001 From: Thomas Nicholas Date: Wed, 4 Jan 2023 17:50:55 -0500 Subject: [PATCH 16/16] relative_to example --- docs/source/hierarchical-data.rst | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/docs/source/hierarchical-data.rst b/docs/source/hierarchical-data.rst index 27a67e4a..daf4b10f 100644 --- a/docs/source/hierarchical-data.rst +++ b/docs/source/hierarchical-data.rst @@ -124,13 +124,13 @@ We can add Herbert to the family tree without displacing Homer by :py:meth:`~Dat .. ipython:: python - herb = DataTree(name="Herb") - abe.assign({"Herbert": herb}) + herbert = DataTree(name="Herb") + abe.assign({"Herbert": herbert}) .. note:: This example shows a minor subtlety - the returned tree has Homer's brother listed as ``"Herbert"``, but the original node was named "Herbert". Not only are names overriden when stored as keys like this, - but the new node is a copy, so that the original node that was reference is unchanged (i.e. ``herb.name == "Herb"`` still). + but the new node is a copy, so that the original node that was reference is unchanged (i.e. ``herbert.name == "Herb"`` still). In other words, nodes are copied into trees, not inserted into them. This is intentional, and mirrors the behaviour when storing named ``xarray.DataArray`` objects inside datasets. @@ -304,16 +304,15 @@ We can use this with ``__setitem__`` to add a missing entry to our evolutionary primates["../../Two Fenestrae/Crocodiles"] = DataTree() print(vertebrates) -Given two nodes in a tree, we can find their relative path: +Given two nodes in a tree, we can also find their relative path: .. ipython:: python - :okexcept: - bart.find_relative_path(herbert) + bart.relative_to(lisa) -You can use this feature to build a nested tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects in a single step. +You can use this filepath feature to build a nested tree from a dictionary of filesystem-like paths and corresponding ``xarray.Dataset`` objects in a single step. If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``, -we can construct a complex tree quickly using the alternative constructor :py:meth:`DataTree.from_dict`: +we can construct a complex tree quickly using the alternative constructor :py:meth:`DataTree.from_dict()`: .. ipython:: python