Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

Method to match node paths via glob #267

Merged
merged 12 commits into from
Oct 24, 2023
50 changes: 50 additions & 0 deletions datatree/datatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -1175,8 +1175,13 @@ def filter(self: DataTree, filterfunc: Callable[[DataTree], bool]) -> DataTree:
filterfunc: function
A function which accepts only one DataTree - the node on which filterfunc will be called.

Returns
-------
DataTree

See Also
--------
match
pipe
map_over_subtree
"""
Expand All @@ -1185,6 +1190,51 @@ def filter(self: DataTree, filterfunc: Callable[[DataTree], bool]) -> DataTree:
}
return DataTree.from_dict(filtered_nodes, name=self.root.name)

def match(self, pattern: str) -> DataTree:
"""
Return nodes with paths matching pattern.

Uses unix glob-like syntax for pattern-matching.

Parameters
----------
pattern: str
A pattern to match each node path against.

Returns
-------
DataTree

See Also
--------
filter
pipe
map_over_subtree

Examples
--------
>>> dt = DataTree.from_dict(
>>> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keewis blackdoc raised an error here:

error: cannot format /code/datatree/datatree.py: Cannot parse: 1217:0: EOF in multi-line statement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not proper doctest, you need to use ... for continuation lines.

I guess blackdoc could be extended to detect this, but that would mean that it would have to parse python code itself (which it does not so far, and would make it a fair bit more complicated): it detects code blocks using the doctest syntax and then applies black to each of them separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - I copied this from syntax used in the docstring of xarray's Dataset.map. So I guess it should look like this example from pint-xarray. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no I'm just being an idiot

>>> "/a/A": None,
>>> "/a/B": None,
>>> "/b/A": None,
>>> "/b/B": None,
>>> }
>>> )
>>> dt.match("*/B")
DataTree('None', parent=None)
├── DataTree('a')
│ └── DataTree('B')
└── DataTree('b')
└── DataTree('B')
"""
matching_nodes = {
node.path: node.ds
for node in self.subtree
if NodePath(node.path).match(pattern)
}
return DataTree.from_dict(matching_nodes, name=self.root.name)

def map_over_subtree(
self,
func: Callable,
Expand Down
19 changes: 19 additions & 0 deletions datatree/tests/test_datatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -678,6 +678,25 @@ def f(x, tree, y):


class TestSubset:
def test_match(self):
# TODO is this example going to cause problems with case sensitivity?
dt = DataTree.from_dict(
{
"/a/A": None,
"/a/B": None,
"/b/A": None,
"/b/B": None,
}
)
result = dt.match("*/B")
expected = DataTree.from_dict(
{
"/a/B": None,
"/b/B": None,
}
)
dtt.assert_identical(result, expected)

def test_filter(self):
simpsons = DataTree.from_dict(
d={
Expand Down
1 change: 1 addition & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ For manipulating, traversing, navigating, or mapping over the tree structure.
DataTree.find_common_ancestor
map_over_subtree
DataTree.pipe
DataTree.match
DataTree.filter

DataTree Contents
Expand Down
18 changes: 17 additions & 1 deletion docs/source/hierarchical-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,23 @@ Subsetting Tree Nodes

We can subset our tree to select only nodes of interest in various ways.

The :py:meth:`DataTree.filter` method can be used to retain only the nodes of a tree that meet a certain condition.
Similarly to on a real filesystem, matching nodes by common patterns in their paths is often useful.
We can use :py:meth:`DataTree.match` for this:

.. ipython:: python

dt = DataTree.from_dict(
{
"/a/A": None,
"/a/B": None,
"/b/A": None,
"/b/B": None,
}
)
result = dt.match("*/B")

We can also subset trees by the contents of the nodes.
:py:meth:`DataTree.filter` retains only the nodes of a tree that meet a certain condition.
For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults:
First lets recreate the tree but with an `age` data variable in every node:

Expand Down
2 changes: 2 additions & 0 deletions docs/source/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ v0.0.13 (unreleased)
New Features
~~~~~~~~~~~~

- New :py:meth:`DataTree.match` method for glob-like pattern matching of node paths. (:pull:`267`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Indicate which node caused the problem if error encountered while applying user function using :py:func:`map_over_subtree`
(:issue:`190`, :pull:`264`). Only works when using python 3.11 or later.
By `Tom Nicholas <https://github.com/TomNicholas>`_.
Expand Down
Loading