Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

Error in datatree.DataTree.sel when attributes are set #262

Closed
m-albert opened this issue Oct 23, 2023 · 5 comments · Fixed by #263
Closed

Error in datatree.DataTree.sel when attributes are set #262

m-albert opened this issue Oct 23, 2023 · 5 comments · Fixed by #263

Comments

@m-albert
Copy link

Hi, thank you for your work on datatree.

I've been using datatree in the context of https://github.com/spatial-image/multiscale-spatial-image and stumbled upon an error: If I try to call datatree.DataTree.sel on a datatree without attributes, the function works as expected. However if the tree has an attribute set, I get the following error:

site-packages/xarray/core/indexing.py:139, in group_indexers_by_index(obj, indexers, options)
    137     raise KeyError(f"no index found for coordinate {key!r}")
    138 elif key not in obj.dims:
--> 139     raise KeyError(f"{key!r} is not a valid dimension or coordinate")
    140 elif len(options):
    141     raise ValueError(
    142         f"cannot supply selection options {options!r} for dimension {key!r}"
    143         "that has no associated coordinate or index"
    144     )

KeyError: "'dim_0' is not a valid dimension or coordinate"

Here's a reproducible example:

import datatree
import xarray as xr
import numpy as np

xs = xr.Dataset({'testvar': xr.DataArray(np.ones((2,3)))})
dt = datatree.DataTree.from_dict({'node1': xs1, 'node2': xs1})
dt.attrs['test_key'] = 1 # sel works fine without this line
dt.sel({'dim_0': 0})

I'm using datatree==0.0.12.

@m-albert m-albert changed the title Error in Datatree.sel when attributes are set Error in datatree.DataTree.sel when attributes are set Oct 23, 2023
@TomNicholas
Copy link
Member

TomNicholas commented Oct 23, 2023

Hi @m-albert , thanks for reporting this, and for the reproducible example!

This error confused me initially, but it's actually a consequence of dt.is_empty returning False if a node contains any attrs. map_over_subtree (which is being used internally to apply .sel() to every node) first checks to see if nodes are empty before applying the mapped function. Adding the attribute changes it from empty, so skipped, to non-empty, so it tries to call .sel() on the root node, which fails as it should.

The error message could obviously be much clearer (see also #190) - I think you would have realised what had happened if the error message told you it had failed when applying to the root node.

It's a bit debatable whether map_over_subtree should map over nodes with anything inside them, or only over nodes that have actual data in them.

@m-albert
Copy link
Author

Hi @TomNicholas, thank you very much for looking into this!

This error confused me initially, but it's actually a consequence of dt.is_empty returning False if a node contains any attrs. map_over_subtree (which is being used internally to apply .sel() to every node) first checks to see if nodes are empty before applying the mapped function. Adding the attribute changes it from empty, so skipped, to non-empty, so it tries to call .sel() on the root node, which fails as it should.

This makes perfect sense, thank you.

The error message could obviously be much clearer (see also #190) - I think you would have realised what had happened if the error message told you it had failed when applying to the root node.

Probably 😁

It's a bit debatable whether map_over_subtree should map over nodes with anything inside them, or only over nodes that have actual data in them.

True. In my use case, attaching attributes to a (data-)empty root node serves the function of keeping attributes that are equally valid for all nodes in the tree. I see that this could be problematic though because of introducing two types of nodes (if I'm not misunderstanding sth here).

Maybe slightly related to that. I'm relatively new to xarray, but several times found myself in the situation of wishing that .sel on an object would simply ignore coordinates / dimensions that are unavailable (at least optionally). With such a behaviour, applying .sel to all nodes including (data-)empty ones would prevent the problem reported above.
could be applied to empty nodes in the case discussed above.

@keewis
Copy link
Contributor

keewis commented Oct 24, 2023

note that in recent versions of xarray, drop_sel has a errors parameter. I wonder if sel should get something similar?

@m-albert
Copy link
Author

Oh nice, didn't know that about drop_sel. Sounds like a very useful option for sel to me, which probably wouldn't introduce breaking changes if errors='raise' by default.

@TomNicholas
Copy link
Member

I wonder if sel should get something similar?

Sounds like a similar suggestion to #67

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants