Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning to DataTree.coords should support full paths #9485

Open
shoyer opened this issue Sep 12, 2024 · 3 comments
Open

Assigning to DataTree.coords should support full paths #9485

shoyer opened this issue Sep 12, 2024 · 3 comments
Labels
bug topic-DataTree Related to the implementation of a DataTree class

Comments

@shoyer
Copy link
Member

shoyer commented Sep 12, 2024

What is your issue?

When assigning to a full path with DataTree.coords, I expected to assign a coordinate to a child, not the root.

Here's the behavior on main:

In [36]: tree = DataTree(Dataset(coords={'x': 0}), children={'child': DataTree()})

In [37]: tree.coords['/child/y'] = 2

In [38]: tree
Out[38]:
<xarray.DataTree>
Group: /
│   Dimensions:   ()
│   Coordinates:
│       x         int64 8B 0
│       /child/y  int64 8B 2
└── Group: /child

Instead, I expected the result of assigning directly to the child node (without the duiplicate inherited coordinate, of course):

In [40]: tree['/child'].coords['y'] = 2

In [41]: tree
Out[41]:
<xarray.DataTree>
Group: /
│   Dimensions:  ()
│   Coordinates:
│       x        int64 8B 0
└── Group: /child
        Dimensions:  ()
        Coordinates:
            x        int64 8B 0
            y        int64 8B 2
@shoyer shoyer added bug needs triage Issue that has not been reviewed by xarray team member topic-DataTree Related to the implementation of a DataTree class labels Sep 12, 2024
@TomNicholas TomNicholas removed the needs triage Issue that has not been reviewed by xarray team member label Sep 12, 2024
@TomNicholas
Copy link
Member

Oh dear. The problem is that DataTreeCoordinates.update() assumes you only want to modify the local node. In fact the same bug exists for DataTree.update, for the same reason:

In [11]: dt = DataTree(Dataset(coords={'x': 0}), children={'child': DataTree()})

In [12]: dt.update({'child/y': Variable(data=2, dims=())})

In [13]: dt
Out[13]: 
<xarray.DataTree>
Group: /Dimensions:  ()
│   Coordinates:
│       x        int64 8B 0Data variables:
│       child/y  int64 8B 2
└── Group: /child

DataTree.__setitem__ is correct though:

In [15]: dt['child/y'] = Variable(data=2, dims=())

In [16]: dt
Out[16]: 
<xarray.DataTree>
Group: /Dimensions:  ()
│   Coordinates:
│       x        int64 8B 0
└── Group: /child
        Dimensions:  ()
        Data variables:
            y        int64 8B 2

I think some refactoring to direct all these through the same codepath would help. Perhaps the general pattern should be more like:

class DataTree:
    def update(self, other: Mapping[str, Any]):
        for k, v in other.items():
            path = NodePath(k)
            node_to_update = self._walk_to(path)
            node_to_update._local_update({k: v})

@TomNicholas
Copy link
Member

Note that merging #9378 would have prevented the variable being assigned, instead raising a ValueError.

TomNicholas added a commit to TomNicholas/xarray that referenced this issue Sep 13, 2024
@TomNicholas
Copy link
Member

Note that merging #9378 would have prevented the variable being assigned, instead raising a ValueError.

This wasn't actually true - due to different code paths #9492 was also necessary to prevent this problem. I will refactor to make all updates go through the same code path in a later PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

2 participants