Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.copy(deep=True) does not deepcopy .attrs #2835

Closed
kefirbandi opened this issue Mar 21, 2019 · 12 comments · Fixed by #7089
Closed

Dataset.copy(deep=True) does not deepcopy .attrs #2835

kefirbandi opened this issue Mar 21, 2019 · 12 comments · Fixed by #7089
Labels
bug needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports regression

Comments

@kefirbandi
Copy link
Contributor

But it would be expected (at least by me) that it does.

@dcherian
Copy link
Contributor

Thanks @kefirbandi, can you send in a PR?

@DerPlankton13
Copy link

Jumping from 2022.3.0 to 2022.6.0 this issue has re-emerged for me.

@Ch-Meier
Copy link

Ch-Meier commented Jul 25, 2022

Jumping from 2022.3.0 to 2022.6.0 this issue has re-emerged for me.

I am seeing the same issue as well.

@headtr1ck
Copy link
Collaborator

Cannot reproduce on current master using

import xarray as xr

ds = xr.Dataset({"a": (["x"], [1, 2, 3])}, attrs={"t": 1})
ds2 = ds.copy(deep=True)
ds.attrs["t"] = 5
print(ds2.attrs)  # returns: {'t': 1}

@keewis
Copy link
Collaborator

keewis commented Sep 4, 2022

even with

In [1]: import xarray as xr
   ...: 
   ...: ds = xr.Dataset({"a": ("x", [1, 2, 3], {"t": 0})}, attrs={"t": 1})
   ...: ds2 = ds.copy(deep=True)
   ...: ds.attrs["t"] = 5
   ...: ds.a.attrs["t"] = 6
   ...: 
   ...: display(ds2, ds2.a)
<xarray.Dataset>
Dimensions:  (x: 3)
Dimensions without coordinates: x
Data variables:
    a        (x) int64 1 2 3
Attributes:
    t:        1
<xarray.DataArray 'a' (x: 3)>
array([1, 2, 3])
Dimensions without coordinates: x
Attributes:
    t:        0

I cannot reproduce. @Ch-Meier, @DerPlankton13, can either of you post a minimal example demonstrating the issue?

@dcherian dcherian added bug needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports regression and removed regression bug labels Sep 21, 2022
@phockett
Copy link
Contributor

phockett commented Sep 23, 2022

Just for the record, I just ran into this for the specific case of nested dictionary attrs in Dataarray.attrs.

It's definitely an issue in 2022.3.0 and 2022.6.0. Here's a minimal test example in case anyone else runs into this too...

# MINIMAL EXAMPLE

import xarray as xr
import numpy as np

data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})
data.attrs['flat']='0'
data.attrs['nested']={'level1':'1'}

data2 = data.copy(deep=True)
data2.attrs['flat']='2'  # OK
# data2.attrs['nested']={'level1':'2'}  # OK
# data2.attrs['nested']['level1'] = '2'  # Fails - overwrites data
data2.attrs['nested'].update({'level1':'2'})  # Fails - overwrites data

print(data.attrs)
print(data2.attrs)

Outputs

In XR 2022.3.0 and 2022.6.0 this gives (incorrect):

{'flat': '0', 'nested': {'level1': '2'}}
{'flat': '2', 'nested': {'level1': '2'}}

As a work-around, safe attrs copy with deepcopy works:

data2 = data.copy(deep=True)
data2.attrs = copy.deepcopy(data.attrs)

With correct results after modification:

{'flat': '0', 'nested': {'level1': '1'}}
{'flat': '2', 'nested': {'level1': '2'}}

EDIT 26th Sept: retested in 2022.6.0 and found it was, in fact, failing there too. Updated comment to reflect this.

@headtr1ck
Copy link
Collaborator

I see, so it was a problem of recursively copying the attrs.
@phockett would you be open to add your example as a test in a PR such that we can avoid this in the future?

@phockett
Copy link
Contributor

Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.

@headtr1ck
Copy link
Collaborator

Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.

Great! Just ask if you need help :)

Basically: fork repo, make new branch on your forked repo, add test, create PR (when you push GitHub should show a popup asking if you want to create a PR anyway).

@phockett
Copy link
Contributor

phockett commented Sep 26, 2022

OK, new test now pushed as #7086. (Hopefully added in the right place and style!)

A couple of additional notes:

  • Revision to my comment above: this actually fails in 2022.3 and 2022.6 for nested attribs.
  • I took a look at the source code in dataarray.py, but couldn't see an obvious way to fix this and/or didn't understand the attrs copying process generally.
  • I tested the equivalent case for DataSet attrs too (see below), and this seems fine as per your previous comments above, so I think Fixing deepcopy for Dataset.attrs #2839 (which includes a ds level test) still applies to ds.attrs, however the issue does affect the individual arrays within the dataset still (as expected).
import xarray as xr

ds = xr.Dataset({"a": (["x"], [1, 2, 3])}, attrs={"t": 1, "nested":{"t2": 1}})
ds.a.attrs = {"t": 'a1', "nested":{"t2": 'a1'}}

ds2 = ds.copy(deep=True)
ds.attrs["t"] = 5
ds.attrs["nested"]["t2"] = 10

ds2.a.attrs["t"] = 'a2'
ds2.a.attrs["nested"]["t2"] = 'a2'


print(ds.attrs)
print(ds.a.attrs)
print(ds2.attrs)
print(ds2.a.attrs)

Results in:

{'t': 5, 'nested': {'t2': 10}}
{'t': 'a1', 'nested': {'t2': 'a2'}}
{'t': 1, 'nested': {'t2': 1}}
{'t': 'a2', 'nested': {'t2': 'a2'}}

@headtr1ck
Copy link
Collaborator

Ok, I thought that copying attrs was fixed. Seems like it did not...
I will create a PR to fix this first, then we can use your test :)

@phockett
Copy link
Contributor

Ok, I thought that copying attrs was fixed. Seems like it did not...

Sorry for the mix-up there - think I initially tested in 2022.6 with the extra data2.attrs = copy.deepcopy(data.attrs) implemented. Caught it with the new test routine 😄

headtr1ck pushed a commit that referenced this issue Sep 28, 2022
* Added test_deepcopy_nested_attrs() (#2835).
* mark xfail
Co-authored-by: Michael Niklas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants