-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assign_coords' behavior depends on input DataArrays #8180
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Thanks for the issue. What would you expect the output to be? It does seem surprising that passing two arguments succeeds while passing each of them alone succeeds... A partial look — adding data.sel(d1='m')==0
<xarray.DataArray (d2: 3)>
array([ True, False, False])
Coordinates:
d1 <U1 'm'
* d2 (d2) <U1 'a' 'b' 'c' [nav] In [36]: data.assign_coords(
...: {
...: 'mask_d1_m': (data.sel(d1='n')==0).drop_vars('d1')
...: # 'mask_d1_n': data.sel(d1='n')==0
...: }
...: ) ...though I'm not sure why it succeeds when there are two arguments. |
Hi - Thanks for the reply and testing! I guess I would expect that all my examples would all fail, or all succeed. Sorry I didn't include what I would expect... because I didn't really know what to expect since I haven't thought about why it failed. (But it seems like it should fail, as now it became clearer to me that my Edit: import xarray as xr
data = xr.DataArray(
data=[
[0, 1, 2],
[0, 1, 2]
],
coords={
'd1': ['m', 'n'],
'd2': ['a', 'b', 'c']
}
)
# so this will FAIL
data.assign_coords(
{
'mask_d1_m': data.sel(d1='m')==0,
# ^^^
'mask_d1_n': data.sel(d1='m')==1,
# ^^^
}
)
# but this will SUCCEED
data.assign_coords(
{
'mask_d1_m': data.sel(d1='m')==0,
# ^^^
'mask_d1_n': data.sel(d1='n')==1,
# ^^^
}
) |
Just wanted to give an update: For my own application (computing masks for a DataArray using its own data, and then attach resulting masks to the DataArray as its new coords), I should make sure my masks contain only labels for its dimensions, in the first place. So, something like # compute mask based on my data
# mask = data.sel(d1=...) > whatever
# remove labels for extra dimensions
mask = mask.drop_vars([v for v in mask.coords if v not in mask.dims]
# assign mask as coords for the original DataArray
data = data.assign_coords({'my_mask': mask}) But the inconsistent behaviors still seem like a bug, or there's some magic happening during the process of combining multiple coords assignments, that are not explicitly documented. It is not too much of an issue, as
Therefore, I'll leave this issue open but please feel free to close it if this isn't something to be fixed. From my point of view, the inconsistency is surprising, but not a major issue. Thank you for taking the time testing! |
as far as I can tell, the cause for the surprising/inconsistent behavior is that In your case, I think the easiest way to work around this is to use |
Hi @keewis - Thank you for the explanation! Yeah, I was digging into the codebase a little bit, but unfortunately -- as it is probably evident that I'm not that familiar with xarray's internals -- I was a bit lost. Not that I completely understand now, but I am grateful for an explanation so I know I'm not crazy 🤣 Also thank you for the recommended route with the Really appreciate all of your help : ) |
Since we are relaxing the constraints that are related to dimension coordinates (e.g., #7989), I'm wondering if we couldn't also relax the case where a scalar coordinate has the same name as a dimension. I don't think that this would help much here, though. Using data.assign_coords({'mask_d1_m': (data.sel(d1='m')==0).variable}) |
What happened?
I'm trying to compute masks (from DataArray's data itself) and assign them as coordinates, but it appears that depending on the combination of coords/dims of the computed masks, sometimes
.assign_coords
will fail.It seems like
It's a bit hard to describe as I don't know the xarray internal itself, but my self-contained minimal example below should demonstrate the issue much clearer.
What did you expect to happen?
No response
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:11:32)
[Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.1.0
pandas: 1.5.3
numpy: 1.24.0
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.15.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2023.5.0
distributed: 2023.5.0
matplotlib: 3.7.2
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.9.0
cupy: None
pint: 0.21
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.1.2
pip: 23.2.1
conda: 23.7.3
pytest: 7.4.1
mypy: None
IPython: 8.12.2
sphinx: 4.5.0
The text was updated successfully, but these errors were encountered: