Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groubpy on array with multiindex renames indices #6313

Closed
headtr1ck opened this issue Mar 1, 2022 · 1 comment · Fixed by #5692
Closed

groubpy on array with multiindex renames indices #6313

headtr1ck opened this issue Mar 1, 2022 · 1 comment · Fixed by #5692

Comments

@headtr1ck
Copy link
Collaborator

headtr1ck commented Mar 1, 2022

What happened?

When grouping and reducing an array or dataset over a multi-index the coordinates that make up the multi-index get renamed to "{name_of_multiindex}_level_{i}".

It only works correctly when the Multiindex is a "homogenous grid", i.e. as obtained by stacking.

What did you expect to happen?

I expect that all coordinates keep their initial names.

Minimal Complete Verifiable Example

import xarray as xr

# this works:

d = xr.DataArray(range(4), dims="t", coords={"x": ("t", [0, 0, 1, 1]), "y": ("t", [0, 1, 0, 1])})
dd = d.set_index({"t": ["x", "y"]})
# returns
# <xarray.DataArray (t: 4)>
# array([0, 1, 2, 3])
# Coordinates:
#   * t        (t) MultiIndex
#   - x        (t) int64 0 0 1 1
#   - y        (t) int64 0 1 0 1

dd.groupby("t").mean(...)
# returns
# <xarray.DataArray (t: 4)>
# array([0., 1., 2., 3.])
# Coordinates:
#   * t        (t) MultiIndex
#   - x        (t) int64 0 0 1 1
#   - y        (t) int64 0 1 0 1


# this does not work
d2 = xr.DataArray(range(6), dims="t", coords={"x": ("t", [0, 0, 1, 1, 0, 1]), "y": ("t", [0, 1, 0, 1, 0, 0])})
dd2 = d2.set_index({"t": ["x", "y"]})
# returns
# <xarray.DataArray (t: 6)>
# array([0, 1, 2, 3, 4, 5])
# Coordinates:
#   * t        (t) MultiIndex
#   - x        (t) int64 0 0 1 1 0 1
#   - y        (t) int64 0 1 0 1 0 0

dd2.groupby("t").mean(...)
# returns
# <xarray.DataArray (t: 4)>
# array([2. , 1. , 3.5, 3. ])
# Coordinates:
#   * t          (t) MultiIndex
#   - t_level_0  (t) int64 0 0 1 1
#   - t_level_1  (t) int64 0 1 0 1

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.1 (default, Jan 13 2021, 15:21:08)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.49.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.21.1
pandas: 1.4.0
numpy: 1.21.5
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 49.2.1
pip: 22.0.3
conda: None
pytest: 6.2.5
IPython: 8.0.0
sphinx: None

@headtr1ck headtr1ck added bug needs triage Issue that has not been reviewed by xarray team member labels Mar 1, 2022
@benbovy
Copy link
Member

benbovy commented Mar 1, 2022

This is addressed in #5692:

dd2.groupby("t").mean(...)
# returns
# <xarray.DataArray (t: 4)>
# array([2. , 1. , 3.5, 3. ])
# Coordinates:
#   * t        (t) object MultiIndex
#   * x        (t) int64 0 0 1 1
#   * y        (t) int64 0 1 0 1

@benbovy benbovy mentioned this issue Mar 1, 2022
54 tasks
@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants