-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalar slice of MultiIndex is turned to tuples #3432
Comments
Do you have a reproducible example, as per the issue instructions? |
@max-sixty here you go: import xarray as xr
print(xr.__version__)
ds = xr.Dataset({
"test": xr.DataArray(
[[[1,2],[3,4]], [[1,2],[3,4]]],
dims=("genes", "individuals", "subtissues"),
coords={
"genes": ["a", "b"],
"individuals": ["c", "d"],
"subtissues": ["e", "f"],
}
)
})
print(ds)
stacked = ds.stack(observations=["individuals", "subtissues"])
print(stacked)
print(stacked.isel(observations=1)) result:
|
Not a regression. I've gone back as far as xarray 0.12 and pandas 0.19 and it's always been like this. The issue is inherited straight from pandas: >>> df = stacked.test.to_pandas()
>>> df
individuals c d
subtissues e f e f
genes
a 1 2 3 4
b 1 2 3 4
>>> df.iloc[:, 1]
genes
a 2
b 2
Name: (c, f), dtype: int64 I'm not sure if we should write an ad-hoc object in xarray for scalar multiindices. The alternative is to think of a more systematic solution in pandas, which likely implies creating an ad-hoc subclass of tuple which is basically a pickle-able namedtuple. In both cases, the size of this change is very large. The third and significantly easier option is that, on sel/isel, xarray should automatically unstack any scalar slices of a multiindex. Meaning that the 'observations' coord would simply disappear, leaving only 'individuals' and 'subtissues'. @shoyer what's your opinion? |
I think the right long-term solution for xarray is to always store separate This looks like @crusaderky's third option. We'll need to finish up the big "explicit indexes" refactor first to make this viable. |
@Hoeze this is now implemented in #5692 ( >>> stacked.isel(observations=1)
<xarray.Dataset>
Dimensions: (genes: 2)
Coordinates:
* genes (genes) <U1 'a' 'b'
observations object ('c', 'f')
individuals <U1 'c'
subtissues <U1 'f'
Data variables:
test (genes) int64 2 2 |
Today I updated to
v0.14
of xarray and it broke some of my code.I tried to select one observation of the following dataset:
ds.isel(observations=1)
:As you can see, observations is now a tuple of
('GTEX-1122O', 'Whole_Blood')
.However, the individual and the subtissue should be kept as coordinates.
Output of
xr.show_versions()
xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: None
pytest: 5.0.1
IPython: 7.8.0
sphinx: None
The text was updated successfully, but these errors were encountered: