-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GroupBy of stacked dim with strings renames underlying dims #3287
Comments
I just bumped into this problem as well. xarray 0.15.0. Expected behavior? Bug? |
Same or different problem as #1483? |
Here's a quick and dirty workaround that works at least for my use case. def fix_unstacked_dims(arr_unstacked_bad, arr_orig, dim_of_stack, dims_stacked):
"""Workaround for xarray bug involving stacking str-based coords.
C.f. https://github.com/pydata/xarray/issues/3287
"""
dims_not_stacked = [dim for dim in arr_orig.dims if dim not in dims_stacked]
stacked_dims_after_unstack = [dim for dim in arr_unstacked_bad.dims
if dim not in dims_not_stacked]
dims_mapping = {d1: d2 for d1, d2 in zip(stacked_dims_after_unstack, dims_stacked)}
arr_unstacked_bad = arr_unstacked_bad.rename(dims_mapping)
arr_out = arr_orig.copy(deep=True)
arr_out.values = arr_unstacked_bad.transpose(*arr_orig.dims).values
return arr_out.assign_coords(arr_orig.coords) |
This does look weird. A PR would be great. |
Notice that the string coordinate also gets reordered alphabetically: in @chrisroat 's example above, the coord goes from ['R', 'G'] to ['G', 'R']. @max-sixty I can't promise a PR anytime soon, but if/when I do manage, where would be a good starting point? Perhaps here where the Lines 251 to 256 in 009aa66
Edit: actually maybe here: xarray/xarray/core/variable.py Lines 2237 to 2249 in 9eec56c
|
Re the reordering; that's the case, though it does reorder the dimension, not just the coord (i.e. it's still correctly aligned). Slight change to the original example to demonstrate.
Yes that second reference looks like the place @spencerahill! |
Thanks @max-sixty. Contrary to my warning about not doing a PR, I couldn't help myself and dug in a bit. It turns out that string coordinates aren't the problem, it's when the coordinate isn't in sorted order. For example, @chrisroat's original example doesn't error if the coordinate is def test_stack_groupby_unsorted_coord():
data = [[0, 1], [2, 3]]
data_flat = [0, 1, 2, 3]
dims = ["y", "x"]
y_vals = [2, 3]
# "y" coord is in sorted order, and everything works
arr = xr.DataArray(data, dims=dims, coords={"y": y_vals})
actual1 = arr.stack(z=["y", "x"]).groupby("z").first()
midx = pd.MultiIndex.from_product([[2, 3], [0, 1]], names=dims)
expected1 = xr.DataArray(data_flat, dims=["z"], coords={"z": midx})
xr.testing.assert_equal(actual1, expected1)
# Now "y" coord is NOT in sorted order, and the bug appears
arr = xr.DataArray(data, dims=dims, coords={"y": y_vals[::-1]})
actual2 = arr.stack(z=["y", "x"]).groupby("z").first()
midx = pd.MultiIndex.from_product([[3, 2], [0, 1]], names=dims)
expected2 = xr.DataArray(data_flat, dims=["z"], coords={"z": midx})
xr.testing.assert_equal(actual2, expected2)
test_stack_groupby_str_coords() yields ---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[...]
AssertionError: Left and right DataArray objects are not equal
Differing values:
L
array([2, 3, 0, 1])
R
array([0, 1, 2, 3])
Differing coordinates:
L * z (z) MultiIndex
- z_leve...(z) int64 2 2 3 3
- z_leve...(z) int64 0 1 0 1
R * z (z) MultiIndex
- y (z) int64 3 3 2 2
- x (z) int64 0 1 0 1 I'll return to this tomorrow, in the meantime if this triggers any thoughts about the best path forward, that would be much appreciated! |
Names for dimensions are lost (renamed) when they are stacked and grouped, if one of the dimensions has string coordinates.
Output
It is expected the 'f_level_0' and 'f_level_1' be 'c' and 'x', respectively in the second part below.
Output of
xr.show_versions()
xarray: 0.12.3
pandas: 0.25.1
numpy: 1.17.1
scipy: 1.3.1
netCDF4: 1.5.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.1
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: None
IPython: 7.8.0
sphinx: None
The text was updated successfully, but these errors were encountered: