-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concat changes variable order #2811
Comments
Xref: Gitter Chat |
This has also implications for the output using Now, as the I did not find any hints in the docs on that topic. I need to preserve the original dimension ordering as declared in the source dataset. How can I achieve this using xarray? |
Your system might print dataset dimensions like When we drop support for Python 3.5, xarray might switch to dimensions matching order of insertion, since we'll get that for free with Python dictionary. But I still doubt we would make any guarantees about preserving dimension order in xarray operations, just like we don't guarantee variable order as part of xarray's API. It should be deterministic (with fixed versions of xarray and dependencies), but you shouldn't write your code in a way that breaks if changes. What's your actual use-case here? What are you trying to do that needs preserving of dimension order? |
Thanks for looking into this @shoyer.
This isn't true for my system. If we consider this example: data = np.zeros((2,3))
ds = xr.Dataset({'test': (['c', 'b'], data)},
coords={'c': (['c'], np.arange(data.shape[0])),
'b': (['b'], np.arange(data.shape[1])),})
ds.to_netcdf('test_dims.nc')
ds2 = xr.concat([ds, ds], dim='c')
ds2.to_netcdf('test_dims2.nc') Dumping the created files gives the following:
My use case is, well, I have to use some legacy code. Concerning my code, yes I'm trying to write it as robust as possible. Finally I wan't to replace the legacy code with the implementation relying completely on xarray, but that's a long way to go. |
Dimensions are written to netCDF files in the order in which they appear on variables in the Dataset: xarray/xarray/backends/common.py Lines 325 to 329 in f382fd8
It sounds like your use-case is writing netCDF files to disk with a desired dimension order? We could conceivably add an "encoding" option to datasets for specifying dimension order, like how we support controlling unlimited dimensions. |
I was assuming something along that lines. But in my variable My use case is, that the dimensions should appear in the same order as in the source files. |
The order of dimensions in the netCDF file matches the order of their appearance on variables in the netCDF files. In your first file, it's
Sorry, xarray is not going to satisfy this use. If you want this guarantee in all cases, you should pick a different tool. |
@shoyer I'm sorry if I did not explain well enough and if my intentions were vague. So let me first clarify, I really appreciate all your hard work to make xarray better. I've adapted many of my workflows to use xarray and I'm happy that such a library exist. Let's consider just one more example where I hopefully get better to the point of my problems in understanding. Two files are created, same dimensions, same data, but one without coordinates the other with coordinates. data = np.zeros((2,3))
src_dim0 = xr.Dataset({'test': (['c', 'b'], data)})
src_dim0.to_netcdf('src_dim0.nc')
src_dim1 = xr.Dataset({'test': (['c', 'b'], data)},
coords={'c': (['c'], np.arange(data.shape[0])),
'b': (['b'], np.arange(data.shape[1])),})
src_dim1.to_netcdf('src_dim1.nc') The dump of both:
Now, from the dump, the 'c' dimension is first in both. Lets read those files again and concat them along the dst_dim0 = xr.open_dataset('src_dim0.nc')
dst_dim0 = xr.concat([dst_dim0, dst_dim0], dim='c')
dst_dim0.to_netcdf('dst_dim0.nc')
dst_dim1 = xr.open_dataset('src_dim1.nc')
dst_dim1 = xr.concat([dst_dim1, dst_dim1], dim='c')
dst_dim1.to_netcdf('dst_dim1.nc') Now, and this is what confuses me, the file without coordinates has 'c' dimension first and the file with coordinates has 'b' dimension first.:
I really like to understand why there is this difference. Thanks for your patience! |
This is due to the internal implementation of You are welcome to take a look at improving this, though I doubt this would be particularly easy to fix. Certainly the code in |
@shoyer Yes, that was what I was assuming. But was a bit confused too, as the concat docs say, that dimension order is not affected. But maybe I get this wrong and the order of dimensions is not affected only for DataArrays. IIUC xarray creates a new dataset during concat, because the dimensions cannot be expanded (due to netCDF4 limitations). So I would need to look at that specific part, where this creation process takes place. I would also not speak of "bug" here, but if such reordering happens only in certain conditions users (I mean at least me) can get confused. I'll try to find out under what conditions this happens and try to come up with some workaround. Will also try ti find my way through the concat-mechanism. Again, I really appreciate your help in this issue. |
Sorry, fat fingers... |
@shoyer I'm working on a notebook with all testing inside. Just found that if I have 3 dimensions ('c', 'd', 'b') the ordering is preserved in Update: Need to be more thorough...with coordinates it reorders also with 3 dims. |
Just as note for me, to not have to reiterate:
Example (dst concat over 'x'):
It seems, that the two coordinates (z and y) are written first, then the variables, and then the changed coordinate. Now trying to find, where this happens. If the two coordinates would be written in the same way as the variables (and after them), then the ordering would be x,y,z as in the source. |
@shoyer I think I found the relevant lines of code in In the docs there is a Warning: Does that mean that this also affects internal machinery (like in concat)? If so, could you point me to some code where this is taken care of or give some explanation or links where this is discussed? Update: I' working with latest 0.12.0 release. |
That warning should be removed — we already finished that deprecation cycle! |
see #2818 for removing that warning |
@shoyer Attached the description of the issue source and kind of workaround. During Lines 244 to 246 in a5ca64a
After several checks the affected variables are treated and added to Lines 301 to 306 in a5ca64a
The comment indicates what you already mentioned, that the reorder might be unintentional. But due to the handling in two separate iterations over This can be worked around by changing the second iteration to: # re-initialize result_vars to write in correct order
result_vars = OrderedDict()
# stack up each variable to fill-out the dataset (in order)
for k in datasets[0].variables:
if k in concat_over:
vars = ensure_common_dims([ds.variables[k] for ds in datasets])
combined = concat_vars(vars, dim, positions)
insert_result_variable(k, combined)
else:
insert_result_variable(k, datasets[0].variables[k]) With this workaround applied, the |
After checking a bit more in older issues, this seems related: #1049, ping @fmaussion. And also @shoyer's comment suggest that those two iterations/loops I mentioned above need to be addressed correctly. |
Code Sample, a copy-pastable example if possible
A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
yields (assumed correct) output of:
yields (assumed false) output of:
Problem description
xr.concat
changes the dimension order for.dims
as well as.sizes
to an alphanumerically sorted representation.Expected Output
xr.concat
should not change the dimension order in any case.Output of
xr.show_versions()
xarray: 0.11.3
pandas: 0.24.1
numpy: 1.16.1
scipy: 1.2.0
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.6.2
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: None
distributed: None
matplotlib: 3.0.2
cartopy: 0.17.0
seaborn: None
setuptools: 40.8.0
pip: 19.0.2
conda: None
pytest: 4.2.0
IPython: 7.2.0
sphinx: None
The text was updated successfully, but these errors were encountered: