Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray.encoding['chunksizes'] not respected in to_netcdf #2198

Closed
Karel-van-de-Plassche opened this issue May 30, 2018 · 2 comments · Fixed by #2207
Closed

DataArray.encoding['chunksizes'] not respected in to_netcdf #2198

Karel-van-de-Plassche opened this issue May 30, 2018 · 2 comments · Fixed by #2207

Comments

@Karel-van-de-Plassche
Copy link
Contributor

Karel-van-de-Plassche commented May 30, 2018

This might be just a documentation issue, so sorry if this is not a problem with xarray.

I'm trying to save an intermediate result of a calculation with xarray + dask to disk, but I'd like to preserve the on-disk chunking. Setting the encoding of a Dataset.data_var or DataArray using the encoding attribute seems to work for (at least) some encoding variables, but not for chunksizes. For example:

import xarray as xr
import dask.array as da
from dask.distributed import Client
from IPython import embed

# First generate a file with random numbers
rng = da.random.RandomState()
shape = (10, 10000)
chunks = [10, 10] 
dims = ['x', 'y']
z = rng.standard_normal(shape, chunks=chunks)
da = xr.DataArray(z, dims=dims, name='z')

# Set encoding of the DataArray
da.encoding['chunksizes'] = chunks # Not conserved
da.encoding['zlib'] = True # Conserved
ds = da.to_dataset()
print(ds['z'].encoding) #out: {'chunksizes': [10, 10], 'zlib': True}
# This one is chunked and compressed correctly
ds.to_netcdf('test1.nc', encoding={'z': {'chunksizes': chunks}})
# While this one is only compressed
ds.to_netcdf('test2.nc')
INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.16.5-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.10.4
pandas: 0.22.0
numpy: 1.14.3
scipy: 0.19.0
netCDF4: 1.4.0
h5netcdf: 0.5.1
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.0.2
cartopy: None
seaborn: 0.7.1
setuptools: 39.1.0
pip: 9.0.1
conda: None
pytest: 3.2.2
IPython: 6.3.1
sphinx: None

@Karel-van-de-Plassche
Copy link
Contributor Author

Karel-van-de-Plassche commented May 30, 2018

Might be related to:
#1225 (comment)
#628

@shoyer
Copy link
Member

shoyer commented May 31, 2018

Indeed, I think my fix in #1707 got this wrong.

In particular this logic is wrong:

changed_shape = encoding.get('original_shape') != variable.shape
if chunks_too_big or changed_shape:
del encoding['chunksizes']

chunksizes should not be deleted if orignal_shape is not found in encoding, but only if orginal_shape exists and is different.

Any interest in putting together a fix here? :)

Karel-van-de-Plassche added a commit to Karel-van-de-Plassche/xarray that referenced this issue Jun 1, 2018
Karel-van-de-Plassche added a commit to Karel-van-de-Plassche/xarray that referenced this issue Jun 1, 2018
Before this fix chunksizes was dropped even when
original_shape was not found in encoding
dcherian pushed a commit that referenced this issue Jun 6, 2019
…nt, not when it isn't found (#2207)

* Fixes #2198: Drop chunksizes when original_shape is different

Before this fix chunksizes was dropped even when
original_shape was not found in encoding

* More direct has_original_shape check

* Fixed typo

* Added test if chunksizes is kept when no original shape

* Fix typo in test name

Co-Authored-By: Deepak Cherian <[email protected]>

* Fix keep_chunksizes_if_no_orignal_shape test by using native open_dataset

* Added entry in whats-new

* Use roundtrip mechanism in chunksizes conservation test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants