DataArray.encoding['chunksizes'] not respected in to_netcdf #2198

Karel-van-de-Plassche · 2018-05-30T07:50:59Z

This might be just a documentation issue, so sorry if this is not a problem with xarray.

I'm trying to save an intermediate result of a calculation with xarray + dask to disk, but I'd like to preserve the on-disk chunking. Setting the encoding of a Dataset.data_var or DataArray using the encoding attribute seems to work for (at least) some encoding variables, but not for chunksizes. For example:

import xarray as xr
import dask.array as da
from dask.distributed import Client
from IPython import embed

# First generate a file with random numbers
rng = da.random.RandomState()
shape = (10, 10000)
chunks = [10, 10] 
dims = ['x', 'y']
z = rng.standard_normal(shape, chunks=chunks)
da = xr.DataArray(z, dims=dims, name='z')

# Set encoding of the DataArray
da.encoding['chunksizes'] = chunks # Not conserved
da.encoding['zlib'] = True # Conserved
ds = da.to_dataset()
print(ds['z'].encoding) #out: {'chunksizes': [10, 10], 'zlib': True}
# This one is chunked and compressed correctly
ds.to_netcdf('test1.nc', encoding={'z': {'chunksizes': chunks}})
# While this one is only compressed
ds.to_netcdf('test2.nc')

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.16.5-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.10.4
pandas: 0.22.0
numpy: 1.14.3
scipy: 0.19.0
netCDF4: 1.4.0
h5netcdf: 0.5.1
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.0.2
cartopy: None
seaborn: 0.7.1
setuptools: 39.1.0
pip: 9.0.1
conda: None
pytest: 3.2.2
IPython: 6.3.1
sphinx: None

The text was updated successfully, but these errors were encountered:

Karel-van-de-Plassche · 2018-05-30T07:56:52Z

Might be related to:
#1225 (comment)
#628

shoyer · 2018-05-31T18:46:33Z

Indeed, I think my fix in #1707 got this wrong.

In particular this logic is wrong:

xarray/xarray/backends/netCDF4_.py

Lines 190 to 192 in 0e21fdf

    
           changed_shape = encoding.get('original_shape') != variable.shape 
        
           if chunks_too_big or changed_shape: 
        
               del encoding['chunksizes']

chunksizes should not be deleted if orignal_shape is not found in encoding, but only if orginal_shape exists and is different.

Any interest in putting together a fix here? :)

…d different

Before this fix chunksizes was dropped even when original_shape was not found in encoding

…nt, not when it isn't found (#2207) * Fixes #2198: Drop chunksizes when original_shape is different Before this fix chunksizes was dropped even when original_shape was not found in encoding * More direct has_original_shape check * Fixed typo * Added test if chunksizes is kept when no original shape * Fix typo in test name Co-Authored-By: Deepak Cherian <[email protected]> * Fix keep_chunksizes_if_no_orignal_shape test by using native open_dataset * Added entry in whats-new * Use roundtrip mechanism in chunksizes conservation test

shoyer added bug topic-backends labels May 31, 2018

Karel-van-de-Plassche added a commit to Karel-van-de-Plassche/xarray that referenced this issue Jun 1, 2018

Fixes pydata#2198: Drop chunksizes only if original_shape is found an…

82930e5

…d different

Karel-van-de-Plassche added a commit to Karel-van-de-Plassche/xarray that referenced this issue Jun 1, 2018

Fixes pydata#2198: Drop chunksizes when original_shape is different

d933522

Before this fix chunksizes was dropped even when original_shape was not found in encoding

Karel-van-de-Plassche mentioned this issue Jun 1, 2018

Fixes #2198: Drop chunksizes when only when original_shape is different, not when it isn't found #2207

Merged

3 tasks

neishm mentioned this issue Jun 27, 2018

Writing Datasets to netCDF4 with "inconsistent" chunks #2254

Closed

dcherian closed this as completed in #2207 Jun 6, 2019

yt87 mentioned this issue Oct 27, 2023

The method to_netcdf does not preserve chunks #8385

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataArray.encoding['chunksizes'] not respected in to_netcdf #2198

DataArray.encoding['chunksizes'] not respected in to_netcdf #2198

Karel-van-de-Plassche commented May 30, 2018 •

edited

Loading

Karel-van-de-Plassche commented May 30, 2018 •

edited

Loading

shoyer commented May 31, 2018

DataArray.encoding['chunksizes'] not respected in to_netcdf #2198

DataArray.encoding['chunksizes'] not respected in to_netcdf #2198

Comments

Karel-van-de-Plassche commented May 30, 2018 • edited Loading

Karel-van-de-Plassche commented May 30, 2018 • edited Loading

shoyer commented May 31, 2018

Karel-van-de-Plassche commented May 30, 2018 •

edited

Loading

Karel-van-de-Plassche commented May 30, 2018 •

edited

Loading