Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable.chunking() is not always a valid argument to chunksizes #740

Closed
shoyer opened this issue Nov 10, 2017 · 3 comments
Closed

Variable.chunking() is not always a valid argument to chunksizes #740

shoyer opened this issue Nov 10, 2017 · 3 comments

Comments

@shoyer
Copy link
Contributor

shoyer commented Nov 10, 2017

For example, consider the netCDF attached to this comment:
pydata/xarray#1225 (comment)

It has unlimited dimension of 'time' with current length 5:

In [10]: ds.dimensions
Out[10]:
OrderedDict([('veg_class',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'veg_class', size = 19),
             ('lat',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 160),
             ('lon',
              <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 160),
             ('time',
              <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 5)])

However, the value of Variable.chunking() is 2 ** 20:

In [7]: ds.variables['time']
Out[7]:
<class 'netCDF4._netCDF4.Variable'>
int32 time(time)
    units: days since 2000-01-01 00:00:00.0
unlimited dimensions: time
current shape = (5,)
filling on, default _FillValue of -2147483647 used

In [8]: ds.variables['time'].chunking()
Out[8]: [1048576]

This results in the error "ValueError: chunksize cannot exceed dimension size" when attempting to write a new Variable with chunksizes equal to its chunking.

It would be nice if netCDF4-Python offered the guarantee that all read chunksizes were valid chunksizes for writing, perhaps by truncating larger chunksizes.

@jswhit
Copy link
Collaborator

jswhit commented Nov 10, 2017

The code that raises that error is here:

    if not dims[n].isunlimited() and \
        chunksizes[n] > dims[n].size:
        msg = 'chunksize cannot exceed dimension size'
        raise ValueError(msg)

since the variable time is an unlimited dimension, there must be a flaw in this logic. I will investigate.

@jswhit
Copy link
Collaborator

jswhit commented Nov 10, 2017

I guess you are trying to write a new variable in which the time dimension is not unlimited? Apparently only unlimited dimensions can have chunksizes larger their (current) size.

@shoyer
Copy link
Contributor Author

shoyer commented Nov 11, 2017

I guess you are trying to write a new variable in which the time dimension is not unlimited?

Hmm, yes, that would explain it. So I guess this is probably legitimate for netCDF4-Python after all.

@jswhit jswhit closed this as completed Nov 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants