Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting DataArrays from netCDF4 files correctly and without hassle #1888

Open
equaeghe opened this issue Feb 5, 2018 · 4 comments
Open

Getting DataArrays from netCDF4 files correctly and without hassle #1888

equaeghe opened this issue Feb 5, 2018 · 4 comments

Comments

@equaeghe
Copy link

equaeghe commented Feb 5, 2018

Context

Consider a netCDF4 file with a group structure. For example, the following toy:

import netCDF4 as nc
# netCDF4 file
f = nc.Dataset('simple_hierarchy.nc', 'w')
# coordinates in root
f.createDimension('x', 3)
f.createVariable('x', 'f4', ('x',), fill_value=False)
f['x'][:] = [1.1, 2.2, 3.3]
f.createDimension('y', 2)
f.createVariable('y', 'f4', ('y',), fill_value=False)
f['y'][:] = [-0.9, -1.8]
# variables in root
f.createVariable('u', 'i1', (), fill_value=False)
f.createVariable('v', 'u1', ('x','y'), fill_value=False)
# group
f.createGroup('g')
g = f['g']
# new/modified coordinates in g
g.createDimension('y', 3)
g.createVariable('y', 'f4', ('y',), fill_value=False)
g['y'][:] = [-0.9, -1.8, -2.7]
# variable in g
g.createVariable('w', 'u1', ('x', 'y'), fill_value=False)
f.close()

Current behavior

  1. It is currently a hassle to get a DataArray from variable in a group with multiple non-coordinate variables:

    >>> xr.open_dataarray('simple_hierarchy.nc')
    …
    ValueError: Given file dataset contains more than one data variable. 
    Please read with xarray.open_dataset and then select the variable you want.
    >>> xr.open_dataarray('simple_hierarchy.nc', group='v')
    xr.open_dataarray('simple_hierarchy.nc', group='v')
    …
    OSError: [Errno group not found: v] 'v'
    >>> xr.open_dataarray('simple_hierarchy.nc', drop_variables='u')
    <xarray.DataArray 'v' (x: 3, y: 2)>
    array([[120, 219],
           [178, 172],
           [  9, 127]], dtype=uint8)
    Coordinates:
      * x        (x) float32 1.1 2.2 3.3
      * y        (y) float32 -0.9 -1.8
  2. Also, coordinates defined at a group level closer tot the root are not taken into account:

    >>> xr.open_dataarray('simple_hierarchy.nc', group='g')
    <xarray.DataArray 'w' (x: 3, y: 3)>
    array([[216, 219, 178],
           [172,   9, 127],
           [  0,   0,  64]], dtype=uint8)
    Coordinates:
      * y        (y) float32 -0.9 -1.8 -2.7
    Dimensions without coordinates: x

    So the DataArray is not loaded correctly, as part of its defining coordinates are missing.

Suggested behavior

  1. Add a variable kwarg in the open_dataarray method:

    >>> xr.open_dataarray('simple_hierarchy.nc', variable='v')
    <xarray.DataArray 'v' (x: 3, y: 2)>
    array([[120, 219],
           [178, 172],
           [  9, 127]], dtype=uint8)
    Coordinates:
      * x        (x) float32 1.1 2.2 3.3
      * y        (y) float32 -0.9 -1.8
  2. Have the function that loads variables go up the group hierarchy to see if some coordinate arrays can be found for dimensions lacking them within this group:

    >>> xr.open_dataarray('simple_hierarchy.nc', group='g')
    <xarray.DataArray 'w' (x: 3, y: 3)>
    array([[216, 219, 178],
           [172,   9, 127],
           [  0,   0,  64]], dtype=uint8)
    Coordinates:
      * x        (x) float32 1.1 2.2 3.3
      * y        (y) float32 -0.9 -1.8 -2.7

    I guess care needs to be taken as well upon writing to netCDF, to make sure no spurious dimension/coordinate definitions are added.

Version

xarray 0.9.6

@stale
Copy link

stale bot commented Jan 7, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jan 7, 2020
@equaeghe
Copy link
Author

equaeghe commented Jan 7, 2020

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

I think this is still relevant.

@stale stale bot removed the stale label Jan 7, 2020
@shoyer
Copy link
Member

shoyer commented Jan 10, 2020

These both sound like reasonable feature additions to me.

@daviguima
Copy link

Just wanted to let you guys know that I ended up here having the above mentioned warning:
"ValueError: Given file dataset contains more than one data variable. Please read with xarray.open_dataset and then select the variable you want."
By following this answer:
https://gis.stackexchange.com/questions/354782/how-to-mask-netcdf-time-series-data-from-a-shapefile-in-python/354798#354798

And if it is of any help, I find it relevant...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants