Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open h5netcdf files from bytes #4523

Closed
WesleyTheGeolien opened this issue Oct 19, 2020 · 3 comments
Closed

Open h5netcdf files from bytes #4523

WesleyTheGeolien opened this issue Oct 19, 2020 · 3 comments

Comments

@WesleyTheGeolien
Copy link

Is your feature request related to a problem? Please describe.
I am creating a small web-based GUI to edit netcdf files, I allow the user to upload a file (the file is hence in bytes) but I am unable to use open_dataset on the file. From reading the code I believe the engine is getting set to h5netcdf (installed) (line 137 xarray/backends/api.py) and an error is being thrown:

..../xarray/xarray/backends/h5netcdf_.py", line 125, in open
    raise ValueError(
ValueError: can't open netCDF4/HDF5 as bytes try passing a path or file-like object

Describe the solution you'd like
I believe h5netcdf is a wrapper around netcdf4 (I am new to the ecosystem so sorry if this is wrong) and according to this issue: Unidata/netcdf4-python#406 I think the engine should be able to support reading from bytes.

Describe alternatives you've considered
Dump the file to disk and read from disk

I'd be happy to look further into making the changes but would be grateful of any help.

Best regards,

@mathause
Copy link
Collaborator

h5netcdf bypasses the netCDF library and directly interfaces h5py (https://github.com/shoyer/h5netcdf) so you'd need to look there if this could be achieved.

Just a wild guess - could you mock a file like object with io.BytesIO (https://docs.python.org/3/library/io.html)?

@keewis
Copy link
Collaborator

keewis commented Oct 19, 2020

Just a wild guess - could you mock a file like object with io.BytesIO (https://docs.python.org/3/library/io.html)?

I can confirm that this does work:

In [1]: import xarray as xr

In [2]: ds = xr.tutorial.open_dataset("rasm")
   ...: ds.to_netcdf("file.nc")

In [3]: with open("file.nc", "rb") as f:
   ...:     display(xr.open_dataset(f).load())
   ...: 
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
    yc       (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 nan nan nan nan nan ... 29.8 28.66 28.19 28.21
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  [1]
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...

In [4]: import io
   ...: 
   ...: with open("file.nc", "rb") as f:
   ...:     bytes_ = f.read()
   ...:     display(xr.open_dataset(io.BytesIO(bytes_)).load())
   ...: 
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
    yc       (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 nan nan nan nan nan ... 29.8 28.66 28.19 28.21
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  [1]
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...

but I'm not sure about the performance with big files

@WesleyTheGeolien
Copy link
Author

Thanks for the reply @mathause and @keewis, I could have sworn that I tried that before dumping to a tempfile but that totally works!

Thanks:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants