-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-kerchunk backend for HDF5/netcdf4 files. #87
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great so far @sharkinsspatial !
kerchunk backend's specialized encoding translation logic
This part I would really like to either factor out, or at a least really understand what it's doing. See #68
virtualizarr/readers/hdf.py
Outdated
@@ -0,0 +1,206 @@ | |||
from typing import List, Mapping, Optional | |||
|
|||
import fsspec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does one need fsspec if reading a local file? Is there any other way to read from S3 without fsspec at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not with a filesystem-like API. You would have to use boto3 or aiobotocore directly.
This is one of the great virtues of fsspec and is not to be under-valued.
virtualizarr/readers/hdf.py
Outdated
def virtual_vars_from_hdf( | ||
path: str, | ||
drop_variables: Optional[List[str]] = None, | ||
) -> Mapping[str, xr.Variable]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this an a way to interface with the code in open_virtual_dataset
This looks cool @sharkinsspatial! My opinion is that it doesn't make sense to just forklift the kerchunk code into virtualizarr. What I would love to see is an extremely tight, strictly typed, unit-tested total refactor of the parsing logic. I think you're headed down the right path, but I encourage you to push as far as you can in that direction. |
for more information, see https://pre-commit.ci
@rabernat Fully agree with your take above 👆 👍 . I'm trying to work through this incrementally whenever I can find some spare time. In the spirit of thorough test coverage 🎊 looking through your issue pydata/xarray#7388 and the corresponding PR I'm not sure what the proper incantation of variable encoding configuration is to use |
for more information, see https://pre-commit.ci
13f82f9
to
ee6fa0b
Compare
b2c89df
to
06a5ae1
Compare
06a5ae1
to
a8cc82f
Compare
13c51a4
to
a1c1ff1
Compare
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
a1c1ff1
to
65a6b14
Compare
@TomNicholas This should be ready for final review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only one question about warnings in the test suite when dependencies aren't installed.
Also let's add a release note saying this is available for experimental use.
import hdf5plugin # type: ignore | ||
except ModuleNotFoundError: | ||
hdf5plugin = None # type: ignore | ||
warnings.warn("hdf5plugin is required for HDF reader") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't this either error or just not be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a release note in #307, but I'm still curious about this @sharkinsspatial
This is a rudimentary initial implementation for #78. The core code is ported directly from kerchunk's hdf backend. I have not ported the bulk of the kerchunk backend's specialized encoding translation logic but I'll try to do so incrementally so that we can build complete test coverage for the many edge cases it currently covers.