-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow open_virtual_dataset
to read existing Kerchunk references
#251
Conversation
@keewis thanks for the rec. Having some docs already is nice. |
…ctored _fsspec_open... to class
open_virual_dataset
to read existing Kerchunk referencesopen_virual_dataset
to read existing Kerchunk references
open_virual_dataset
to read existing Kerchunk referencesopen_virtual_dataset
to read existing Kerchunk references
Co-authored-by: Justus Magin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @norlandrhagen !
Convert inlined vars into numpy arrays. Note: After talking with @sharkinsspatial, it seems like the HDF reader he is working on won't inline at all. Should we:
- Raise an error if any inline data exists or
- Add logic to convert any inlined bytes to numpy arrays to maintain more compatibility with other Kerchunk readers?
I feel like this mixes a few issues together. We do need to be able to read back inlined kerchunk references (though it's fine to add that feature in a follow-up PR), partly because even with Sean's PR we're still going to want to use the other kerchunk readers sometimes. I think it makes sense for @sharkinsspatial 's HDF reader not to create inlined refs, so long as we have another way to create inlined refs (i.e. using the normal xarray backend for that filetype).
Mypy checks in CI are currently disabled!
Should be fixed by #252
The utility function
_fsspec_openfile_from_filepath
forced the fsspec filesystem to open a filepath
Not totally sure I understand this but abstracting away fsspec details sounds good.
Co-authored-by: Tom Nicholas <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #251 +/- ##
==========================================
+ Coverage 91.20% 91.37% +0.17%
==========================================
Files 32 32
Lines 2057 2098 +41
==========================================
+ Hits 1876 1917 +41
Misses 181 181
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks for the feedback @keewis and @TomNicholas. I think everything is addressed. The auto-detection is pretty rough, but hopefully if you're using this, you know the difference between parquet and json 🤷 |
Thanks @norlandrhagen - happy to merge! |
Allow
open_virtual_dataset
to read existing Kerchunk references as virtual datasetsdocs/releases.rst
To Do:
\\
in variable names. Fixed.json
was adding trailing slashes on ref write.json
write.json
write.inlined
vars into numpy arrays. Note: After talking with @sharkinsspatial, it seems like the HDF reader he is working on won't inline at all. Should we: 1. Raise an error if any inline data exists or 2. Add logic to convert any inlined bytes to numpy arrays to maintain more compatibility with other Kerchunk readers?Notes:
_fsspec_openfile_from_filepath
forced the fsspec filesystem to open a filepath (the fault of past me). I replaced it with a simple class that has a.open_file
method that can be used as needed. The KerchunkLazyReferenceMapper
required a fsspec filesystem.