-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_virtual_dataset with and without indexes #52
Conversation
vds_refs = kerchunk.read_kerchunk_references_from_file( | ||
filepath=filepath, | ||
filetype=filetype, | ||
) | ||
|
||
ds = dataset_from_kerchunk_refs( | ||
ds_refs, | ||
if indexes is None: | ||
# add default indexes by reading data from file | ||
# TODO we are reading a bunch of stuff we know we won't need here, e.g. all of the data variables... | ||
# TODO it would also be nice if we could somehow consolidate this with the reading of the kerchunk references | ||
ds = xr.open_dataset(filepath) | ||
indexes = ds.xindexes | ||
ds.close() | ||
|
||
vds = dataset_from_kerchunk_refs( | ||
vds_refs, | ||
drop_variables=drop_variables, | ||
virtual_array_class=virtual_array_class, | ||
indexes=indexes, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're literally opening the file twice here - once with kerchunk to read all the byte ranges, and then optionally once again to read in the values to use to create defaut pandas indexes with xarray.
Wondering if you have any thoughts on how hard it might be to consolidate these @jhamman ?
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #52 +/- ##
==========================================
+ Coverage 88.12% 90.04% +1.91%
==========================================
Files 13 13
Lines 893 944 +51
==========================================
+ Hits 787 850 +63
+ Misses 106 94 -12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Closes #18
This PR ensures that you can either create a virtual dataset with real in-memory pandas indexes using
open_virtual_dataset(..., indexes=None)
, or avoid creating indexes entirely usingopen_virtual_dataset(..., indexes={})
. The kwarg signature here was chosen to match what is planned forxr.open_dataset
, see pydata/xarray#6633.