open_virtual_dataset with and without indexes #52

TomNicholas · 2024-03-26T19:57:15Z

Closes #18

This PR ensures that you can either create a virtual dataset with real in-memory pandas indexes using open_virtual_dataset(..., indexes=None), or avoid creating indexes entirely using open_virtual_dataset(..., indexes={}). The kwarg signature here was chosen to match what is planned for xr.open_dataset, see pydata/xarray#6633.

TomNicholas · 2024-03-26T19:58:35Z

virtualizarr/xarray.py

+    vds_refs = kerchunk.read_kerchunk_references_from_file(
        filepath=filepath,
        filetype=filetype,
    )

-    ds = dataset_from_kerchunk_refs(
-        ds_refs,
+    if indexes is None:
+        # add default indexes by reading data from file
+        # TODO we are reading a bunch of stuff we know we won't need here, e.g. all of the data variables...
+        # TODO it would also be nice if we could somehow consolidate this with the reading of the kerchunk references
+        ds = xr.open_dataset(filepath)
+        indexes = ds.xindexes
+        ds.close()
+
+    vds = dataset_from_kerchunk_refs(
+        vds_refs,
        drop_variables=drop_variables,
        virtual_array_class=virtual_array_class,
        indexes=indexes,
    )


We're literally opening the file twice here - once with kerchunk to read all the byte ranges, and then optionally once again to read in the values to use to create defaut pandas indexes with xarray.

Wondering if you have any thoughts on how hard it might be to consolidate these @jhamman ?

for more information, see https://pre-commit.ci

…iZarr into open_indexes

codecov · 2024-03-26T20:20:38Z

Codecov Report

Attention: Patch coverage is 96.92308% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 90.04%. Comparing base (970d354) to head (aceba53).

Files	Patch %	Lines
virtualizarr/tests/test_xarray.py	95.74%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   88.12%   90.04%   +1.91%     
==========================================
  Files          13       13              
  Lines         893      944      +51     
==========================================
+ Hits          787      850      +63     
+ Misses        106       94      -12

Flag	Coverage Δ
unittests	`90.04% <96.92%> (+1.91%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TomNicholas added 3 commits March 26, 2024 14:49

test passing indexes={}

370c651

test creating default indexes by passing indexes=None

bc32a82

implementation of creating default indexes

6d63446

TomNicholas added the xarray Requires changes to xarray upstream label Mar 26, 2024

TomNicholas commented Mar 26, 2024

View reviewed changes

TomNicholas and others added 7 commits March 26, 2024 15:59

typo

3cc5cab

clarify docstring

8b929cb

[pre-commit.ci] auto fixes from pre-commit.com hooks

cb5f2e1

for more information, see https://pre-commit.ci

clarify docstring further

cb7397a

Merge branch 'open_indexes' of https://github.com/TomNicholas/Virtual…

5621a99

…iZarr into open_indexes

add pooch to test dependencies

809c83e

remove | character

11cc55d

TomNicholas added 3 commits March 26, 2024 16:26

remove rogue print statements

2f654ed

test using combine_by_coords

5ffd401

document how to create virtual datasets with in-memory indexes

6dc8d8d

TomNicholas mentioned this pull request Mar 27, 2024

Better documentation #8

Open

TomNicholas added 3 commits March 27, 2024 10:40

correct .indexes example

bb21854

note about work on writing to Zarr v3

709eee1

Merge branch 'main' into open_indexes

aceba53

TomNicholas merged commit 7927fe3 into main Mar 27, 2024
8 checks passed

TomNicholas deleted the open_indexes branch March 27, 2024 18:22

TomNicholas mentioned this pull request Mar 27, 2024

Inferring concatenation order from coordinate data values #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open_virtual_dataset with and without indexes #52

open_virtual_dataset with and without indexes #52

TomNicholas commented Mar 26, 2024 •

edited

Loading

TomNicholas Mar 26, 2024

codecov bot commented Mar 26, 2024 •

edited

Loading

open_virtual_dataset with and without indexes #52

open_virtual_dataset with and without indexes #52

Conversation

TomNicholas commented Mar 26, 2024 • edited Loading

TomNicholas Mar 26, 2024

Choose a reason for hiding this comment

codecov bot commented Mar 26, 2024 • edited Loading

Codecov Report

TomNicholas commented Mar 26, 2024 •

edited

Loading

codecov bot commented Mar 26, 2024 •

edited

Loading