Skip to content
This repository has been archived by the owner on Nov 12, 2024. It is now read-only.

Matt/icechunk encoding #2

Merged
merged 6 commits into from
Oct 4, 2024
Merged

Matt/icechunk encoding #2

merged 6 commits into from
Oct 4, 2024

Conversation

mpiannucci
Copy link

Not ready for primetime, but first time wroking in this repo want to make sure this approach is in the right direction of what needs to be done.

I think this also has to be done for zarr writer maybe

@mpiannucci mpiannucci marked this pull request as draft October 2, 2024 16:53
@@ -130,6 +131,11 @@ async def write_virtual_variable_to_icechunk(
# TODO we should probably be doing some encoding of those attributes?
arr.attrs["DIMENSION_NAMES"] = var.dims

_encoding_keys = {"_FillValue", "missing_value", "scale_factor", "add_offset"}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filters and compressors are packed in the codec pipeline, but CF encoding params are not

@mpiannucci mpiannucci requested a review from TomNicholas October 2, 2024 16:58

# check chunk references
# TODO we can't explicitly check that the path/offset/length is correct because icechunk doesn't yet expose any get_virtual_refs method

expected_ds = open_dataset(netcdf4_file)
expected_air_array = expected_ds["air"].to_numpy()
expected_air_array = expected_ds["air"].to_numpy() / scale_factor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we are basically replicating the step that xarray would do lazily if we opened the icechunk store using xarray?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya... I just switched it around to be more clear

Copy link

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thiiink this is right? It would be nice to get Tom A / Sean H to look at it.

@mpiannucci mpiannucci marked this pull request as ready for review October 2, 2024 17:18
@mpiannucci
Copy link
Author

mpiannucci commented Oct 3, 2024

OK leaving this here for a short while.

  • I added attribute encoding.
  • I tried to pull in variable encoding as a whole, as well as the cf encoding from xarray but it failed with a NotImplemented error when calling astype on ManifestArray
  • The tests pass doing things manually
  • I tried to run all of this with xarray from the zarr 3 pr and failed because kerchunk is not updated for zarr 3 api changes

All of these can be overcome i think, up to you if you want to sync this with your branch

@TomNicholas
Copy link

NotImplementedError when calling .astype on ManifestArray

Is there a simple solution to this? You can't change the dtype of a ManifestArray because that would involve loading data...

I tried to run all of this with xarray from the zarr 3 pr and failed because kerchunk is not updated for zarr 3 api changes

Right, annoying isn't it 😅 I don't actually think there is an upstream kerchunk issue for that - we should raise one.

@TomNicholas
Copy link

I would like to merge this into my branch but I'm not sure about this

NotImplementedError when calling .astype on ManifestArray

Shouldn't this step become part of the codec somehow?

@mpiannucci
Copy link
Author

Currently as implanted this doesn't happen. It happens when I switched to using arrays internal encoding functions.

@TomNicholas
Copy link

Okay so basically this problem only crops up when you try to integrate the code path here with the codepath in xarray's to_zarr more closely?

@mpiannucci
Copy link
Author

yupp correct

@TomNicholas TomNicholas merged commit 9676485 into icechunk Oct 4, 2024
@mpiannucci mpiannucci deleted the matt/icechunk-encoding branch October 4, 2024 17:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants