You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm building VirtualiZarr, an evolution of kerchunk, that allows you to determine byte ranges of chunks in netCDF files, but then concatenate the virtual representation of those chunks using xarray's API.
This works by creating a ChunkManifest object in-memory (one per netCDF Variable per file initially), then defining ways to merge those manifests.
What I'm wondering is if cog3pio's code could be useful to me as a way to generate the ChunkManifest for a netCDF file without using kerchunk/fsspec (see this issue). In other words I use cog3pio only to determine the byte ranges, not for actually reading the data. (I plan to actually read the bytes later as if it were zarr using the rust object-store crate, see zarr-developers/zarr-python#1661).
Q's:
Is this idea dumb?
Does cog3pio expose the byte range information currently?
Oh hi Tom, I was just chatting to @norlandrhagen at the Pangeo weekly meeting 😆. I will say that yes, it should be possible to work out the byte ranges from just the GeoTIFF's header, and we can expose that via an API function somehow. I just need to figure out how GDAL does this, and re-implement it here (easier said than done).
Note that I'm already using object-store in cog3pio (#5), and passing a HTTP url to a GeoTIFF should already work. Reading from s3 (or azure, gcp, etc) will work too if I enable the feature flag here and recompile:
object_store = { version = "0.9.0", features = ["http"] }
I'm aware that the Zarr v3 implementation is using object-store, and keeping an eye on progress at https://github.com/roeap/object-store-python. Would definitely be keen to standardize on object-store as the 'fsspec-for-Rust'.
I would add, that there's no reason that kerchunk needs to use fsspec - it's really a package of ways to make reference files. Therefore, if you come up with a way to get COG offsets, it can easily live there together with the other reference makers. Does TIFFFile do the required work too?
Basically same question as gauteh/hidefix#38 (comment) but for this library 😁
I'm building VirtualiZarr, an evolution of kerchunk, that allows you to determine byte ranges of chunks in netCDF files, but then concatenate the virtual representation of those chunks using xarray's API.
This works by creating a
ChunkManifest
object in-memory (one per netCDF Variable per file initially), then defining ways to merge those manifests.What I'm wondering is if cog3pio's code could be useful to me as a way to generate the
ChunkManifest
for a netCDF file without usingkerchunk
/fsspec
(see this issue). In other words I use cog3pio only to determine the byte ranges, not for actually reading the data. (I plan to actually read the bytes later as if it were zarr using the rustobject-store
crate, see zarr-developers/zarr-python#1661).Q's:
cc @norlandrhagen
The text was updated successfully, but these errors were encountered: