Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using cog3pio to determine byte ranges in COG files? #16

Open
TomNicholas opened this issue Apr 10, 2024 · 3 comments
Open

Using cog3pio to determine byte ranges in COG files? #16

TomNicholas opened this issue Apr 10, 2024 · 3 comments
Labels
feature New feature or request help wanted Extra attention is needed

Comments

@TomNicholas
Copy link

TomNicholas commented Apr 10, 2024

Basically same question as gauteh/hidefix#38 (comment) but for this library 😁

I'm building VirtualiZarr, an evolution of kerchunk, that allows you to determine byte ranges of chunks in netCDF files, but then concatenate the virtual representation of those chunks using xarray's API.

This works by creating a ChunkManifest object in-memory (one per netCDF Variable per file initially), then defining ways to merge those manifests.

What I'm wondering is if cog3pio's code could be useful to me as a way to generate the ChunkManifest for a netCDF file without using kerchunk/fsspec (see this issue). In other words I use cog3pio only to determine the byte ranges, not for actually reading the data. (I plan to actually read the bytes later as if it were zarr using the rust object-store crate, see zarr-developers/zarr-python#1661).

Q's:

  • Is this idea dumb?
  • Does cog3pio expose the byte range information currently?
  • Can cog3pio read over S3?

cc @norlandrhagen

@weiji14
Copy link
Owner

weiji14 commented Apr 10, 2024

Oh hi Tom, I was just chatting to @norlandrhagen at the Pangeo weekly meeting 😆. I will say that yes, it should be possible to work out the byte ranges from just the GeoTIFF's header, and we can expose that via an API function somehow. I just need to figure out how GDAL does this, and re-implement it here (easier said than done).

Note that I'm already using object-store in cog3pio (#5), and passing a HTTP url to a GeoTIFF should already work. Reading from s3 (or azure, gcp, etc) will work too if I enable the feature flag here and recompile:

object_store = { version = "0.9.0", features = ["http"] }

I'm aware that the Zarr v3 implementation is using object-store, and keeping an eye on progress at https://github.com/roeap/object-store-python. Would definitely be keen to standardize on object-store as the 'fsspec-for-Rust'.

@TomNicholas
Copy link
Author

Awesome! Thanks @weiji14

Would definitely be keen to standardize on object-store as the 'fsspec-for-Rust'.

Yeah this would be great, and I like the way you've described the aim there.

@weiji14 weiji14 added help wanted Extra attention is needed feature New feature or request labels Apr 10, 2024
@martindurant
Copy link

I would add, that there's no reason that kerchunk needs to use fsspec - it's really a package of ways to make reference files. Therefore, if you come up with a way to get COG offsets, it can easily live there together with the other reference makers. Does TIFFFile do the required work too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants