Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunking schema for analysis-ready? #12

Closed
Lkruitwagen opened this issue Oct 21, 2022 · 3 comments
Closed

chunking schema for analysis-ready? #12

Lkruitwagen opened this issue Oct 21, 2022 · 3 comments

Comments

@Lkruitwagen
Copy link

Hey guys! Love to see this effort.

Quick question about the chunking schema you're planning for an analysis-ready corpus. Will you keep the native ERA5 chunking, i.e. {'time':1}? Are you chunking variables together?

With h2ox we've chunked hourly ERA5-land in blocks of 4-years with some moderate spatial aggregation, e.g. 5 degrees. Our main use case is for quick/easy retrieval of timeseries. Would be good to know what you guys are thinking for the chunking schema here!

@alxmrs
Copy link
Collaborator

alxmrs commented Aug 29, 2023

Sorry for the quite late reply. We're releasing data chunked at 1 hour and don't have plans to provide spatial chunks -- though, that is a good idea for the problem you have.

In the medium-term: The time-series use case is an important one that we want to address. We've just updated our roadmap with respect to this goal (#48). To address this, we plan on mirroring our ERA5 data into Google BigQuery (focusing on the AR corpus).

We're planning on using this piece of infrastructure for the data ingestion: https://github.com/google/weather-tools/tree/main/weather_mv#weather-mv-bigquery

@alxmrs
Copy link
Collaborator

alxmrs commented Oct 3, 2023

We did do an internal chunking experiment where we prioritized querying by time. This revealed the inherent Zarr-specific tradeoffs where you have to prioritize between space and time (or, between dimensions A vs dimensions B). Given the other use cases we want to support for this dataset, our plan is to prefer the existing chunking scheme (the whole globe at every hour) and to support timeseries like analysis with BigQuery.

@alxmrs alxmrs closed this as completed Oct 3, 2023
@Lkruitwagen
Copy link
Author

Thanks for the update. Glad to see this project going somewhere!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants