Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for saving converting data to Parquet #81

Closed
Tracked by #90
lewfish opened this issue Jul 26, 2022 · 0 comments
Closed
Tracked by #90

Workaround for saving converting data to Parquet #81

lewfish opened this issue Jul 26, 2022 · 0 comments

Comments

@lewfish
Copy link
Contributor

lewfish commented Jul 26, 2022

We have had difficulty converting the NWM subset from Zarr to Parquet in parallel using xArray and Dask. See pydata/xarray#6811 and https://dask.discourse.group/t/workers-dont-have-promised-key-error-and-delayed-computation/936. If we cannot resolve our problems by increasing our understanding of Dask by reading documentation and forum posts, we should find a workaround. One potential workaround is to use AWS Batch to run a script in parallel (using the job array functionality in Batch), where each job selects a specific piece of the datasets and saves it as a Parquet file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants