Workaround for saving converting data to Parquet #81

lewfish · 2022-07-26T18:09:25Z

We have had difficulty converting the NWM subset from Zarr to Parquet in parallel using xArray and Dask. See pydata/xarray#6811 and https://dask.discourse.group/t/workers-dont-have-promised-key-error-and-delayed-computation/936. If we cannot resolve our problems by increasing our understanding of Dask by reading documentation and forum posts, we should find a workaround. One potential workaround is to use AWS Batch to run a script in parallel (using the job array functionality in Batch), where each job selects a specific piece of the datasets and saves it as a Parquet file.

lewfish mentioned this issue Aug 2, 2022

Followup for ESIP experiments #90

Closed

7 tasks

elizawallace closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround for saving converting data to Parquet #81

Workaround for saving converting data to Parquet #81

lewfish commented Jul 26, 2022 •

edited

Loading

Workaround for saving converting data to Parquet #81

Workaround for saving converting data to Parquet #81

Comments

lewfish commented Jul 26, 2022 • edited Loading

lewfish commented Jul 26, 2022 •

edited

Loading