Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR enables "disk"-based shuffling of `cudf`-backed Dask-DataFrame collections, but does **not** yet add the `shuffle="disk"` option to the `dask_cudf.DataFrame.shuffle/sort_values` APIs. We now use basic (slow) `pickle` logic to convert `cudf.DataFrame` objects to/from `bytes` here, so I'd like to consider further optimizations before making the `shuffle="disk"` option "official". Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) - Ray Douglass (https://github.com/raydouglass) - gpuCI (https://github.com/GPUtester) - Mike Wendt (https://github.com/mike-wendt) - AJ Schmidt (https://github.com/ajschmidt8) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #14287
- Loading branch information