Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Dask-CuDF: use default Dask Dataframe optimizer (#8581)
In order to use the new HighLevelGraph optimization work in Dask/Distributed, this PR makes `dask_cudf.Dataframes` use the default Dask optimizer. Previously, we have been explicitly materialized the HighLevelGraphs when calling `submit()` and `compute()` on `dask_cudf.Dataframes`. Overall, this should improve performance but by default low-level task optimizations are disabled, which _might_ have a negative impact. High-level optimizations are done in any case and we are working on moving all low-level optimization to high-level but currently low-level optimization such as array slicing is only supported by the low-level. I don't think we will be missing any low-level optimizations related to Dataframes so I think we should follow Dask on this one and disable low-level optimizations by default. It is possible to enable low-level optimizations explicitly by setting the Dask config like: ```python dask.config.set({"optimization.fuse.active": True}) ``` cc. @jakirkham, @quasiben, @beckernick, @VibhuJawa Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - https://github.com/jakirkham URL: #8581
- Loading branch information