-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Overhead in sorting single partition frames with dask_cudf
#3873
Comments
dask_cudf
dask_cudf
This may be a duplicate of #2272 depending on how that issue is resolved. |
Thanks for raising this @VibhuJawa - I can definitely add a simple fix here to avoid unnecessary work in the case that there is only a single partition. You are correct that there is no reason for the Other context/info: I have spent some time today working on an experimental version of |
I think for For multi-column sort, i don't think we do these on super big frames, so we can play with repartitioning to see if that helps performance there. I will start a thread on our internal slack because it might be easier to discuss our workflows there. |
Describe the bug
There seems to be overhead in sorting single partition frames with dask vs map_partitions vs cudf.
Steps/Code to reproduce bug
Create Helper
Native Dask
Map Partition
Cudf
Expected behavior
I would expect similar behavior.
Environment details
Gist Link: https://gist.github.com/VibhuJawa/236b073b8e1b1243ad33a099503cdc36
The text was updated successfully, but these errors were encountered: