Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cudf::distinct in Python binding #11230

Closed
wants to merge 7 commits into from

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Jul 8, 2022

This changes the internal implementation of cudf.stream_compaction.drop_duplicates from using cudf::unique(cudf::stable_sort(input)) into directly using cudf::distinct, which avoids sorting the input and improves the performance from O(nlogn) to O(n) time complexity.

A new option keep = 'any' from cudf::duplicate_keep_option is also adopted.

@ttnghia ttnghia added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. Cython Performance Performance related issue non-breaking Non-breaking change labels Jul 8, 2022
@ttnghia ttnghia self-assigned this Jul 8, 2022
@ttnghia ttnghia changed the title Use cudf::distinct in Python Use cudf::distinct in Python binding Jul 8, 2022
@ttnghia ttnghia added the improvement Improvement / enhancement to an existing function label Jul 8, 2022
@github-actions github-actions bot added CMake CMake build issue conda Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Aug 9, 2022
@ttnghia ttnghia changed the base branch from branch-22.08 to branch-22.10 August 9, 2022 22:03
@github-actions github-actions bot removed libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue conda Java Affects Java cuDF API. labels Aug 19, 2022
@rapidsai rapidsai deleted a comment from github-actions bot Aug 19, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Aug 19, 2022

Rerun tests.

@ttnghia
Copy link
Contributor Author

ttnghia commented Sep 23, 2022

This one is taken over by #11656. I will keep this open until that PR is merged.

@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Performance Performance related issue Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant