Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of std::move when using cuco::make_pair #138

Merged
merged 1 commit into from
Jan 28, 2022

Conversation

PointKernel
Copy link
Member

This PR fixes multiple improper uses of cuco::make_pair where types were explicitly specified and std::move were used. Accordingly, it adds a cuco::pair_converter constructor taking universal references and a copy constructor to cuco::pair.

The related cudf implementation should be updated as well:
https://github.com/rapidsai/cudf/blob/57ff6f55b9fd44e8a8e10282d3f95d5f38e299ef/cpp/src/join/hash_join.cuh#L71

@PointKernel PointKernel added helps: rapids Helps or needed by RAPIDS improvement labels Jan 27, 2022
@vyasr
Copy link
Collaborator

vyasr commented Jan 27, 2022

This looks good. I'm assuming that CI won't pass until #130 is merged though.

@PointKernel
Copy link
Member Author

pre-commit.ci run

@PointKernel PointKernel merged commit 6ec8b6d into NVIDIA:dev Jan 28, 2022
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Feb 2, 2022
Related to #9413.

This PR adds `unordered_drop_duplicates`/`unordered_distinct_count` APIs by using hash-based algorithms. It doesn't close the original issue since adding `std::unique`-like `drop_duplicates` is not addressed in this PR. It involves several changes:

- [x] Change the behavior of the existing `distinct_count`: counting the number of consecutive groups of equivalent rows instead of total unique.
- [x] Add hash-based `unordered_distinct_count`: this new API counts unique rows across the whole table by using a hash map. It requires a newer version of `cuco` with bug fixing: NVIDIA/cuCollections#132 and NVIDIA/cuCollections#138.
- [x] Add hash-based `unordered_drop_duplicates`: similar to `drop_duplicates`, but this API doesn't support `keep` option and the output is in an unspecified order.
- [x] Replace all the cpp-side `drop_duplicates`/`distinct_count` use cases with `unordered_` versions. 
- [x] Update and replace the existing compaction benchmark with `nvbench`.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - https://github.com/brandon-b-miller
  - Bradley Dice (https://github.com/bdice)
  - Nghia Truong (https://github.com/ttnghia)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #10030
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Feb 3, 2022
This PR fixes the `cuCollections` pair issue via NVIDIA/cuCollections#138 thus we don't have to pass rvalues to `cuco::make_pair`.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Paul Taylor (https://github.com/trxcllnt)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #10195
rapids-bot bot pushed a commit to rapidsai/raft that referenced this pull request Mar 25, 2022
This PR updates the commit hash for cuCollections to include the changes in NVIDIA/cuCollections#138. cudf depends on those changes in 22.04, and some of our CI builds of cudf are finding the version of cuco installed by raft and then failing, so I'm making this change to 22.04 even though we're in code freeze. Happy to work with ops an an alternate solution if there are concerns about the update, though.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #592
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helps: rapids Helps or needed by RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants