Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update distinct/unique_count to experimental::row hasher/comparator #12776

Merged

Conversation

divyegala
Copy link
Member

@divyegala divyegala commented Feb 14, 2023

This PR is a part of #11844

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Compilation Times

distinct_count.cu
This PR: 3m8.392s
main: 1m37.576s

unique_count.cu
This PR: 13m8.858s
main: 8m21.900s

@divyegala divyegala requested a review from a team as a code owner February 14, 2023 21:01
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 14, 2023
@divyegala divyegala added feature request New feature or request 5 - DO NOT MERGE Hold off on merging; see PR for details non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Feb 14, 2023
@divyegala divyegala marked this pull request as draft February 14, 2023 21:02
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 14, 2023
@divyegala divyegala removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Feb 16, 2023
@divyegala divyegala marked this pull request as ready for review February 16, 2023 17:00
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor questions.

cpp/src/stream_compaction/distinct_count.cu Outdated Show resolved Hide resolved
cpp/src/stream_compaction/distinct_count.cu Show resolved Hide resolved
cpp/src/stream_compaction/distinct_count.cu Outdated Show resolved Hide resolved
cpp/src/stream_compaction/unique_count.cu Outdated Show resolved Hide resolved
cpp/src/stream_compaction/unique_count.cu Show resolved Hide resolved
Co-authored-by: Vyas Ramasubramani <[email protected]>
@divyegala divyegala requested a review from vyasr February 21, 2023 21:16
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are benchmarks impacted by this PR?

@divyegala
Copy link
Member Author

@karthikeyann

How are benchmarks impacted by this PR?

There are no benchmarks for these algorithms

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor question, otherwise LGTM. Is the plan to keep moving forward with these for now and investigate compile times when we can?

rmm::exec_policy(stream),
thrust::counting_iterator<cudf::size_type>(0),
thrust::counting_iterator<cudf::size_type>(keys.num_rows()),
[comp] __device__(cudf::size_type i) { return (i == 0 or not comp(i, i - 1)); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that we can't capture the comparator by reference because it's a host object that needs to be copied to device for the lambda?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct

@divyegala
Copy link
Member Author

@vyasr and I followed up offline on his compile time question.

@divyegala
Copy link
Member Author

/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants