-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for cudf::contains
when searching for a scalar
#11202
Conversation
Here is the benchmark results, comparing the performance after vs before this PR:
|
Here is the original benchmark results: Before:
After:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few suggestions.
cpp/benchmarks/search/search.cpp
Outdated
BINARY_SEARCH_BENCHMARK_DEFINE(Column_AllValid, false) | ||
BINARY_SEARCH_BENCHMARK_DEFINE(Column_HasNulls, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if intentional, but AFAICT we don't need separate definitions here. Validity can be another parameter, something like (if CreateRange
is available in the version that we use)
->ArgsProduct({
benchmark::CreateRange(100000, 100000000, 10),
{0,1}
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't know that there is such way. I'm trying that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current Google benchmark version used in cudf doesn't support CreateRange
. I've created another PR to update it: #11209
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vuule It seems that we can't upgrade Google benchmark, so unfortunately your suggestion here can't be worked on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should look at this as an opportunity to convert this benchmark to nvbench. The reason not to allow something like #11209 is to encourage us to switch (which is also important for adjacent initiatives like building dashboards around better benchmark tracking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion, otherwise LGTM!
@gpucibot merge |
The current implementation of
cudf::contains(column_view, scalar)
usesthrust::find
andthrust::any_of
(which also callsthrust::find_if
under the hood). These thrust APIs were known to have performance regression (NVIDIA/cccl#720).This PR replaces
thrust::find
andthrust::any_of
incudf::contains
bythrust::count_if
, which improves performance significantly.Benchmarks show that the run time can be reduced as much as 80% after modification, or up to 5X speedup.
Closes #3806.