-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update contains_table
to experimental row hasher and equality comparator
#13119
Update contains_table
to experimental row hasher and equality comparator
#13119
Conversation
benchmarks
|
cpp/src/search/contains_table.cu
Outdated
// | ||
// TODO: We should unify these code paths in the future when performance regression is no longer | ||
// happening. | ||
return contains_impl(haystack, needles, compare_nulls, compare_nans, stream, mr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that function is called just once, please move its content here and remove it completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I don't see a need for a separate impl function here.
Thank you @divyegala for suggesting this change. We can just barely see a performance difference in the automated microbenchmarks - about 1.5% slower in the worst case. The difference is close to the noise level and acceptable in my opinion. |
@divyegala can the 1.5% worst-case regression that @GregoryKimball found be attributed to any of the code changes? |
@abellina Yes, I would say that if it's not just noise and we consistently see a 1.5% worst-case regression then it's because of the code changes. That's where your help with the NDS query would be a good insight to have. |
@divyegala my question was if you could explain why there would be a regression. e.g. would you expect a regression if there are nulls, certain data types, specific cuDF expressions. |
@divyegala I do not see regressions with this patch vs the 23.06 nightly. I ran it 5 times each, and then compared both sets of results:
|
Based on what I see above, this PR should be ready for review then? No real performance concerns here IIUC. |
@vyasr yes, it's ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving assuming Nghia's comment about combining functions is addressed.
cpp/src/search/contains_table.cu
Outdated
// | ||
// TODO: We should unify these code paths in the future when performance regression is no longer | ||
// happening. | ||
return contains_impl(haystack, needles, compare_nulls, compare_nans, stream, mr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I don't see a need for a separate impl function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since some code was removed, there are some headers such as struct utilities are no longer needed. Please clean them up too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any need for additional unit tests for this change?
@karthikeyann this functionality already existed, this PR is just essentially a refactor |
/merge |
This is a part of #11844
Benchmarks: #13119 (comment)
Checklist