-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] performance regression after semi_anti_join
refactor
#11299
Comments
abellina
added
bug
Something isn't working
Needs Triage
Need team to review and classify
labels
Jul 19, 2022
bdice
added
libcudf
Affects libcudf (C++/CUDA) code.
and removed
Needs Triage
Need team to review and classify
labels
Jul 20, 2022
This was referenced Jul 20, 2022
rapids-bot bot
pushed a commit
that referenced
this issue
Jul 22, 2022
…ins` (#11330) The current implementation of `cudf::detail::contains` can process input with arbitrary nested types. However, it was reported to have severe performance issue when the input tables have many duplicate rows (#11299). In order to fix the issue, #11310 and #11325 was created. Unfortunately, #11310 is separating semi-anti-join from `cudf::detail::contains`, causing duplicate implementation. On the other hand, #11325 can address the issue #11299 but semi-anti-join using it still performs worse than the previous semi-anti-join implementation. The changes in this PR include the following: * Fix the performance issue reported in #11299 for the current `cudf::detail::contains` implementation that support nested types. * Add a separate code path into `cudf::detail::contains` such that: * Input without having lists column (at any nested level) will be processed by the code path that is the same as the old implementation of semi-anti-join. This is to make sure the performance of semi-anti-join will remain the same as before. * Input with nested lists column, or NaNs compared as unequal, will be processed by another code path that supports nested types and different NaNs behavior. This will make sure support for nested types will not be dropped. Closes #11299. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) - MithunR (https://github.com/mythrocks) - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) - Alessandro Bellina (https://github.com/abellina) URL: #11330
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We are seeing an issue in NDS query14_part1 where we see it seemingly never finishes with cuDF versions after: #11100. Reverting this PR fixes the issue. I believe it's making slow progress, but other timeouts are kicking in (in one case after 2 hours). I can see the issue is the
left_semi_join
cuDF call, specificallycuco::detail::insert
taking all of the time.In scala, a pretty easy repro is:
This is 3 seconds with the new code, and a few millis (I've seen as low as 3 ms) after reverting the PR.
regerssion_cudf_samples.zip
The text was updated successfully, but these errors were encountered: