-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds launch bounds hints to mixed join kernels to address regression seen in NDS q72 in Spark #10534
Adds launch bounds hints to mixed join kernels to address regression seen in NDS q72 in Spark #10534
Conversation
@abellina you'll need to run clang-format locally. If we're going to make these changes we should probably also make them in |
@vyasr, thanks. The changed looked good to me so I thought "how could the style be wrong".. well it was, I'll fix shortly. Happy to add the check to the semi kernels. I can look for a query that uses it. q72 is special because it is dominated by kernel time, especially the mixed join, so it is very sensitive. |
CI is now failing here because the black fix in #10523 was not backported to 22.04 (because we were in code freeze and didn't want to push the fix if we didn't have to). I think a decision on whether or not to backport that is probably dependent on whether or not to push forward with this change in 22.04 or 22.06. |
This PR is blocked by #10535. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me once the lint issues are resolved. Thanks for your help there, @vyasr
@vyasr, regarding this:
I made the same change in mixed_join_kernels_semi.cu and mixed_join_size_kernels_semi.cu, and the effect is pretty minimal. Given the above, and apologies as my test isn't that useful, do you still want the semi change in this PR? |
…upancy_in_mixed_join
Discussed with @nvdbaranec offline about the below, I'll add the patch to semi as a separate commit, and if people want to back it out let me know. But at least this way we are consistent.
|
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10534 +/- ##
=============================================
Coverage 86.17% 86.17%
=============================================
Files 141 141
Lines 22510 22510
=============================================
Hits 19398 19398
Misses 3112 3112 Continue to review full report at Codecov.
|
@abellina I think we may as well include it. The semi kernels are intrinsically less complicated so I'm not surprised that they weren't as sensitive in this case, but you never know what future changes might have an effect here and the launch bounds are accurate so we may as well be consistent as @nvdbaranec says. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs admin merge.
The following change addresses a performance degradation we noticed in the
mixed_join
andcompute_mixed_join_output_size
that looks to be tied to the theoretical occupancy of these kernels, as limited by the number of registers used.The regression is triggered by this patch: #9727, which improves handling of unreachable code paths. That said, somehow, this change is altering the number of registers these kernels need. Both
mixed_join
andcompute_mixed_join_output_size
are very sensitive to the register count, per NSight compute. With the patch, the register required changed from 92 to 102, and 118 to 141 respectively.The fix here hints the compiler what our block size is (128 threads). This, from our testing, allows the compiler to reduce the number of registers required to 128 for
compute_mixed_join_output_size
and 96 formixed_join
. This lead to better occupancy (I think @nvdbaranec measured it going from 30% to 50%) and I saw the wall clock time of q72 (which started all this) to go from 133s to 121s, which is within the ballpark I'd expect.