[BUG] Improve performance of mixed joins on H100 #13662
Labels
0 - Backlog
In queue waiting for assignment
bug
Something isn't working
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Milestone
When comparing x86-H100 versus x86-V100 microbenchmark performance in libcudf, we found that mixed join benchmarks showed slower runtimes on H100. The rest of libcudf microbenchmarks tend to be around 2-3x faster. Perhaps we need to adjust
DEFAULT_JOIN_BLOCK_SIZE
(code pointer), or some other performance hinting on the mixed join kernels.Also see #10534 which added launch bounds to mixed joins kernels.
Figure showing H100 vs V100 speedup results:
Figure zooming in on JOIN benchmarks:
[Based on libcudf 23.08 commit
aed7174eae6c6
]The text was updated successfully, but these errors were encountered: