Skip to content

Commit

Permalink
Improve performance of expression evaluation (#9210)
Browse files Browse the repository at this point in the history
This PR does some minor reworking of the internals of expression evaluation to improve performance. The largest performance improvements come from passing device data references down the call stack by reference rather than by value. The nullable kernel template experiences significantly higher register pressure and these changes do not seem to be as effective at increasing occupancy on the benchmarks with null data, but in general we see performance improvements across the board for non-null data and in some cases for nulllable data, with improvements ranging up to 40%. This PR also does some minor cleanup: removing some unused functions, replacing `__device__` with `CUDA_DEVICE_CALLABLE` to ensure compatibility with host compilers, and fixing the templating of various functions to ensure proper usage of CRTP. These changes are intended to facilitate future redesigning of the internals of the `device_data_references` to reduce the depth of these call stacks, simplify the code, and reduce register pressure.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - David Wendt (https://github.com/davidwendt)

URL: #9210
  • Loading branch information
vyasr authored Sep 16, 2021
1 parent 40a3b03 commit 6128476
Show file tree
Hide file tree
Showing 5 changed files with 195 additions and 179 deletions.
Loading

0 comments on commit 6128476

Please sign in to comment.