Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve performance of expression evaluation (#9210)
This PR does some minor reworking of the internals of expression evaluation to improve performance. The largest performance improvements come from passing device data references down the call stack by reference rather than by value. The nullable kernel template experiences significantly higher register pressure and these changes do not seem to be as effective at increasing occupancy on the benchmarks with null data, but in general we see performance improvements across the board for non-null data and in some cases for nulllable data, with improvements ranging up to 40%. This PR also does some minor cleanup: removing some unused functions, replacing `__device__` with `CUDA_DEVICE_CALLABLE` to ensure compatibility with host compilers, and fixing the templating of various functions to ensure proper usage of CRTP. These changes are intended to facilitate future redesigning of the internals of the `device_data_references` to reduce the depth of these call stacks, simplify the code, and reduce register pressure. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #9210
- Loading branch information