Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduce shared memory usage in gpuComputePageSizes by 50% (#13047)
In a multithreaded, multi-stream environment (Spark) we were experiencing a performance regression on some benchmark queries. The culprit was gpu scheduling issues related to the `gpuComputePageSizes` kernel. Dependent kernels (`gpuDecodePages`) were getting serialized because `gpuComputePageSizes` wasn't running alongside other streams well. The fix was reducing shared memory usage in `gpuComputePageSizes`. The kernel shares a lot of code and data structures with `gpuDecodePages` but doesn't actually use several of the large buffers that are stored in shared memory. This PR refactors those buffers out so that they are only declared in the `gpuDecodePages` kernel, reducing the shared usage by 50% (3kb). This clears up the performance issue on Spark. I am currently experiencing build issues with cudf benchmarks so I'm marking this as do-not-merge until I can verify with them. Authors: - https://github.com/nvdbaranec - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: #13047
- Loading branch information