Skip to content

Commit

Permalink
Reduce shared memory usage in gpuComputePageSizes by 50% (#13047)
Browse files Browse the repository at this point in the history
In a multithreaded, multi-stream environment (Spark) we were experiencing a performance regression on some benchmark queries.  The culprit was gpu scheduling issues related to the `gpuComputePageSizes` kernel.   Dependent kernels (`gpuDecodePages`) were getting serialized because `gpuComputePageSizes` wasn't running alongside other streams well.

The fix was reducing shared memory usage in `gpuComputePageSizes`.  The kernel shares a lot of code and data structures with `gpuDecodePages` but doesn't actually use several of the large buffers that are stored in shared memory.  This PR refactors those buffers out so that they are only declared in the `gpuDecodePages` kernel, reducing the shared usage by 50% (3kb).

This clears up the performance issue on Spark.  I am currently experiencing build issues with cudf benchmarks so I'm marking this as do-not-merge until I can verify with them.

Authors:
  - https://github.com/nvdbaranec
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #13047
  • Loading branch information
nvdbaranec authored Apr 7, 2023
1 parent f328b64 commit c4a34eb
Showing 1 changed file with 105 additions and 58 deletions.
Loading

0 comments on commit c4a34eb

Please sign in to comment.