Reduce shared memory usage in gpuComputePageSizes by 50% (#13047) · rapidsai/cudf@c4a34eb

Commit

Reduce shared memory usage in gpuComputePageSizes by 50% (#13047)

In a multithreaded, multi-stream environment (Spark) we were experiencing a performance regression on some benchmark queries.  The culprit was gpu scheduling issues related to the `gpuComputePageSizes` kernel.   Dependent kernels (`gpuDecodePages`) were getting serialized because `gpuComputePageSizes` wasn't running alongside other streams well.

The fix was reducing shared memory usage in `gpuComputePageSizes`.  The kernel shares a lot of code and data structures with `gpuDecodePages` but doesn't actually use several of the large buffers that are stored in shared memory.  This PR refactors those buffers out so that they are only declared in the `gpuDecodePages` kernel, reducing the shared usage by 50% (3kb).

This clears up the performance issue on Spark.  I am currently experiencing build issues with cudf benchmarks so I'm marking this as do-not-merge until I can verify with them.

Authors:
  - https://github.com/nvdbaranec
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #13047

Loading branch information

nvdbaranec authored Apr 7, 2023

1 parent f328b64 commit c4a34eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `c4a34eb`

Commit

There are no files selected for viewing

0 comments on commit c4a34eb

0 comments on commit `c4a34eb`