-
Notifications
You must be signed in to change notification settings - Fork 912
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Parquet reader optimization to address V100 regression. (#12577)
Addresses #12316 Some recent changes caused a performance regression in the parquet reader benchmarks for lists. The culprit ended up being slightly different code generation happening for arch 70. In several memory hotspots, the code was reading values from global, modifying them and then storing them. Previously it had done a better job of loading and keeping them in registers and the L2 cache was helping keep things fast. But the extra store was causing twice as many L2 access in these places and causing many long scoreboard stalls. Ultimately the issue is that these values shouldn't be kept in global memory. The initial implementation did it this way because the data was variable in size (based on depth of column nesting). But in practice, we never see more than 2 or 3 levels of nesting. So the solution is: - Keep these values (in a struct called `PageNestingDecodeInfo`) that is kept in shared memory for up to N nesting levels. N is currently 10. - If the nesting information for the incoming column fits in the cache, use it. Otherwise fall back to the arrays in global memory. In practice, it is exceedingly rare to see columns nested >= 10 deep. This addresses the performance regression and actually gives some performance increases. Some comparisons for LIST benchmarks. ``` cudf 22.10 (prior to regression) | data_type | cardinality | run_length | bytes_per_second | |-----------|-------------|------------|------------------| | LIST | 0 | 1 | 892901208 | | LIST | 1000 | 1 | 952863876 | | LIST | 0 | 32 | 1246033395 | | LIST | 1000 | 32 | 1232884866 | ``` ``` cudf 22.12 (where the regression occurred) | data_type | cardinality | run_length | bytes_per_second | |-----------|-------------|------------|------------------| | LIST | 0 | 1 | 747758436 | | LIST | 1000 | 1 | 827763260 | | LIST | 0 | 32 | 1026048576 | | LIST | 1000 | 32 | 1022928119 | ``` ``` This PR | data_type | cardinality | run_length | bytes_per_second | |-----------|-------------|------------|------------------| | LIST | 0 | 1 | 927347737 | | LIST | 1000 | 1 | 1024566150 | | LIST | 0 | 32 | 1315972881 | | LIST | 1000 | 32 | 1303995168 | ``` Authors: - https://github.com/nvdbaranec Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #12577
- Loading branch information
1 parent
ed6daad
commit 2784f58
Showing
5 changed files
with
245 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.