Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-merge branch-23.02 to branch-23.04 #12612

Merged
merged 1 commit into from
Jan 25, 2023
Merged

Conversation

GPUtester
Copy link
Collaborator

Forward-merge triggered by push to branch-23.02 that creates a PR to keep branch-23.04 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge.

Addresses #12316

Some recent changes caused a performance regression in the parquet reader benchmarks for lists.  The culprit ended up being slightly different code generation happening for arch 70.  In several memory hotspots, the code was reading values from global, modifying them and then storing them.   Previously it had done a better job of loading and keeping them in registers and the L2 cache was helping keep things fast.  But the extra store was causing twice as many L2 access in these places and causing many long scoreboard stalls.

Ultimately the issue is that these values shouldn't be kept in global memory.  The initial implementation did it this way because the data was variable in size (based on depth of column nesting).  But in practice, we never see more than 2 or 3 levels of nesting.  So the solution is:

- Keep these values (in a struct called `PageNestingDecodeInfo`) that is kept in shared memory for up to N nesting levels.  N is currently 10.
- If the nesting information for the incoming column fits in the cache, use it.  Otherwise fall back to the arrays in global memory.  In practice, it is exceedingly rare to see columns nested >= 10 deep.

This addresses the performance regression and actually gives some performance increases.   Some comparisons for LIST benchmarks.
```
cudf 22.10 (prior to regression)
| data_type | cardinality | run_length | bytes_per_second | 
|-----------|-------------|------------|------------------|
|      LIST |           0 |          1 |     892901208    | 
|      LIST |        1000 |          1 |     952863876    |  
|      LIST |           0 |         32 |    1246033395    |  
|      LIST |        1000 |         32 |    1232884866    |  
```

```
cudf 22.12 (where the regression occurred)
| data_type | cardinality | run_length | bytes_per_second | 
|-----------|-------------|------------|------------------|
|      LIST |           0 |          1 |     747758436    | 
|      LIST |        1000 |          1 |     827763260    |  
|      LIST |           0 |         32 |    1026048576    |  
|      LIST |        1000 |         32 |    1022928119    |  
```

```
This PR
| data_type | cardinality | run_length | bytes_per_second | 
|-----------|-------------|------------|------------------|
|      LIST |           0 |          1 |     927347737    | 
|      LIST |        1000 |          1 |     1024566150   |  
|      LIST |           0 |         32 |    1315972881    |  
|      LIST |        1000 |         32 |    1303995168    |  
```

Authors:
  - https://github.com/nvdbaranec

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Bradley Dice (https://github.com/bdice)

URL: #12577
@GPUtester GPUtester requested a review from a team as a code owner January 25, 2023 20:22
@GPUtester GPUtester requested review from bdice and ttnghia January 25, 2023 20:22
@GPUtester GPUtester merged commit e3d3207 into branch-23.04 Jan 25, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 25, 2023
@GPUtester
Copy link
Collaborator Author

SUCCESS - forward-merge complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants