Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet reader optimization to address V100 regression. #12577

Merged

Conversation

nvdbaranec
Copy link
Contributor

@nvdbaranec nvdbaranec commented Jan 19, 2023

Addresses #12316

Some recent changes caused a performance regression in the parquet reader benchmarks for lists. The culprit ended up being slightly different code generation happening for arch 70. In several memory hotspots, the code was reading values from global, modifying them and then storing them. Previously it had done a better job of loading and keeping them in registers and the L2 cache was helping keep things fast. But the extra store was causing twice as many L2 access in these places and causing many long scoreboard stalls.

Ultimately the issue is that these values shouldn't be kept in global memory. The initial implementation did it this way because the data was variable in size (based on depth of column nesting). But in practice, we never see more than 2 or 3 levels of nesting. So the solution is:

  • Keep these values (in a struct called PageNestingDecodeInfo) that is kept in shared memory for up to N nesting levels. N is currently 10.
  • If the nesting information for the incoming column fits in the cache, use it. Otherwise fall back to the arrays in global memory. In practice, it is exceedingly rare to see columns nested >= 10 deep.

This addresses the performance regression and actually gives some performance increases. Some comparisons for LIST benchmarks.

cudf 22.10 (prior to regression)
| data_type | cardinality | run_length | bytes_per_second | 
|-----------|-------------|------------|------------------|
|      LIST |           0 |          1 |     892901208    | 
|      LIST |        1000 |          1 |     952863876    |  
|      LIST |           0 |         32 |    1246033395    |  
|      LIST |        1000 |         32 |    1232884866    |  
cudf 22.12 (where the regression occurred)
| data_type | cardinality | run_length | bytes_per_second | 
|-----------|-------------|------------|------------------|
|      LIST |           0 |          1 |     747758436    | 
|      LIST |        1000 |          1 |     827763260    |  
|      LIST |           0 |         32 |    1026048576    |  
|      LIST |        1000 |         32 |    1022928119    |  
This PR
| data_type | cardinality | run_length | bytes_per_second | 
|-----------|-------------|------------|------------------|
|      LIST |           0 |          1 |     927347737    | 
|      LIST |        1000 |          1 |     1024566150   |  
|      LIST |           0 |         32 |    1315972881    |  
|      LIST |        1000 |         32 |    1303995168    |  

@nvdbaranec nvdbaranec added libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 19, 2023
@nvdbaranec nvdbaranec requested a review from a team as a code owner January 19, 2023 00:12
@nvdbaranec nvdbaranec marked this pull request as draft January 19, 2023 00:13
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool optimization. Just have a few (potential) suggestions.

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_data.cu Show resolved Hide resolved
@nvdbaranec nvdbaranec added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jan 19, 2023
@nvdbaranec nvdbaranec marked this pull request as ready for review January 19, 2023 19:55
@nvdbaranec nvdbaranec removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jan 19, 2023
@nvdbaranec
Copy link
Contributor Author

Tested against spark integration tests.

@nvdbaranec
Copy link
Contributor Author

rerun tests

@nvdbaranec nvdbaranec requested a review from vuule January 20, 2023 16:01
@codecov
Copy link

codecov bot commented Jan 20, 2023

Codecov Report

Base: 85.71% // Head: 85.70% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (6d269a9) compared to base (ed6daad).
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-23.02   #12577      +/-   ##
================================================
- Coverage         85.71%   85.70%   -0.01%     
================================================
  Files               155      155              
  Lines             24873    24873              
================================================
- Hits              21319    21318       -1     
- Misses             3554     3555       +1     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/categorical.py 89.46% <0.00%> (-0.22%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of the changes look good. I have minor questions about types and variable name suggestions. I can approve after that.

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved
*/
struct PageNestingDecodeInfo {
// set up prior to decoding
int32_t max_def_level;
Copy link
Contributor

@bdice bdice Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verifying our intended spellings: Should all these types be written as int32_t , or should some be int or cudf::size_type?

valid_count seems like it should be cudf::size_type, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How far do we want to take this here? Broadly, cuio tends to work with native types in the kernels and not cudf typedefs (mostly for historical reasons, not really philosophically). I think my preference here would be to keep int32_t for now (although there's some errant int fields worth fixing up) and then maybe file an issue to do a more detailed audit cleanup.

Copy link
Contributor

@bdice bdice Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d defer to @vuule on scope and prioritization but I do think this deserves an issue if we choose not to address it here. We really shouldn’t be redefining core types.

I hoped this PR might be able to make a dent in this broader problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my understanding that size_type should be used for row count and values derived from row count. I suspect that max_def_level value range is dictated by PQ specs, in which case int32_t is the right type to use.

Given that we are nearly at code freeze, I'm against increasing the scope of this PR.

We can open an issue to audit integral types in cuIO. Taking suggestion on how we could make this less tedious 😬

cpp/src/io/parquet/parquet_gpu.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions, but very nice find and a solid fix.

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/parquet_gpu.hpp Show resolved Hide resolved
…Renamed PageNestingDecodeInfo::pndi to PageNestingDecodeInfo::nesting_info

for clarity.  Improved some variable names.
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments on naming. Otherwise looks good.

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/reader_impl_preprocess.cu Outdated Show resolved Hide resolved
@nvdbaranec
Copy link
Contributor Author

/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants