Parquet reader optimization to address V100 regression. #12577

nvdbaranec · 2023-01-19T00:12:37Z

Addresses #12316

Some recent changes caused a performance regression in the parquet reader benchmarks for lists. The culprit ended up being slightly different code generation happening for arch 70. In several memory hotspots, the code was reading values from global, modifying them and then storing them. Previously it had done a better job of loading and keeping them in registers and the L2 cache was helping keep things fast. But the extra store was causing twice as many L2 access in these places and causing many long scoreboard stalls.

Ultimately the issue is that these values shouldn't be kept in global memory. The initial implementation did it this way because the data was variable in size (based on depth of column nesting). But in practice, we never see more than 2 or 3 levels of nesting. So the solution is:

Keep these values (in a struct called PageNestingDecodeInfo) that is kept in shared memory for up to N nesting levels. N is currently 10.
If the nesting information for the incoming column fits in the cache, use it. Otherwise fall back to the arrays in global memory. In practice, it is exceedingly rare to see columns nested >= 10 deep.

This addresses the performance regression and actually gives some performance increases. Some comparisons for LIST benchmarks.

cudf 22.10 (prior to regression)
| data_type | cardinality | run_length | bytes_per_second | 
|-----------|-------------|------------|------------------|
|      LIST |           0 |          1 |     892901208    | 
|      LIST |        1000 |          1 |     952863876    |  
|      LIST |           0 |         32 |    1246033395    |  
|      LIST |        1000 |         32 |    1232884866    |

cudf 22.12 (where the regression occurred)
| data_type | cardinality | run_length | bytes_per_second | 
|-----------|-------------|------------|------------------|
|      LIST |           0 |          1 |     747758436    | 
|      LIST |        1000 |          1 |     827763260    |  
|      LIST |           0 |         32 |    1026048576    |  
|      LIST |        1000 |         32 |    1022928119    |

This PR
| data_type | cardinality | run_length | bytes_per_second | 
|-----------|-------------|------------|------------------|
|      LIST |           0 |          1 |     927347737    | 
|      LIST |        1000 |          1 |     1024566150   |  
|      LIST |           0 |         32 |    1315972881    |  
|      LIST |        1000 |         32 |    1303995168    |

…t decoding.

…ache.

vuule

Really cool optimization. Just have a few (potential) suggestions.

cpp/src/io/parquet/page_data.cu

nvdbaranec · 2023-01-19T22:39:20Z

Tested against spark integration tests.

nvdbaranec · 2023-01-20T15:37:18Z

rerun tests

codecov · 2023-01-20T23:55:55Z

Codecov Report

Base: 85.71% // Head: 85.70% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (6d269a9) compared to base (ed6daad).
Patch has no changes to coverable lines.

Additional details and impacted files

@@               Coverage Diff                @@
##           branch-23.02   #12577      +/-   ##
================================================
- Coverage         85.71%   85.70%   -0.01%     
================================================
  Files               155      155              
  Lines             24873    24873              
================================================
- Hits              21319    21318       -1     
- Misses             3554     3555       +1

Impacted Files	Coverage Δ
python/cudf/cudf/core/column/categorical.py	`89.46% <0.00%> (-0.22%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

bdice

The core of the changes look good. I have minor questions about types and variable name suggestions. I can approve after that.

cpp/src/io/parquet/page_data.cu

bdice · 2023-01-23T21:47:44Z

cpp/src/io/parquet/parquet_gpu.hpp

+ */
+struct PageNestingDecodeInfo {
+  // set up prior to decoding
+  int32_t max_def_level;


Verifying our intended spellings: Should all these types be written as int32_t , or should some be int or cudf::size_type?

valid_count seems like it should be cudf::size_type, for example.

How far do we want to take this here? Broadly, cuio tends to work with native types in the kernels and not cudf typedefs (mostly for historical reasons, not really philosophically). I think my preference here would be to keep int32_t for now (although there's some errant int fields worth fixing up) and then maybe file an issue to do a more detailed audit cleanup.

I’d defer to @vuule on scope and prioritization but I do think this deserves an issue if we choose not to address it here. We really shouldn’t be redefining core types.

I hoped this PR might be able to make a dent in this broader problem.

It is my understanding that size_type should be used for row count and values derived from row count. I suspect that max_def_level value range is dictated by PQ specs, in which case int32_t is the right type to use.

Given that we are nearly at code freeze, I'm against increasing the scope of this PR.

We can open an issue to audit integral types in cuIO. Taking suggestion on how we could make this less tedious 😬

cpp/src/io/parquet/parquet_gpu.hpp

hyperbolic2346

Couple of questions, but very nice find and a solid fix.

cpp/src/io/parquet/page_data.cu

cpp/src/io/parquet/parquet_gpu.hpp

…Renamed PageNestingDecodeInfo::pndi to PageNestingDecodeInfo::nesting_info for clarity. Improved some variable names.

bdice

A few minor comments on naming. Otherwise looks good.

cpp/src/io/parquet/page_data.cu

cpp/src/io/parquet/reader_impl_preprocess.cu

…e nesting decode cache limit.

…est. Cleaned up some more variable names.

nvdbaranec · 2023-01-25T20:16:59Z

/merge

nvdbaranec added 2 commits January 18, 2023 15:45

Use a shared-memory cache of frequently accessed values during parque…

794c330

…t decoding.

Move several fields used during the preprocess step into the decode c…

fd81cca

…ache.

nvdbaranec added libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 19, 2023

nvdbaranec requested a review from a team as a code owner January 19, 2023 00:12

nvdbaranec requested review from karthikeyann and vuule January 19, 2023 00:12

nvdbaranec marked this pull request as draft January 19, 2023 00:13

GregoryKimball assigned nvdbaranec Jan 19, 2023

vuule reviewed Jan 19, 2023

View reviewed changes

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved

cpp/src/io/parquet/page_data.cu Show resolved Hide resolved

nvdbaranec added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jan 19, 2023

nvdbaranec marked this pull request as ready for review January 19, 2023 19:55

nvdbaranec removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jan 19, 2023

nvdbaranec added 2 commits January 19, 2023 16:54

Code formatting changes.

fe5ad35

Merge branch 'branch-23.02' into parquet_cache_optimization

677a5c2

nvdbaranec added 2 commits January 20, 2023 09:38

Merge branch 'branch-23.02' into parquet_cache_optimization

170e3d8

Change computation of max decode cache size.

8920e76

nvdbaranec requested a review from vuule January 20, 2023 16:01

vuule approved these changes Jan 20, 2023

View reviewed changes

Merge branch 'branch-23.02' into parquet_cache_optimization

5da8933

bdice reviewed Jan 23, 2023

View reviewed changes

hyperbolic2346 reviewed Jan 24, 2023

View reviewed changes

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved

cpp/src/io/parquet/parquet_gpu.hpp Show resolved Hide resolved

PR review feedback. Cleaned up some errant int fields to be int32_t. …

24acc50

…Renamed PageNestingDecodeInfo::pndi to PageNestingDecodeInfo::nesting_info for clarity. Improved some variable names.

Merge branch 'branch-23.02' into parquet_cache_optimization

3870fe0

nvdbaranec requested review from hyperbolic2346, bdice and PointKernel January 24, 2023 20:34

bdice approved these changes Jan 24, 2023

View reviewed changes

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved

cpp/src/io/parquet/reader_impl_preprocess.cu Outdated Show resolved Hide resolved

nvdbaranec added 3 commits January 25, 2023 10:27

Some more variable name cleanup.

7ab44b6

Add a test which specifically forces the parquet reader to go past th…

cf691e0

…e nesting decode cache limit.

Move NestingOptimizationTest from ParquetWriterTest to ParquetReaderT…

6d269a9

…est. Cleaned up some more variable names.

rapids-bot bot merged commit 2784f58 into rapidsai:branch-23.02 Jan 25, 2023

GregoryKimball mentioned this pull request Nov 15, 2023

[BUG] Resolve parquet reader performance regression on V100 from #14167 #14415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet reader optimization to address V100 regression. #12577

Parquet reader optimization to address V100 regression. #12577

nvdbaranec commented Jan 19, 2023 •

edited

Loading

vuule left a comment

nvdbaranec commented Jan 19, 2023

nvdbaranec commented Jan 20, 2023

codecov bot commented Jan 20, 2023 •

edited

Loading

bdice left a comment

bdice Jan 23, 2023 •

edited

Loading

nvdbaranec Jan 24, 2023

bdice Jan 24, 2023 •

edited

Loading

vuule Jan 24, 2023

hyperbolic2346 left a comment

bdice left a comment

nvdbaranec commented Jan 25, 2023

Parquet reader optimization to address V100 regression. #12577

Parquet reader optimization to address V100 regression. #12577

Conversation

nvdbaranec commented Jan 19, 2023 • edited Loading

vuule left a comment

Choose a reason for hiding this comment

nvdbaranec commented Jan 19, 2023

nvdbaranec commented Jan 20, 2023

codecov bot commented Jan 20, 2023 • edited Loading

Codecov Report

bdice left a comment

Choose a reason for hiding this comment

bdice Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

nvdbaranec Jan 24, 2023

Choose a reason for hiding this comment

bdice Jan 24, 2023 • edited Loading

Choose a reason for hiding this comment

vuule Jan 24, 2023

Choose a reason for hiding this comment

hyperbolic2346 left a comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

nvdbaranec commented Jan 25, 2023

nvdbaranec commented Jan 19, 2023 •

edited

Loading

codecov bot commented Jan 20, 2023 •

edited

Loading

bdice Jan 23, 2023 •

edited

Loading

bdice Jan 24, 2023 •

edited

Loading