-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet reader list microkernel #16538
Parquet reader list microkernel #16538
Conversation
…lable column code
…attione-nvidia/cudf into mukernels_fixedwidth_optimize
Co-authored-by: nvdbaranec <[email protected]>
Co-authored-by: nvdbaranec <[email protected]>
Co-authored-by: nvdbaranec <[email protected]>
…attione-nvidia/cudf into mukernels_fixedwidth_optimize
Co-authored-by: Vukasin Milovanovic <[email protected]>
Seems like this is also adding list support to the split page path as well. Am I reading this right? |
One thing I've been thinking about is maybe splitting this file into two or three pieces.
Definitely not for this PR, but something to think about down the road. I think it might help make the volume of code that has built up here more tractable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed through it. Will get more in-depth tomorrow.
Co-authored-by: nvdbaranec <[email protected]>
Yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor questions/suggestions
Not sure I could find issues with the decode algorithm :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Mostly just more small stuff.
…idia/cudf into parquet_list_kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also run compute-sanitizer
on the unit tests to make sure everything is good.
Tests pass. |
/merge |
This PR refactors fixed-width parquet list reader decoding into its own set of micro-kernels, templatizing the existing fixed-width microkernels. When skipping rows for lists, this will skip ahead the decoding of the definition, repetition, and dictionary rle_streams as well. The list kernel uses 128 threads per block and 71 registers per thread, so I've changed the launch_bounds to enforce a minimum of 8 blocks per SM. This causes a small register spill but the benchmarks are still faster, as seen below:
DEVICE_BUFFER list benchmarks (decompress + decode, not bound by IO):
run_length 1, cardinality 0, no byte_limit: 24.7% faster
run_length 32, cardinality 1000, no byte_limit: 18.3% faster
run_length 1, cardinality 0, 500kb byte_limit: 57% faster
run_length 32, cardinality 1000, 500kb byte_limit: 53% faster
Compressed list of ints on hard drive: 5.5% faster
Sample real data on hard drive (many columns not lists): 0.5% faster
Checklist