-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] skip parquet decode steps for nullable all-valid pages #15266
Labels
cuIO
cuIO issue
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Milestone
Comments
3 tasks
GregoryKimball
added
libcudf
Affects libcudf (C++/CUDA) code.
cuIO
cuIO issue
Performance
Performance related issue
labels
Mar 11, 2024
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
May 8, 2024
…ls (#15332) Closes #15266. Some block and warp operations and some atomics can be avoided during Parquet page decoding when nullable columns are known to not contain any null values. One issue, however, is that since the columns are nullable, the null mask has to be set. My approach in this PR was to initialize nullable column buffers with an `ALL_VALID` mask rather than `ALL_NULL`. This will work when nulls are present because `store_validity()` sets the bitmask to all zeros before ORing in the passed `valid_mask`. Benchmarks modified to not emit nulls showed a good improvement in decoding time for fixed-width data types. ``` ## [0] NVIDIA RTX A6000 | data_type | io_type | cardinality | run_length | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |-------------|---------------|---------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------| | INTEGRAL | DEVICE_BUFFER | 0 | 1 | 9.955 ms | 2.31% | 9.361 ms | 3.55% | -594.395 us | -5.97% | FAIL | | INTEGRAL | DEVICE_BUFFER | 1000 | 1 | 9.964 ms | 3.01% | 8.981 ms | 3.98% | -982.965 us | -9.87% | FAIL | | INTEGRAL | DEVICE_BUFFER | 0 | 32 | 10.222 ms | 3.32% | 9.207 ms | 5.47% | -1014.597 us | -9.93% | FAIL | | INTEGRAL | DEVICE_BUFFER | 1000 | 32 | 9.930 ms | 3.83% | 8.607 ms | 3.37% | -1323.326 us | -13.33% | FAIL | | FLOAT | DEVICE_BUFFER | 0 | 1 | 5.999 ms | 3.59% | 5.752 ms | 3.87% | -246.635 us | -4.11% | FAIL | | FLOAT | DEVICE_BUFFER | 1000 | 1 | 6.576 ms | 4.40% | 5.839 ms | 4.43% | -737.338 us | -11.21% | FAIL | | FLOAT | DEVICE_BUFFER | 0 | 32 | 5.828 ms | 4.59% | 4.940 ms | 4.41% | -887.375 us | -15.23% | FAIL | | FLOAT | DEVICE_BUFFER | 1000 | 32 | 6.198 ms | 3.91% | 5.271 ms | 3.54% | -927.602 us | -14.97% | FAIL | | DECIMAL | DEVICE_BUFFER | 0 | 1 | 20.199 ms | 1.66% | 20.014 ms | 2.23% | -184.710 us | -0.91% | PASS | | DECIMAL | DEVICE_BUFFER | 1000 | 1 | 7.068 ms | 3.99% | 6.479 ms | 4.08% | -588.856 us | -8.33% | FAIL | | DECIMAL | DEVICE_BUFFER | 0 | 32 | 9.287 ms | 3.45% | 8.656 ms | 2.94% | -631.348 us | -6.80% | FAIL | | DECIMAL | DEVICE_BUFFER | 1000 | 32 | 5.641 ms | 4.39% | 5.021 ms | 3.31% | -620.122 us | -10.99% | FAIL | | TIMESTAMP | DEVICE_BUFFER | 0 | 1 | 27.488 ms | 1.57% | 27.235 ms | 1.74% | -253.277 us | -0.92% | PASS | | TIMESTAMP | DEVICE_BUFFER | 1000 | 1 | 6.656 ms | 4.73% | 6.049 ms | 4.61% | -607.760 us | -9.13% | FAIL | | TIMESTAMP | DEVICE_BUFFER | 0 | 32 | 9.974 ms | 3.22% | 9.204 ms | 2.84% | -770.247 us | -7.72% | FAIL | | TIMESTAMP | DEVICE_BUFFER | 1000 | 32 | 5.998 ms | 5.17% | 5.203 ms | 3.06% | -794.943 us | -13.25% | FAIL | | DURATION | DEVICE_BUFFER | 0 | 1 | 8.816 ms | 3.61% | 8.538 ms | 3.26% | -278.877 us | -3.16% | PASS | | DURATION | DEVICE_BUFFER | 1000 | 1 | 5.989 ms | 4.76% | 5.446 ms | 4.57% | -542.636 us | -9.06% | FAIL | | DURATION | DEVICE_BUFFER | 0 | 32 | 6.822 ms | 4.96% | 6.042 ms | 3.74% | -779.786 us | -11.43% | FAIL | | DURATION | DEVICE_BUFFER | 1000 | 32 | 5.706 ms | 5.40% | 4.930 ms | 3.39% | -775.607 us | -13.59% | FAIL | | STRING | DEVICE_BUFFER | 0 | 1 | 36.616 ms | 1.74% | 36.483 ms | 1.31% | -132.191 us | -0.36% | PASS | | STRING | DEVICE_BUFFER | 1000 | 1 | 12.006 ms | 4.15% | 11.989 ms | 3.53% | -16.278 us | -0.14% | PASS | | STRING | DEVICE_BUFFER | 0 | 32 | 36.587 ms | 1.99% | 36.514 ms | 1.38% | -73.737 us | -0.20% | PASS | | STRING | DEVICE_BUFFER | 1000 | 32 | 11.235 ms | 4.25% | 11.228 ms | 3.62% | -7.041 us | -0.06% | PASS | | LIST | DEVICE_BUFFER | 0 | 1 | 36.929 ms | 1.88% | 36.988 ms | 1.42% | 59.350 us | 0.16% | PASS | | LIST | DEVICE_BUFFER | 1000 | 1 | 36.510 ms | 1.91% | 36.558 ms | 1.66% | 48.536 us | 0.13% | PASS | | LIST | DEVICE_BUFFER | 0 | 32 | 35.513 ms | 2.00% | 35.490 ms | 1.77% | -23.411 us | -0.07% | PASS | | LIST | DEVICE_BUFFER | 1000 | 32 | 35.755 ms | 1.99% | 35.728 ms | 1.64% | -27.564 us | -0.08% | PASS | | STRUCT | DEVICE_BUFFER | 0 | 1 | 43.456 ms | 1.35% | 43.537 ms | 1.16% | 81.405 us | 0.19% | PASS | | STRUCT | DEVICE_BUFFER | 1000 | 1 | 25.549 ms | 2.54% | 25.698 ms | 1.90% | 149.295 us | 0.58% | PASS | | STRUCT | DEVICE_BUFFER | 0 | 32 | 43.103 ms | 1.87% | 43.019 ms | 1.59% | -84.825 us | -0.20% | PASS | | STRUCT | DEVICE_BUFFER | 1000 | 32 | 23.462 ms | 2.81% | 23.432 ms | 1.88% | -30.434 us | -0.13% | PASS | ``` Authors: - Ed Seidl (https://github.com/etseidl) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Vukasin Milovanovic (https://github.com/vuule) URL: #15332
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
cuIO
cuIO issue
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
In this PR #15159 @etseidl helped point out a scenario where parquet columns could be marked nullable but contain only valid rows, as some systems like to do (Spark looks to be in this category as adding some of the optimizations Ed suggested helped speed up our benchmark).
In the PR, we skipped initializing and decoding the definition RLE stream for the nullable pages that had only valid rows. We'll also like to skip some likely expensive
BlockScan
operations performed to obtain valid counts when generating null masks ingpuUpdateValidityOffsetsAndRowIndicesFlat
. I didn't include this in the PR because I didn't want to add more changes to an already large PR.Please see this conversation #15159 (comment) and this one #15159 (comment) for context.
The text was updated successfully, but these errors were encountered: