[BUG] Reduce peak memory usage for STRUCT decoding in parquet reader #14965
Labels
0 - Backlog
In queue waiting for assignment
bug
Something isn't working
cuIO
cuIO issue
libcudf
Affects libcudf (C++/CUDA) code.
Milestone
Describe the bug
In the libcudf benchmarks
PARQUET_READER_NVBENCH
, the STRUCT data type shows surprisingly highpeak_memory_usage
. For a 536 MB table, the INTEGRAL data type shows a 597 MiB peak memory usage. However, for the same 536 MB table size, the STRUCT data type shows 996 MiB peak memory usage. If there are good reasons for this difference, we can close the issue. Otherwise, we should reduce the extra memory overhead.Steps/Code to reproduce bug
Here is an nvbench CLI command you can run to reproduce the above table:
Expected behavior
INTEGRAL and STRUCT decode in the parquet reader should have a similar peak memory footprint.
Environment overview (please complete the following information)
rapidsai/ci-conda:cuda12.1.1-ubuntu22.04-py3.11
pulled on 2024-02-03branch-24.02
and sha6cebf2294ff
Additional context
The chunked parquet reader seems to reduce the memory footprint from STRUCT decode, and the trend seems scaled to a higher footprint than other data types.
The text was updated successfully, but these errors were encountered: