Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduce peak memory use when writing compressed ORC files. (#12963)
This PR changes how the buffer for encoded data is allocated in the ORC writer. Instead of a single buffer for the whole table, each stream of each stripe is allocated separately. Since size of the encoded data is not known in advance, buffers are over sized in most cases (decimal types and dictionary encoded data being the exceptions). Resizing these buffers to the exact encoded data size before compression reduces peak memory usage. The resizing of the encoded buffers is done in the step where row groups are gathered to make contiguous encoded stripe in memory. This way we don't incur additional copies (compared to previous approach to `gather_stripes`). Other changes: Removed `compute_offsets` because it is not needed with separate buffers for each stripe/stream. Refactored parts of `encode_columns` to initialize data buffers + stream descriptors one stripe at a time, allowing future separation into per-stripe processing (for e.g. pipelining). Impact: internal benchmarks show average reduction of peak memory use of 14% when SNAPPY compression is enabled, with minimal impact on performance. cub::DeviceMemcpy::Batched can now be used in the ORC writer. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) - Mike Wilson (https://github.com/hyperbolic2346) URL: #12963
- Loading branch information