-
Notifications
You must be signed in to change notification settings - Fork 919
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use per-page max compressed size estimate for compression (#11066)
Closes #10857 The current behaviour of parquet writer is to get the estimate for maximum page compressed size by first finding the maximum page size and using nvcomp's `nvcompBatchedSnappyCompressGetMaxOutputChunkSize` API once for the largest page. The total output memory is allocated for max_compressed_page_size * num_pages. This approach is pessimistic and over-allocates output buffer for batched compression. This PR changes this to call `nvcompBatchedSnappyCompressGetMaxOutputChunkSize` for each page and sum up the result to get the output buffer size. This greatly reduces the peak memory consumption of parquet writer so the compression step is no longer the bottleneck. Authors: - Devavret Makkar (https://github.com/devavret) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Vukasin Milovanovic (https://github.com/vuule) URL: #11066
- Loading branch information
Showing
4 changed files
with
89 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters