-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix page size estimation in Parquet writer (#13364)
Fixes #13250. The page size estimation for dictionary encoded pages adds a term to estimate overhead bytes for the `bit-packed-header` used when encoding bit-packed literal runs. This term originally used a value of `256`, but it's hard to see where that value comes from. This PR change the value to `8`, with a possible justification being the minimum length of a literal run is 8 values. Worst case would be multiple runs of 8, with required overhead bytes then being `num_values/8`. This also adds a test that has been verified to fail for values larger than 16 in the problematic term. Authors: - Ed Seidl (https://github.com/etseidl) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) - Nghia Truong (https://github.com/ttnghia) URL: #13364
- Loading branch information
Showing
2 changed files
with
51 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters