Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix page size calculation in Parquet writer (#12182)
When calculating page boundaries, the current Parquet writer does not take into account storage needed per page for repetition and definition level data. As a consequence pages may sometimes exceed the specified limit, which in turn impacts the ability to compress these pages with codecs that have a maximum buffer size. This PR fixes the page size calculation to take repetition and definition levels into account. ~~This also incorporates the fragment size reduction from 5000 to 1000 that was suggested in #12130~~ Authors: - Ed Seidl (https://github.com/etseidl) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #12182
- Loading branch information