You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using V2 page headers, the repetition and definition level data is not compressed. The Parquet writer currently increments the data pointer past the level data before sending to the compression function. The zStandard compression, however, requires input to be aligned on a 4 byte boundary, so simply incrementing the pointer past the level data often results in misaligned access errors.
)
Fixes#14781
This PR makes changes to the Parquet writer to ensure that data to be compressed is properly aligned. Changes have also been made to the `EncPage` struct to make it easier to keep fields in that struct aligned, and also to reduce confusing re-use of fields. In particular, the `max_data_size` field can be any of a) the maximum possible size for the page data, b) the actual size of page data after encoding, c) the actual size of compressed page data. The latter two now have their own fields, `data_size` and `comp_data_size`.
Authors:
- Ed Seidl (https://github.com/etseidl)
- Mike Wilson (https://github.com/hyperbolic2346)
Approvers:
- Mike Wilson (https://github.com/hyperbolic2346)
- Vukasin Milovanovic (https://github.com/vuule)
URL: #14841
Describe the bug
When using V2 page headers, the repetition and definition level data is not compressed. The Parquet writer currently increments the data pointer past the level data before sending to the compression function. The zStandard compression, however, requires input to be aligned on a 4 byte boundary, so simply incrementing the pointer past the level data often results in misaligned access errors.
Additional context
Current workaround PR #14772
The text was updated successfully, but these errors were encountered: