This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
parquet -> arrow utf8-8 column conversion yields too many values #790
Labels
no-changelog
Issues whose changes are covered by a PR and thus should not be shown in the changelog
We're seeing an issue decoding parquet row groups containing UTF-8 string columns with many null values. Occassionally these conversions yield more values than specified in the parquet file metadata. I traced this as far as
arrow2::io::parquet::read::binary::iter_to_array
with the problematic data set. After modifying that function to add some sanity check asserts, multiple test cases fail in the arrow2 test suite.yields
I'm not entirely sure these debug asserts are correct, but I suspect they are. I'm going to keep digging and see if I can isolate the issue further, but figured I'd throw up a bug report sooner than later. Thanks!
The text was updated successfully, but these errors were encountered: