Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RLEDecoder::get_batch_with_dict may panic on bit-packed runs longer than 1024 #3029

Closed
wolfv opened this issue Nov 6, 2022 · 2 comments · Fixed by #3036
Closed

RLEDecoder::get_batch_with_dict may panic on bit-packed runs longer than 1024 #3029

wolfv opened this issue Nov 6, 2022 · 2 comments · Fixed by #3036
Labels
bug parquet Changes to the parquet crate

Comments

@wolfv
Copy link

wolfv commented Nov 6, 2022

Describe the bug

Reading a specific parquet file triggers: thread 'main' panicked at 'index out of bounds: the len is 1024 but the index is 1024', /Users/wolfvollprecht/Programs/arrow-rs/parquet/src/encodings/rle.rs:492:25

The max-index size computation is wrong.

@wolfv wolfv added the bug label Nov 6, 2022
@tustvold tustvold changed the title The num_values number computation in rle.rs is wrong RLEDecoder::get_batch_with_dict may panic on bit-packed runs longer than 1024 Nov 7, 2022
@tustvold
Copy link
Contributor

tustvold commented Nov 7, 2022

I've found the underlying cause of this is an accounting bug in RLEDecoder::get_batch_with_dict

In particular if the runs are longer than 1024, it may try to read more values from the underlying bit reader than there is capacity for. If the actual number of values is not a multiple of 8, this will return more values, as the length of bit packed runs is actually ambiguous. Such a scenario will result in a panic when it tries to copy these values across.

Will post a PR to fix shortly

@alamb
Copy link
Contributor

alamb commented Nov 11, 2022

label_issue.py automatically added labels {'parquet'} from #3036

@alamb alamb added the parquet Changes to the parquet crate label Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
3 participants