Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Potential bug in reading lists from avro? #1252

Closed
shaeqahmed opened this issue Sep 18, 2022 · 1 comment · Fixed by #1253
Closed

Potential bug in reading lists from avro? #1252

shaeqahmed opened this issue Sep 18, 2022 · 1 comment · Fixed by #1253
Labels
bug Something isn't working

Comments

@shaeqahmed
Copy link
Contributor

shaeqahmed commented Sep 18, 2022

Line of code:

array.try_push_valid()?;

Feel free to close this issue if my understanding is incorrect here. Referencing the Java Avro implementation, looks like array values are stored/read as blocks and can be read consecutively until a 0 is encountered. (https://github.com/apache/avro/blob/42822886c28ea74a744abb7e7a80a942c540faa5/lang/java/avro/src/main/java/org/apache/avro/io/Decoder.java#L203)

In the Avro deserialisation logic for reading in lists, looks like we are making a call to try_push_valid for each block in a list item, rather than just for the whole item (outside of the loop). Intuitively, it looks like this would be incorrect behavior since blocks are an Avro implementation detail and our validity bitmap is tracking Arrow types (e.g. a list item).

I wasn't able to easily create a list in Avro that is stored as multiple blocks, so as to validate my assumptions, but wanted to open this issue in case it is an obvious bug. Thank you

@jorgecarleitao
Copy link
Owner

jorgecarleitao commented Sep 19, 2022

Hey!

This is an interesting observation. Indeed this is incorrect.

I was also unable to create a file that fulfills this code branch, so could only rely on the spec for this. Fielded #1253 to address it. Let me know what do you think :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants