You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
In the Avro deserialisation logic for reading in lists, looks like we are making a call to try_push_valid for each block in a list item, rather than just for the whole item (outside of the loop). Intuitively, it looks like this would be incorrect behavior since blocks are an Avro implementation detail and our validity bitmap is tracking Arrow types (e.g. a list item).
I wasn't able to easily create a list in Avro that is stored as multiple blocks, so as to validate my assumptions, but wanted to open this issue in case it is an obvious bug. Thank you
The text was updated successfully, but these errors were encountered:
This is an interesting observation. Indeed this is incorrect.
I was also unable to create a file that fulfills this code branch, so could only rely on the spec for this. Fielded #1253 to address it. Let me know what do you think :)
Line of code:
arrow2/src/io/avro/read/deserialize.rs
Line 136 in c615095
Feel free to close this issue if my understanding is incorrect here. Referencing the Java Avro implementation, looks like array values are stored/read as blocks and can be read consecutively until a 0 is encountered. (https://github.com/apache/avro/blob/42822886c28ea74a744abb7e7a80a942c540faa5/lang/java/avro/src/main/java/org/apache/avro/io/Decoder.java#L203)
In the Avro deserialisation logic for reading in lists, looks like we are making a call to
try_push_valid
for each block in a list item, rather than just for the whole item (outside of theloop
). Intuitively, it looks like this would be incorrect behavior since blocks are an Avro implementation detail and our validity bitmap is tracking Arrow types (e.g. a list item).I wasn't able to easily create a list in Avro that is stored as multiple blocks, so as to validate my assumptions, but wanted to open this issue in case it is an obvious bug. Thank you
The text was updated successfully, but these errors were encountered: