-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClassCastException possible in DeltaByteArrayReader after PARQUET-2431 #3013
Labels
Comments
Thanks for reporting the bug! Is it possible to provide a file that can reproduce this issue? cc @gszadovszky this issue seems to be caused by a recent refactoring commit. |
Thanks, @bwjoh. It seems I've overlooked how this part worked. The code is not super clear, unfortunately. Also, seems we are lacking a unit test for this scenario. |
gszadovszky
added a commit
to gszadovszky/parquet-mr
that referenced
this issue
Sep 27, 2024
wgtmac
pushed a commit
that referenced
this issue
Sep 30, 2024
wgtmac
pushed a commit
that referenced
this issue
Sep 30, 2024
dongjoon-hyun
pushed a commit
to apache/spark
that referenced
this issue
Oct 8, 2024
### What changes were proposed in this pull request? The pr aims to upgrade `Parquet` from `1.14.2` to `1.14.3`. ### Why are the changes needed? The full release notes: https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.14.3 apache/parquet-java#3007: Ensure version specific Jackson classes are shaded apache/parquet-java#3013: Fix potential ClassCastException at reading DELTA_BYTE_ARRAY encoding ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48378 from panbingkun/SPARK-49903. Authored-by: panbingkun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
himadripal
pushed a commit
to himadripal/spark
that referenced
this issue
Oct 19, 2024
### What changes were proposed in this pull request? The pr aims to upgrade `Parquet` from `1.14.2` to `1.14.3`. ### Why are the changes needed? The full release notes: https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.14.3 apache/parquet-java#3007: Ensure version specific Jackson classes are shaded apache/parquet-java#3013: Fix potential ClassCastException at reading DELTA_BYTE_ARRAY encoding ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48378 from panbingkun/SPARK-49903. Authored-by: panbingkun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
Noticed when upgrading from 1.13.1 to 1.14.1
This appears to be due to PARQUET-2431 - https://github.com/apache/parquet-java/pull/1274/files#diff-362b7d44b24283c1bb1f6ca3e124cb72706a33ed96d86b58bf3339f20aafb4e9R732
Looking into how my code hit this and it seems to be that
CorruptDeltaByteArrays.requiresSequentialReads
was essentially doing thedataColumn instanceof RequiresPreviousReader
check previously (CorruptDeltaByteArrays.requiresSequentialReads
can only return true whenencoding == Encoding.DELTA_BYTE_ARRAY
, andorg.apache.parquet.column.values.RequiresPreviousReader
is only implemented by *DeltaByteArrayReader classes).With no check on
previousReader instanceof RequiresPreviousReader
the ClassCastException is possible above.This is more likely to happen when using
org.apache.parquet.io.ColumnIOFactory#ColumnIOFactory()
to read files withoutcreatedBy
. In my case I was able to fix this by adding createdBy, knowing that all Parquet files I have were written after PARQUET-246, which preventsCorruptDeltaByteArrays.requiresSequentialReads
from returning trueComponent(s)
No response
The text was updated successfully, but these errors were encountered: