-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure when reading ZSTD-compressed Parquet file: NoSuchMethodError: 'void com.github.luben.zstd.ZstdInputStream.<init>(java.io.InputStream, com.github.luben.zstd.BufferPool) #3621
Comments
Iceberg 0.12 ( Since
i tried using newer spark versions, 3.1.2 and 3.2.0 (both seem to bundle
(i am not sure whether i was testing this with Iceberg 0.12 or 0.11 though) |
Iceberg 0.12.1 + Spark 3.1.1 seem to work fine. |
Ah yeah we hit this and changed the Spark jni version. @kbendick may remember more? |
@RussellSpitzer thanks for looking into this.
you mean replacing |
I believe @RussellSpitzer is referring to upgrading the zstd-jni version with a later version. So your question is correct @findepi. Ideally, it should come from Spark (at least for spark3-runtime) which was the point of #3058
You are right, it does appear that #3058 was never included in 0.12.1 or the 0.12.x branch in general. 0.12.0 was released prior to that PR being merged, and then with the repo layout changes and only grabbing for bugs it seems we might have missed it when preparing 0.12.1. I'll be sure to add that to the upcoming 0.13.0 release. Not sure if #3058 itself merits a patch release. Would it be possible to exclude the dependency from the trino side momentarily? What do others think? |
It might be possible to disable the buffer pool (which would not be great from a performance standpoint, but might be helpful in working around the original issue): apache/parquet-java#903 The relevant parquet config is |
I wonder if Trino can hold on trinodb/trino#10045 to wait for 0.13.0 for the fix given it's just around the corner. But from correctness perspective I think it's good to have a 0.12.2 for this later. |
Trino doesn't use the Parquet reader bundled with Iceberg, so it is not affected. I faced this problem in Trino's compatibility tests against Spark, where the Iceberg+Spark setup we have was failing. I thing I solved this (#3621 (comment), trinodb/docker-images#119). Will close this issue once i can confirm with the CI.
for posterity -- my cursory reading of Parquet code suggests that this flag controls what kind of BufferPool is passed to the problematic constructor, not whether the constructor is used. Thus, it seems not effective to enable running with older zstd-jni version. |
Yes the compatibility test is what I am talking about on Trino side. If that's fixed then it should be fine. |
thanks for all the comments! |
I am using
iceberg-spark3-runtime-0.12.jar
(https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark3-runtime/0.12.0/iceberg-spark3-runtime-0.12.0.jar) with Spark 3.0.0 (https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop2.7.tgz)When attempting to read ZSTD-compressed Parquet file the query fails
The text was updated successfully, but these errors were encountered: