Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Databricks parquetFilters build failure in db 8.2 runtime #3191

Closed
pxLi opened this issue Aug 11, 2021 · 5 comments
Closed

[BUG] Databricks parquetFilters build failure in db 8.2 runtime #3191

pxLi opened this issue Aug 11, 2021 · 5 comments
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@pxLi
Copy link
Collaborator

pxLi commented Aug 11, 2021

Describe the bug
Looks like Databricks just applied the API change to 8.2, related to #3098
311db shims layer is still using old ParquetFilters func which inherited from Spark301Shims,
we will need to make an 311db shims layer update

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7813.0 failed 1 times, most recent failure: Lost task 0.0 in stage 7813.0 (TID 27101) (ip-10-59-168-175.us-west-2.compute.internal executor driver): 
java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.parquet.ParquetFilters.<init>(Lorg/apache/parquet/schema/MessageType;ZZZZIZ)V
[2021-08-10T21:50:01.141Z] at com.nvidia.spark.rapids.shims.spark301.SparkBaseShims.getParquetFilters(SparkBaseShims.scala:95)

db does not add the patch version for their runtime, this brings us many headaches

@pxLi pxLi added bug Something isn't working ? - Needs Triage Need team to review and classify labels Aug 11, 2021
@pxLi
Copy link
Collaborator Author

pxLi commented Aug 11, 2021

NEED CONFIRM: we may need to target this to 21.06 and release 21.06.2 since it's long-term support version...

As we have the auto-merge setup, we would like to make fix target the elder-version branch

@tgravescs tgravescs added the P0 Must have for release label Aug 11, 2021
@tgravescs tgravescs self-assigned this Aug 11, 2021
@tgravescs
Copy link
Collaborator

I tried on databricks aws 8.2 runtime using 21.08 and 21.06.01 and both write parquet fine, I"m not sure that is the issue here, investigating.

@tgravescs
Copy link
Collaborator

so the couple parquet writes I did worked but running the unit tests on 8.2 fail for both 21.10 and 21.08 and likely 21.06.01 as well. Must be that in 8.2 the ParquetFilters is only hit in certain cases, which seems odd because I think only fallback tests pass, I thought on 7.3 any parquet write hit the issue.

@tgravescs
Copy link
Collaborator

So I was wrong, it does fail reading parquet on 8.2 with 21.06.01 in normal read case, I missed the failure.

@tgravescs
Copy link
Collaborator

pr merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

No branches or pull requests

2 participants