Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Spark 3.1 orc nested predicate pushdown support #576

Closed
tgravescs opened this issue Aug 18, 2020 · 6 comments
Closed

[FEA] Spark 3.1 orc nested predicate pushdown support #576

tgravescs opened this issue Aug 18, 2020 · 6 comments
Assignees
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request P0 Must have for release Spark 3.1+ Bugs only related to Spark 3.1 or higher

Comments

@tgravescs
Copy link
Collaborator

Describe the bug
Spark 3.1 added nested predicate pushdown support in ORC - apache/spark@7b6e1d5

This removed a function we were using:
E Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.orc.OrcFiltersBase.isSearchableType$(Lorg/apache/spark/sql/execution/datasources/orc/OrcFiltersBase;Lorg/apache/spark/sql/types/DataType;)Z

We should go through and add the same support or fix it to use the new function.

Steps/Code to reproduce bug
Run spark 3.1.0 integration tests:
FAILED integration_tests/src/main/python/orc_test.py::test_input_meta - py4j....

Expected behavior
tests pass

Environment details (please complete the following information)

  • Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
  • Spark configuration settings related to the issue

Additional context
Add any other context about the problem here.

@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify labels Aug 18, 2020
@tgravescs tgravescs added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Aug 18, 2020
@tgravescs
Copy link
Collaborator Author

this got backported to 3.0.1 so we need to fix for 0.2 release

@tgravescs
Copy link
Collaborator Author

it looks like changes in 3.0.1 and 3.1.0 are also different so we would need a shim layer. Temporarily I'm going to put in a change to just copy the function that got moved to private:

  • protected[sql] def isSearchableType(dataType: DataType) = dataType match {
  • private def isSearchableType(dataType: DataType) = dataType match {

And then we can look at this in more detail to support it fully.

@sameerz sameerz added P1 Nice to have for release Spark 3.1+ Bugs only related to Spark 3.1 or higher and removed P0 Must have for release labels Oct 23, 2020
@tgravescs tgravescs added P0 Must have for release P1 Nice to have for release and removed P1 Nice to have for release P0 Must have for release labels Feb 9, 2021
@tgravescs tgravescs changed the title [BUG] Spark 3.1 orc nested predicate pushdown support (breaks our test) [FEATURE] Spark 3.1 orc nested predicate pushdown support Feb 9, 2021
@tgravescs tgravescs added feature request New feature or request and removed bug Something isn't working labels Feb 9, 2021
@tgravescs
Copy link
Collaborator Author

ok so leaving a P2 because we fixed the function and it works, this is only needed if we want to support nested predicate pushdown

@tgravescs tgravescs removed the P1 Nice to have for release label Feb 10, 2021
@tgravescs tgravescs added the P2 Not required for release label Feb 10, 2021
@sameerz sameerz added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Apr 20, 2021
@sameerz
Copy link
Collaborator

sameerz commented Apr 20, 2021

Depends on rapidsai/cudf#7640 and rapidsai/cudf#7830

@tgravescs
Copy link
Collaborator Author

our orc filter code was updated with #1982 and it looks like this should just work, so perhaps we just need to test it.

@Salonijain27 Salonijain27 added P0 Must have for release and removed P2 Not required for release labels Jul 20, 2021
@jlowe
Copy link
Contributor

jlowe commented Jul 20, 2021

Closing as a duplicate of #1481

@jlowe jlowe closed this as completed Jul 20, 2021
@sameerz sameerz changed the title [FEATURE] Spark 3.1 orc nested predicate pushdown support [FEA] Spark 3.1 orc nested predicate pushdown support Sep 27, 2021
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
[auto-merge] bot-auto-merge-branch-22.10 to branch-22.12 [skip ci] [bot]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request P0 Must have for release Spark 3.1+ Bugs only related to Spark 3.1 or higher
Projects
None yet
Development

No branches or pull requests

5 participants