Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] UDF-Compiler: Translation of simple predicate UDF should allow predicate pushdown #3985

Closed
ywang529 opened this issue Nov 1, 2021 · 2 comments · Fixed by #5315
Closed
Assignees
Labels
feature request New feature or request

Comments

@ywang529
Copy link

ywang529 commented Nov 1, 2021

What is your question?
Hi, It is a great open-source project, thanks for sharing. We have been evaluating the udf-compiler, it works great as advertised, but we run into a problem when we try to test a filter pushdown. We wrote a simple UDF as below, but the PushedFilters is empty with udf-compiler:

    spark.udf.register("isOlderThan20", (i: Int) => {i>20})
    val udfResult = spark.sql("SELECT * FROM people_with_schema WHERE isOlderThan20(age)")
    
   +- FileScan json [name#0,age#1] Batched: false, DataFilters: [if ((age#1 > 20)) true else false], ..... PushedFilters: [], 

What we expected is to have a physical plan like this:

    val udfResult = spark.sql("SELECT * FROM people_with_schema WHERE age > 20")
   
    !+- Filter (isnotnull(age#1) AND (age#1 > 20))                                                                                                                                                                                                                                                     
    +- FileScan ...PushedFilters: [IsNotNull(age), GreaterThan(age,20)], 

By turning on trace, it looks like that the udf-compiler translated this udf as "if-else", instead of just "age#1 >20":
Filter if ((age#1 > 20)) true else false

Wonder if this is by design? If not, wonder if you can suggest some ideas to us on how to fix this?

Thank you very much!
-Yong Wang

@ywang529 ywang529 added ? - Needs Triage Need team to review and classify question Further information is requested labels Nov 1, 2021
@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Nov 2, 2021
@jlowe jlowe added feature request New feature or request and removed question Further information is requested labels Nov 3, 2021
@jlowe jlowe changed the title [QST] UDF-Compiler: Predicate pushdown is not working in this case [FEA] UDF-Compiler: Translation of simple predicate UDF should allow predicate pushdown Nov 3, 2021
@jlowe
Copy link
Contributor

jlowe commented Nov 3, 2021

Thanks for the report! I verified with a vanilla Spark session that specifying an expression as an IF statement is sufficient to defeat the predicate pushdown into the load.

Migrating this into a feature request to add support for predicate pushdown for Catalyst operations resulting from the UDF compiler.

@ywang529
Copy link
Author

ywang529 commented Nov 3, 2021

Thank you very much for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants