-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substrait: Combine join on and filter expressions in a single Substrait JoinRel's field #7611
Comments
There is a few things in my description that I need to correct here.
According to the ANSI join syntax (explained by IBM), there are two types of join filters: filter and post join filter.
In the case of an Since the current version of |
@Dandandan @metesynnada @berkaysynnada Do you guys have any thoughts on this? |
Thanks for the explanation :) @metesynnada may give you more accurate answers since has more experience with joins., but let me give you my thoughts. "If the join on field is empty, the producer does not put Literal(True) in the field, it will just produce None" "The reason this issue should be labeled as bug is because the current use of post_join_filter is incorrect." From your first explanation, I can understand why you need this but I'm unsure of the best solution. |
This is from the original code. There is actually nothing wrong with this part. I just described it incorrectly in the issue description so I wanted to correct it.
You're exactly right. datafusion does not have a post-join filter. It only contains join The Join struct has only two fields for condition and according to the comments, neither of them are to be applied post-join. /// Equijoin clause expressed as pairs of (left, right) join expressions
pub on: Vec<(Expr, Expr)>,
/// Filters applied during join (non-equi conditions)
pub filter: Option<Expr>, |
Is your feature request related to a problem or challenge?
As of version 31.0.0,
datafusion
has a join on field which is used for equi-join conditions and a join filter field for non-equi-join conditions.Currently, the producer puts the non-equi-join conditions in the Substrait post_join_filter. However, we can also combine all conditions and put it in the expression field of the JoinRel in Substrait.
Please refer to the motivation in the following comment.
The motivation behind this change request is that this will let other DB systems decide what to do with the entire condition, as opposed to having to process them separately. Right now, if there is no equal condition, the producer will output justLiteral(True)
as the join expression, and put the rest of the condition in thepost_join_filter
. Having a redundantLiteral
expression adds unnecessary overhead of evaluating this condition. It also implies that you’re performing a cartesian product THEN a filter, as opposed to just a non-equi-join, which does not completely align with the original plan intent.Describe the solution you'd like
All valid join conditions to be encoded in the Substrait
JoinRel
's expression field.Describe alternatives you've considered
None.
The current approach works correctly semantically, but it can make downstream query execution inefficient.Additional context
No response
The text was updated successfully, but these errors were encountered: