-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support null safe equals in ExtractEquijoinPredicate #12458
feat: Support null safe equals in ExtractEquijoinPredicate #12458
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @eejbyfeldt, this looks like a reasonable change.
I've left some comments, and it's probably also worth to check other logical optimizer rules, to ensure that if they somehow modify joins, it won't result in incorrect operator.
if join.filter.is_none() { | ||
return Ok(Transformed::no(LogicalPlan::Join(join))); | ||
} | ||
let expr = join.filter.as_ref().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will let-else work here to avoid unwrap?
) -> Result<(Vec<EquijoinPredicate>, Option<Expr>)> { | ||
let exprs = split_conjunction_owned(filter); | ||
let exprs = split_conjunction(filter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: this part (along with cloning exprs) in else
s of this function was changed in #10165 as a part of "reducing expr cloning in optimizer" initiative, so, probably, in this place it's preferred to avoid cloning, unless it's required.
@@ -96,6 +96,7 @@ impl OptimizerRule for EliminateCrossJoin { | |||
filter.input.as_ref(), | |||
LogicalPlan::Join(Join { | |||
join_type: JoinType::Inner, | |||
null_equals_null: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule also has it's own logic for identifying join keys in extract_possible_join_keys
function. Shouldn't is not distinct from
operator be supported there (perhaps by reusing logic from extract_equijoin_predicated_rule
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, seems like we should probably unify that logic before making this change.
@@ -96,6 +96,7 @@ impl OptimizerRule for EliminateCrossJoin { | |||
filter.input.as_ref(), | |||
LogicalPlan::Join(Join { | |||
join_type: JoinType::Inner, | |||
null_equals_null: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also, not sure that this rule shouldn't have any interactions with null_equals_null = true
joins -- what is the current behavior of such queries and won't this additional filter change it drastically?
|
||
# IS NOT DISTRINCT can be transformed into equijoin | ||
query TT | ||
EXPLAIN SELECT t1_id, t1_int, t2_int FROM t1 JOIN t2 ON t1_id IS NOT DISTINCT from t2_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be useful to also have the following test cases
- demonstrating that equality predicates have priority over "is not distinct from" when picking ON conditions from the filter
- how these joins will work with
eliminate_cross_join
rule (which if I'm not mistaken also works for filter over inner joins event with ON conditions) - how these join will be treated by
push_down_filter
-- in some cases it seems to be able to push filters into ON conditions -- if so, resulting query won't be correct in case of null_equals_null joins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions for:
how these join will be treated by push_down_filter -- in some cases it seems to be able to push filters into ON conditions -- if so, resulting query won't be correct in case of null_equals_null joins
I thought these would be pushed into the filter
and not the on
(https://github.com/eejbyfeldt/datafusion/blob/1d4cf53169fd9a611a7e24e9ac4880d3e24b131e/datafusion/expr/src/logical_plan/plan.rs#L2965-L2968) and therefore still be correct.
Converted to draft until I have time to address the feedback and look more into if the requires changes in other rules. |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #12299.
Rationale for this change
Allow more joins to be executed efficiently.
What changes are included in this PR?
Previously ExtractEquijoinPredicate did not inspect the value of
nulls_equals_null
for the join before extracting equi join predicates, which was likely a possible source of bugs. The new code will respectnull_equals_null
if set (and there is a equi join predicate) and if not it will first try to extraOperator::Eq
predicates and if none is found it will try to extractOperator::IsNotDistinctFrom
equi predicates.Prev
Are these changes tested?
Yes, using existing and some new test cases.
Are there any user-facing changes?
For queries using
IS NOT DISTINCT
might have improved query plans.