-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Incorporate RewriteDisjunctivePredicate rule into SimplifyExpressions #13032
refactor: Incorporate RewriteDisjunctivePredicate rule into SimplifyExpressions #13032
Conversation
This adds a rewrite rule that elliminates common factors in OR. This is already implmented in RewriteDisjunctivePredicate but this implementation is simpler and will apply in more cases.
Looks like a nice refactor. The changed plan looks slightly less optimal (one expression repeated where it was executed only a single time before). |
pub fn iter_conjunction_owned(expr: Expr) -> impl Iterator<Item = Expr> { | ||
let mut stack = vec![expr]; | ||
std::iter::from_fn(move || { | ||
while let Some(expr) = stack.pop() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a clever formulation of an iterator that does not require buffering all the conjuncts in a Vec
@@ -255,7 +254,6 @@ impl Optimizer { | |||
// run it again after running the optimizations that potentially converted | |||
// subqueries to joins | |||
Arc::new(SimplifyExpressions::new()), | |||
Arc::new(RewriteDisjunctivePredicate::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳 -- thank you @eejbyfeldt
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
@@ -199,18 +199,18 @@ physical_plan | |||
# Surely only once but also conditionally evaluated subexpressions | |||
query TT | |||
EXPLAIN SELECT | |||
(a = 1 OR random() = 0) AND (a = 1 OR random() = 1) AS c1, | |||
(a = 2 AND random() = 0) OR (a = 2 AND random() = 1) AS c2, | |||
(a = 1 OR random() = 0) AND (a = 2 OR random() = 1) AS c1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
I ran the test locally for the original and here it is
query TT
EXPLAIN SELECT
(a = 1 OR random() = 0) AND (a = 1 OR random() = 1) AS c1,
(a = 2 AND random() = 0) OR (a = 2 AND random() = 1) AS c2,
CASE WHEN a + 3 = 0 THEN a + 3 + random() ELSE 0 END AS c3,
CASE WHEN a + 4 = 0 THEN 0 ELSE a + 4 + random() END AS c4
FROM t1
----
logical_plan
01)Projection: (__common_expr_1 OR random() = Float64(0)) AND (__common_expr_1 OR random() = Float64(1)) AS c1, t1.a = Float64(2) AND (random() = Float64(0) OR random() = Float64(1)) AS c2, CASE WHEN __common_expr_2 = Float64(0) THEN __common_expr_2 + random() ELSE Float64(0) END AS c3, CASE WHEN __common_expr_3 = Float64(0) THEN Float64(0) ELSE __common_expr_3 + random() END AS c4
02)--Projection: t1.a = Float64(1) AS __common_expr_1, t1.a + Float64(3) AS __common_expr_2, t1.a + Float64(4) AS __common_expr_3, t1.a
03)----TableScan: t1 projection=[a]
physical_plan
01)ProjectionExec: expr=[(__common_expr_1@0 OR random() = 0) AND (__common_expr_1@0 OR random() = 1) as c1, a@3 = 2 AND (random() = 0 OR random() = 1) as c2, CASE WHEN __common_expr_2@1 = 0 THEN __common_expr_2@1 + random() ELSE 0 END as c3, CASE WHEN __common_expr_3@2 = 0 THEN 0 ELSE __common_expr_3@2 + random() END as c4]
02)--ProjectionExec: expr=[a@0 = 1 as __common_expr_1, a@0 + 3 as __common_expr_2, a@0 + 4 as __common_expr_3, a@0 as a]
03)----MemoryExec: partitions=1, partition_sizes=[0]
@peter-toth does this look right to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this change looks good to me to avoid simplification in this CSE test.
(Actually the same trick is applied in the previous test called # Surely only once but also conditionally evaluated expressions
a bit above.)
Out of curiosity, volatile common expressions don't get simplified, do they?
E.g. (a = 1 AND random() = 0) OR (a = 2 AND random() = 0)
is not rewritten to random() = 0 AND (a = 1 OR a = 2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Actually the same trick is applied in the previous test called # Surely only once but also conditionally evaluated expressions a bit above.)
That was also changed by me in #12746 for basically the same reason for another rule.
Out of curiosity, volatile common expressions don't get simplified, do they?
E.g.(a = 1 AND random() = 0) OR (a = 2 AND random() = 0)
is not rewritten torandom() = 0 AND (a = 1 OR a = 2)
?
It would end up being rewritten. I agree that does not sound correct. It should probably be addressed separatly as the issue applies to many of the other simplification rules. Created #13060
(thank you @eejbyfeldt and @Dandandan ) |
Co-authored-by: Andrew Lamb <[email protected]>
🚀 -- thank you @eejbyfeldt @Dandandan and @peter-toth |
Which issue does this PR close?
Closes #13031
Rationale for this change
The rational for moving the logic in to the SimplifyExpression is that is allows the rule to better interact with other simplifications and also be applied to all logical nodes, not just filters as the current rule is.
The reason for changing the logic is to only only rewrite the Expr when the rule acutally applies. The current code will always change the current structure of the expression which makes our optimizer harded to predict and more sensitive to changes in the order the rules are applied.
The new code is also significantly shorter and at least to me easier to follow.
What changes are included in this PR?
Adding a rule in SimplifyExpressions that will extract common conjuntive predicates in disjuntive predicates and removal of the existing rule RewriteDisjuntivePredicate.
Are these changes tested?
Yes, existing and new tests.
Are there any user-facing changes?
Yes, the rule RewriteDisjuntivePredicate no longer exists.