Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Incorporate RewriteDisjunctivePredicate rule into SimplifyExpressions #13032

Conversation

eejbyfeldt
Copy link
Contributor

Which issue does this PR close?

Closes #13031

Rationale for this change

The rational for moving the logic in to the SimplifyExpression is that is allows the rule to better interact with other simplifications and also be applied to all logical nodes, not just filters as the current rule is.

The reason for changing the logic is to only only rewrite the Expr when the rule acutally applies. The current code will always change the current structure of the expression which makes our optimizer harded to predict and more sensitive to changes in the order the rules are applied.

The new code is also significantly shorter and at least to me easier to follow.

What changes are included in this PR?

Adding a rule in SimplifyExpressions that will extract common conjuntive predicates in disjuntive predicates and removal of the existing rule RewriteDisjuntivePredicate.

Are these changes tested?

Yes, existing and new tests.

Are there any user-facing changes?

Yes, the rule RewriteDisjuntivePredicate no longer exists.

This adds a rewrite rule that elliminates common factors in OR. This is
already implmented in RewriteDisjunctivePredicate but this
implementation is simpler and will apply in more cases.
@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Oct 21, 2024
@Dandandan
Copy link
Contributor

Looks like a nice refactor. The changed plan looks slightly less optimal (one expression repeated where it was executed only a single time before).

pub fn iter_conjunction_owned(expr: Expr) -> impl Iterator<Item = Expr> {
let mut stack = vec![expr];
std::iter::from_fn(move || {
while let Some(expr) = stack.pop() {
Copy link
Contributor

@alamb alamb Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a clever formulation of an iterator that does not require buffering all the conjuncts in a Vec

@@ -255,7 +254,6 @@ impl Optimizer {
// run it again after running the optimizations that potentially converted
// subqueries to joins
Arc::new(SimplifyExpressions::new()),
Arc::new(RewriteDisjunctivePredicate::new()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳 -- thank you @eejbyfeldt

@@ -199,18 +199,18 @@ physical_plan
# Surely only once but also conditionally evaluated subexpressions
query TT
EXPLAIN SELECT
(a = 1 OR random() = 0) AND (a = 1 OR random() = 1) AS c1,
(a = 2 AND random() = 0) OR (a = 2 AND random() = 1) AS c2,
(a = 1 OR random() = 0) AND (a = 2 OR random() = 1) AS c1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

I ran the test locally for the original and here it is

query TT
EXPLAIN SELECT
    (a = 1 OR random() = 0) AND (a = 1 OR random() = 1) AS c1,
    (a = 2 AND random() = 0) OR (a = 2 AND random() = 1) AS c2,
    CASE WHEN a + 3 = 0 THEN a + 3 + random() ELSE 0 END AS c3,
    CASE WHEN a + 4 = 0 THEN 0 ELSE a + 4 + random() END AS c4
FROM t1
----
logical_plan
01)Projection: (__common_expr_1 OR random() = Float64(0)) AND (__common_expr_1 OR random() = Float64(1)) AS c1, t1.a = Float64(2) AND (random() = Float64(0) OR random() = Float64(1)) AS c2, CASE WHEN __common_expr_2 = Float64(0) THEN __common_expr_2 + random() ELSE Float64(0) END AS c3, CASE WHEN __common_expr_3 = Float64(0) THEN Float64(0) ELSE __common_expr_3 + random() END AS c4
02)--Projection: t1.a = Float64(1) AS __common_expr_1, t1.a + Float64(3) AS __common_expr_2, t1.a + Float64(4) AS __common_expr_3, t1.a
03)----TableScan: t1 projection=[a]
physical_plan
01)ProjectionExec: expr=[(__common_expr_1@0 OR random() = 0) AND (__common_expr_1@0 OR random() = 1) as c1, a@3 = 2 AND (random() = 0 OR random() = 1) as c2, CASE WHEN __common_expr_2@1 = 0 THEN __common_expr_2@1 + random() ELSE 0 END as c3, CASE WHEN __common_expr_3@2 = 0 THEN 0 ELSE __common_expr_3@2 + random() END as c4]
02)--ProjectionExec: expr=[a@0 = 1 as __common_expr_1, a@0 + 3 as __common_expr_2, a@0 + 4 as __common_expr_3, a@0 as a]
03)----MemoryExec: partitions=1, partition_sizes=[0]

@peter-toth does this look right to you?

Copy link
Contributor

@peter-toth peter-toth Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this change looks good to me to avoid simplification in this CSE test.
(Actually the same trick is applied in the previous test called # Surely only once but also conditionally evaluated expressions a bit above.)

Out of curiosity, volatile common expressions don't get simplified, do they?
E.g. (a = 1 AND random() = 0) OR (a = 2 AND random() = 0) is not rewritten to random() = 0 AND (a = 1 OR a = 2)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Actually the same trick is applied in the previous test called # Surely only once but also conditionally evaluated expressions a bit above.)

That was also changed by me in #12746 for basically the same reason for another rule.

Out of curiosity, volatile common expressions don't get simplified, do they?
E.g. (a = 1 AND random() = 0) OR (a = 2 AND random() = 0) is not rewritten to random() = 0 AND (a = 1 OR a = 2)?

It would end up being rewritten. I agree that does not sound correct. It should probably be addressed separatly as the issue applies to many of the other simplification rules. Created #13060

@alamb
Copy link
Contributor

alamb commented Oct 21, 2024

(thank you @eejbyfeldt and @Dandandan )

@alamb alamb merged commit 818ce3f into apache:main Oct 22, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 22, 2024

🚀 -- thank you @eejbyfeldt @Dandandan and @peter-toth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RewriteDisjuntivePredicate should only change Exprs that it simplifies
4 participants