Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize OR expressions referencing the same symbol as IN #2698

Closed

Conversation

Praveen2112
Copy link
Member

@Praveen2112 Praveen2112 commented Jan 31, 2020

Supersedes #2374

Fixes #1118

@cla-bot cla-bot bot added the cla-signed label Jan 31, 2020
@Praveen2112 Praveen2112 force-pushed the or_expression_normalizer branch 7 times, most recently from 8e3f40f to 53e09e8 Compare February 2, 2020 03:05
private OrExpressionNormalizer()
{}

public static Expression rewrite(Metadata metadata, Expression expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/rewrite/normalizeOrExpressions


import static io.prestosql.sql.planner.OrExpressionNormalizer.rewrite;

public class NormalizeOrExpressions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating this class maybe you could introduce a factory method ExpressionRewriteRuleSet.of(OrExpressionNormalizer::rewrite)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

@Test
public void testSimpleOrExpressions()
{
assertExpression("x = 0 or x = 1 or x = 2 or x = 3", "x in (0, 1, 2, 3)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would format this as:

assertExpression(
    "x = 0 or x = 1 or x = 2 or x = 3", 
    "x in (0, 1, 2, 3)");

it is much easier to compare expressions that way

public void testSimpleOrExpressions()
{
assertExpression("x = 0 or x = 1 or x = 2 or x = 3", "x in (0, 1, 2, 3)");
assertExpression("(x = 0 or x = 1) or (x = 2 or x = 3)", "x in (0, 1, 2, 3)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitalize OR and AND

assertExpression("a = 0 or a = 1 or x = 2 or x = 3", "a in (0, 1) or x in (2, 3)");
assertExpression("(a = 0 or x = 1) or (a = 2 or x = 3)", "a in (0, 2) or x in (1, 3)");
assertExpression("a = 0 or (a = 1 or (x = 2 or x = 3))", "a in (0, 1) or x in (2, 3)");
assertExpression("(a = 0 or a = 1) and (x = 2 or x = 3)", "a in (0, 1) and x in (2, 3)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add tests with AND and OR mixed in single expression?

assertExpression("(a = 0 or x = 1) or (a = 2 or x = 3)", "a in (0, 2) or x in (1, 3)");
assertExpression("a = 0 or (a = 1 or (x = 2 or x = 3))", "a in (0, 1) or x in (2, 3)");
assertExpression("(a = 0 or a = 1) and (x = 2 or x = 3)", "a in (0, 1) and x in (2, 3)");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a test with x = a or x = b or x IN (2,3,4) or x IN (SELECT 1)

Also can you please add such tests with actual query using tpch connector to make sure that results are correct.

@kasiafi
Copy link
Member

kasiafi commented Feb 3, 2020

Some general comments:

  1. the overriden visitor methods aren’t recursive. If you hit a ComparisonExpression or InPredicate or AND-expression, you don’t explore the subexpressions.
    This could be fixed by returning treeRewriter.defaultRewrite() or returning null, which would also trigger default rewrite instead of returning the original expression.

  2. please add tests for nested rewrite.

@kasiafi
Copy link
Member

kasiafi commented Feb 3, 2020

  1. How do you take care of a case when the normalized expression is equal to the original expression? I’m not sure, but if you return the rewritten equivalent instead of the original instance, it may cause optimizer loop.
    Please add a query test with something like x IN (1, 2) OR y IN (3, 4).

@kasiafi
Copy link
Member

kasiafi commented Feb 3, 2020

  1. This change could be extended to optimize a tree of nested OR-expressions. To achieve that, you should collect the context over the whole OR-tree and finally rewrite. You could check the class EliminateCrossJoins for inspiration.

@Praveen2112 Praveen2112 force-pushed the or_expression_normalizer branch from 53e09e8 to a9cc6a5 Compare February 4, 2020 06:34

public Visitor(Metadata metadata)
{
this.metadata = metadata;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNonNull

}

@Override
public Expression rewriteExpression(Expression expression, Optional<Context> context, ExpressionTreeRewriter<Optional<Context>> treeRewriter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this method to the bottom (or to the top), after (or before) any other more specialized method


import static io.prestosql.sql.planner.OrExpressionNormalizer.rewrite;

public class NormalizeOrExpressions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

ComparisonExpression rewrittenExpression = treeRewriter.defaultRewrite(comparisonExpression, Optional.empty());
if (rewrittenExpression.getOperator().equals(EQUAL) && context.isPresent()) {
context.get().addMapping(comparisonExpression.getLeft(), comparisonExpression.getRight());
return FALSE_LITERAL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this return FALSE_LITERAL (and any other)


public static Expression normalizeOrExpressions(Metadata metadata, Expression expression)
{
return ExpressionTreeRewriter.rewriteWith(new Visitor(metadata), expression, Optional.empty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it is a bit misleading that rewrite sometimes does not rewrite but only collect information in context. Maybe you could do it in two phases. For every logical binary expression you could collect with simple visitor (not rewriter) expressions. So rewriting visitor would implement only rewriteLogicalBinaryExpression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
public NormalizeOrExpressions(Metadata metadata)
{
super(((expression, context) -> normalizeOrExpressions(metadata, expression)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this rewrite should be part of SimplifyExpressions

@sopel39
Copy link
Member

sopel39 commented Oct 31, 2020

@Praveen2112 what's status of this?

@Praveen2112
Copy link
Member Author

I'm working on a revamp for the same PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Normalize OR expressions referencing the same symbol as IN
5 participants