-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize queries where lhs and rhs of predicate are equal #10444
optimize queries where lhs and rhs of predicate are equal #10444
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10444 +/- ##
============================================
+ Coverage 63.14% 70.31% +7.16%
- Complexity 5914 6112 +198
============================================
Files 2045 2072 +27
Lines 111202 112021 +819
Branches 16963 17068 +105
============================================
+ Hits 70217 78762 +8545
+ Misses 35818 27717 -8101
- Partials 5167 5542 +375
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 312 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, great job!
@@ -170,6 +170,8 @@ private void validateQueries() { | |||
for (String queryString : _queryWeightMap.keySet()) { | |||
try { | |||
PinotQuery pinotQuery = CalciteSqlParser.compileToPinotQuery(queryString); | |||
// TODO: should we catch and ignore any errors here. If we error on query optimization, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Ignoring the error is more robust, while failing the query can help catch the bug in the optimizer and prevent certain unexpected performance degradation. Currently optimize logic is applied in-place (there is no return value), so I personally prefer directly failing the query since the query might already be modified and messed up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, I was thinking the same about the fact that it might be only semi optimized when this fails. I've updated the comment to reflect this state
.../src/main/java/org/apache/pinot/core/query/optimizer/filter/FlattenAndOrFilterOptimizer.java
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/core/query/optimizer/filter/FlattenAndOrFilterOptimizer.java
Show resolved
Hide resolved
@@ -44,7 +45,7 @@ public class QueryOptimizer { | |||
// values to the proper format so that they can be properly parsed | |||
private static final List<FilterOptimizer> FILTER_OPTIMIZERS = | |||
Arrays.asList(new FlattenAndOrFilterOptimizer(), new MergeEqInFilterOptimizer(), new NumericalFilterOptimizer(), | |||
new TimePredicateFilterOptimizer(), new MergeRangeFilterOptimizer()); | |||
new TimePredicateFilterOptimizer(), new MergeRangeFilterOptimizer(), new IdenticalPredicateFilterOptimizer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we apply this optimizer in the end? If it doesn't rely on other optimizers, we can put it next to the flatten optimizer to avoid other optimizer to optimize on identical predicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call. it actually caught another null point exception moving it earlier
*/ | ||
public abstract class BaseAndOrBooleanFilterOptimizer implements FilterOptimizer { | ||
|
||
protected static final Expression TRUE = RequestUtils.getLiteralExpression(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file doesn't follow the Pinot Style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good eye. I was working on a new laptop and hadn't set that up.
protected static final Expression FALSE = RequestUtils.getLiteralExpression(false); | ||
|
||
@Override | ||
public abstract Expression optimize(Expression filterExpression, @Nullable Schema schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) No need to override this API to an abstract method
} | ||
|
||
@Override | ||
protected boolean isAlwaysFalse(Expression operand) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to override this method? After the DFS, all the children should already be optimized
return expression; | ||
} | ||
|
||
protected boolean isAlwaysFalse(Expression operand) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't need to override this method (see comment below), we can change optimizeCurrent
into a util method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworked the interface to be a little clearer. The base class handles the DFS, and users just implement the base case. Let me know if this looks better.
…o other PR comments
thank you! i see all checks passed. let me know if you have further comments, though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
} | ||
|
||
/** Change the expression value to boolean literal with given value. */ | ||
protected static void setExpressionToBoolean(Expression expression, boolean value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not introduced in this PR, but let's remove this method since we should avoid mutating an expression. We can use the constant TRUE
and FALSE
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. There's still some more mutation in NumericalFilterOptimizer, but this gets rid of a big part
private boolean isAlwaysFalse(Expression operand) { | ||
return operand.equals(FALSE); | ||
} | ||
|
||
private boolean isAlwaysTrue(Expression operand) { | ||
return operand.equals(TRUE); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Slightly more readable if we just inline them or rename to isTrue()
and isFalse()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. it made more sense in the initial PR
case OR: | ||
case NOT: | ||
// Recursively traverse the expression tree to find an operator node that can be rewritten. | ||
operands.forEach(operand -> optimize(operand, schema)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use replaceAll()
here so that it still works when the optimize is not applied inplace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
switch (kind) { | ||
case EQUALS: | ||
if (hasIdenticalLhsAndRhs(filterExpression)) { | ||
setExpressionToBoolean(filterExpression, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly return TRUE
, same for the following
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, even better
FilterKind kind = FilterKind.valueOf(function.getOperator()); | ||
switch (kind) { | ||
case EQUALS: | ||
if (hasIdenticalLhsAndRhs(filterExpression)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Directly pass the operands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty, no need to recompute it all
return false; | ||
} | ||
List<Expression> children = function.getOperands(); | ||
boolean hasTwoChildren = children.size() == 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Is there any case EQ
or NEQ
can have other than 2 children? We can probably make a precondition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a precondition
would fail the query, no? even if it is possible, this function is really only supposed to optimize this one case.
|
||
/** | ||
* Pinot queries of the WHERE 1 != 1 AND "col1" = "col2" variety are rewritten as | ||
* 1-1 != 0 AND "col1"-"col2" = 0. Therefore, we check specifically for the case where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rewrite is already happening in PredicateComparisonRewriter.updateFunctionExpression()
, so we might just compare the lhs and rhs there.
Since we already get this implementation, we can add a TODO here and revisit later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see that class until now. But even then, I think I slightly prefer this more as an optimization than a rewrite. But it's probably easier to do there before it gets converted just for us to convert it back.
This is a minor
performance
bugfix
and closes #10383WHERE 1=1
queries. These would fail because the filter expression had no function callWHERE 1=1
was no simplified, butWHERE col1>0 AND 1=1
was actually being simplified in theNumericalFilterOptimizer
. So I put that part in a separate class to be used more generally for future cases like thisIdenticalPredicateFilterOptimizer
class that convertsWHERE 1=1
orWHERE "colA"!="colA"
to TRUE/FALSE respectivelyI've added a bunch more test cases, and I've tested manually in the Quickstart app. This is my first contribution to the query parsing part of the code base, so I don't have a great sense what test coverage looks like. But I imagine between unit and integration tests, this should catch any glaring breaks?