-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for LEFT/RIGHT/FULL/INNER lateral join #390
Conversation
e5fb60e
to
6437010
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! I have some initial comments and a question about filterExpressionTranslationMap.
{ | ||
PlanNode subqueryNode = subqueryPlan.getRoot(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formatting change is unrelated to the core of this commit, so please remove it or pull it out as a separate commit.
} | ||
|
||
public PlanBuilder appendLateralJoin(PlanBuilder subPlan, PlanBuilder subqueryPlan, Query query, boolean correlationAllowed, LateralJoinNode.Type type) | ||
public PlanBuilder appendLateralJoin(PlanBuilder subPlan, PlanBuilder subqueryPlan, Query query, boolean correlationAllowed, LateralJoinNode.Type type, Expression filterCondition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joins don't have "filter" conditions. Rather, it's a "join" condition or criteria. The distinction is subtle but important. A filter determines whether a row is preserved or removed, but the join condition determines whether the rows are considered as candidates to be joined. Depending on the type of join, rows that don't satisfy the join condition may be dropped or joined with a synthetic row containing nulls (e.g., for outer joins). So I'd rename this argument to just criteria
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware of the naming bias.
However, JoinNode
has got both members: criteria
and filter
.
I chose the name filter
for LateralJoinNode
member, because it is actually the counterpart of JoinNode
's filter
and it becomes JoinNode
's filter
later in the course of planning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Ok, let's leave it for now. We can clean this up in the future for both LateralJoinNode
and JoinNode
@@ -499,7 +501,7 @@ private PlanBuilder createPlanBuilder(Node node) | |||
private Set<Expression> extractOuterColumnReferences(PlanNode planNode) | |||
{ | |||
// at this point all the column references are already rewritten to SymbolReference | |||
// when reference expression is not rewritten that means it cannot be satisfied within given PlaNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this fix to a separate commit, since it's unrelated to this change.
@@ -63,6 +85,7 @@ | |||
*/ | |||
private final List<Symbol> correlation; | |||
private final Type type; | |||
private final Expression filter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be called criteria
(see my other comment in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maintain it as Optional<Expression>
like we maintain in JoinNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maintain it as Optional like we maintain in JoinNode ?
I would prefer the opposite. To remove Optional
from JoinNode
. To me it is nested optional-like structure, two levels where you need to check for emptiness.
.addAll(rightPlan.getFieldMappings()) | ||
.build(); | ||
|
||
PlanNode root = new JoinNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. Planning a join + lateral should not involve a join node. After all, we're modeling it as a specialized "apply + join" operation. Can you explain why you think this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a egg-chicken-egg problem :). We do the same for join. We create join node to rewrite expression to create join. As I understand it is required to create a TranslationMap
, because it is needed by RelationPlan
. However, PlanNode
from RelationPlan
is not used in TranslationMap
. I think we should refactor that and remove a need of PlanNode
for TranslationMap
.
Also I think this code should belong SubqueriesPlanner
. SubqueriesPlanner
should do proper expression rewrites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kokosing you're right. It's just a dummy used to create a TranslationMap
and not used in the resulting plan. It was inspired by RelationPlanner.visitJoin()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should refactor that and remove a need of PlanNode for TranslationMap.
Let's not make structural changes quite yet. I have work in progress to rewrite RelationPlanner and friends so we can split the AST from the IR entirely, and it touches all that you're talking about. @kasiafi's addition is fine, though -- I can adapt it easily since it's an incremental change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kasiafi, I'd rename this field to dummy
to make it clear it's inconsequential and add a comment explaining that it's not actually used in the plan but just for seeding the TranslationMap with schema and scope.
BTW, it might be sufficient to use a Values node for this instead of a Join node. If I recall correctly, all that matters is that the symbols are in the right positions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, apparently ValuesNode
works fine and even with empty rows
. That looks much better as a dummy, thanks.
@@ -546,6 +556,52 @@ private RelationPlan planLateralJoin(Join join, RelationPlan leftPlan, Lateral l | |||
return new RelationPlan(planBuilder.getRoot(), analysis.getScope(join), outputSymbols); | |||
} | |||
|
|||
private TranslationMap filterExpressionTranslationMap(RelationPlan leftPlan, RelationPlan rightPlan, PlanBuilder leftPlanBuilder, PlanBuilder rightPlanBuilder, Join join) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much value in having this as a separate method. There's no reuse across various operations and it's tied to the specific implementation of planning join + lateral. I would inline it in the method that uses is and add some comments in the appropriate sections if there's any logical separation worth highlighting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would be better to graphically separate it, because it constructs the JoinNode
which is not a part of the resulting plan (as mentioned above). Inlining the construction could be misleading at the first view.
Surely inlining + comment would be OK.
return translationMap; | ||
} | ||
|
||
private Expression extractFilterExpression(Join joinNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I'd inline this where it's being used. It's very specific to the planning of that type of join (i.e., join + lateral).
@@ -61,10 +64,23 @@ public Result apply(LateralJoinNode lateralJoinNode, Captures captures, Context | |||
decorrelatedNode.getNode(), | |||
ImmutableList.of(), | |||
lateralJoinNode.getOutputSymbols(), | |||
decorrelatedNode.getCorrelatedPredicates(), | |||
conjunctFilters(decorrelatedNode.getCorrelatedPredicates(), lateralJoinNode.getFilter()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ExpressionUtils.combineConjuncts()
:
ExpressionUtils.combineConjuncts(
decorrelatedNode.getCorrelatedPredicates().orElse(TRUE_LITERAL),
lateralJoinNode.getFilter());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe decorrelatedNode.getCorrelatedPredicates()
should return TRUE_LITERAL
instead of Optional.empty()
. I think having nested optional structure is not needed, because first we need to check if value is empty() and if it equal to TRUE_LITERAL
.
@@ -201,7 +201,6 @@ public String getJoinLabel() | |||
|
|||
public static Type typeConvert(Join.Type joinType) | |||
{ | |||
// Omit SEMI join types because they must be inferred by the planner and not part of the SQL parse tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change?
@@ -59,9 +59,11 @@ | |||
import io.prestosql.sql.tree.InPredicate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could be hard, but can divide this commit to first introduce the expression into LateralJoinNode
and then we could have a commit which would add RIGHT
and FULL
to LateralJoinType
.
"VALUES (1, NULL), (2, 3), (2, 1), (3, 2), (3, 1), (NULL, 5)"); | ||
assertQuery( | ||
"SELECT * FROM (VALUES 1, 2) a(x) JOIN LATERAL(SELECT y FROM (VALUES 2, 3) b(y) WHERE y > x) c(z) ON z > 2*x", | ||
"VALUES (1, 3)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these tests to io.prestosql.sql.query.TestSubqueries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests exactly do you mean?
.addAll(rightPlan.getFieldMappings()) | ||
.build(); | ||
|
||
PlanNode root = new JoinNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a egg-chicken-egg problem :). We do the same for join. We create join node to rewrite expression to create join. As I understand it is required to create a TranslationMap
, because it is needed by RelationPlan
. However, PlanNode
from RelationPlan
is not used in TranslationMap
. I think we should refactor that and remove a need of PlanNode
for TranslationMap
.
Also I think this code should belong SubqueriesPlanner
. SubqueriesPlanner
should do proper expression rewrites.
@@ -279,7 +281,7 @@ private PlanBuilder appendExistSubqueryApplyNode(PlanBuilder subPlan, ExistsPred | |||
|
|||
PlanNode subqueryPlanRoot = subqueryPlan.getRoot(); | |||
if (isAggregationWithEmptyGroupBy(subqueryPlanRoot)) { | |||
subPlan.getTranslations().put(existsPredicate, BooleanLiteral.TRUE_LITERAL); | |||
subPlan.getTranslations().put(existsPredicate, TRUE_LITERAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separete commit
@@ -61,10 +64,23 @@ public Result apply(LateralJoinNode lateralJoinNode, Captures captures, Context | |||
decorrelatedNode.getNode(), | |||
ImmutableList.of(), | |||
lateralJoinNode.getOutputSymbols(), | |||
decorrelatedNode.getCorrelatedPredicates(), | |||
conjunctFilters(decorrelatedNode.getCorrelatedPredicates(), lateralJoinNode.getFilter()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe decorrelatedNode.getCorrelatedPredicates()
should return TRUE_LITERAL
instead of Optional.empty()
. I think having nested optional structure is not needed, because first we need to check if value is empty() and if it equal to TRUE_LITERAL
.
return context.rewrite(node.getInput(), context.get()); | ||
if (intersection(ImmutableSet.copyOf(subquery.getOutputSymbols()), context.get()).isEmpty()) { | ||
// remove unused lateral subquery of inner join | ||
if (node.getType() == INNER && isScalar(subquery) && node.getFilter().equals(TRUE_LITERAL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also check if subquery cardinality is 0, then we could optimize it to empty ValuesNode
.
Also I think we should extract this to optimizer Rule and remove this code from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be a separate change after this PR, though.
// remove unused input nodes | ||
if (intersection(ImmutableSet.copyOf(input.getOutputSymbols()), expectedCorrelationAndContextSymbols).isEmpty()) { | ||
// remove unused input of inner join | ||
if (node.getType() == INNER && isScalar(input) && node.getFilter().equals(TRUE_LITERAL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding test cases for that.
@@ -453,8 +453,9 @@ public PlanNode visitLateralJoin(LateralJoinNode node, RewriteContext<Void> cont | |||
PlanNode source = context.rewrite(node.getInput()); | |||
PlanNode subquery = context.rewrite(node.getSubquery()); | |||
List<Symbol> canonicalCorrelation = canonicalizeAndDistinct(node.getCorrelation()); | |||
Expression canonicalFilter = canonicalize(node.getFilter()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline
@@ -63,6 +85,7 @@ | |||
*/ | |||
private final List<Symbol> correlation; | |||
private final Type type; | |||
private final Expression filter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maintain it as Optional like we maintain in JoinNode ?
I would prefer the opposite. To remove Optional
from JoinNode
. To me it is nested optional-like structure, two levels where you need to check for emptiness.
public static Property<LateralJoinNode, Lookup, Expression> filter() | ||
{ | ||
return property("filter", LateralJoinNode::getFilter); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also add pattern for empty critieria (equal to TRUE_LITERAL
), as it looks to be used quite frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show me some example I could follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we have to create something like this
public static Property<LateralJoinNode, Lookup, Boolean> isEmpty()
{
return property("isEmpty", lateralJoinNode -> lateralJoinNode.getFilter().equals(TRUE_LITERAL);
}
4fc5d46
to
ba142db
Compare
.addAll(rightPlan.getFieldMappings()) | ||
.build(); | ||
|
||
PlanNode root = new JoinNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kasiafi, I'd rename this field to dummy
to make it clear it's inconsequential and add a comment explaining that it's not actually used in the plan but just for seeding the TranslationMap with schema and scope.
BTW, it might be sufficient to use a Values node for this instead of a Join node. If I recall correctly, all that matters is that the symbols are in the right positions.
} | ||
|
||
public PlanBuilder appendLateralJoin(PlanBuilder subPlan, PlanBuilder subqueryPlan, Query query, boolean correlationAllowed, LateralJoinNode.Type type) | ||
public PlanBuilder appendLateralJoin(PlanBuilder subPlan, PlanBuilder subqueryPlan, Query query, boolean correlationAllowed, LateralJoinNode.Type type, Expression filterCondition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Ok, let's leave it for now. We can clean this up in the future for both LateralJoinNode
and JoinNode
Optional.empty(), | ||
Optional.empty(), | ||
Optional.empty(), | ||
Optional.empty())); | ||
} | ||
|
||
private Optional<Expression> joinFilter(Expression lateralJoinFilter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument could be renamed to just filter
with no loss of clarity.
@@ -812,11 +817,25 @@ public PlanNode visitAssignUniqueId(AssignUniqueId node, RewriteContext<Set<Symb | |||
@Override | |||
public PlanNode visitLateralJoin(LateralJoinNode node, RewriteContext<Set<Symbol>> context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, context
is something more complex than just the dependency symbols. It contains the dependency symbols, but it's truly a context object for the rewrite action.
return context.rewrite(node.getInput(), context.get()); | ||
if (intersection(ImmutableSet.copyOf(subquery.getOutputSymbols()), context.get()).isEmpty()) { | ||
// remove unused lateral subquery of inner join | ||
if (node.getType() == INNER && isScalar(subquery) && node.getFilter().equals(TRUE_LITERAL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be a separate change after this PR, though.
@martint I applied your comments. Is there anything else to be done? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. There's some cleanup do later for code this PR touched (move tests out of AbstractTestQueries, migrate out of PruneUnreferencedOutputs, etc), but it's pre-existing and unrelated to the core of this change.
Merged. Thanks for your contribution, @kasiafi! |
relates to #9