Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for LEFT/RIGHT/FULL/INNER lateral join #390

Merged
merged 4 commits into from
Apr 1, 2019

Conversation

kasiafi
Copy link
Member

@kasiafi kasiafi commented Mar 5, 2019

relates to #9

@cla-bot cla-bot bot added the cla-signed label Mar 5, 2019
@kasiafi kasiafi force-pushed the Lateral branch 3 times, most recently from e5fb60e to 6437010 Compare March 6, 2019 13:27
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I have some initial comments and a question about filterExpressionTranslationMap.

{
PlanNode subqueryNode = subqueryPlan.getRoot();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formatting change is unrelated to the core of this commit, so please remove it or pull it out as a separate commit.

}

public PlanBuilder appendLateralJoin(PlanBuilder subPlan, PlanBuilder subqueryPlan, Query query, boolean correlationAllowed, LateralJoinNode.Type type)
public PlanBuilder appendLateralJoin(PlanBuilder subPlan, PlanBuilder subqueryPlan, Query query, boolean correlationAllowed, LateralJoinNode.Type type, Expression filterCondition)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joins don't have "filter" conditions. Rather, it's a "join" condition or criteria. The distinction is subtle but important. A filter determines whether a row is preserved or removed, but the join condition determines whether the rows are considered as candidates to be joined. Depending on the type of join, rows that don't satisfy the join condition may be dropped or joined with a synthetic row containing nulls (e.g., for outer joins). So I'd rename this argument to just criteria.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of the naming bias.
However, JoinNode has got both members: criteria and filter.
I chose the name filter for LateralJoinNode member, because it is actually the counterpart of JoinNode's filter and it becomes JoinNode's filter later in the course of planning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Ok, let's leave it for now. We can clean this up in the future for both LateralJoinNode and JoinNode

@@ -499,7 +501,7 @@ private PlanBuilder createPlanBuilder(Node node)
private Set<Expression> extractOuterColumnReferences(PlanNode planNode)
{
// at this point all the column references are already rewritten to SymbolReference
// when reference expression is not rewritten that means it cannot be satisfied within given PlaNode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this fix to a separate commit, since it's unrelated to this change.

@@ -63,6 +85,7 @@
*/
private final List<Symbol> correlation;
private final Type type;
private final Expression filter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called criteria (see my other comment in this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maintain it as Optional<Expression> like we maintain in JoinNode ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maintain it as Optional like we maintain in JoinNode ?

I would prefer the opposite. To remove Optional from JoinNode. To me it is nested optional-like structure, two levels where you need to check for emptiness.

.addAll(rightPlan.getFieldMappings())
.build();

PlanNode root = new JoinNode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. Planning a join + lateral should not involve a join node. After all, we're modeling it as a specialized "apply + join" operation. Can you explain why you think this is needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a egg-chicken-egg problem :). We do the same for join. We create join node to rewrite expression to create join. As I understand it is required to create a TranslationMap, because it is needed by RelationPlan. However, PlanNode from RelationPlan is not used in TranslationMap. I think we should refactor that and remove a need of PlanNode for TranslationMap.

Also I think this code should belong SubqueriesPlanner. SubqueriesPlanner should do proper expression rewrites.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing you're right. It's just a dummy used to create a TranslationMap and not used in the resulting plan. It was inspired by RelationPlanner.visitJoin()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should refactor that and remove a need of PlanNode for TranslationMap.

Let's not make structural changes quite yet. I have work in progress to rewrite RelationPlanner and friends so we can split the AST from the IR entirely, and it touches all that you're talking about. @kasiafi's addition is fine, though -- I can adapt it easily since it's an incremental change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kasiafi, I'd rename this field to dummy to make it clear it's inconsequential and add a comment explaining that it's not actually used in the plan but just for seeding the TranslationMap with schema and scope.

BTW, it might be sufficient to use a Values node for this instead of a Join node. If I recall correctly, all that matters is that the symbols are in the right positions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, apparently ValuesNode works fine and even with empty rows. That looks much better as a dummy, thanks.

@@ -546,6 +556,52 @@ private RelationPlan planLateralJoin(Join join, RelationPlan leftPlan, Lateral l
return new RelationPlan(planBuilder.getRoot(), analysis.getScope(join), outputSymbols);
}

private TranslationMap filterExpressionTranslationMap(RelationPlan leftPlan, RelationPlan rightPlan, PlanBuilder leftPlanBuilder, PlanBuilder rightPlanBuilder, Join join)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much value in having this as a separate method. There's no reuse across various operations and it's tied to the specific implementation of planning join + lateral. I would inline it in the method that uses is and add some comments in the appropriate sections if there's any logical separation worth highlighting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be better to graphically separate it, because it constructs the JoinNode which is not a part of the resulting plan (as mentioned above). Inlining the construction could be misleading at the first view.
Surely inlining + comment would be OK.

return translationMap;
}

private Expression extractFilterExpression(Join joinNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I'd inline this where it's being used. It's very specific to the planning of that type of join (i.e., join + lateral).

@@ -61,10 +64,23 @@ public Result apply(LateralJoinNode lateralJoinNode, Captures captures, Context
decorrelatedNode.getNode(),
ImmutableList.of(),
lateralJoinNode.getOutputSymbols(),
decorrelatedNode.getCorrelatedPredicates(),
conjunctFilters(decorrelatedNode.getCorrelatedPredicates(), lateralJoinNode.getFilter()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ExpressionUtils.combineConjuncts():

ExpressionUtils.combineConjuncts(
    decorrelatedNode.getCorrelatedPredicates().orElse(TRUE_LITERAL), 
    lateralJoinNode.getFilter());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe decorrelatedNode.getCorrelatedPredicates() should return TRUE_LITERAL instead of Optional.empty(). I think having nested optional structure is not needed, because first we need to check if value is empty() and if it equal to TRUE_LITERAL.

@@ -201,7 +201,6 @@ public String getJoinLabel()

public static Type typeConvert(Join.Type joinType)
{
// Omit SEMI join types because they must be inferred by the planner and not part of the SQL parse tree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

@@ -59,9 +59,11 @@
import io.prestosql.sql.tree.InPredicate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could be hard, but can divide this commit to first introduce the expression into LateralJoinNode and then we could have a commit which would add RIGHT and FULL to LateralJoinType.

"VALUES (1, NULL), (2, 3), (2, 1), (3, 2), (3, 1), (NULL, 5)");
assertQuery(
"SELECT * FROM (VALUES 1, 2) a(x) JOIN LATERAL(SELECT y FROM (VALUES 2, 3) b(y) WHERE y > x) c(z) ON z > 2*x",
"VALUES (1, 3)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these tests to io.prestosql.sql.query.TestSubqueries

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests exactly do you mean?

.addAll(rightPlan.getFieldMappings())
.build();

PlanNode root = new JoinNode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a egg-chicken-egg problem :). We do the same for join. We create join node to rewrite expression to create join. As I understand it is required to create a TranslationMap, because it is needed by RelationPlan. However, PlanNode from RelationPlan is not used in TranslationMap. I think we should refactor that and remove a need of PlanNode for TranslationMap.

Also I think this code should belong SubqueriesPlanner. SubqueriesPlanner should do proper expression rewrites.

@@ -279,7 +281,7 @@ private PlanBuilder appendExistSubqueryApplyNode(PlanBuilder subPlan, ExistsPred

PlanNode subqueryPlanRoot = subqueryPlan.getRoot();
if (isAggregationWithEmptyGroupBy(subqueryPlanRoot)) {
subPlan.getTranslations().put(existsPredicate, BooleanLiteral.TRUE_LITERAL);
subPlan.getTranslations().put(existsPredicate, TRUE_LITERAL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separete commit

@@ -61,10 +64,23 @@ public Result apply(LateralJoinNode lateralJoinNode, Captures captures, Context
decorrelatedNode.getNode(),
ImmutableList.of(),
lateralJoinNode.getOutputSymbols(),
decorrelatedNode.getCorrelatedPredicates(),
conjunctFilters(decorrelatedNode.getCorrelatedPredicates(), lateralJoinNode.getFilter()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe decorrelatedNode.getCorrelatedPredicates() should return TRUE_LITERAL instead of Optional.empty(). I think having nested optional structure is not needed, because first we need to check if value is empty() and if it equal to TRUE_LITERAL.

return context.rewrite(node.getInput(), context.get());
if (intersection(ImmutableSet.copyOf(subquery.getOutputSymbols()), context.get()).isEmpty()) {
// remove unused lateral subquery of inner join
if (node.getType() == INNER && isScalar(subquery) && node.getFilter().equals(TRUE_LITERAL)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also check if subquery cardinality is 0, then we could optimize it to empty ValuesNode.

Also I think we should extract this to optimizer Rule and remove this code from here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be a separate change after this PR, though.

// remove unused input nodes
if (intersection(ImmutableSet.copyOf(input.getOutputSymbols()), expectedCorrelationAndContextSymbols).isEmpty()) {
// remove unused input of inner join
if (node.getType() == INNER && isScalar(input) && node.getFilter().equals(TRUE_LITERAL)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding test cases for that.

@@ -453,8 +453,9 @@ public PlanNode visitLateralJoin(LateralJoinNode node, RewriteContext<Void> cont
PlanNode source = context.rewrite(node.getInput());
PlanNode subquery = context.rewrite(node.getSubquery());
List<Symbol> canonicalCorrelation = canonicalizeAndDistinct(node.getCorrelation());
Expression canonicalFilter = canonicalize(node.getFilter());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline

@@ -63,6 +85,7 @@
*/
private final List<Symbol> correlation;
private final Type type;
private final Expression filter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maintain it as Optional like we maintain in JoinNode ?

I would prefer the opposite. To remove Optional from JoinNode. To me it is nested optional-like structure, two levels where you need to check for emptiness.

public static Property<LateralJoinNode, Lookup, Expression> filter()
{
return property("filter", LateralJoinNode::getFilter);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also add pattern for empty critieria (equal to TRUE_LITERAL), as it looks to be used quite frequently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show me some example I could follow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we have to create something like this

        public static Property<LateralJoinNode, Lookup, Boolean> isEmpty()
        {
            return property("isEmpty", lateralJoinNode -> lateralJoinNode.getFilter().equals(TRUE_LITERAL);
        }

@kasiafi kasiafi force-pushed the Lateral branch 2 times, most recently from 4fc5d46 to ba142db Compare March 30, 2019 00:18
.addAll(rightPlan.getFieldMappings())
.build();

PlanNode root = new JoinNode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kasiafi, I'd rename this field to dummy to make it clear it's inconsequential and add a comment explaining that it's not actually used in the plan but just for seeding the TranslationMap with schema and scope.

BTW, it might be sufficient to use a Values node for this instead of a Join node. If I recall correctly, all that matters is that the symbols are in the right positions.

}

public PlanBuilder appendLateralJoin(PlanBuilder subPlan, PlanBuilder subqueryPlan, Query query, boolean correlationAllowed, LateralJoinNode.Type type)
public PlanBuilder appendLateralJoin(PlanBuilder subPlan, PlanBuilder subqueryPlan, Query query, boolean correlationAllowed, LateralJoinNode.Type type, Expression filterCondition)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Ok, let's leave it for now. We can clean this up in the future for both LateralJoinNode and JoinNode

Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty()));
}

private Optional<Expression> joinFilter(Expression lateralJoinFilter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument could be renamed to just filter with no loss of clarity.

@@ -812,11 +817,25 @@ public PlanNode visitAssignUniqueId(AssignUniqueId node, RewriteContext<Set<Symb
@Override
public PlanNode visitLateralJoin(LateralJoinNode node, RewriteContext<Set<Symbol>> context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, context is something more complex than just the dependency symbols. It contains the dependency symbols, but it's truly a context object for the rewrite action.

return context.rewrite(node.getInput(), context.get());
if (intersection(ImmutableSet.copyOf(subquery.getOutputSymbols()), context.get()).isEmpty()) {
// remove unused lateral subquery of inner join
if (node.getType() == INNER && isScalar(subquery) && node.getFilter().equals(TRUE_LITERAL)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be a separate change after this PR, though.

@kasiafi
Copy link
Member Author

kasiafi commented Apr 1, 2019

@martint I applied your comments. Is there anything else to be done?

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. There's some cleanup do later for code this PR touched (move tests out of AbstractTestQueries, migrate out of PruneUnreferencedOutputs, etc), but it's pre-existing and unrelated to the core of this change.

@martint martint merged commit 79a70d1 into trinodb:master Apr 1, 2019
@martint
Copy link
Member

martint commented Apr 1, 2019

Merged. Thanks for your contribution, @kasiafi!

@martint martint mentioned this pull request Apr 1, 2019
6 tasks
@electrum electrum added this to the 307 milestone Apr 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants