-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: implement apply join #34546
sql: implement apply join #34546
Conversation
f99d38c
to
43ab74c
Compare
This is ready for a review, though still not ready to merge. The main thing that's missing at this point is that I need way more test cases, though I'm not sure the best way to generate them. |
Maybe we want to add some testing knobs to disable opt rules that remove ApplyJoins. In addition, we could add support for the LATERAL syntax. |
I thought LATERAL was a noise word. How would we plan differently in the presence of LATERAL? |
LATERAL allows you to write queries where the second relation refers to columns of the first, e.g.: SELECT * FROM ab JOIN LATERAL (SELECT x, y + b FROM xy) AS t ON a = x; We don't support |
(also see #24560) |
a906cba
to
6abc840
Compare
I've rebased on top of #35171 to get a fix for incorrect outer column computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @asubiotto, @jordanlewis, and @RaduBerinde)
pkg/sql/apply_join.go, line 33 at r1 (raw file):
// applyJoinNode implements apply join: the execution component of correlated // subqueries. The node reads rows from the left planDataSource, and for each
[nit] correlated subqueries (which couldn't be decorrelated by the optimizer's transformations)
pkg/sql/apply_join.go, line 61 at r1 (raw file):
rightCols sqlbase.ResultColumns rightProps *physical.Required
rightProps could use a comment
pkg/sql/apply_join.go, line 64 at r1 (raw file):
right memo.RelExpr run struct {
Nice comments for the fields here!
pkg/sql/apply_join.go, line 150 at r1 (raw file):
if predMatched { a.run.leftRowFoundAMatch = true if a.joinType == sqlbase.JoinType_LEFT_ANTI {
[nit] anti and semi could be handled more similarly - the logic to emit the left row is inside different loops. We could just a.run.rightRows.Clear()
and break
for both here, and move the semi emitting logic below
pkg/sql/apply_join.go, line 160 at r1 (raw file):
a.run.rightRows.Clear(params.ctx) return true, nil }
[nit] inconsistent use of else
(we're using it above but not here).
pkg/sql/apply_join.go, line 166 at r1 (raw file):
} // Didn't match? Try with the next right-side row. continue
this unnecessary continue
is a little odd. I'd remove it and keep the comment, or move it to a if !predMatched { }
above
pkg/sql/apply_join.go, line 210 at r1 (raw file):
}) a.optimizer.Init(params.p.EvalContext())
[nit] Add some explanation about what's happening here at a high level - we're setting up an optimizer to replan the right-side
pkg/sql/apply_join.go, line 228 at r1 (raw file):
return false, err } plan := p.(*planTop)
What happens when the right side has subqueries?
And in particular - what happens when there is a subquery that depends on an outer column which becomes a regular subquery after this re-optimization?
Would be good to have tests like this.
pkg/sql/apply_join.go, line 230 at r1 (raw file):
plan := p.(*planTop) if err := func() error {
Can this be a separate method? (Next is pretty large)
pkg/sql/apply_join.go, line 268 at r1 (raw file):
// We've got fresh right rows. Continue along in the loop, which will deal // with joining the right plan's output with our left row. continue
[nit] unnecessary continue
pkg/sql/apply_join.go, line 276 at r1 (raw file):
} func (a *applyJoinNode) Close(ctx context.Context) {
Do we need to free the rightRows
container?
pkg/sql/opt/exec/factory.go, line 113 at r1 (raw file):
// // fakeRight is a pre-planned node that is the right side of the join with // all outer columns replaced by NULL.
Maybe mention what it's used for (logical/physical properties etc)
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 521 at r1 (raw file):
ReplaceVars(&f, rightExpr, rightExpr.RequiredPhysical(), fakeBindings) // Create a fake version of the right-side plan that contains NULL for all
[nit] move this comment above the fakeBindings
line
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 553 at r1 (raw file):
var leftBoundColMap opt.ColMap var bindErr error leftBoundCols.ForEach(func(k int) {
[nit] you can do for i, ok := leftBoundCols.Next(0); ok; i, ok = leftBoundCols.Next(i+1)
and then you can just return the error directly
This commit adds the implementation of the apply join operator, which is built by the optimizer in some cases for correlated subqueries. Apply join is like an ordinary join operator in that it joins the output of 2 other operators. The special thing about apply join is that its right side operator has outer columns (columns that don't yet have values), and at runtime, the values of each row from the left operator are used to fill in the outer columns of the right side operator. Once the outer columns of the right side operator are filled, apply join recursively re-plans the right side operator and runs it, joining the resultant rows with the left side row that was used to fill the outer columns. Apply join isn't distributable because it requires passing around optimizer memo and plan data structures, which are not marshallable to bytes that could be distributed at this time. Release note (sql change): many more correlated subqueries are now runnable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @asubiotto, @jordanlewis, and @RaduBerinde)
pkg/sql/apply_join.go, line 210 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] Add some explanation about what's happening here at a high level - we're setting up an optimizer to replan the right-side
Good idea, done.
pkg/sql/apply_join.go, line 228 at r1 (raw file):
Previously, RaduBerinde wrote…
What happens when the right side has subqueries?
And in particular - what happens when there is a subquery that depends on an outer column which becomes a regular subquery after this re-optimization?
Would be good to have tests like this.
As discussed offline, I've added an error case if there are non-zero subqueries as this seems to be an exceptional case that we don't except to happen.
pkg/sql/apply_join.go, line 230 at r1 (raw file):
Previously, RaduBerinde wrote…
Can this be a separate method? (Next is pretty large)
Done.
pkg/sql/apply_join.go, line 268 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] unnecessary
continue
Done.
pkg/sql/apply_join.go, line 276 at r1 (raw file):
Previously, RaduBerinde wrote…
Do we need to free the
rightRows
container?
That's done right here, isn't it? .Close frees it.
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 553 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] you can do
for i, ok := leftBoundCols.Next(0); ok; i, ok = leftBoundCols.Next(i+1)
and then you can just return the error directly
Good idea, done.
bors r+ |
Build failed |
TestVet OOM'd... bors r+ |
34546: sql: implement apply join r=jordanlewis a=jordanlewis This commit adds the implementation of the apply join operator, which is built by the optimizer in some cases for correlated subqueries. Apply join is like an ordinary join operator in that it joins the output of 2 other operators. The special thing about apply join is that its right side operator has outer columns (columns that don't yet have values), and at runtime, the values of each row from the left operator are used to fill in the outer columns of the right side operator. Once the outer columns of the right side operator are filled, apply join recursively re-plans the right side operator and runs it, joining the resultant rows with the left side row that was used to fill the outer columns. Apply join isn't distributable because it requires passing around optimizer memo and plan data structures, which are not marshallable to bytes that could be distributed at this time. Closes #32786. Closes #25678. Release note (sql change): many more correlated subqueries are now runnable. Co-authored-by: Jordan Lewis <[email protected]>
Build succeeded |
This commit adds the implementation of the apply join operator, which is
built by the optimizer in some cases for correlated subqueries.
Apply join is like an ordinary join operator in that it joins the output
of 2 other operators. The special thing about apply join is that its
right side operator has outer columns (columns that don't yet have
values), and at runtime, the values of each row from the left operator
are used to fill in the outer columns of the right side operator. Once
the outer columns of the right side operator are filled, apply join
recursively re-plans the right side operator and runs it, joining the
resultant rows with the left side row that was used to fill the outer
columns.
Apply join isn't distributable because it requires passing around
optimizer memo and plan data structures, which are not marshallable to
bytes that could be distributed at this time.
Closes #32786.
Closes #25678.
Release note (sql change): many more correlated subqueries are now
runnable.