Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: implement apply join #34546

Merged
merged 1 commit into from
Mar 1, 2019
Merged

sql: implement apply join #34546

merged 1 commit into from
Mar 1, 2019

Conversation

jordanlewis
Copy link
Member

@jordanlewis jordanlewis commented Feb 4, 2019

This commit adds the implementation of the apply join operator, which is
built by the optimizer in some cases for correlated subqueries.

Apply join is like an ordinary join operator in that it joins the output
of 2 other operators. The special thing about apply join is that its
right side operator has outer columns (columns that don't yet have
values), and at runtime, the values of each row from the left operator
are used to fill in the outer columns of the right side operator. Once
the outer columns of the right side operator are filled, apply join
recursively re-plans the right side operator and runs it, joining the
resultant rows with the left side row that was used to fill the outer
columns.

Apply join isn't distributable because it requires passing around
optimizer memo and plan data structures, which are not marshallable to
bytes that could be distributed at this time.

Closes #32786.
Closes #25678.

Release note (sql change): many more correlated subqueries are now
runnable.

@jordanlewis jordanlewis requested review from andy-kimball and a team February 4, 2019 21:46
@jordanlewis jordanlewis requested a review from a team as a code owner February 4, 2019 21:46
@jordanlewis jordanlewis requested a review from a team February 4, 2019 21:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis jordanlewis requested a review from a team February 12, 2019 17:30
@jordanlewis jordanlewis force-pushed the aj branch 6 times, most recently from f99d38c to 43ab74c Compare February 17, 2019 21:54
@jordanlewis jordanlewis changed the title sql: wip: apply join sql: implement apply join Feb 17, 2019
@jordanlewis
Copy link
Member Author

This is ready for a review, though still not ready to merge.

The main thing that's missing at this point is that I need way more test cases, though I'm not sure the best way to generate them.

@RaduBerinde
Copy link
Member

Maybe we want to add some testing knobs to disable opt rules that remove ApplyJoins. In addition, we could add support for the LATERAL syntax.

@jordanlewis
Copy link
Member Author

I thought LATERAL was a noise word. How would we plan differently in the presence of LATERAL?

@RaduBerinde
Copy link
Member

LATERAL allows you to write queries where the second relation refers to columns of the first, e.g.:

SELECT * FROM ab JOIN LATERAL (SELECT x, y + b FROM xy) AS t ON a = x;

We don't support LATERAL and without it you get an error about column b. This would be built as a JoinApply rather than a Join. I believe it would be relatively easy to support.

@RaduBerinde
Copy link
Member

(also see #24560)

@jordanlewis jordanlewis force-pushed the aj branch 4 times, most recently from a906cba to 6abc840 Compare February 25, 2019 15:12
@jordanlewis
Copy link
Member Author

I've rebased on top of #35171 to get a fix for incorrect outer column computation.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Very nice work!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @asubiotto, @jordanlewis, and @RaduBerinde)


pkg/sql/apply_join.go, line 33 at r1 (raw file):

// applyJoinNode implements apply join: the execution component of correlated
// subqueries. The node reads rows from the left planDataSource, and for each

[nit] correlated subqueries (which couldn't be decorrelated by the optimizer's transformations)


pkg/sql/apply_join.go, line 61 at r1 (raw file):

	rightCols  sqlbase.ResultColumns
	rightProps *physical.Required

rightProps could use a comment


pkg/sql/apply_join.go, line 64 at r1 (raw file):

	right      memo.RelExpr

	run struct {

Nice comments for the fields here!


pkg/sql/apply_join.go, line 150 at r1 (raw file):

			if predMatched {
				a.run.leftRowFoundAMatch = true
				if a.joinType == sqlbase.JoinType_LEFT_ANTI {

[nit] anti and semi could be handled more similarly - the logic to emit the left row is inside different loops. We could just a.run.rightRows.Clear() and break for both here, and move the semi emitting logic below


pkg/sql/apply_join.go, line 160 at r1 (raw file):

					a.run.rightRows.Clear(params.ctx)
					return true, nil
				}

[nit] inconsistent use of else (we're using it above but not here).


pkg/sql/apply_join.go, line 166 at r1 (raw file):

			}
			// Didn't match? Try with the next right-side row.
			continue

this unnecessary continue is a little odd. I'd remove it and keep the comment, or move it to a if !predMatched { } above


pkg/sql/apply_join.go, line 210 at r1 (raw file):

		})

		a.optimizer.Init(params.p.EvalContext())

[nit] Add some explanation about what's happening here at a high level - we're setting up an optimizer to replan the right-side


pkg/sql/apply_join.go, line 228 at r1 (raw file):

			return false, err
		}
		plan := p.(*planTop)

What happens when the right side has subqueries?

And in particular - what happens when there is a subquery that depends on an outer column which becomes a regular subquery after this re-optimization?

Would be good to have tests like this.


pkg/sql/apply_join.go, line 230 at r1 (raw file):

		plan := p.(*planTop)

		if err := func() error {

Can this be a separate method? (Next is pretty large)


pkg/sql/apply_join.go, line 268 at r1 (raw file):

		// We've got fresh right rows. Continue along in the loop, which will deal
		// with joining the right plan's output with our left row.
		continue

[nit] unnecessary continue


pkg/sql/apply_join.go, line 276 at r1 (raw file):

}

func (a *applyJoinNode) Close(ctx context.Context) {

Do we need to free the rightRows container?


pkg/sql/opt/exec/factory.go, line 113 at r1 (raw file):

	//
	// fakeRight is a pre-planned node that is the right side of the join with
	// all outer columns replaced by NULL.

Maybe mention what it's used for (logical/physical properties etc)


pkg/sql/opt/exec/execbuilder/relational_builder.go, line 521 at r1 (raw file):

	ReplaceVars(&f, rightExpr, rightExpr.RequiredPhysical(), fakeBindings)

	// Create a fake version of the right-side plan that contains NULL for all

[nit] move this comment above the fakeBindings line


pkg/sql/opt/exec/execbuilder/relational_builder.go, line 553 at r1 (raw file):

	var leftBoundColMap opt.ColMap
	var bindErr error
	leftBoundCols.ForEach(func(k int) {

[nit] you can do for i, ok := leftBoundCols.Next(0); ok; i, ok = leftBoundCols.Next(i+1) and then you can just return the error directly

This commit adds the implementation of the apply join operator, which is
built by the optimizer in some cases for correlated subqueries.

Apply join is like an ordinary join operator in that it joins the output
of 2 other operators. The special thing about apply join is that its
right side operator has outer columns (columns that don't yet have
values), and at runtime, the values of each row from the left operator
are used to fill in the outer columns of the right side operator. Once
the outer columns of the right side operator are filled, apply join
recursively re-plans the right side operator and runs it, joining the
resultant rows with the left side row that was used to fill the outer
columns.

Apply join isn't distributable because it requires passing around
optimizer memo and plan data structures, which are not marshallable to
bytes that could be distributed at this time.

Release note (sql change): many more correlated subqueries are now
runnable.
Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @asubiotto, @jordanlewis, and @RaduBerinde)


pkg/sql/apply_join.go, line 210 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] Add some explanation about what's happening here at a high level - we're setting up an optimizer to replan the right-side

Good idea, done.


pkg/sql/apply_join.go, line 228 at r1 (raw file):

Previously, RaduBerinde wrote…

What happens when the right side has subqueries?

And in particular - what happens when there is a subquery that depends on an outer column which becomes a regular subquery after this re-optimization?

Would be good to have tests like this.

As discussed offline, I've added an error case if there are non-zero subqueries as this seems to be an exceptional case that we don't except to happen.


pkg/sql/apply_join.go, line 230 at r1 (raw file):

Previously, RaduBerinde wrote…

Can this be a separate method? (Next is pretty large)

Done.


pkg/sql/apply_join.go, line 268 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] unnecessary continue

Done.


pkg/sql/apply_join.go, line 276 at r1 (raw file):

Previously, RaduBerinde wrote…

Do we need to free the rightRows container?

That's done right here, isn't it? .Close frees it.


pkg/sql/opt/exec/execbuilder/relational_builder.go, line 553 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] you can do for i, ok := leftBoundCols.Next(0); ok; i, ok = leftBoundCols.Next(i+1) and then you can just return the error directly

Good idea, done.

@jordanlewis
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 1, 2019

Build failed

@jordanlewis
Copy link
Member Author

TestVet OOM'd...

bors r+

craig bot pushed a commit that referenced this pull request Mar 1, 2019
34546: sql: implement apply join r=jordanlewis a=jordanlewis

This commit adds the implementation of the apply join operator, which is
built by the optimizer in some cases for correlated subqueries.

Apply join is like an ordinary join operator in that it joins the output
of 2 other operators. The special thing about apply join is that its
right side operator has outer columns (columns that don't yet have
values), and at runtime, the values of each row from the left operator
are used to fill in the outer columns of the right side operator. Once
the outer columns of the right side operator are filled, apply join
recursively re-plans the right side operator and runs it, joining the
resultant rows with the left side row that was used to fill the outer
columns.

Apply join isn't distributable because it requires passing around
optimizer memo and plan data structures, which are not marshallable to
bytes that could be distributed at this time.

Closes #32786.
Closes #25678.

Release note (sql change): many more correlated subqueries are now
runnable.

Co-authored-by: Jordan Lewis <[email protected]>
@craig
Copy link
Contributor

craig bot commented Mar 1, 2019

Build succeeded

@craig craig bot merged commit ca05993 into cockroachdb:master Mar 1, 2019
@jordanlewis jordanlewis deleted the aj branch March 15, 2019 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants