Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support range based lookup join spans #66002

Merged
merged 2 commits into from
Jul 20, 2021

Conversation

cucaroach
Copy link
Contributor

@cucaroach cucaroach commented Jun 2, 2021

sql: support range based lookup join spans

Informs #51576

If filters exist on a lookup join that match columns that we are doing
the lookup against add them to the lookupExpr in the join reader spec
and build those filters into the multispan generator.

If we have inequality conditions we need to be able to lookup of the
prefix for keys found against range spans and not just point spans so
build a sorted slice of span+inputRowIndices we can binary search on.

Issue #51576 also encompasses allowing inequalities on columns from the
index to reference columns from the input, that will come in a later
commit.

Release note (sql change): Improve performance of lookup joins in some
cases. If join inequality conditions can be matched to index columns
include the conditions in the index lookup spans and remove them from
the runtime filters.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cucaroach cucaroach marked this pull request as draft June 3, 2021 15:58
@cucaroach cucaroach force-pushed the lookup-join-spans branch 11 times, most recently from f3b6117 to c06a335 Compare June 9, 2021 08:52
@cucaroach cucaroach changed the title sql: WIP on range based lookup join spans sql: support range based lookup join spans Jun 9, 2021
@cucaroach cucaroach force-pushed the lookup-join-spans branch from c06a335 to 825920c Compare June 9, 2021 11:09
@cucaroach cucaroach requested a review from rytaft June 9, 2021 11:53
@cucaroach cucaroach linked an issue Jun 9, 2021 that may be closed by this pull request
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!! I have a bunch of nits, but no major changes. I do agree that we need some benchmarking, though. Other than the place I pointed out in a comment below, most of the plan changes look fine to me.

nit: In the PR message, I wouldn't wrap the whole thing in a code block. It makes it harder to read and also means the auto-close won't work for issue #51576. (Although I don't think we actually want to close that issue since the other use case isn't addressed by this PR. You could say "Informs #51576" instead)

Reviewed 17 of 17 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach)


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1470 at r1 (raw file):

│   └── • error if rows
│       │
│       └── • lookup join (semi)

These new plans aren't as good as the old one. We might need to fix costing.


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 3 at r1 (raw file):

statement ok
CREATE TABLE metrics (
	id   SERIAL PRIMARY KEY,

nit: get rid of tabs here and below


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 41 at r1 (raw file):

 (2,'2020-01-01 00:00:01+00:00',4)

query ITIIT

would be good to add some tests for left/semi/anti joins too


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 80 at r1 (raw file):

2  2020-01-01 00:00:01 +0000 UTC  4  2  cpu

query ITIIT

add rowsort


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 90 at r1 (raw file):

----

query ITIIT

ditto


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 108 at r1 (raw file):

	time <= '2020-01-01 00:00:00+00:00' AND
	name='cpu'
ORDER BY value

this test looks like a duplicate (might want to remove the first case to keep the order here)


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 134 at r1 (raw file):

	time > '2020-01-01 00:00:00+00:00' AND
	name='cpu'
ORDER BY value

another dup


pkg/sql/opt/constraint/constraint_set.go, line 336 at r1 (raw file):

}

// HasRangeConstraint returns column id if it has at least one span.

This function name is a bit misleading. But I also think we might not need it at all -- I think it would be better for the caller of this function to first check if s.IsUnconstrained(), and if not, call s.Constraint(0).Columns.Get(0). I'm not sure we need another function.


pkg/sql/opt/optbuilder/testdata/lookup_join_spans, line 1 at r1 (raw file):

exec-ddl

these tests shouldn't live in optbuilder. optbuilder tests should be used to test functionality in the optbuilder package. These should probably live in xform.


pkg/sql/opt/optbuilder/testdata/lookup_join_spans, line 3 at r1 (raw file):

exec-ddl
CREATE TABLE metrics (
	id   SERIAL PRIMARY KEY,

nit: remove tabs here and below


pkg/sql/opt/optbuilder/testdata/lookup_join_spans, line 54 at r1 (raw file):

 │    └── constraint: /6/5: [/'cpu' - /'cpu']
 └── filters
      └── time:2::STRING LIKE '202%'

looks like this plan doesn't use the new feature -- is this a costing issue?

edit: I see below that LIKE isn't supported, but as I mention below, I think you could possibly rewrite this using > or >=


pkg/sql/opt/xform/join_funcs.go, line 260 at r1 (raw file):

		numIndexKeyCols := index.LaxKeyColumnCount()

		var lookupFilters memo.FiltersExpr

nit: why the name change? I think constFilters is a bit clearer, since it doesn't include the equality conditions.


pkg/sql/opt/xform/join_funcs.go, line 302 at r1 (raw file):

			var foundRange bool = false
			if !ok {
				// Allow a range condition to occur after any constant filters.

nit: this comment is a bit confusing, since I don't think the range condition has to occur after the constant filters. Just seems like you're checking if you didn't find any constant filters.


pkg/sql/opt/xform/join_funcs.go, line 310 at r1 (raw file):

			if len(foundVals) > 1 || foundRange {
				if joinType == opt.LeftJoinOp || joinType == opt.AntiJoinOp || joinType == opt.InnerJoinOp || foundRange {

why do you need to add the check for InnerJoinOp? I would remove that since it's causing some unnecessary plan changes


pkg/sql/opt/xform/join_funcs.go, line 597 at r1 (raw file):

		constFilters = append(constFilters, constFilter)

		//RFC: should we break if we found range?  Only makes sense on end right?

yea, this sounds right to me -- just add a comment to explain


pkg/sql/opt/xform/join_funcs.go, line 600 at r1 (raw file):

}

// isCanonicalConstFilter checks that the given filter is a constant filter in

Don't you still need this? e.g., the LIKE operator with a wildcard would need to be converted to >


pkg/sql/opt/xform/join_funcs.go, line 976 at r1 (raw file):

	for filterIdx := range filters {
		props := filters[filterIdx].ScalarProps()
		if props.TightConstraints {

nit: it's a bit more common in our codebase to use

if !props.TightConstraints {
  continue
}
...

pkg/sql/opt/xform/join_funcs.go, line 977 at r1 (raw file):

		props := filters[filterIdx].ScalarProps()
		if props.TightConstraints {
			constraintCol, ok := props.Constraints.HasRangeConstraint(c.e.evalCtx)

Based on the code in the execution engine, it looks like you're currently only supporting one span -- so you should probably check here that the constraint is not unconstrained and has exactly one span.


pkg/sql/opt/xform/join_funcs.go, line 981 at r1 (raw file):

				continue
			}
			// Make sure we support the expression type

I think this is ok, but it would be better long term to just build the expression from the constraint if the original expression doesn't use one of the supported operators.


pkg/sql/rowexec/joinreader_span_generator.go, line 143 at r1 (raw file):

}

type spanRowIndices []spanRowIndex

would be good to add a benchmark to make sure this isn't significantly worse than the old data structure


pkg/sql/rowexec/joinreader_span_generator.go, line 232 at r1 (raw file):

// multiSpanGeneratorInequalityColInfo represents a column that is bound by a
// range expression.	If there are <,>, >= or <= inequalities we distill them

nit: extra spaces before "If"


pkg/sql/rowexec/joinreader_span_generator.go, line 367 at r1 (raw file):

// The optimizer should have ensured that all conditions fall into one of
// these categories. Any other expression types will return an error.
// TODO: We should probably be doing this at compile time, see #65773

nit: TODOs should always have a name/GitHub user associated with them.


pkg/sql/rowexec/joinreader_span_generator.go, line 437 at r1 (raw file):

		// out later in the span builder.
		var inequalityInfo multiSpanGeneratorInequalityColInfo
		if lval, err := getInfo(t.Left.(tree.TypedExpr)); lval != nil {

nit: seems better to check first if err != nil


pkg/sql/rowexec/joinreader_span_generator.go, line 503 at r1 (raw file):

	}

	//build spans for each row

nit: add a space/capital B before and period after


pkg/sql/rowexec/joinreader_span_generator.go, line 530 at r1 (raw file):

// the given key.
func (g *multiSpanGenerator) findInputRowIndicesByKey(key roachpb.Key) []int {
	i, j := 0, len(g.spanToInputRowIndices)

can you use sort.Search here?


pkg/sql/rowexec/joinreader_span_generator.go, line 567 at r1 (raw file):

		}
	}
	// not found, add it

nit: capital letter + period


pkg/sql/rowexec/joinreader_span_generator.go, line 588 at r1 (raw file):

			if inputRowIndices == nil {
				// RFC: should doing range spans affects how/whether we should call this?
				// it seems to break when calling it with inequality spans

seems like it might... I'm not too familiar with the logic in MaybeSplitSpanIntoSeparateFamilies, but I'd be surprised if it didn't apply in the case of inequality spans


pkg/sql/rowexec/joinreader_span_generator.go, line 602 at r1 (raw file):

	if len(g.spanToInputRowIndices) > 1 {
		sort.Sort(g.spanToInputRowIndices)

wonder if it makes sense to sort as we go to avoid the linear scans above... I guess a benchmark would tell us


pkg/sql/span/span_builder.go, line 148 at r1 (raw file):

}

// SpanFromEncDatumsWithRange encodes a range span.  The inequality is assumed

nit: extra space


pkg/sql/span/span_builder.go, line 168 at r1 (raw file):

	}

	makeKeyFromRow := func(values rowenc.EncDatumRow, len int) (k roachpb.Key, cn bool, e error) {

nit: a bit confusing to use the keyword len as a variable name


pkg/sql/span/span_builder.go, line 185 at r1 (raw file):

	}
	if containsNull || err != nil {
		return roachpb.Span{}, containsNull, err

Even though the current use case will discard the key if containsNull is true, it seems like we should still calculate the full span to avoid future confusion

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @rytaft)


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row, line 1470 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

These new plans aren't as good as the old one. We might need to fix costing.

I'll investigate.


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 3 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: get rid of tabs here and below

convert to spaces or should I collapse to single line?


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 41 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

would be good to add some tests for left/semi/anti joins too

👍


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 80 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

add rowsort

rowsort doesn't work nicely with timestamps, I get an error about column count being wrong, apparently adding rowsort kicks in a " " based column parser?


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 108 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

this test looks like a duplicate (might want to remove the first case to keep the order here)

👍 I'll also make the metric_values followed by metric_valuesd test pattern uniform.


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 134 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

another dup

👍

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @rytaft)


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 3 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

convert to spaces or should I collapse to single line?

just 2 spaces is good


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 80 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

rowsort doesn't work nicely with timestamps, I get an error about column count being wrong, apparently adding rowsort kicks in a " " based column parser?

Hm that's strange. Well I guess it doesn't matter here since there are no rows output, so feel free to leave as-is. (Maybe you knew this already, but any time there is more than one row returned you always need either an ORDER BY or rowsort)

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @rytaft)


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 80 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Hm that's strange. Well I guess it doesn't matter here since there are no rows output, so feel free to leave as-is. (Maybe you knew this already, but any time there is more than one row returned you always need either an ORDER BY or rowsort)

So in general prefer rowsort but falling back to GROUP BY when dealing with timestamps is okay?


pkg/sql/opt/constraint/constraint_set.go, line 336 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This function name is a bit misleading. But I also think we might not need it at all -- I think it would be better for the caller of this function to first check if s.IsUnconstrained(), and if not, call s.Constraint(0).Columns.Get(0). I'm not sure we need another function.

Makes sense.


pkg/sql/opt/optbuilder/testdata/lookup_join_spans, line 1 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

these tests shouldn't live in optbuilder. optbuilder tests should be used to test functionality in the optbuilder package. These should probably live in xform.

Okay, I'll put it into xform/testdata/external


pkg/sql/opt/optbuilder/testdata/lookup_join_spans, line 3 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: remove tabs here and below

👍


pkg/sql/opt/optbuilder/testdata/lookup_join_spans, line 54 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

looks like this plan doesn't use the new feature -- is this a costing issue?

edit: I see below that LIKE isn't supported, but as I mention below, I think you could possibly rewrite this using > or >=

In my head this was just a negative test case to make sure I was properly rejecting plans that the joinreader doesn't support. Make sense?


pkg/sql/opt/xform/join_funcs.go, line 260 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: why the name change? I think constFilters is a bit clearer, since it doesn't include the equality conditions.

I associated "const' with constant values in my head so I thought this was an improvement but I guess inequalities against constants are also constFilters, I'll change it back.


pkg/sql/opt/xform/join_funcs.go, line 310 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

why do you need to add the check for InnerJoinOp? I would remove that since it's causing some unnecessary plan changes

I think that was a hack that predated foundRange I forgot to undo, nice catch!


pkg/sql/opt/xform/join_funcs.go, line 597 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

yea, this sounds right to me -- just add a comment to explain

👍


pkg/sql/opt/xform/join_funcs.go, line 600 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Don't you still need this? e.g., the LIKE operator with a wildcard would need to be converted to >

I'm sure its broken. I think I need help understanding what this code was accomplishing, what would an example be of an expression that wasn't an in or eq but had some const values here? I guess what this code is doing is cleaning up things so what we have to deal with at exec time is normalized? I no have no idea what that would mean for range expressions. I should have put an RFC here...


pkg/sql/opt/xform/join_funcs.go, line 981 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think this is ok, but it would be better long term to just build the expression from the constraint if the original expression doesn't use one of the supported operators.

Interesting, I hadn't realized that was even a possibility but it sounds great, I'll try it. Why not just always use the constraint? Are there advantages to using the original expression in some cases?


pkg/sql/rowexec/joinreader_span_generator.go, line 143 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

would be good to add a benchmark to make sure this isn't significantly worse than the old data structure

yeah I'm gonna put this on ice and add some benchmarks as a pre-cursor change.


pkg/sql/rowexec/joinreader_span_generator.go, line 367 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: TODOs should always have a name/GitHub user associated with them.

oof, I didn't expect to have to type cucaroach into the code! I'll just use tpr.


pkg/sql/rowexec/joinreader_span_generator.go, line 530 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

can you use sort.Search here?

I tried to but sort.Search has this requirement that the predicate return falses followed by trues and it zeros in on the false/true edge, I couldn't figure out how to get the higher spans that don't contain the key to return true.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @rytaft)


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 80 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

So in general prefer rowsort but falling back to GROUP BY when dealing with timestamps is okay?

Just depends on what you're trying to test. But yea, rowsort allows you to avoid complicating queries with an extra ORDER BY in most cases.


pkg/sql/opt/optbuilder/testdata/lookup_join_spans, line 1 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Okay, I'll put it into xform/testdata/external

xform/testdata/external is sort of designed for "external" workloads like benchmarks, customer-inspired workloads, etc. I'd prefer to put this in xform/testdata/rules, since it has to do with the GenerateLookupJoins rule. You could add it to the existing xform/testdata/rules/join test file in the GenerateLookupJoins section, or just add this new file -- both seem fine to me.


pkg/sql/opt/optbuilder/testdata/lookup_join_spans, line 54 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

In my head this was just a negative test case to make sure I was properly rejecting plans that the joinreader doesn't support. Make sense?

Oh ok -- in that case maybe just add a comment to explain (perhaps along with a TODO that this could be supported in the future if you don't want to add support in this PR).


pkg/sql/opt/xform/join_funcs.go, line 600 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

I'm sure its broken. I think I need help understanding what this code was accomplishing, what would an example be of an expression that wasn't an in or eq but had some const values here? I guess what this code is doing is cleaning up things so what we have to deal with at exec time is normalized? I no have no idea what that would mean for range expressions. I should have put an RFC here...

For example, SELECT ... WHERE x > 3 AND x < 5 has a tight equality constraint if x is an INT. The reason to have this function is that it basically allows us to use the intelligence of our constraint library to support many types of expressions, while only needing to support a handful in the execution engine.


pkg/sql/opt/xform/join_funcs.go, line 981 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Interesting, I hadn't realized that was even a possibility but it sounds great, I'll try it. Why not just always use the constraint? Are there advantages to using the original expression in some cases?

It could potentially save some computation or allocations, but probably not a very strong reason. This was the purpose of isCanonicalConstFilter.


pkg/sql/rowexec/joinreader_span_generator.go, line 367 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

oof, I didn't expect to have to type cucaroach into the code! I'll just use tpr.

It's partly so another engineer can figure out who to ask about it, so GitHub username might be preferable :)


pkg/sql/rowexec/joinreader_span_generator.go, line 530 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

I tried to but sort.Search has this requirement that the predicate return falses followed by trues and it zeros in on the false/true edge, I couldn't figure out how to get the higher spans that don't contain the key to return true.

Ok!

@cucaroach cucaroach force-pushed the lookup-join-spans branch from 825920c to edca4c4 Compare June 25, 2021 20:02
@cucaroach cucaroach requested a review from a team June 25, 2021 20:03
@cucaroach cucaroach force-pushed the lookup-join-spans branch 3 times, most recently from 577133a to 2a1bb92 Compare June 28, 2021 14:30
@rytaft
Copy link
Collaborator

rytaft commented Jun 28, 2021

Is this ready for review? Or should I hold off?

@cucaroach
Copy link
Contributor Author

Hold off, I'll ping you when its ready. A performance comparison revealed I have some work todo...

@cucaroach cucaroach force-pushed the lookup-join-spans branch from c5687ae to a366ec5 Compare June 29, 2021 16:10
@cucaroach cucaroach force-pushed the lookup-join-spans branch 3 times, most recently from fc60035 to ac68e8e Compare July 15, 2021 13:05
Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 379 at r18 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think this still isn't really testing what we want. time > '2020-01-01 00:00:00+00:00' eliminates 2 out of the 3 NULL rows in metric_values. name='cpu' followed by the join condition metric_id=id eliminates the last NULL row, leaving none for v.nullable < -10 to eliminate. I would change or remove one of those other conditions (or add some more NULL rows).

Okay I think I got this right but my brain's SQL co-processor isn't as evolved as yours! One thing I'm confused about it is are we concerned about NULL values coming out properly as just values retrieved by the query or are we worried about nulls showing up in lookup span substitution slots and correctly being discarded? I THINK we have adequate coverage for both now but just wanted to be sure I understand your thinking. Thanks again!

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r19.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @mgartner, and @rytaft)


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 379 at r18 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Okay I think I got this right but my brain's SQL co-processor isn't as evolved as yours! One thing I'm confused about it is are we concerned about NULL values coming out properly as just values retrieved by the query or are we worried about nulls showing up in lookup span substitution slots and correctly being discarded? I THINK we have adequate coverage for both now but just wanted to be sure I understand your thinking. Thanks again!

Excellent! This shows that there is indeed a bug :)

Taking a look at this test case:

# Test NULL values in <= unbounded lookup span.
query ITIIIIT
SELECT *
FROM metric_values as v
INNER JOIN metrics as m
ON metric_id=id
WHERE
  v.nullable <= -10 AND
  name='cpu'
ORDER BY value
----
1  2020-01-01 00:00:00 +0000 UTC  NULL  0  1  NULL  cpu
2  2020-01-01 00:00:00 +0000 UTC  NULL  2  2  1     cpu
2  2020-01-01 00:01:01 +0000 UTC  -11   4  2  1     cpu
2  2020-01-01 00:01:02 +0000 UTC  -10   5  2  1     cpu

v.nullable <= -10 is a null-rejecting filter, meaning that there should be no output rows with v.nullable = NULL. But the output includes two rows with v.nullable = NULL, so we're doing something wrong.

The reason I expected this test case to show a bug if it existed is because in CockroachDB, NULLs sort before all other values in every type. So in order to exclude nulls for this filter, you'd actually need to construct a span like this: (/NULL - /-10] to exclude NULL values. What you're probably doing is constructing a span like this: [ - /-10]. I think you'll need to update multiSpanGenerator.generateNonNullSpans to fix this.

Sorry if I didn't articulate that well before... let me know if anything is still unclear.


pkg/sql/opt/exec/execbuilder/testdata/lookup_join_spans, line 70 at r17 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: this histogram doesn't make sense since it has 1000 distinct values, but the overall distinct count is 10. I'd just change distinct_range to 10 (I don't think it will change the tests below).

Not sure if you meant to resolve this but doesn't look like you made the update here or the other file (as I said I don't think it will change anything, just avoid confusion for future readers of this file)

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @mgartner, and @rytaft)


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 379 at r18 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Excellent! This shows that there is indeed a bug :)

Taking a look at this test case:

# Test NULL values in <= unbounded lookup span.
query ITIIIIT
SELECT *
FROM metric_values as v
INNER JOIN metrics as m
ON metric_id=id
WHERE
  v.nullable <= -10 AND
  name='cpu'
ORDER BY value
----
1  2020-01-01 00:00:00 +0000 UTC  NULL  0  1  NULL  cpu
2  2020-01-01 00:00:00 +0000 UTC  NULL  2  2  1     cpu
2  2020-01-01 00:01:01 +0000 UTC  -11   4  2  1     cpu
2  2020-01-01 00:01:02 +0000 UTC  -10   5  2  1     cpu

v.nullable <= -10 is a null-rejecting filter, meaning that there should be no output rows with v.nullable = NULL. But the output includes two rows with v.nullable = NULL, so we're doing something wrong.

The reason I expected this test case to show a bug if it existed is because in CockroachDB, NULLs sort before all other values in every type. So in order to exclude nulls for this filter, you'd actually need to construct a span like this: (/NULL - /-10] to exclude NULL values. What you're probably doing is constructing a span like this: [ - /-10]. I think you'll need to update multiSpanGenerator.generateNonNullSpans to fix this.

Sorry if I didn't articulate that well before... let me know if anything is still unclear.

No kidding, I saw that and said to myself, oh I guess null's sort before everything and briefly considered testing in postgresql. I should probably include a similar query that doesn't use lookup exprs. Actually what I should do is pull this lookup_join_spans logictest into a separate commit before my changes to make sure there are no more cases like this. Nice find!

@rytaft
Copy link
Collaborator

rytaft commented Jul 15, 2021


pkg/sql/logictest/testdata/logic_test/lookup_join_spans, line 379 at r18 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

No kidding, I saw that and said to myself, oh I guess null's sort before everything and briefly considered testing in postgresql. I should probably include a similar query that doesn't use lookup exprs. Actually what I should do is pull this lookup_join_spans logictest into a separate commit before my changes to make sure there are no more cases like this. Nice find!

Sounds good!

@cucaroach cucaroach force-pushed the lookup-join-spans branch from ac68e8e to da839a4 Compare July 15, 2021 16:07
Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @rytaft, and @yuzefovich)


pkg/sql/opt/exec/execbuilder/testdata/lookup_join_spans, line 70 at r17 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Not sure if you meant to resolve this but doesn't look like you made the update here or the other file (as I said I don't think it will change anything, just avoid confusion for future readers of this file)

The two commits attached the PR have distinct count set to 10, maybe reviewable has lost the ball? I'm still frequently confused by reviewable. I've been throwing some serious curveballs at it with this PR!

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @rytaft, and @yuzefovich)


pkg/sql/span/span_builder.go, line 185 at r21 (raw file):

		startKey, startContainsNull, err = makeKeyFromRow(values, prefixLen-1)
		startKey = encoding.EncodeNullAscending(startKey)
		startKey = startKey.Next()

These two lines are the fix for the NULL boundary issue, I can't get reviewable to show me the diff of just this change but its literally:

diff --git a/pkg/sql/span/span_builder.go b/pkg/sql/span/span_builder.go
index d9522c9154..96f9100034 100644
--- a/pkg/sql/span/span_builder.go
+++ b/pkg/sql/span/span_builder.go
@@ -181,6 +181,8 @@ func (s *Builder) SpanFromEncDatumsWithRange(
                }
        } else {
                startKey, startContainsNull, err = makeKeyFromRow(values, prefixLen-1)
+               startKey = encoding.EncodeNullAscending(startKey)
+               startKey = startKey.Next()
        }

        if err != nil {

@cucaroach cucaroach force-pushed the lookup-join-spans branch from da839a4 to 2943ec0 Compare July 16, 2021 16:18
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!! I think this is basically ready to be merged, but I do think we need one final test for descending index columns. So maybe add one additional logictest/execbuilder test where you delete the secondary index on nullable and then re-add it with the nullable column descending. The queries with inequalities on nullable should still return the same result.

Reviewed 24 of 24 files at r20, 23 of 23 files at r21.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach, @mgartner, and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/lookup_join_spans, line 70 at r17 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

The two commits attached the PR have distinct count set to 10, maybe reviewable has lost the ball? I'm still frequently confused by reviewable. I've been throwing some serious curveballs at it with this PR!

I see it now - sorry if I missed it before


pkg/sql/span/span_builder.go, line 185 at r21 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

These two lines are the fix for the NULL boundary issue, I can't get reviewable to show me the diff of just this change but its literally:

diff --git a/pkg/sql/span/span_builder.go b/pkg/sql/span/span_builder.go
index d9522c9154..96f9100034 100644
--- a/pkg/sql/span/span_builder.go
+++ b/pkg/sql/span/span_builder.go
@@ -181,6 +181,8 @@ func (s *Builder) SpanFromEncDatumsWithRange(
                }
        } else {
                startKey, startContainsNull, err = makeKeyFromRow(values, prefixLen-1)
+               startKey = encoding.EncodeNullAscending(startKey)
+               startKey = startKey.Next()
        }

        if err != nil {

I can see it comparing r19 to r21

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I added such tests and they fail! I suspect maybe you already knew that would happen, I have more to learn about span construction apparently...

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)

@cucaroach cucaroach force-pushed the lookup-join-spans branch 2 times, most recently from f422635 to 45b0db9 Compare July 20, 2021 01:01
Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: thanks for working through all these edge cases! This turned out to be quite a difficult problem to solve!

Reviewed 1 of 24 files at r22, 23 of 23 files at r23.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)

@rytaft
Copy link
Collaborator

rytaft commented Jul 20, 2021


pkg/sql/span/span_builder.go, line 184 at r23 (raw file):

	} else {
		startKey, startContainsNull, err = makeKeyFromRow(values, prefixLen-1)
		startKey = encoding.EncodeNullAscending(startKey)

do we need to check the direction here too?

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


pkg/sql/span/span_builder.go, line 184 at r23 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

do we need to check the direction here too?

Good question, I don't know. I'll poke around at callers of EncodeNullAscending and EncodeNullDescending and see if I can come up with something definitive to say.

Informs cockroachdb#51576

If filters exist on a lookup join that match columns that we are doing
the lookup against add them to the lookupExpr in the join reader spec
and build those filters into the multispan generator.

If we have inequality conditions we need to be able to lookup of the
prefix for keys found against range spans and not just point spans so
build a sorted slice of span+inputRowIndices we can binary search on.

Issue cockroachdb#51576 also encompasses allowing inequalities on columns from the
index to reference columns from the input, that will come in a later
commit.

Release note (sql change): Improve performance of lookup joins in some
cases. If join inequality conditions can be matched to index columns
include the conditions in the index lookup spans and remove them from
the runtime filters.
@cucaroach cucaroach force-pushed the lookup-join-spans branch from 45b0db9 to 649d66b Compare July 20, 2021 17:47
Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


pkg/sql/span/span_builder.go, line 184 at r23 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Good question, I don't know. I'll poke around at callers of EncodeNullAscending and EncodeNullDescending and see if I can come up with something definitive to say.

Well I can definitely say I'm not sure what the correct thing to do here is ;-). But for symmetry I elided the null encoding when the index is descending and all the tests still pass so hopefully that passes muster.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r24.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


pkg/sql/span/span_builder.go, line 184 at r23 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Well I can definitely say I'm not sure what the correct thing to do here is ;-). But for symmetry I elided the null encoding when the index is descending and all the tests still pass so hopefully that passes muster.

SGTM

@cucaroach
Copy link
Contributor Author

Well here goes nothing, thanks for the many reviews!

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 20, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: enable lookup joins to lookup spans, not just individual keys
5 participants