-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: index accelerate && (overlaps) expressions for array inverted indexes #76212
Conversation
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
5299bf4
to
f21d1a5
Compare
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, @RajivTS! This is looking great. A few comments below.
Reviewed 7 of 8 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RajivTS)
-- commits, line 12 at r2:
nit: Mention that this is specifically array expressions
pkg/sql/opt/invertedidx/json_array.go, line 366 at r2 (raw file):
} case *memo.OverlapsExpr: invertedExpr = getInvertedExprForArrayIndexForOverlaps(evalCtx, memo.ExtractConstDatum(t.Right))
I'm surprised you don't need an extract function similar to the above. Are we doing extra work if t.Left
is not actually an index column? Is it possible that ExtractConstDatum
will panic if it cannot extract a constant from t.Right
? It would be good to make sure we have test cases that exercise this.
pkg/sql/rowenc/index_encoding.go, line 866 at r2 (raw file):
// inKey is prefixed to all returned keys. // // Returns unique=true if the spans are guaranteed not to produce
I see that you copied this from above, but it looks like this comment has gotten stale so maybe you can fix it in both places?
We no longer return a value for unique
, instead we set Unique
appropriately on the returned inverted.SpanExpression
. Feel free to just remove this comment, and/or maybe add a comment where we set Unique=true
below. (A previous version of this code had a separate return value for unique
)
pkg/sql/rowenc/index_encoding.go, line 872 at r2 (raw file):
) (invertedExpr inverted.Expression, err error) { emptyArrSpanExpr := inverted.ExprForSpan( inverted.MakeSingleValSpan(encoding.EncodeEmptyArray(inKey)), false, /* tight */
do we need this for overlaps? ARRAY[]::int[] && ARRAY[]::int[];
is false
pkg/sql/rowenc/index_encoding.go, line 890 at r2 (raw file):
for _, key := range keys { spanExpr := inverted.ExprForSpan( inverted.MakeSingleValSpan(key), false, /* tight */
The comment above on line 713 said that the span expression will be tight, but it looks like that may not be the case? Do you think this should be true?
pkg/sql/sem/tree/eval.go, line 434 at r2 (raw file):
} // ArrayOverlaps return true if there is even one element common between the left and right arrays.
nit: comments should be <= 80 columns wide
pkg/sql/sem/tree/eval.go, line 435 at r2 (raw file):
// ArrayOverlaps return true if there is even one element common between the left and right arrays. func ArrayOverlaps(ctx *EvalContext, left, right *DArray) (*DBool, error) {
You should make sure that this is the same function we are actually using to evaluate &&
. I think you'll want to extract the code from here for this helper function, and call it from both places:
cockroach/pkg/sql/sem/tree/eval.go
Lines 2598 to 2616 in 63988a4
Fn: func(ctx *EvalContext, left Datum, right Datum) (Datum, error) { | |
array := MustBeDArray(left) | |
other := MustBeDArray(right) | |
if !array.ParamTyp.Equivalent(other.ParamTyp) { | |
return nil, pgerror.New(pgcode.DatatypeMismatch, "cannot compare arrays with different element types") | |
} | |
for _, needle := range array.Array { | |
// Nulls don't compare to each other in && syntax. | |
if needle == DNull { | |
continue | |
} | |
for _, hay := range other.Array { | |
if needle.Compare(ctx, hay) == 0 { | |
return DBoolTrue, nil | |
} | |
} | |
} | |
return DBoolFalse, nil | |
}, |
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1572 at r2 (raw file):
---- {rat} {rat,NULL,""}
nit: remove extra space (I think if you rewrite this file as follows it should remove the space):
make test PKG=./pkg/sql/logictest/... TESTS='TestLogic//inverted_index' TESTFLAGS='-rewrite=true'
pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 1446 at r2 (raw file):
│ columns: (a, b_inverted_key) │ inverted column: b_inverted_key │ num spans: 2
the formatting looks weird -- you might try to run this to fix it:
make test PKG=./pkg/sql/opt/exec/execbuilder... TESTS='TestExecBuild//inverted_index' TESTFLAGS='-rewrite=true'
pkg/sql/opt/xform/testdata/rules/select, line 9398 at r2 (raw file):
└── const-agg [as=b:5, outer=(5)] └── b:5
nit: remove this
pkg/sql/rowenc/index_encoding_test.go, line 749 at r2 (raw file):
// to determine if the spans produced from the second Array value will // correctly overlap or be distinct from the first value, indicated by // overlapsKeys. Then, if indexedValue && value, expected is true.
I don't see overlapsKeys
pkg/sql/rowenc/index_encoding_test.go, line 754 at r2 (raw file):
// unless the value produces a single empty array span. // First we test that the spans will include expected value.
Nice tests!
pkg/sql/rowenc/index_encoding_test.go, line 808 at r2 (raw file):
} // Array spans for && are never tight.
why not? (as I wrote elsewhere, I think you might be able to make the spans tight)
pkg/sql/rowenc/index_encoding_test.go, line 813 at r2 (raw file):
} // Array spans for && are never unique unless the result is a singular empty array span.
nit: comments <= 80 columns wide
pkg/sql/rowenc/index_encoding_test.go, line 845 at r2 (raw file):
// Generate two random arrays and evaluate the result of `left && right`. left := randgen.RandArray(rng, typ, 9 /* nullChance */) right := randgen.RandArray(rng, typ, 9 /* nullChance */)
why are you using 9 here for nullChance? (maybe just add a comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! I left a few comments as well.
Reviewed 4 of 8 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RajivTS and @rytaft)
pkg/sql/rowenc/index_encoding.go, line 872 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
do we need this for overlaps?
ARRAY[]::int[] && ARRAY[]::int[];
is false
Hmmm, this is an interesting case. So what should we do when val
is an empty array, generate no spans? I don't know if there's precedent for scans over an empty range... maybe this can't be index accelerated?
Maybe we can add a normalization rule in a follow-up PR for this case. We have to be careful though. We can't simply normalize a && ARRAY[]::INT[]
to false
, because if a
is NULL
, it evaluates to NULL
. But I think we could normalize a filter with a && ARRAY[]::INT[]
to false
.
pkg/sql/rowenc/index_encoding.go, line 887 at r2 (raw file):
return nil, err } invertedExpr = emptyArrSpanExpr
Why do we include the empty array span in this case? It seems like it shouldn't be needed given that an empty array doesn't overlap with any other array.
pkg/sql/rowenc/index_encoding.go, line 890 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
The comment above on line 713 said that the span expression will be tight, but it looks like that may not be the case? Do you think this should be true?
@RajivTS FYI there's some explanation of tightness here:
cockroach/pkg/sql/inverted/expression.go
Lines 280 to 282 in a8fc111
// IsTight returns whether the inverted expression is tight, i.e., will the | |
// original expression not need to be reevaluated on each row output by the | |
// query evaluation over the inverted index. |
You can see examples of tight and non-tight expressions here:
testCases := []struct { |
This is a good example of a non-tight expression:
cockroach/pkg/sql/opt/invertedidx/json_array_test.go
Lines 266 to 273 in b2c04be
{ | |
filters: `j <@ '{"a": 1}'`, | |
indexOrd: jsonOrd, | |
ok: true, | |
tight: false, | |
unique: false, | |
remainingFilters: `j <@ '{"a": 1}'`, | |
}, |
We can scan the inverted span for '{"a": 1}'
, to get rows that may match, but we need to re-evaluate the expression to ensure that the JSON columns don't contain other values that aren't contained by '{"a": 1}'
.
I think these overlaps expressions should be tight.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1491 at r2 (raw file):
statement ok INSERT INTO cb VALUES (0, ARRAY[], ARRAY[]),
Can you add a row where numbers
and words
are NULL
?
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1504 at r2 (raw file):
SELECT numbers FROM cb@n WHERE numbers && ARRAY[]::INT[] ----
nit: remove extra lines after the empty test responses here and below
pkg/sql/opt/invertedidx/json_array_test.go, line 743 at r2 (raw file):
}, { filters: "a && '{1}' OR str && '{hello}'",
It'd be interesting to add expressions that combine &&
expressions with @>
and <@
with AND
s and OR
s.
pkg/sql/rowenc/index_encoding_test.go, line 752 at r2 (raw file):
// The expression is a union of spans, so unique should never be true // unless the value produces a single empty array span.
Is this true for overlaps? I think the spans for a single array overlap would be unique, like `a &&
pkg/sql/rowenc/index_encoding_test.go, line 784 at r2 (raw file):
{`{NULL}`, `{1, NULL}`, false, false}, {`{1,NULL}`, `{NULL}`, false, true}, {`{2, NULL}`, `{1, NULL}`, false, false},
Can you add test with NULL
arrays?
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1491 at r2 (raw file): Previously, mgartner (Marcus Gartner) wrote…
I think arrays of tuples would be interesting test cases too, like |
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1491 at r2 (raw file): Previously, mgartner (Marcus Gartner) wrote…
I don't think we support columns with arrays of tuples |
f21d1a5
to
79b434c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @RajivTS, and @rytaft)
pkg/sql/opt/invertedidx/json_array.go, line 366 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I'm surprised you don't need an extract function similar to the above. Are we doing extra work if
t.Left
is not actually an index column? Is it possible thatExtractConstDatum
will panic if it cannot extract a constant fromt.Right
? It would be good to make sure we have test cases that exercise this.
Good catch. On adding new tests, I can confirm that having non-constant array column on the right leads to panic. Observing the implementation of EqExpr
(which is also commutative), I believe that in case of both t.Left
and t.Right
being indexed columns we should simply return a NonInvertedColExpression
indicating that index acceleration is not possible in this case. What do you think?
pkg/sql/rowenc/index_encoding.go, line 866 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I see that you copied this from above, but it looks like this comment has gotten stale so maybe you can fix it in both places?
We no longer return a value for
unique
, instead we setUnique
appropriately on the returnedinverted.SpanExpression
. Feel free to just remove this comment, and/or maybe add a comment where we setUnique=true
below. (A previous version of this code had a separate return value forunique
)
Done.
pkg/sql/rowenc/index_encoding.go, line 872 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Hmmm, this is an interesting case. So what should we do when
val
is an empty array, generate no spans? I don't know if there's precedent for scans over an empty range... maybe this can't be index accelerated?Maybe we can add a normalization rule in a follow-up PR for this case. We have to be careful though. We can't simply normalize
a && ARRAY[]::INT[]
tofalse
, because ifa
isNULL
, it evaluates toNULL
. But I think we could normalize a filter witha && ARRAY[]::INT[]
tofalse
.
I have kept this as is (except making tight=true for overlap) since I don't see a consensus on the expected behavior here. Please let me know if changes are needed.
pkg/sql/rowenc/index_encoding.go, line 887 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Why do we include the empty array span in this case? It seems like it shouldn't be needed given that an empty array doesn't overlap with any other array.
Yes strictly speaking its not needed and hence I have updated the code here. It did help in the scenario when the input array is non-empty but contains only NULL
values which then get filtered out leaving us again with an empty array (and 0 keys). invertedExpr = emptyArrSpanExpr
makes sure an empty span expression gets returned in this case. It also helps prevent an explicit nil
check within the loop.
pkg/sql/rowenc/index_encoding.go, line 890 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
@RajivTS FYI there's some explanation of tightness here:
cockroach/pkg/sql/inverted/expression.go
Lines 280 to 282 in a8fc111
// IsTight returns whether the inverted expression is tight, i.e., will the // original expression not need to be reevaluated on each row output by the // query evaluation over the inverted index. You can see examples of tight and non-tight expressions here:
testCases := []struct { This is a good example of a non-tight expression:
cockroach/pkg/sql/opt/invertedidx/json_array_test.go
Lines 266 to 273 in b2c04be
{ filters: `j <@ '{"a": 1}'`, indexOrd: jsonOrd, ok: true, tight: false, unique: false, remainingFilters: `j <@ '{"a": 1}'`, }, We can scan the inverted span for
'{"a": 1}'
, to get rows that may match, but we need to re-evaluate the expression to ensure that the JSON columns don't contain other values that aren't contained by'{"a": 1}'
.I think these overlaps expressions should be tight.
That's the understanding with which I began this implementation. However, if the generated inverted expression is marked tight
then the execution plan created in inverted_index
test shows a direct scan of all the rows for evaluating overlaps instead of filtering using the inverted index. I do not know the reason behind this but to ensure that the inverted index is used for array overlaps, I kept tight as false for now.
pkg/sql/sem/tree/eval.go, line 434 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: comments should be <= 80 columns wide
Done.
pkg/sql/sem/tree/eval.go, line 435 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
You should make sure that this is the same function we are actually using to evaluate
&&
. I think you'll want to extract the code from here for this helper function, and call it from both places:cockroach/pkg/sql/sem/tree/eval.go
Lines 2598 to 2616 in 63988a4
Fn: func(ctx *EvalContext, left Datum, right Datum) (Datum, error) { array := MustBeDArray(left) other := MustBeDArray(right) if !array.ParamTyp.Equivalent(other.ParamTyp) { return nil, pgerror.New(pgcode.DatatypeMismatch, "cannot compare arrays with different element types") } for _, needle := range array.Array { // Nulls don't compare to each other in && syntax. if needle == DNull { continue } for _, hay := range other.Array { if needle.Compare(ctx, hay) == 0 { return DBoolTrue, nil } } } return DBoolFalse, nil },
Done.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1491 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I don't think we support columns with arrays of tuples
Done.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1504 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: remove extra lines after the empty test responses here and below
Done.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1572 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: remove extra space (I think if you rewrite this file as follows it should remove the space):
make test PKG=./pkg/sql/logictest/... TESTS='TestLogic//inverted_index' TESTFLAGS='-rewrite=true'
Done.
pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 1446 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
the formatting looks weird -- you might try to run this to fix it:
make test PKG=./pkg/sql/opt/exec/execbuilder... TESTS='TestExecBuild//inverted_index' TESTFLAGS='-rewrite=true'
Done.
pkg/sql/opt/xform/testdata/rules/select, line 9398 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: remove this
Done.
pkg/sql/opt/invertedidx/json_array_test.go, line 743 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
It'd be interesting to add expressions that combine
&&
expressions with@>
and<@
withAND
s andOR
s.
Done.
pkg/sql/rowenc/index_encoding_test.go, line 749 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I don't see
overlapsKeys
Done.
pkg/sql/rowenc/index_encoding_test.go, line 754 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Nice tests!
Done.
pkg/sql/rowenc/index_encoding_test.go, line 784 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Can you add test with
NULL
arrays?
Tried with NULL::INT[]
and NULL
but they could not be parsed while running the test. Got the below error:
Failed to parse array NULL::INT[]: could not parse "NULL::INT[]" as type int[]: array must be enclosed in { and }
pkg/sql/rowenc/index_encoding_test.go, line 808 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
why not? (as I wrote elsewhere, I think you might be able to make the spans tight)
Same concern as mentioned earlier.
pkg/sql/rowenc/index_encoding_test.go, line 813 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: comments <= 80 columns wide
Done.
Code quote:
// Array spans for && are never unique unless the result
pkg/sql/rowenc/index_encoding_test.go, line 845 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
why are you using 9 here for nullChance? (maybe just add a comment)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @RajivTS, and @rytaft)
pkg/sql/opt/invertedidx/json_array.go, line 366 at r2 (raw file):
Previously, RajivTS (Rajiv Sharma) wrote…
Good catch. On adding new tests, I can confirm that having non-constant array column on the right leads to panic. Observing the implementation of
EqExpr
(which is also commutative), I believe that in case of botht.Left
andt.Right
being indexed columns we should simply return aNonInvertedColExpression
indicating that index acceleration is not possible in this case. What do you think?
Yep, looks good to me. Thanks!
pkg/sql/rowenc/index_encoding.go, line 872 at r2 (raw file):
Previously, RajivTS (Rajiv Sharma) wrote…
I have kept this as is (except making tight=true for overlap) since I don't see a consensus on the expected behavior here. Please let me know if changes are needed.
I think it's important that we don't scan spans that we don't need to. I think it's ok to just return &inverted.SpanEpression{Tight: true}
. If that causes problems, try &inverted.SpanEpression{Tight: true, SpansToRead: []Span{}, FactoredUnionSpans: []Span{}}}
. And if that still causes problems, you could just add a check in extractArrayOverlapsCondition
that the const datum is not an empty array (and if it is, return inverted.NonInvertedColExpression{}
).
We'll also want to add the normalization rule that @mgartner suggested, but I agree that can be done in a separate PR.
pkg/sql/rowenc/index_encoding.go, line 890 at r2 (raw file):
Previously, RajivTS (Rajiv Sharma) wrote…
That's the understanding with which I began this implementation. However, if the generated inverted expression is marked
tight
then the execution plan created ininverted_index
test shows a direct scan of all the rows for evaluating overlaps instead of filtering using the inverted index. I do not know the reason behind this but to ensure that the inverted index is used for array overlaps, I kept tight as false for now.
Can you try forcing the use of the inverted index using the table@idx
syntax in the inverted_index
test? It's possible that we need to improve our statistics estimation to choose the correct plan for this case (that could be done in a follow up PR).
I do think we'll want to change tight
to true here (we can validate that this is correct with the randomized test that you've included, although you'll need to modify the test to not apply the additional filter on top).
pkg/sql/sem/tree/eval.go, line 438 at r3 (raw file):
func ArrayOverlaps(ctx *EvalContext, left, right *DArray) (*DBool, error) { array := MustBeDArray(left) other := MustBeDArray(right)
you don't need these two lines -- the parameters are already type DArray
79b434c
to
3430798
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
Previously, rytaft (Rebecca Taft) wrote…
nit: Mention that this is specifically array expressions
Done
pkg/sql/opt/invertedidx/json_array.go, line 366 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Yep, looks good to me. Thanks!
👍
pkg/sql/rowenc/index_encoding.go, line 872 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think it's important that we don't scan spans that we don't need to. I think it's ok to just return
&inverted.SpanEpression{Tight: true}
. If that causes problems, try&inverted.SpanEpression{Tight: true, SpansToRead: []Span{}, FactoredUnionSpans: []Span{}}}
. And if that still causes problems, you could just add a check inextractArrayOverlapsCondition
that the const datum is not an empty array (and if it is, returninverted.NonInvertedColExpression{}
).We'll also want to add the normalization rule that @mgartner suggested, but I agree that can be done in a separate PR.
Done. Returned inverted.NonInvertedColExpression{}
in case of empty arrays or arrays with only NULL
, in all other cases it produces a tight inverted expression. However this is being done in encodeOverlapsArrayInvertedIndexSpans
instead of extractArrayOverlapsCondition
since the null check for array elements can only be done once the value datums type has been confired as an Array.
The tests have been modified accordingly.
pkg/sql/rowenc/index_encoding.go, line 890 at r2 (raw file):
Can you try forcing the use of the inverted index using the table@idx syntax in the inverted_index test? It's possible that we need to improve our statistics estimation to choose the correct plan for this case (that could be done in a follow up PR).
Yes this did the trick :)
tight
is now true and the tests are working as expected.
pkg/sql/sem/tree/eval.go, line 438 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
you don't need these two lines -- the parameters are already type
DArray
My bad. Done.
pkg/sql/rowenc/index_encoding_test.go, line 752 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is this true for overlaps? I think the spans for a single array overlap would be unique, like `a &&
I am not so sure. Since the same element can be present in multiple rows for the same array, can unique
be true? i.e. Will the spans produce no duplicate primary keys?
3430798
to
1683881
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close! Thanks for the updates.
Reviewed 7 of 7 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @RajivTS)
pkg/sql/rowenc/index_encoding.go, line 861 at r4 (raw file):
// the given array, one slice of spans per entry in the array. The input // inKey is prefixed to all returned keys. //
nit: remove blank line (here and above where you removed the last bit of the function comment)
pkg/sql/rowenc/index_encoding.go, line 867 at r4 (raw file):
// If the given array is directly empty (i.e. Len == 0), // or contains only NULLs and thus has effective length 0, // we cannot generate an inverted expression.
We support the notion of a "contradiction" in regular constraints -- I think we might want to do the same for inverted constraints. A contradiction for inverted constraints would effectively be an empty SpanExpression (&inverted.SpanEpression{Tight: true, SpansToRead: []Span{}, FactoredUnionSpans: []Span{}}}
). Does returning that expression cause an error? If so, we may need to update the inverted.SpanEpression
code to explicitly support contradictions (can be a separate PR).
In the mean time, if that causes an error, returning inverted.NonInvertedColExpression{}
is fine. Could you just add a TODO here that this should be a contradiction?
pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 1398 at r4 (raw file):
spans: /"f"-/"f"/PrefixEnd /[]-/{} /Arr/"f"-/Arr/"f"/PrefixEnd /Arr/{}-/Arr/{}/PrefixEnd /Arr/"a"/"b"-/Arr/"a"/"b"/PrefixEnd /Arr/"c"/{}-/Arr/"c"/{}/PrefixEnd /Arr/"c"/"d"/[]-/Arr/"c"/"d"/{} /Arr/"c"/"d"/Arr/"e"-/Arr/"c"/"d"/Arr/"e"/PrefixEnd # Test that queries with overlaps && operator use the inverted index
This comment is no longer valid for this test case, so please add a TODO for this case that we should have a normalization rule that turns this into a no-op, since the predicate is a contradiction.
pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 1444 at r4 (raw file):
query T EXPLAIN (VERBOSE) SELECT * FROM e WHERE b && ARRAY[NULL]::INT[]
ditto here -- add a TODO
pkg/sql/opt/xform/testdata/rules/select, line 3365 at r3 (raw file):
opt expect=GenerateInvertedIndexScans SELECT * FROM c WHERE a && ARRAY[]::INT[]
I think these test cases are good, so it would be better to leave them in but use expect-not=GenerateInvertedIndexScans
and explain in a TODO that they should be addressed with a normalization rule rather than GenerateInvertedIndexScans
pkg/sql/opt/xform/testdata/rules/select, line 3338 at r4 (raw file):
opt expect=GenerateInvertedIndexScans SELECT * FROM c WHERE a && ARRAY[1]
you can use c@a_inv_idx
here to show the plan with the inverted index
pkg/sql/opt/xform/testdata/rules/select, line 3353 at r4 (raw file):
opt expect=GenerateInvertedIndexScans SELECT * FROM c WHERE a && ARRAY[1] OR a && ARRAY[2]
use c@a_inv_idx
pkg/sql/opt/invertedidx/json_array_test.go, line 744 at r3 (raw file):
{ // When operations affecting two different variables are OR-ed, we cannot // constrain either index.
why did you remove this comment? I think it's useful
pkg/sql/opt/invertedidx/json_array_test.go, line 718 at r4 (raw file):
}, { // Overlaps with an empty array produces a non-inverted expression
nit: end comments with a period (here and below)
pkg/sql/opt/invertedidx/json_array_test.go, line 724 at r4 (raw file):
}, { // Overlaps with conjunction of both tight expression with
nit: conjunction of both tight expression -> conjunction of two tight expressions (here and below)
pkg/sql/rowenc/index_encoding_test.go, line 742 at r4 (raw file):
indexedValue string value string inverted bool
I'd call this ok
instead
pkg/sql/rowenc/index_encoding_test.go, line 751 at r4 (raw file):
// correctly overlap or be distinct from the first value. // The expression is a union of spans, so unique should never be true
In this case, I don't think we need to include it as a parameter of the test case. It would be better to just unconditionally add a test below that the resulting SpanExpression
has Unique=false
pkg/sql/rowenc/index_encoding_test.go, line 817 at r4 (raw file):
// Array spans for && are never unique unless the result // is a singular empty array span.
I think you can change this to test that they are never unique and remove the expectUnique
param
pkg/sql/rowenc/index_encoding_test.go, line 826 at r4 (raw file):
} actual, _ := tree.ArrayOverlaps(&evalCtx, value.(*tree.DArray), indexedValue.(*tree.DArray))
you shouldn't need to apply this filter now, since the spans are tight
pkg/sql/rowenc/index_encoding_test.go, line 854 at r4 (raw file):
// We cannot check for false positives with these tests (due to the fact that // the spans are not tight), so we will only test for false negatives.
Since now these spans are always tight, I think you should change this test to check for false positives as well. This test should look very similar to TestEncodeContainingArrayInvertedIndexSpans
pkg/sql/rowenc/index_encoding_test.go, line 864 at r4 (raw file):
// An inverted expression can only be generated if the value array is // non-empty or contains atleast one non-NULL element. isInverted := rightArr.Len() > 0 && rightArr.HasNonNulls
nit: I'd call this ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @RajivTS)
pkg/sql/rowenc/index_encoding.go, line 867 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
We support the notion of a "contradiction" in regular constraints -- I think we might want to do the same for inverted constraints. A contradiction for inverted constraints would effectively be an empty SpanExpression (
&inverted.SpanEpression{Tight: true, SpansToRead: []Span{}, FactoredUnionSpans: []Span{}}}
). Does returning that expression cause an error? If so, we may need to update theinverted.SpanEpression
code to explicitly support contradictions (can be a separate PR).In the mean time, if that causes an error, returning
inverted.NonInvertedColExpression{}
is fine. Could you just add a TODO here that this should be a contradiction?
Another reason I'd really like to just return an empty span (i.e. contradiction) here instead of inverted.NonInvertedColExpression
is that if we want to use this function to support inverted joins with &&
in the future, returning inverted.NonInvertedColExpression
won't work (and any normalization rule we add won't help).
Anyway, as I said, this is fine for now if returning an empty span doesn't work, but we'll need to change it in the future.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1512 at r4 (raw file):
---- query T
For all of these tests that return multiple rows, you'll either need to add an ORDER BY
clause or use the rowsort
directive (e.g., query T rowsort
). That will prevent the CI build failures we're seeing.
1683881
to
e8e33a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks again for the contribution.
Do you want to take another look, @mgartner? Otherwise I'll go ahead and merge this.
Reviewed 6 of 6 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor optional suggestions, and one question for @rytaft.
Reviewed 1 of 8 files at r3, 1 of 7 files at r4, 6 of 6 files at r5, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @RajivTS, and @rytaft)
pkg/sql/opt/invertedidx/json_array.go, line 398 at r5 (raw file):
// inverted expression with the constant value picked from right. constantVal = right } else if isIndexColumn(j.tabID, j.index, right, j.computedColumns) && memo.CanExtractConstDatum(left) {
This is fine for now because it's following a pattern already established in this file, but in the long term I think it'd be better to make a normalization rule to commute &&
expressions so that variables are always on the left and constants are always on the right. I'm not sure why we didn't do that for contains @>
and contained by <@
. Maybe those aren't commutative?
pkg/sql/rowenc/index_encoding_test.go, line 752 at r2 (raw file):
Previously, RajivTS (Rajiv Sharma) wrote…
I am not so sure. Since the same element can be present in multiple rows for the same array, can
unique
be true? i.e. Will the spans produce no duplicate primary keys?
It would be impossible to have duplicate entries in the inverted index for the same primary key - they index keys would be identical. Here's a kvtrace test that proves it:
statement ok
CREATE TABLE t (
k INT PRIMARY KEY,
a INT[],
INVERTED INDEX (a),
FAMILY (k, a)
)
query T kvtrace
INSERT INTO t VALUES (1, ARRAY[1, 2])
----
CPut /Table/106/1/1/0 -> /TUPLE/
InitPut /Table/106/2/1/1/0 -> /BYTES/
InitPut /Table/106/2/2/1/0 -> /BYTES/
query T kvtrace
INSERT INTO t VALUES (2, ARRAY[1, 2, 1])
----
CPut /Table/106/1/2/0 -> /TUPLE/
InitPut /Table/106/2/1/2/0 -> /BYTES/
InitPut /Table/106/2/2/2/0 -> /BYTES/
Regardless, I think setting unique=true
for an expression like a && ARRAY[1]
would be a minor performance improvement. Leaving it as unique=false
wouldn't cause incorrect results, is that correct @rytaft?
pkg/sql/rowenc/index_encoding_test.go, line 784 at r2 (raw file):
Previously, RajivTS (Rajiv Sharma) wrote…
Tried with
NULL::INT[]
andNULL
but they could not be parsed while running the test. Got the below error:
Failed to parse array NULL::INT[]: could not parse "NULL::INT[]" as type int[]: array must be enclosed in { and }
You could add a check for "NULL"
in parseArray like:
if s == "NULL" {
return tree.DNull
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @RajivTS)
pkg/sql/rowenc/index_encoding.go, line 882 at r5 (raw file):
inverted.MakeSingleValSpan(key), true, /* tight */ ) if invertedExpr == nil {
I think you should add spanExpr.Unique = true
above this line.
pkg/sql/rowenc/index_encoding_test.go, line 752 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
It would be impossible to have duplicate entries in the inverted index for the same primary key - they index keys would be identical. Here's a kvtrace test that proves it:
statement ok CREATE TABLE t ( k INT PRIMARY KEY, a INT[], INVERTED INDEX (a), FAMILY (k, a) ) query T kvtrace INSERT INTO t VALUES (1, ARRAY[1, 2]) ---- CPut /Table/106/1/1/0 -> /TUPLE/ InitPut /Table/106/2/1/1/0 -> /BYTES/ InitPut /Table/106/2/2/1/0 -> /BYTES/ query T kvtrace INSERT INTO t VALUES (2, ARRAY[1, 2, 1]) ---- CPut /Table/106/1/2/0 -> /TUPLE/ InitPut /Table/106/2/1/2/0 -> /BYTES/ InitPut /Table/106/2/2/2/0 -> /BYTES/
Regardless, I think setting
unique=true
for an expression likea && ARRAY[1]
would be a minor performance improvement. Leaving it asunique=false
wouldn't cause incorrect results, is that correct @rytaft?
Yea -- I think you're totally right, @mgartner. I added a comment in the code where I think unique
should be set to true. So unfortunately you'll need to undo the change where you removed unique
from the test case, @RajivTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @RajivTS, and @rytaft)
pkg/sql/rowenc/index_encoding.go, line 867 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Another reason I'd really like to just return an empty span (i.e. contradiction) here instead of
inverted.NonInvertedColExpression
is that if we want to use this function to support inverted joins with&&
in the future, returninginverted.NonInvertedColExpression
won't work (and any normalization rule we add won't help).Anyway, as I said, this is fine for now if returning an empty span doesn't work, but we'll need to change it in the future.
Currently it does cause an error and the tests fail which is why I reverted to inverted.NonInvertedColExpression
. Will add the TODO for now.
pkg/sql/logictest/testdata/logic_test/inverted_index, line 1512 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
For all of these tests that return multiple rows, you'll either need to add an
ORDER BY
clause or use therowsort
directive (e.g.,query T rowsort
). That will prevent the CI build failures we're seeing.
Done.
pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 1398 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This comment is no longer valid for this test case, so please add a TODO for this case that we should have a normalization rule that turns this into a no-op, since the predicate is a contradiction.
Done.
pkg/sql/opt/exec/execbuilder/testdata/inverted_index, line 1444 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
ditto here -- add a TODO
Done.
pkg/sql/opt/xform/testdata/rules/select, line 3365 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think these test cases are good, so it would be better to leave them in but use
expect-not=GenerateInvertedIndexScans
and explain in a TODO that they should be addressed with a normalization rule rather thanGenerateInvertedIndexScans
Done.
pkg/sql/opt/xform/testdata/rules/select, line 3338 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
you can use
c@a_inv_idx
here to show the plan with the inverted index
Done.
pkg/sql/opt/xform/testdata/rules/select, line 3353 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
use c@a_inv_idx
Done.
pkg/sql/opt/invertedidx/json_array_test.go, line 744 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
why did you remove this comment? I think it's useful
Done.
pkg/sql/rowenc/index_encoding_test.go, line 752 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Yea -- I think you're totally right, @mgartner. I added a comment in the code where I think
unique
should be set to true. So unfortunately you'll need to undo the change where you removedunique
from the test case, @RajivTS.
Unfortunately, I don't know how to read a KV trace so am unable to figure out how the above trace confirms that there cannot be a duplicate. If there are any resources that you can share to better understand the above terms, it would really help.
Let me describe how I think how this whole scenario is working out. Please rectify wherever I am incorrect.
A table T
contains an indexed INT
array column, let's call it A
. Few of the entries in A
are: [1], [1, 2], [2, 3], [3, 5, 6, 7], [6, 8], [9,0]
with auto-incrementing primary key starting at 1. The inverted index on A
would be a table roughly like below:
Array Value (ID) | Primary Key in T (VAL) |
---|---|
1 | 1, 2 |
2 | 2, 3 |
3 | 3, 4 |
5 | 4 |
6 | 4, 5 |
7 | 4 |
8 | 5 |
9 | 6 |
0 | 6 |
Assume that we are running a query as follows:
SELECT * FROM T WHERE A && [0,1,2,8]
After generating the inverted span expression for the constant value on the right hand-side, we get a modified query like:
SELECT * FROM T WHERE T_Idx IN (SELECT VAL FROM InvertedIdx_A WHERE ID == 0 OR ID == 1 OR ID == 2 OR ID == 8)
Assuming everything up until this point has been (roughly) correct, we can make the following claims about uniqueness:
- The above inverted span expression is not
unique
due to theOR
ing of the keys pulled from the array, i.e.A && [1]
would beunique
but notA && [0,1,2,8]
. Thus, overlaps between an indexed array and a single element array would always beunique
. OR
ing of two different overlaps will not beunique
even if the individual overlaps themselves areunique
, i.e.A && [1] OR A && [2]
won't beunique
.AND
ing of two different overlaps will beunique
provided the individual overlaps themselves areunique
, i.e.A && [1] AND A && [2]
will beunique
butA && [1, 2, 3] AND A && [5, 6]
will not beunique
.
pkg/sql/rowenc/index_encoding_test.go, line 742 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I'd call this
ok
instead
Done.
pkg/sql/rowenc/index_encoding_test.go, line 751 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
In this case, I don't think we need to include it as a parameter of the test case. It would be better to just unconditionally add a test below that the resulting
SpanExpression
hasUnique=false
Done.
pkg/sql/rowenc/index_encoding_test.go, line 817 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think you can change this to test that they are never unique and remove the
expectUnique
param
Done.
pkg/sql/rowenc/index_encoding_test.go, line 826 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
you shouldn't need to apply this filter now, since the spans are tight
Done.
pkg/sql/rowenc/index_encoding_test.go, line 854 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Since now these spans are always tight, I think you should change this test to check for false positives as well. This test should look very similar to
TestEncodeContainingArrayInvertedIndexSpans
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @RajivTS, and @rytaft)
pkg/sql/rowenc/index_encoding_test.go, line 752 at r2 (raw file):
Previously, RajivTS (Rajiv Sharma) wrote…
Unfortunately, I don't know how to read a KV trace so am unable to figure out how the above trace confirms that there cannot be a duplicate. If there are any resources that you can share to better understand the above terms, it would really help.
Let me describe how I think how this whole scenario is working out. Please rectify wherever I am incorrect.
A tableT
contains an indexedINT
array column, let's call itA
. Few of the entries inA
are:[1], [1, 2], [2, 3], [3, 5, 6, 7], [6, 8], [9,0]
with auto-incrementing primary key starting at 1. The inverted index onA
would be a table roughly like below:
Array Value (ID) Primary Key in T (VAL) 1 1, 2 2 2, 3 3 3, 4 5 4 6 4, 5 7 4 8 5 9 6 0 6 Assume that we are running a query as follows:
SELECT * FROM T WHERE A && [0,1,2,8]
After generating the inverted span expression for the constant value on the right hand-side, we get a modified query like:
SELECT * FROM T WHERE T_Idx IN (SELECT VAL FROM InvertedIdx_A WHERE ID == 0 OR ID == 1 OR ID == 2 OR ID == 8)
Assuming everything up until this point has been (roughly) correct, we can make the following claims about uniqueness:
- The above inverted span expression is not
unique
due to theOR
ing of the keys pulled from the array, i.e.A && [1]
would beunique
but notA && [0,1,2,8]
. Thus, overlaps between an indexed array and a single element array would always beunique
.OR
ing of two different overlaps will not beunique
even if the individual overlaps themselves areunique
, i.e.A && [1] OR A && [2]
won't beunique
.AND
ing of two different overlaps will beunique
provided the individual overlaps themselves areunique
, i.e.A && [1] AND A && [2]
will beunique
butA && [1, 2, 3] AND A && [5, 6]
will not beunique
.
Once the above logic gets confirmed, I will make changes to the unique
part and associated tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @RajivTS)
pkg/sql/rowenc/index_encoding_test.go, line 752 at r2 (raw file):
Previously, RajivTS (Rajiv Sharma) wrote…
Once the above logic gets confirmed, I will make changes to the
unique
part and associated tests.
@RajivTS your understanding is correct, as described in this comment:
cockroach/pkg/sql/inverted/expression.go
Lines 300 to 305 in a8fc111
// Unique is true if the spans are guaranteed not to produce duplicate | |
// primary keys. Otherwise, Unique is false. Unique may be true for certain | |
// JSON or Array SpanExpressions, and it holds when unique SpanExpressions | |
// are combined with And. It does not hold when these SpanExpressions are | |
// combined with Or. | |
Unique bool |
I think your examples above would make great test cases.
The logic for AND
and OR
is already taken care of in expression.go
, so you don't need to worry about implementing it.
The logic for AND
is implemented here:
cockroach/pkg/sql/inverted/expression.go
Line 655 in a8fc111
Unique: left.Unique && right.Unique, |
And the logic for OR
is implemented here:
cockroach/pkg/sql/inverted/expression.go
Lines 680 to 687 in a8fc111
expr := &SpanExpression{ | |
Tight: left.Tight && right.Tight, | |
SpansToRead: unionSpans(left.SpansToRead, right.SpansToRead), | |
FactoredUnionSpans: unionSpans(left.FactoredUnionSpans, right.FactoredUnionSpans), | |
Operator: SetUnion, | |
Left: left, | |
Right: right, | |
} |
(notice that the resulting
SpanExpression
has the default value for Unique
, which is false
for booleans in go.)
I think you should be able to just set spanExpr.Unique = true
where I indicated in index_encoding.go
and it should all "just work".
fa3c6d9
to
c368e32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/invertedidx/json_array.go, line 398 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This is fine for now because it's following a pattern already established in this file, but in the long term I think it'd be better to make a normalization rule to commute
&&
expressions so that variables are always on the left and constants are always on the right. I'm not sure why we didn't do that for contains@>
and contained by<@
. Maybe those aren't commutative?
Will work on that PR next. IMHO, contains
and contained by
don't appear to be commutative (i.e. A @> B
isn't always the same as B @> A
)
pkg/sql/rowenc/index_encoding.go, line 882 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think you should add
spanExpr.Unique = true
above this line.
Done.
pkg/sql/rowenc/index_encoding_test.go, line 784 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
You could add a check for
"NULL"
in parseArray like:if s == "NULL" { return tree.DNull }
Done.
bors r- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @ajwerner
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @RajivTS)
pkg/sql/rowenc/index_encoding.go, line 874 at r6 (raw file):
Previously, ajwerner wrote…
This needs a rebase.
bors r-
@RajivTS I don't have permission to push to your branch, so can you please rebase from cockroach/master
and change
descpb.LatestNonPrimaryIndexDescriptorVersion
here to descpb.PrimaryIndexWithStoredColumnsVersion
. Thanks!
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
…dexes. Previously, we did not support index acceleration for queries involving array overlaps (&&). This change adds support for using the inverted index with && expressions on Array columns. When there is an inverted index available, a scan will be done on the Array column using the spans found from the constant value. Release note (performance improvement): Expressions using the overlaps (&&) operator for arrays now support index-acceleration for faster execution in some cases.
68e2d7e
to
d62a8f2
Compare
unfortunately, i think the rebase here was done incorrectly. there are 2436 files changed. i would suggest |
Yes, I will try to do this fresh with a new branch, avoiding this entire hassle. Will close this PR and create a new one (with link to the original) once I am done with my changes. |
Previously, we did not support index acceleration for queries involving
array overlaps (&&).
This change adds support for using the inverted index with && expressions on
Array columns. When there is an inverted index available, a scan will be done on
the Array column using the spans found from the constant value.
Fixes: #75477
Release note (performance improvement): Expressions using the overlaps (&&)
operator for arrays now support index-acceleration for faster execution in
some cases.