-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fix record-returning udfs when used as data source #98162
sql: fix record-returning udfs when used as data source #98162
Conversation
Please hold off on reviewing. I introduced some test failures when finalizing this PR. |
255e6c7
to
22deff9
Compare
Ok, I think this is RFAL now. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this solution. I think we should refactor some of the srf logic to also use the insideDataSource
check, since it seems less error-prone than the current method (see #98352).
Reviewed 10 of 11 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/sql/routine.go
line 116 at r1 (raw file):
rt := g.expr.ResolvedType() var retTypes []*types.T if g.expr.MultiColOutput {
Do you think it would be worth asserting that the type is a tuple as expected?
pkg/sql/opt/optbuilder/scalar.go
line 730 at r1 (raw file):
} } if b.insideDataSource && (types.IsRecordType(rtyp) || rtyp.UserDefined()) {
Won't this set MultiColOutput
for any UDT? What happens for a user-defined enum? Could we just check if its a TupleFamily
type instead?
pkg/sql/opt/optbuilder/scalar.go
line 815 at r1 (raw file):
// For strict, set-returning UDFs, the evaluation logic achieves this // behavior. if !isSetReturning && !o.CalledOnNullInput && len(args) > 0 {
What happens for a strict UDF that's used as a table source?
pkg/sql/opt/optbuilder/scalar.go
line 847 at r1 (raw file):
if outCol == nil && isMultiColOutput { f.ResolvedOverload().ReturnsRecordType = types.IsRecordType(rtyp) return b.finishBuildGeneratorFunction(f, f.ResolvedOverload(), out, inScope, outScope, outCol)
What is finishBuildGeneratorFunction
doing that we haven't already done above - didn't we already synthesize all the needed columns?
pkg/sql/opt/optbuilder/scalar.go
line 851 at r1 (raw file):
// Should return a single tuple. // Synthesize an output column for set-returning UDFs. if outCol == nil && isSetReturning {
I know you didn't change this, but why are we also checking isSetReturning
here? It seems like we'd want to synthesize a column if we don't have one already, since finishBuildScalar
expects a non-nil outCol
. Maybe outCol == nil
implies isSetReturning
, but in that case we can remove the latter check anyway.
pkg/sql/opt/optbuilder/srfs.go
line 114 at r1 (raw file):
startCols := len(outScope.cols) isRecordReturningUDF := def != nil && funcExpr.ResolvedOverload().IsUDF && (texpr.ResolvedType().UserDefined() || types.IsRecordType(texpr.ResolvedType())) && b.insideDataSource
Similar to above - doesn't this condition capture UDFs that return non-tuple UDTs?
pkg/sql/logictest/testdata/logic_test/udf_record
line 185 at r1 (raw file):
---- a b 1 10
Other tests to consider:
- UDF that returns user-defined enum used as a table source
- UDF that returns SETOF RECORD used as a data source
- Strict UDF that returns RECORD used as a data source
22deff9
to
0fb39a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Good idea. When refactoring we'll have to be careful not to break the UDFs that currently use finishBuildGeneratorFunction
.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)
pkg/sql/routine.go
line 116 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Do you think it would be worth asserting that the type is a tuple as expected?
Done.
pkg/sql/opt/optbuilder/scalar.go
line 730 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Won't this set
MultiColOutput
for any UDT? What happens for a user-defined enum? Could we just check if its aTupleFamily
type instead?
You're right, that's a bug. Fixed.
pkg/sql/opt/optbuilder/scalar.go
line 815 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
What happens for a strict UDF that's used as a table source?
Added a test, looks like we handle this correctly compared to postgres.
pkg/sql/opt/optbuilder/scalar.go
line 847 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
What is
finishBuildGeneratorFunction
doing that we haven't already done above - didn't we already synthesize all the needed columns?
It synthesizes the outScope columns (finsihBuildScalar
does this, but only for one column). The columns we synthesized in the code added above are an intermediate step when we need to add a column access to expand a tuple into columns.
pkg/sql/opt/optbuilder/scalar.go
line 851 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I know you didn't change this, but why are we also checking
isSetReturning
here? It seems like we'd want to synthesize a column if we don't have one already, sincefinishBuildScalar
expects a non-niloutCol
. MaybeoutCol == nil
impliesisSetReturning
, but in that case we can remove the latter check anyway.
Rewrote this slightly. There are cases when outScope is nil, in which case outCol is likely also nil, which can go to finishBuildScalar
, so I modified this to check that instead.
pkg/sql/opt/optbuilder/srfs.go
line 114 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Similar to above - doesn't this condition capture UDFs that return non-tuple UDTs?
Done.
pkg/sql/logictest/testdata/logic_test/udf_record
line 185 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Other tests to consider:
- UDF that returns user-defined enum used as a table source
- UDF that returns SETOF RECORD used as a data source
- Strict UDF that returns RECORD used as a data source
Great suggestions, done. We have some SETOF RECORD
tests in udf_setof
, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/sql/opt/optbuilder/scalar.go
line 815 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Added a test, looks like we handle this correctly compared to postgres.
Could we have an optimizer test for this case as well? I'm confused how a CaseExpr
is returning multiple columns. Or is it returning a tuple that gets expanded after?
pkg/sql/logictest/testdata/logic_test/udf_record
line 233 at r2 (raw file):
statement ok CREATE FUNCTION f_strict() RETURNS RECORD STRICT AS
For the tests with strict UDFs, could we add a parameter to the functions? And also make sure they're called with both NULL and non-null inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
The typing of these expressions confuses me - it's counterintuitive why we need to convert back and forth between tuple and non-tuples. Some more explanation of why that's necessary would be helpful.
Reviewed 2 of 11 files at r1, 1 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rharding6373)
-- commits
line 34 at r2:
nit: I don't think RECORD
return types are possible in v22.2, so I think we can get rid of the release note.
pkg/sql/routine.go
line 125 at r2 (raw file):
for i, c := range rt.TupleContents() { retTypes[i] = c }
It might be simpler to determine the return type of the routine in execbuilder, before calling tree.NewTypedRoutineExpr
, e.g. here:
cockroach/pkg/sql/opt/exec/execbuilder/scalar.go
Lines 887 to 894 in 74c8e59
return tree.NewTypedRoutineExpr( | |
udf.Name, | |
args, | |
planGen, | |
udf.Typ, | |
enableStepping, | |
udf.CalledOnNullInput, | |
), nil |
Which makes me wonder - why isn't the memo.UDFExpr correctly typed in optbuilder? If there are multiple columns returned by the last statement, it's type should be a tuple, no?
pkg/sql/opt/optbuilder/scalar.go
line 718 at r2 (raw file):
if isSingleTupleResult { rtyp.InternalType.TupleContents = stmtScope.cols[0].typ.TupleContents() rtyp.InternalType.TupleLabels = stmtScope.cols[0].typ.TupleLabels()
I'm not 100% that it's safe to muck around with a type's InternalType
. Other expressions may hold pointers to the same type, which would cause issues. I think it's safest to create a new type.
Can you add some comments describing these cases. I'm not understanding this first case - why do we need to change rtyp
?
pkg/sql/opt/optbuilder/scalar.go
line 734 at r2 (raw file):
if isSingleTupleResult { // When used as a data source, we need to expand the tuple into // individual columns.
It seems odd that we need to expand the tuple into individual columns here, only to put them back into a Tuple during execution. Is there a fundamental problem with out handling of tuples vs rows - they should be more-or-less the same thing, I think.
d8f0170
to
be790d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs! I added some comments and rearranged the code to hopefully make it less confusing, let me know if that needs more work.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)
Previously, mgartner (Marcus Gartner) wrote…
nit: I don't think
RECORD
return types are possible in v22.2, so I think we can get rid of the release note.
Done.
pkg/sql/routine.go
line 125 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
It might be simpler to determine the return type of the routine in execbuilder, before calling
tree.NewTypedRoutineExpr
, e.g. here:cockroach/pkg/sql/opt/exec/execbuilder/scalar.go
Lines 887 to 894 in 74c8e59
return tree.NewTypedRoutineExpr( udf.Name, args, planGen, udf.Typ, enableStepping, udf.CalledOnNullInput, ), nil Which makes me wonder - why isn't the memo.UDFExpr correctly typed in optbuilder? If there are multiple columns returned by the last statement, it's type should be a tuple, no?
When there are multiple output columns, the function return type is a tuple (as seen here, the tuple contents are the column types). The issue that motivated this change is that when the routine returns multiple columns, if we don't have a 1:1 mapping with retTypes
the exec engine panics.
I think that adding multiple types at NewTypedRoutineExpr
would break the RoutineExpr
's TypedExpr
interface.
pkg/sql/opt/optbuilder/scalar.go
line 815 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Could we have an optimizer test for this case as well? I'm confused how a
CaseExpr
is returning multiple columns. Or is it returning a tuple that gets expanded after?
Added. I added more comments to scalar.go around expanding tuples to columns to try to clarify. Let me know if they need more detail.
pkg/sql/opt/optbuilder/scalar.go
line 718 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I'm not 100% that it's safe to muck around with a type's
InternalType
. Other expressions may hold pointers to the same type, which would cause issues. I think it's safest to create a new type.Can you add some comments describing these cases. I'm not understanding this first case - why do we need to change
rtyp
?
Good suggestion, this seems much less hacky. I updated some comments, please let me know if it's more or less clear.
pkg/sql/opt/optbuilder/scalar.go
line 734 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
It seems odd that we need to expand the tuple into individual columns here, only to put them back into a Tuple during execution. Is there a fundamental problem with out handling of tuples vs rows - they should be more-or-less the same thing, I think.
Could you point me to where we are putting them back into a tuple? I could potentially see some optimization that could be done here to prevent this output tuple from being created in the first place, but after we expand it here I don't think we turn it into a tuple again. Might have missed something though.
pkg/sql/logictest/testdata/logic_test/udf_record
line 233 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
For the tests with strict UDFs, could we add a parameter to the functions? And also make sure they're called with both NULL and non-null inputs.
Good call, it turns out we don't do this properly, but fixing it will take a bit more work because A) the bug is elsewhere (BuildUDF
isn't actually called if one of the function args is NULL), and B) reworking the CASE statement wrapper to return multiple NULLs or expanding columns after the CASE statement instead of before memoizing the UDF will take some refactoring.
I was going to disable all strict multi-col UDFs used as a datasource, but since the bug is elsewhere I'm learning towards allowing them for now. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 6 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rharding6373)
pkg/sql/routine.go
line 125 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
When there are multiple output columns, the function return type is a tuple (as seen here, the tuple contents are the column types). The issue that motivated this change is that when the routine returns multiple columns, if we don't have a 1:1 mapping with
retTypes
the exec engine panics.I think that adding multiple types at
NewTypedRoutineExpr
would break theRoutineExpr
'sTypedExpr
interface.
Ahh thanks for the explanation.
pkg/sql/routine.go
line 120 at r3 (raw file):
// A routine with multiple output column should have its types in a tuple. if rt.Family() != types.TupleFamily { panic(errors.AssertionFailedf("routine expected to return multiple columns"))
I don't think it's safe to panic here, but we can return an error instead.
pkg/sql/opt/optbuilder/scalar.go
line 718 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Good suggestion, this seems much less hacky. I updated some comments, please let me know if it's more or less clear.
If stmtScope.cols[0].typ
already has contents and labels, isn't it already a labelled tuple? Why do we have to build a new labeled tuple type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sans the question about strict functions.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/sql/opt/optbuilder/scalar.go
line 743 at r3 (raw file):
// data source. Data sources may output multiple columns, and if the // statement body produces a tuple it needs to be expanded into columns. // When not used as a data source, statements producing multiple columns, combine them into a tuple. If the last statement is already returning a tuple
[nit] Is this line too long?
pkg/sql/logictest/testdata/logic_test/udf_record
line 233 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Good call, it turns out we don't do this properly, but fixing it will take a bit more work because A) the bug is elsewhere (
BuildUDF
isn't actually called if one of the function args is NULL), and B) reworking the CASE statement wrapper to return multiple NULLs or expanding columns after the CASE statement instead of before memoizing the UDF will take some refactoring.I was going to disable all strict multi-col UDFs used as a datasource, but since the bug is elsewhere I'm learning towards allowing them for now. Thoughts?
The bug only applies when the NULL argument is a constant, right? What happens if its a column with a NULL value?
WRT B, is it correct to just return a tuple with all NULL elements instead of NULL? Or do you see some more complication involved with that approach?
be790d0
to
5656f70
Compare
5656f70
to
1c35278
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strict UDFs now work as data sources! The main changes are not building a CASE statement for this case and adding a new routine generator to generate nulls instead. This is RFAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)
pkg/sql/routine.go
line 120 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I don't think it's safe to panic here, but we can return an error instead.
Done.
pkg/sql/opt/optbuilder/scalar.go
line 718 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
If
stmtScope.cols[0].typ
already has contents and labels, isn't it already a labelled tuple? Why do we have to build a new labeled tuple type?
Good point. Done.
pkg/sql/opt/optbuilder/scalar.go
line 743 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] Is this line too long?
Done.
pkg/sql/logictest/testdata/logic_test/enums
line 1771 at r4 (raw file):
SELECT "🙏"('😊'), "🙏"(NULL:::"Emoji 😉") ---- NULL mixed
Tested in postgres, and the new output is correct.
pkg/sql/logictest/testdata/logic_test/udf_record
line 233 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
The bug only applies when the NULL argument is a constant, right? What happens if its a column with a NULL value?
WRT B, is it correct to just return a tuple with all NULL elements instead of NULL? Or do you see some more complication involved with that approach?
I think the most recent revision fixes the strict case. The solution Marcus and I came up with is in strict functions where we return records and the function is being used as a data source, we do not add the case wrapper and instead let the routine return the multiple NULLs if necessary. We could return a tuple with all NULL elements, but the challenge has been expanding the tuple into columns as expected for data sources.
Please let me know if there are any other interesting test cases that are missing.
pkg/sql/logictest/testdata/logic_test/udf_record
line 321 at r4 (raw file):
# Test ambiguous function signatures with records subtest ambiguity
I came up with these examples when I was modifying type checking / function resolution. None work as expected, but I think that it's out of scope for this already big PR so I filed #100928 to fix this.
1c35278
to
cb0fcd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r3, 10 of 14 files at r4, 3 of 4 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
-- commits
line 33 at r2:
Why remove this? Also, worth mentioning that this is for UDFs used as a table source.
pkg/sql/opt/ops/scalar.opt
line 1290 at r6 (raw file):
# Generator is true if the function may output a set of rows. Generator bool
Isn't this already covered by SetReturning
?
pkg/sql/opt/optbuilder/scalar.go
line 740 at r6 (raw file):
} f.SetTypeAnnotation(rtyp) rtyp = f.ResolvedType()
[nit] I don't think this line does anything, it should always just return the same type.
cb0fcd7
to
90ff224
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)
Previously, DrewKimball (Drew Kimball) wrote…
Why remove this? Also, worth mentioning that this is for UDFs used as a table source.
This fixes a bug with record-returning UDFs, which were only introduced in 23.1 which has not been released yet, so no need for a release note after all.
pkg/sql/opt/ops/scalar.opt
line 1290 at r6 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Isn't this already covered by
SetReturning
?
Yes, you're right. We only need a new field in routines.
pkg/sql/opt/optbuilder/scalar.go
line 740 at r6 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] I don't think this line does anything, it should always just return the same type.
You're right. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! It's quite a challenge to understand all this UDF behavior!
Reviewed 2 of 11 files at r1, 1 of 4 files at r2, 1 of 6 files at r3, 7 of 14 files at r4, 3 of 4 files at r5, 1 of 1 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rharding6373)
Previously, rharding6373 (Rachael Harding) wrote…
This fixes a bug with record-returning UDFs, which were only introduced in 23.1 which has not been released yet, so no need for a release note after all.
When I originally told you to remove this, I was unaware that we expect release notes for changes in pre-release (alphas and betas) behavior. So might be good to add a quick note here. Sorry to make extra work for you.
pkg/sql/routine.go
line 120 at r7 (raw file):
// A routine with multiple output column should have its types in a tuple. if rt.Family() != types.TupleFamily { return errors.New("routine expected to return multiple columns")
This should be an errors.AssertionFailedf
because it should be an internal error at this point, I think. If the return type of the function was incorrect we should have errored earlier during type-checking.
pkg/sql/routine.go
line 122 at r7 (raw file):
return errors.New("routine expected to return multiple columns") } retTypes = make([]*types.T, len(rt.TupleContents()))
Can we simplify this to retTypes = rt.TupleContents()
? As long as we don't mutate the types in retTypes
this should be safe, AFAIK.
pkg/sql/opt/ops/scalar.opt
line 1290 at r6 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Yes, you're right. We only need a new field in routines.
Did you mean to remove this?
pkg/sql/opt/optbuilder/scalar.go
line 739 at r7 (raw file):
rtyp = types.MakeLabeledTuple(tc, tl) } f.SetTypeAnnotation(rtyp)
Can you explain why we need to update the type of the tree.FuncExpr
?
pkg/sql/opt/optbuilder/scalar.go
line 746 at r7 (raw file):
When not used as a data source, statements producing multiple columns, combine them into a tuple.
nit: seems like there's a typo in this sentence
pkg/sql/rowexec/project_set.go
line 194 at r7 (raw file):
gen = builtins.NullGenerator(t.ResolvedType()) } else { gen = builtins.EmptyGenerator()
This logic seems wrong for a non-set-returning function that returns multiple columns as a data source. It should produce values, but I don't think it will with the empty generator. I see you have a test like the one below, but I'm not understanding how the empty generator is working for that.
marcus=# CREATE FUNCTION f() RETURNS RECORD LANGUAGE SQL AS 'select 1, 2, 3';
CREATE FUNCTION
marcus=# SELECT * FROM f() AS (a INT, b INT, c INT);
a | b | c
---+---+---
1 | 2 | 3
(1 row)
pkg/sql/sem/builtins/generator_builtins.go
line 1262 at r7 (raw file):
func NullGenerator(typ *types.T) eval.ValueGenerator { if typ.Family() != types.TupleFamily { return nil
Should this be an error instead?
pkg/sql/sem/tree/type_check.go
line 1143 at r7 (raw file):
// NULL is given as an argument. if len(s.overloadIdxs) > 0 && calledOnNullInputFns.Len() == 0 && funcCls != GeneratorClass && funcCls != AggregateClass && !hasUDFOverload {
Why was this type-checking change needed?
pkg/sql/logictest/testdata/logic_test/udf_record
line 233 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
I think the most recent revision fixes the strict case. The solution Marcus and I came up with is in strict functions where we return records and the function is being used as a data source, we do not add the case wrapper and instead let the routine return the multiple NULLs if necessary. We could return a tuple with all NULL elements, but the challenge has been expanding the tuple into columns as expected for data sources.
Please let me know if there are any other interesting test cases that are missing.
Wouldn't hurt to label some of these UDFs as STABLE to make sure we're inlining correctly.
pkg/sql/logictest/testdata/logic_test/udf_record
line 132 at r7 (raw file):
(1,5) # subtest datasource
nit: un-comment the subtest
directive
Gentle ping @mgartner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. This is looking great! Just a few more nits and some suggestions on tests to add.
Reviewed 1 of 14 files at r4, 5 of 7 files at r8, 6 of 6 files at r9, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rharding6373)
pkg/sql/opt/ops/scalar.opt
line 1287 at r9 (raw file):
# MultiColOutput is true if the function may return multiple columns. MultiColOutput bool
This is only true if the UDF is used as a datasource, correct? Can you add that to the comment here? Does it make more sense to name this MultiColDataSource
?
pkg/sql/opt/optbuilder/scalar.go
line 739 at r7 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
I reworded the comment at the top of this block to try to explain this with more clarity. Could you PTAL and let me know if it needs more detail?
Looks good, thanks!
pkg/sql/rowexec/project_set.go
line 190 at r9 (raw file):
// (i.e., Generators), have different behavior and are handled // separately. gen, err = builtins.NullGenerator(t.ResolvedType())
It'd be nice to put this logic into eval.GetRoutineGenerator
, but I suspect that'll cause in import cycle betwen eval
and builtins
. For now, can you update the comment here to reflect that the empty or null generator can be used:
cockroach/pkg/sql/sem/eval/generators.go
Lines 54 to 56 in 1b45ced
// Strict routines (CalledOnNullInput=false) should not be | |
// invoked if any of their arguments are NULL. Return nil so | |
// that the EmptyGenerator is used. |
pkg/sql/logictest/testdata/logic_test/udf_record
line 233 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Scattered some STABLEs throughout the test. This did find a bug, and when I tried to fix inlining to make it work with record-returning UDFs, I eventually remembered that subqueries can only return 1 column, so inlining when they're used as a datasource doesn't work :-(
Good find. I think that's an ok sacrifice for now. We can revisit lifting that restriction in the future.
pkg/sql/logictest/testdata/logic_test/udf_record
line 321 at r4 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
I came up with these examples when I was modifying type checking / function resolution. None work as expected, but I think that it's out of scope for this already big PR so I filed #100928 to fix this.
Nice catch!
pkg/sql/opt/optbuilder/testdata/udf
line 1175 at r9 (raw file):
build format=show-scalars SELECT * FROM strict_fn_record(1, 'foo', false) as bar(i INT, t TEXT, b BOOl)
Some tests here with nested datasources would be good. I'm most concerned about insideDataSource
working correctly with something like this:
SELECT * FROM (SELECT strict_fn_record(1, 'foo', false) FROM (VALUES (1), (2)))
Some logic tests like this might be good to have too.
a02942b
to
7ecc02d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Turns out there wasn't much that could be simplified here after the strict/inlining changes.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)
pkg/sql/opt/ops/scalar.opt
line 1287 at r9 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This is only true if the UDF is used as a datasource, correct? Can you add that to the comment here? Does it make more sense to name this
MultiColDataSource
?
Done.
pkg/sql/rowexec/project_set.go
line 190 at r9 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
It'd be nice to put this logic into
eval.GetRoutineGenerator
, but I suspect that'll cause in import cycle betweneval
andbuiltins
. For now, can you update the comment here to reflect that the empty or null generator can be used:cockroach/pkg/sql/sem/eval/generators.go
Lines 54 to 56 in 1b45ced
// Strict routines (CalledOnNullInput=false) should not be // invoked if any of their arguments are NULL. Return nil so // that the EmptyGenerator is used.
Yes, I tried that before and there's an import cycle :-(
Updated the comment.
pkg/sql/logictest/testdata/logic_test/udf_record
line 233 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Good find. I think that's an ok sacrifice for now. We can revisit lifting that restriction in the future.
Done.
pkg/sql/opt/optbuilder/testdata/udf
line 1175 at r9 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Some tests here with nested datasources would be good. I'm most concerned about
insideDataSource
working correctly with something like this:SELECT * FROM (SELECT strict_fn_record(1, 'foo', false) FROM (VALUES (1), (2)))
Some logic tests like this might be good to have too.
That was a good test. Found an error and fixed it.
When record-returning UDFs (both implicit and `RECORD` return types) are used as a data source in a query, the result should be treated as a row with separate columns instead of a tuple, which is how UDF output is normally treated. This PR closes this gap between CRDB and Postgres. For example: ``` CREATE FUNCTION f() RETURNS RECORD AS $$ SELECT 1, 2, 3; $$ LANGUAGE SQL SELECT f() f -------- (1,2,3) SELECT * FROM f() as foo(a int, b int, c int); a | b | c ---+---+--- 1 | 2 | 3 ``` The behavior is the same for implicit record return types. Epic: CRDB-19496 Fixes: cockroachdb#97059 Release note (bug fix): Fixes the behavior of UDFs to return its results as a row instead of a tuple when UDFs are called in a query as a data source. This is now compatible with postgres behavior.
7ecc02d
to
4539072
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job working through all these tricky edge cases!
Reviewed 13 of 13 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs everyone!
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)
Build failed: |
Bors failed with a known flake: #101336 bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 4539072 to blathers/backport-release-23.1-98162: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. error creating merge commit from 4539072 to blathers/backport-release-23.1.0-98162: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.0 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously, we would build scalars in VALUES clauses after resolving the value column types. However, for UDFs, we modify the type if it's a record-returning function during the build. In this change we reverse the order so that we build scalars and then resolve types. This bug was introduced by cockroachdb#98162. Epic: None Fixes: cockroachdb#102718 Release note: This fixes a bug in VALUES clauses containing a call to a record-returning UDF that could manifest as an internal error in some queries.
102299: tree: fix decimal strings for 0 with exponents < -6 r=rafiss a=otan Release note (bug fix): Fix a bug where 0 with exponents < -6 would display as `0E(exponent)` instead of printing all 0s, e.g. `0E-7` should be `0.0000000`. Informs #102217 103078: sql: resolve values types after building scalars in optbuilder r=rharding6373 a=rharding6373 Previously, we would build scalars in VALUES clauses after resolving the value column types. However, for UDFs, we modify the type if it's a record-returning function during the build. In this change we reverse the order so that we build scalars and then resolve types. This bug was introduced by #98162. Epic: None Fixes: #102718 Release note: This fixes a bug in VALUES clauses containing a call to a record-returning UDF that could manifest as an internal error in some queries. 103588: kvstreamer: add non-negative float validation to a cluster setting r=yuzefovich a=yuzefovich We forgot to add non-negative validation function to private `sql.distsql.streamer.avg_response_size_multiple` cluster setting. If this were set to a negative value, it would result in an undefined behavior of the streamer (we could try setting negative `TargetBytes` limit on `BatchRequest`s). I don't think anyone ever used it though so far. Epic: None Release note: None 103625: ccl/sqlproxyccl: fix flake on TestWatchTenants r=JeffSwenson a=jaylim-crl Fixes #103494. This commit fixes a flake on TestWatchTenants. There's a possibility where the cache invalidation logic races with the watcher termination logic in the test. This commit updates the test to wait for the cache invalidation before proceeding. Release note: None Epic: none Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: rharding6373 <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Jay <[email protected]>
Previously, we would build scalars in VALUES clauses after resolving the value column types. However, for UDFs, we modify the type if it's a record-returning function during the build. In this change we reverse the order so that we build scalars and then resolve types. This bug was introduced by #98162. Epic: None Fixes: #102718 Release note: This fixes a bug in VALUES clauses containing a call to a record-returning UDF that could manifest as an internal error in some queries.
This change backports a small change from a larger PR (cockroachdb#98162) that prevents short-circuiting type checking for UDF function overloads. This allows us to do more precise type checking even if inputs are NULL. Epic: none Fixes: cockroachdb#88374 Release note (bug fix): Fixes ambiguous calls to UDFs with NULL arguments.
When record-returning UDFs (both implicit and
RECORD
return types) are used as a data source in a query, the result should be treated as a row with separate columns instead of a tuple, which is how UDF output is normally treated. This PR closes this gap between CRDB and Postgres.For example:
The behavior is the same for implicit record return types.
Epic: CRDB-19496
Fixes: #97059
Release note: None