-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate non-aggregate string.Join to CONCAT_WS on SQL Server #28900
Conversation
does that mean any result string crossing 8000 (or 4000 if unicode), will be truncated? |
Is this required for 7.0? |
Yes 🙄 If there's a single varchar/nvarchar(max) in there, the result is also max, so no truncation occurs. So this only affects the case where all arguments (and the delimiter) are non-max.
No, not required.. The reason I did this is that we've added string.Join translation in another context (aggregate), so it's nice to be able to just say "we now support string.Join" (in both aggregate and non-aggregate contexts). It also seems very low-risk, but if we're against it we can do it for 8.0. |
It is not very "low-risk". I prefer not to do it. Not really a frequently asked feature. |
Well, it's just a new translation for something we didn't translate before, so I'm not sure in what sense the risk can be high. If you're concerned specifically with the truncation, we can introduce a CAST to varchar/nvarchar(max) e.g. on the delimiter, which would work around that. Or we can leave it as-is as a SQL Server quirk, like how we do with trailing whitespace. |
Making this a draft as we're not doing this in 7.0. |
Note: CONCAT_WS exists since SQL Server 2017 (14.x). We can use the compatibility level (#30163) to determine whether to translate or not (or to throw). |
d3dee8e
to
436843e
Compare
436843e
to
d2e274a
Compare
arguments[i + 1] = sqlArgument switch | ||
{ | ||
ColumnExpression { IsNullable: false } => sqlArgument, | ||
SqlConstantExpression constantExpression => constantExpression.Value is null | ||
? new SqlConstantExpression(string.Empty, stringTypeMapping) | ||
: constantExpression, | ||
_ => Dependencies.SqlExpressionFactory.Coalesce( | ||
sqlArgument, | ||
Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string))) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to eventually ensure that the nullability processor can simplify COALESCE
by dropping everything after the first known-non-null subexpression (and all of the known-null subexpressions). This would make it possible to simply use
arguments[i + 1] = sqlArgument switch | |
{ | |
ColumnExpression { IsNullable: false } => sqlArgument, | |
SqlConstantExpression constantExpression => constantExpression.Value is null | |
? new SqlConstantExpression(string.Empty, stringTypeMapping) | |
: constantExpression, | |
_ => Dependencies.SqlExpressionFactory.Coalesce( | |
sqlArgument, | |
Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string))) | |
}; | |
arguments[i + 1] = Dependencies.SqlExpressionFactory.Coalesce( | |
sqlArgument, | |
Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string))); |
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranma42 thanks, that's an interesting direction... A couple of thoughts:
- First, we can partially do this simplification at the source, i.e. improve SqlExpressionFactory.Coalesce() to not actually add the coalesce node for non-nullable columns/constants - I'm making that change in this PR. This has the small advantage of keeping the tree a bit cleaner, i.e. no unneeded coalesce node before the nullability processor.
- But you're right that for the general case of arbitrary expressions, this would (currently) need to happen at the nullability processor, since that's the only place where we know the nullability of arbitrary expressions (anything beyond column/constant).
- This general design is something we've discussed in the past; I've opened Consider introducing nullability on SqlExpression #33889 with some thoughts, but that's a very long-term, high-level architecture question that we can't really tackle in the near term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there is value in keeping the expression tree as clean as possible; OTOH duplicating the null propagation/simplification logic across the codebase has its disadvantages.
I think that an interesting approach to manage some simple cases (that are nonetheless currently repeated/spread across several files) could be a wrapper (possibly even as a collection of extension methods) that abstracts the pattern you are writing and makes it easily re-usable in several places.
I am thinking about something like CoalesceAndSimplify
(or other similar factory methods) that pre-emptively handles trivial optimizations (local&cheap ones). For an example of another compiler/expression translator that does this, see LLVM and how it performs trivial constant folding while building its AST.
A nit/question: is there a reason why one of the empty strings is built as new SqlConstantExpression(string.Empty, stringTypeMapping)
and the other one as Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that an interesting approach to manage some simple cases (that are nonetheless currently repeated/spread across several files) could be a wrapper (possibly even as a collection of extension methods) that abstracts the pattern you are writing and makes it easily re-usable in several places.
Right - that's what I tried to introduce in this PR, within SqlExpressionFactory itself. SqlExpressionFactory already goes considerably beyond simply creating instances of expression types (it wouldn't be very useful if it did just that), so it seems OK to add this sort of simplification logic in there too; after all, part of the point is to not have to think about "do I need coalesce" at each and every callsite, but just have the simplification happen automatically. And yeah, that's a little bit similar to the LLVM IR builder-level optimization you're referencing - you could view SqlExpressionFactory as our "IR builder"... Let me know if this all makes sense to you.
Unfortnuately, we have various tests exercising Coalesce functionality, which are implemented over non-nullable columns; the Coalesce node would be stripped away there, and the tests would become useless. I've split this out to #33890.
A nit/question: is there a reason why one of the empty strings is built as new SqlConstantExpression(string.Empty, stringTypeMapping) and the other one as Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string))?
Thanks - no, no reason - just that this is an old PR being revived, and there's a bit of mess. I cleaned it up.
d2e274a
to
db11893
Compare
db11893
to
b484f9b
Compare
b484f9b
to
68a8154
Compare
@dotnet/efteam this should be ready for review. |
As usual, this was a bit more tricky than it looks.
varchar(14)
. We don't have column/parameter values, so I'm setting the return mapping to be varchar(max) or nvarchar(max) (based on whether we've seen nvarchar or not). See below forCloses #28899
Interesting experiments for CONCAT_WS result type