Translate non-aggregate string.Join to CONCAT_WS on SQL Server #28900

roji · 2022-08-26T15:49:56Z

As usual, this was a bit more tricky than it looks.

The translation is in SqlServerSqlTranslatingExpressionVisitor because there's an array parameter.
CONCAT_WS is interesting in that the type it returns has a length based on its inputs, so concatenating 3 4-letter words with a 1-char delimiter returns a varchar(14). We don't have column/parameter values, so I'm setting the return mapping to be varchar(max) or nvarchar(max) (based on whether we've seen nvarchar or not). See below for
There's also CONCAT which is similar to CONCAT_WS. We already have a relational translation for string.Concat in StringMethodTranslator, but that works only for the overloads with 2-4 arguments, and not for 5+ (which has an array parameter). We could translate to CONCAT instead just like for CONCAT_WS, but then we should override the relational and do it regardless of number of args (shouldn't use different translations for different arg numbers). If you agree I can do this.

Interesting experiments for CONCAT_WS result type

SELECT CONCAT_WS(', ', CAST('foo' AS varchar(max)), CAST('bar' AS varchar(max)));

SELECT CONCAT_WS(', ', 'foo', 'bar'); -- varchar(8), adds lengths of arguments + delimiter as necessary
SELECT CONCAT_WS(', ', CAST('f' AS varchar(1)), CAST('b' AS varchar(1))); -- varchar(4)
SELECT CONCAT_WS(', ', CAST('f' AS varchar(1)), 'bar'); -- varchar(6)
SELECT CONCAT_WS(', ', CAST('f' AS varchar(3)), 'bar'); -- varchar(8)
SELECT CONCAT_WS(', ', 'f', CAST('bar' AS varchar(max))); -- varchar(max)

SELECT CONCAT_WS(', ', 'foo', CAST('bar' AS char(3))); -- varchar(8), char expanded to varchar
SELECT CONCAT_WS(CAST(', ' AS char(2)), CAST('foo' AS char(3)), CAST('bar' AS char(3))); -- varchar(8), even though all arguments are char

SELECT CONCAT_WS(', ', N'foo', 'bar'); -- nvarchar(16), varchar treated as nvarchar

-- Look at this thing (one for the book):
SELECT CONCAT_WS('|', REPLICATE('x', 7999), 'bar'); -- returns ...xxxxxb ('ar' is truncated)

-- To find out an expression's type:
DECLARE @what sql_variant;
SELECT @what = 'some expression';
SELECT
    SQL_VARIANT_PROPERTY(@what, 'BaseType'),
    SQL_VARIANT_PROPERTY(@what, 'Precision'),
    SQL_VARIANT_PROPERTY(@what, 'Scale'),
    SQL_VARIANT_PROPERTY(@what, 'MaxLength');

smitpatel · 2022-08-26T16:39:15Z

SELECT CONCAT_WS('|', REPLICATE('x', 7999), 'bar'); -- returns ...xxxxxb ('ar' is truncated)

does that mean any result string crossing 8000 (or 4000 if unicode), will be truncated?

smitpatel · 2022-08-26T16:42:03Z

Is this required for 7.0?

roji · 2022-08-26T16:47:46Z

SELECT CONCAT_WS('|', REPLICATE('x', 7999), 'bar'); -- returns ...xxxxxb ('ar' is truncated)

does that mean any result string crossing 8000 (or 4000 if unicode), will be truncated?

Yes 🙄

If there's a single varchar/nvarchar(max) in there, the result is also max, so no truncation occurs. So this only affects the case where all arguments (and the delimiter) are non-max.

Is this required for 7.0?

No, not required.. The reason I did this is that we've added string.Join translation in another context (aggregate), so it's nice to be able to just say "we now support string.Join" (in both aggregate and non-aggregate contexts). It also seems very low-risk, but if we're against it we can do it for 8.0.

smitpatel · 2022-08-26T17:10:38Z

It is not very "low-risk". I prefer not to do it. Not really a frequently asked feature.

roji · 2022-08-27T07:42:32Z

Well, it's just a new translation for something we didn't translate before, so I'm not sure in what sense the risk can be high.

If you're concerned specifically with the truncation, we can introduce a CAST to varchar/nvarchar(max) e.g. on the delimiter, which would work around that. Or we can leave it as-is as a SQL Server quirk, like how we do with trailing whitespace.

roji · 2022-08-31T20:28:06Z

Making this a draft as we're not doing this in 7.0.

roji · 2023-04-04T09:41:03Z

Note: CONCAT_WS exists since SQL Server 2017 (14.x). We can use the compatibility level (#30163) to determine whether to translate or not (or to throw).

ranma42 · 2024-06-03T22:47:43Z

src/EFCore.SqlServer/Query/Internal/SqlServerSqlTranslatingExpressionVisitor.cs

+                arguments[i + 1] = sqlArgument switch
+                {
+                    ColumnExpression { IsNullable: false } => sqlArgument,
+                    SqlConstantExpression constantExpression => constantExpression.Value is null
+                        ? new SqlConstantExpression(string.Empty, stringTypeMapping)
+                        : constantExpression,
+                    _ => Dependencies.SqlExpressionFactory.Coalesce(
+                        sqlArgument,
+                        Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string)))
+                };


I would like to eventually ensure that the nullability processor can simplify COALESCE by dropping everything after the first known-non-null subexpression (and all of the known-null subexpressions). This would make it possible to simply use

Suggested change

arguments[i + 1] = sqlArgument switch

{

ColumnExpression { IsNullable: false } => sqlArgument,

SqlConstantExpression constantExpression => constantExpression.Value is null

? new SqlConstantExpression(string.Empty, stringTypeMapping)

: constantExpression,

_ => Dependencies.SqlExpressionFactory.Coalesce(

sqlArgument,

Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string)))

};

arguments[i + 1] = Dependencies.SqlExpressionFactory.Coalesce(

sqlArgument,

Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string)));

here

@ranma42 thanks, that's an interesting direction... A couple of thoughts:

First, we can partially do this simplification at the source, i.e. improve SqlExpressionFactory.Coalesce() to not actually add the coalesce node for non-nullable columns/constants - I'm making that change in this PR. This has the small advantage of keeping the tree a bit cleaner, i.e. no unneeded coalesce node before the nullability processor.

But you're right that for the general case of arbitrary expressions, this would (currently) need to happen at the nullability processor, since that's the only place where we know the nullability of arbitrary expressions (anything beyond column/constant).

This general design is something we've discussed in the past; I've opened Consider introducing nullability on SqlExpression #33889 with some thoughts, but that's a very long-term, high-level architecture question that we can't really tackle in the near term.

I agree that there is value in keeping the expression tree as clean as possible; OTOH duplicating the null propagation/simplification logic across the codebase has its disadvantages.
I think that an interesting approach to manage some simple cases (that are nonetheless currently repeated/spread across several files) could be a wrapper (possibly even as a collection of extension methods) that abstracts the pattern you are writing and makes it easily re-usable in several places.
I am thinking about something like CoalesceAndSimplify (or other similar factory methods) that pre-emptively handles trivial optimizations (local&cheap ones). For an example of another compiler/expression translator that does this, see LLVM and how it performs trivial constant folding while building its AST.

A nit/question: is there a reason why one of the empty strings is built as new SqlConstantExpression(string.Empty, stringTypeMapping) and the other one as Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string))?

I think that an interesting approach to manage some simple cases (that are nonetheless currently repeated/spread across several files) could be a wrapper (possibly even as a collection of extension methods) that abstracts the pattern you are writing and makes it easily re-usable in several places.

Right - that's what I tried to introduce in this PR, within SqlExpressionFactory itself. SqlExpressionFactory already goes considerably beyond simply creating instances of expression types (it wouldn't be very useful if it did just that), so it seems OK to add this sort of simplification logic in there too; after all, part of the point is to not have to think about "do I need coalesce" at each and every callsite, but just have the simplification happen automatically. And yeah, that's a little bit similar to the LLVM IR builder-level optimization you're referencing - you could view SqlExpressionFactory as our "IR builder"... Let me know if this all makes sense to you.

Unfortnuately, we have various tests exercising Coalesce functionality, which are implemented over non-nullable columns; the Coalesce node would be stripped away there, and the tests would become useless. I've split this out to #33890.

A nit/question: is there a reason why one of the empty strings is built as new SqlConstantExpression(string.Empty, stringTypeMapping) and the other one as Dependencies.SqlExpressionFactory.Constant(string.Empty, typeof(string))?

Thanks - no, no reason - just that this is an old PR being revived, and there's a bit of mess. I cleaned it up.

Closes dotnet#28899

roji · 2024-06-05T09:41:30Z

@dotnet/efteam this should be ready for review.

roji requested a review from smitpatel August 26, 2022 15:49

roji marked this pull request as draft August 31, 2022 20:28

roji mentioned this pull request Sep 17, 2022

Set operations: infer type mappings from the other side #29081

Open

roji mentioned this pull request Apr 4, 2023

SQL Server: Allow users to explicitly specify the target SQL Server version/type #30163

Closed

3 tasks

roji mentioned this pull request Apr 26, 2023

IN() list queries are not parameterized, causing increased SQL Server CPU usage #13617

Closed

FirdavsAsadov approved these changes Nov 15, 2023

View reviewed changes

roji force-pushed the NonAggregateStringJoin branch from d3dee8e to 436843e Compare June 3, 2024 21:52

roji changed the base branch from release/7.0 to main June 3, 2024 22:05

roji force-pushed the NonAggregateStringJoin branch from 436843e to d2e274a Compare June 3, 2024 22:12

ranma42 reviewed Jun 3, 2024

View reviewed changes

roji assigned maumar and cincuranet Jun 4, 2024

roji force-pushed the NonAggregateStringJoin branch from d2e274a to db11893 Compare June 4, 2024 07:09

roji marked this pull request as ready for review June 4, 2024 07:09

roji enabled auto-merge (squash) June 4, 2024 07:09

roji disabled auto-merge June 4, 2024 07:45

roji force-pushed the NonAggregateStringJoin branch from db11893 to b484f9b Compare June 4, 2024 08:15

Translate non-aggregate string.Join

68a8154

Closes dotnet#28899

roji force-pushed the NonAggregateStringJoin branch from b484f9b to 68a8154 Compare June 4, 2024 08:16

roji enabled auto-merge (squash) June 4, 2024 08:23

cincuranet approved these changes Jun 12, 2024

View reviewed changes

roji merged commit 87796b9 into dotnet:main Jun 12, 2024
7 checks passed

roji deleted the NonAggregateStringJoin branch June 12, 2024 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translate non-aggregate string.Join to CONCAT_WS on SQL Server #28900

Translate non-aggregate string.Join to CONCAT_WS on SQL Server #28900

roji commented Aug 26, 2022

smitpatel commented Aug 26, 2022

smitpatel commented Aug 26, 2022

roji commented Aug 26, 2022

smitpatel commented Aug 26, 2022

roji commented Aug 27, 2022

roji commented Aug 31, 2022

roji commented Apr 4, 2023

ranma42 Jun 3, 2024

roji Jun 4, 2024

ranma42 Jun 4, 2024

roji Jun 4, 2024

roji commented Jun 5, 2024

Translate non-aggregate string.Join to CONCAT_WS on SQL Server #28900

Translate non-aggregate string.Join to CONCAT_WS on SQL Server #28900

Conversation

roji commented Aug 26, 2022

smitpatel commented Aug 26, 2022

smitpatel commented Aug 26, 2022

roji commented Aug 26, 2022

smitpatel commented Aug 26, 2022

roji commented Aug 27, 2022

roji commented Aug 31, 2022

roji commented Apr 4, 2023

ranma42 Jun 3, 2024

Choose a reason for hiding this comment

roji Jun 4, 2024

Choose a reason for hiding this comment

ranma42 Jun 4, 2024

Choose a reason for hiding this comment

roji Jun 4, 2024

Choose a reason for hiding this comment

roji commented Jun 5, 2024