-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement more connector expression pushdowns in SQL Server #14570
Implement more connector expression pushdowns in SQL Server #14570
Conversation
Initial work on predicate modulo push down support,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, however my opinion is not authoritative enough here.
testing/trino-testing/src/main/java/io/trino/testing/TestingConnectorBehavior.java
Outdated
Show resolved
Hide resolved
@Test | ||
public void testPredicateModuloPushdown() | ||
{ | ||
if (!hasBehavior(SUPPORTS_PREDICATE_MODULO_PUSHDOWN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to high number for connector behaviors, it's imperative that these declarations are self-tested. I.e. both "supports" and "does not support" should be verified. Otherwise, connector implementors will not know that they need to declare that give behavior is supported and it will not be tested.
Concrete example: PostgreSQL supports modulo pushdown, and automation should remind us to declare that in the PostgreSQL's connector test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
if (!hasBehavior(SUPPORTS_PREDICATE_ARITHMETIC_EXPRESSION_PUSHDOWN)) {
assertThat(query("SELECT shippriority FROM orders WHERE shippriority % 4 = 0")).isNotFullyPushedDown(FilterNode.class);
return;
}
But didn't get fully what should be the node, I took for now FilterNode,
but can you please clarify is it correct way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest way is to change to isFullyPushedDown
, see what the error shows the plan looks like and add the node (or sub-tree) just above TableScan as argument to isNotFullyPushedDown
.
e.g. if connector doesn't support LIMIT pushdown but supports aggregation pushdown then a query like SELECT regionkey, sum(nationkey) FROM (SELECT * FROM nation WHERE regionkey < 2 LIMIT 11) GROUP BY regionkey
will have plan like:
... LimitNode ... -> TableScanNode
That can be represented as isNotFullyPushedDown(LimitNode.class)
or to be more precise isNotFullyPushedDown(node(LimitNode.class, anyTree(node(TableScanNode.class))))
(i.e. a LimitNode followed by 1 or more nodes followed by a TableScanNode at the end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest way is to change to
isFullyPushedDown
, see what the error shows the plan looks like and add the node (or sub-tree) just above TableScan as argument toisNotFullyPushedDown
I made this test and see that we have ScanFilter in plan
So looks like isNotFullyPushedDown(FilterNode.class); is correct
I cannot find exact ScanFilter.class node, could I assume that FilterNode is the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, FilterNode is correct here.
ScanFilter = FilterNode + TableScanNode
ScanFilterProject = ProjectNode + FilterNode + TableScanNode.
if (!hasBehavior(SUPPORTS_PREDICATE_MODULO_PUSHDOWN)) { | ||
return; | ||
} | ||
//modulo over bigint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: we put a space after //
comment start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
plugin/trino-sqlserver/src/main/java/io/trino/plugin/sqlserver/SqlServerClient.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Show resolved
Hide resolved
e68e5dc
to
9198aad
Compare
1fba516
to
6b6e75b
Compare
@@ -46,6 +47,13 @@ public Optional<String> rewrite(Constant constant, Captures captures, RewriteCon | |||
if (slice == null) { | |||
return Optional.empty(); | |||
} | |||
|
|||
boolean isAscii = CharMatcher.ascii().matchesAllOf(slice.toStringUtf8()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are constructing String from Slice twice here. Let's do it once instead.
Also we should invert the logic - let's test if there is at least single non ascii character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -48,12 +48,13 @@ public Optional<String> rewrite(Constant constant, Captures captures, RewriteCon | |||
return Optional.empty(); | |||
} | |||
|
|||
boolean isAscii = CharMatcher.ascii().matchesAllOf(slice.toStringUtf8()); | |||
String sliceUtf8String = slice.toStringUtf8(); | |||
boolean isNonAscii = !CharMatcher.ascii().matchesAllOf(sliceUtf8String); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CharMatcher.ascii().negate().precomputed().matchesAnyOf(sliceUtf8String)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move CharMatcher instance to a const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And let's rename it to isUnicodeString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at implementation, and looks like it's not working as we expect in more performance way( like as soon it will find non ascii char, it stops processing and return), instead it just use negation of match all :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is Slice.isAscii static method. Let's use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, can't find it, I see only nonstatic method slice.toStringAscii()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe CharUtils.isAscii ? from commons-lang
if (isUnicodeString) { | ||
return Optional.of("N'" + sliceUtf8String.replace("'", "''") + "'"); | ||
} | ||
|
||
return Optional.of("'" + sliceUtf8String.replace("'", "''") + "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be in trino-base-jdbc - it's specific to SQL Server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's part of SQL-92 standard and SQL server is required to use it, but all other support it (but not required), like MySQL, Postgres, DB-2 e.t.c, so should we move only for SQL Server part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the N
string interpretation same across databases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This what I found about N prefix:
The "N" prefix stands for National Language in the SQL-92 standard, and is used for representing Unicode characters
While most databases do not need the added N prefix when using Unicode data, in SQL Server you must precede all Unicode strings with a prefix N when dealing with Unicode string constants.
All other supported backends work with or without the N prefix - the N prefix is not required in those environments but does work if used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all databases / query engines support N prefix. For example, does Trino support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting it is one thing - but the bigger question is do escapes get handled the same across databases.
From an older offline discussion my conclusion was it might make more sense to make ConnectorExpression pushdown use prepared statements which would make handling this (and other cases) easier.
In case we are able to identify cases where the N
behaviour differs across databases then IMO we should remove all non-arithmetic pushdowns from this PR and revisit them when we have better mechanisms to support them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe we can move for now this N support as you suggested previously only to SQL Server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hashhar I tried implement today the path, when we remove all non-arithmetic pushdowns
but the problem is, arithmetics push down will work only if we have such rule:
.add(new RewriteComparison(ImmutableSet.of(RewriteComparison.ComparisonOperator.EQUAL, RewriteComparison.ComparisonOperator.NOT_EQUAL)))
But if add such rule, equal/not equal push down start work even for simple cases with varchars like failing one:
WHERE variable = 'łąka for the win'
So looks like we need this UnicodeConstantRewrite rule, if we want arithmetics pushdown, but we can keep everything only in SQL Server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it only in SQL Server for now in that case until we have better ways to implement connector expression to SQL conversion + more test coverage.
The problem seems to be that there is no way to "prevent a connector expression pushdown" - we can only add more rules. i.e. we cannot say allow RewriteComparision but only if operands are integer types (we can but it'd require writing entirely new rule).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem seems to be that there is no way to "prevent a connector expression pushdown" - we can only add more rules. i.e. we cannot say allow RewriteComparision but only if operands are integer types (we can but it'd require writing entirely new rule).
exactly, I tried to made it, but with no luck
@hashhar @wendigo @findepi @ssheikin Can you please resume the review? @vlad-lyutenko says that all yours concerns were addressed. |
Happy to. from my perspective it would be helpful to squash the commits, as I'll read anew. |
@vlad-lyutenko It is also a good practice to make sure that CI is happy with changes you provide. It could be a case that you have a serious bug that may affect in big refactor. |
9b86729
to
91527c1
Compare
commits squashed |
8b7411f
to
4924e3c
Compare
assertThat(query("SELECT shippriority FROM orders WHERE shippriority % 4 = 0")).isFullyPushedDown(); | ||
|
||
assertThat(query("SELECT nationkey, name, regionkey FROM nation WHERE nationkey > 0 AND (nationkey - regionkey) % nationkey = 2")) | ||
.isFullyPushedDown() | ||
.matches("VALUES (BIGINT '3', CAST('CANADA' AS varchar(25)), BIGINT '1')"); | ||
|
||
// some databases calculate remainder instead of modulus when one of the values is negative | ||
assertThat(query("SELECT nationkey, name, regionkey FROM nation WHERE nationkey > 0 AND (nationkey - regionkey) % -nationkey = 2")) | ||
.isFullyPushedDown() | ||
.matches("VALUES (BIGINT '3', CAST('CANADA' AS varchar(25)), BIGINT '1')"); | ||
|
||
assertThatThrownBy(() -> query("SELECT nationkey, name, regionkey FROM nation WHERE nationkey > 0 AND (nationkey - regionkey) % 0 = 2")) | ||
.hasMessageContaining("by zero"); | ||
|
||
// Expression that evaluates to 0 for some rows on RHS of modulus | ||
assertThatThrownBy(() -> query("SELECT nationkey, name, regionkey FROM nation WHERE nationkey > 0 AND (nationkey - regionkey) % (regionkey - 1) = 2")) | ||
.hasMessageContaining("by zero"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd probably extract these to a separate method (not @Test
) so that it's easy to group related things togethers and for other connectors to override easily if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add TODO to add coverage for other arithmetic pushdowns + create a issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue created #14808, TODO added,
but I didn't got idea with method extraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a protected static void moduloPushdownTestCases() { ... }
and move the modulo related assertions to that method and call that method from this test.
That way subclasses can override just modulo pushdown if needed. But maybe it's premature optimization - so feel free to ignore for now.
plugin/trino-sqlserver/src/main/java/io/trino/plugin/sqlserver/SqlServerClient.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Show resolved
Hide resolved
if (isUnicodeString) { | ||
return Optional.of("N'" + sliceUtf8String.replace("'", "''") + "'"); | ||
} | ||
|
||
return Optional.of("'" + sliceUtf8String.replace("'", "''") + "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting it is one thing - but the bigger question is do escapes get handled the same across databases.
From an older offline discussion my conclusion was it might make more sense to make ConnectorExpression pushdown use prepared statements which would make handling this (and other cases) easier.
In case we are able to identify cases where the N
behaviour differs across databases then IMO we should remove all non-arithmetic pushdowns from this PR and revisit them when we have better mechanisms to support them.
plugin/trino-sqlserver/src/main/java/io/trino/plugin/sqlserver/SqlServerClient.java
Show resolved
Hide resolved
73e0445
to
47acd02
Compare
Created new flaky test issue |
{ | ||
// because we have arithmetic push down, now we will get exception not on trino side, but from sql server, | ||
// so error message will be different | ||
assertThatThrownBy(() -> getQueryRunner().execute("SELECT * FROM nation WHERE regionKey / nationKey - 1 = 0")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass a session in a base test to disable the predicate pushdown. Then you don't need to override the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -406,13 +406,6 @@ public void testShowCreateTable() | |||
")"); | |||
} | |||
|
|||
@Override | |||
public void testDeleteWithLike() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
47acd02
to
a0ab098
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % comments.
...n/trino-sqlserver/src/main/java/io/trino/plugin/sqlserver/RewriteUnicodeVarcharConstant.java
Show resolved
Hide resolved
Slice slice = (Slice) constant.getValue(); | ||
if (slice == null) { | ||
return Optional.empty(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A stupid question maybe but won't this be caught by the null check above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be honest it taken from RewriteVarcharConstant rule, and I was not brave enough to touch this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secondary check seems redundant 😄
...n/trino-sqlserver/src/main/java/io/trino/plugin/sqlserver/RewriteUnicodeVarcharConstant.java
Show resolved
Hide resolved
plugin/trino-sqlserver/src/main/java/io/trino/plugin/sqlserver/SqlServerClient.java
Show resolved
Hide resolved
.../trino-faulttolerant-tests/src/test/java/io/trino/faulttolerant/BaseFailureRecoveryTest.java
Show resolved
Hide resolved
a0ab098
to
1c4a1fe
Compare
Oh, also reminder to squash commits before merging since all of them depend on each other and it's single logical change. |
1c4a1fe
to
ebf34d6
Compare
Test added in BaseJdbcConnectorTest and can be used in any connector using hasBehavior override
ebf34d6
to
fb0d057
Compare
if (isUnicodeString) { | ||
return Optional.of("N'" + sliceUtf8String.replace("'", "''") + "'"); | ||
} | ||
|
||
return Optional.of("'" + sliceUtf8String.replace("'", "''") + "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if we unconditionally use N'<literal>'
? Seems like it'll still be valid - no need to bother with CharMatcher
which can be slow when matching larger values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Adding support for SqlServer predicate modulo pushdown,
Also added test in
BaseJdbcConnectorTest
, which can be used by overridinghasBehavior
method in any connector
Non-technical explanation
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: