Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement more connector expression pushdowns in SQL Server #14570

Merged

Conversation

vlad-lyutenko
Copy link
Contributor

Description

Adding support for SqlServer predicate modulo pushdown,
Also added test in BaseJdbcConnectorTest, which can be used by overriding hasBehavior
method in any connector

Non-technical explanation

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 11, 2022
@vlad-lyutenko
Copy link
Contributor Author

vlad-lyutenko commented Oct 11, 2022

Initial work on predicate modulo push down support,
to discuss:

  • should we enable modulo push down tests for other connectors like Postgres in this PR?
  • what other connectors most require this changes and should I refactor them in this PR?
    e.t.c

Copy link
Contributor

@ssheikin ssheikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, however my opinion is not authoritative enough here.

@Test
public void testPredicateModuloPushdown()
{
if (!hasBehavior(SUPPORTS_PREDICATE_MODULO_PUSHDOWN)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to high number for connector behaviors, it's imperative that these declarations are self-tested. I.e. both "supports" and "does not support" should be verified. Otherwise, connector implementors will not know that they need to declare that give behavior is supported and it will not be tested.

Concrete example: PostgreSQL supports modulo pushdown, and automation should remind us to declare that in the PostgreSQL's connector test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

   if (!hasBehavior(SUPPORTS_PREDICATE_ARITHMETIC_EXPRESSION_PUSHDOWN)) {
            assertThat(query("SELECT shippriority FROM orders WHERE shippriority % 4 = 0")).isNotFullyPushedDown(FilterNode.class);
            return;
        }

But didn't get fully what should be the node, I took for now FilterNode,
but can you please clarify is it correct way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest way is to change to isFullyPushedDown, see what the error shows the plan looks like and add the node (or sub-tree) just above TableScan as argument to isNotFullyPushedDown.

e.g. if connector doesn't support LIMIT pushdown but supports aggregation pushdown then a query like SELECT regionkey, sum(nationkey) FROM (SELECT * FROM nation WHERE regionkey < 2 LIMIT 11) GROUP BY regionkey will have plan like:

... LimitNode ... -> TableScanNode

That can be represented as isNotFullyPushedDown(LimitNode.class) or to be more precise isNotFullyPushedDown(node(LimitNode.class, anyTree(node(TableScanNode.class)))) (i.e. a LimitNode followed by 1 or more nodes followed by a TableScanNode at the end).

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest way is to change to isFullyPushedDown, see what the error shows the plan looks like and add the node (or sub-tree) just above TableScan as argument to isNotFullyPushedDown

I made this test and see that we have ScanFilter in plan
So looks like isNotFullyPushedDown(FilterNode.class); is correct

I cannot find exact ScanFilter.class node, could I assume that FilterNode is the same here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, FilterNode is correct here.

ScanFilter = FilterNode + TableScanNode
ScanFilterProject = ProjectNode + FilterNode + TableScanNode.

if (!hasBehavior(SUPPORTS_PREDICATE_MODULO_PUSHDOWN)) {
return;
}
//modulo over bigint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: we put a space after // comment start

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/sqlserver-modulo-pushdown branch 3 times, most recently from e68e5dc to 9198aad Compare October 14, 2022 11:20
@hashhar hashhar changed the title SqlServer predicate modulo pushdown support Implement more connector expression pushdowns in SQL Server Oct 17, 2022
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/sqlserver-modulo-pushdown branch from 1fba516 to 6b6e75b Compare October 18, 2022 12:07
@@ -46,6 +47,13 @@ public Optional<String> rewrite(Constant constant, Captures captures, RewriteCon
if (slice == null) {
return Optional.empty();
}

boolean isAscii = CharMatcher.ascii().matchesAllOf(slice.toStringUtf8());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are constructing String from Slice twice here. Let's do it once instead.

Also we should invert the logic - let's test if there is at least single non ascii character.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -48,12 +48,13 @@ public Optional<String> rewrite(Constant constant, Captures captures, RewriteCon
return Optional.empty();
}

boolean isAscii = CharMatcher.ascii().matchesAllOf(slice.toStringUtf8());
String sliceUtf8String = slice.toStringUtf8();
boolean isNonAscii = !CharMatcher.ascii().matchesAllOf(sliceUtf8String);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CharMatcher.ascii().negate().precomputed().matchesAnyOf(sliceUtf8String)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move CharMatcher instance to a const.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And let's rename it to isUnicodeString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at implementation, and looks like it's not working as we expect in more performance way( like as soon it will find non ascii char, it stops processing and return), instead it just use negation of match all :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is Slice.isAscii static method. Let's use that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, can't find it, I see only nonstatic method slice.toStringAscii()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe CharUtils.isAscii ? from commons-lang

Comment on lines 55 to 59
if (isUnicodeString) {
return Optional.of("N'" + sliceUtf8String.replace("'", "''") + "'");
}

return Optional.of("'" + sliceUtf8String.replace("'", "''") + "'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be in trino-base-jdbc - it's specific to SQL Server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's part of SQL-92 standard and SQL server is required to use it, but all other support it (but not required), like MySQL, Postgres, DB-2 e.t.c, so should we move only for SQL Server part?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the N string interpretation same across databases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This what I found about N prefix:
The "N" prefix stands for National Language in the SQL-92 standard, and is used for representing Unicode characters
While most databases do not need the added N prefix when using Unicode data, in SQL Server you must precede all Unicode strings with a prefix N when dealing with Unicode string constants.

All other supported backends work with or without the N prefix - the N prefix is not required in those environments but does work if used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all databases / query engines support N prefix. For example, does Trino support it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting it is one thing - but the bigger question is do escapes get handled the same across databases.

From an older offline discussion my conclusion was it might make more sense to make ConnectorExpression pushdown use prepared statements which would make handling this (and other cases) easier.

In case we are able to identify cases where the N behaviour differs across databases then IMO we should remove all non-arithmetic pushdowns from this PR and revisit them when we have better mechanisms to support them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe we can move for now this N support as you suggested previously only to SQL Server?

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar I tried implement today the path, when we remove all non-arithmetic pushdowns
but the problem is, arithmetics push down will work only if we have such rule:

.add(new RewriteComparison(ImmutableSet.of(RewriteComparison.ComparisonOperator.EQUAL, RewriteComparison.ComparisonOperator.NOT_EQUAL)))

But if add such rule, equal/not equal push down start work even for simple cases with varchars like failing one:
WHERE variable = 'łąka for the win'

So looks like we need this UnicodeConstantRewrite rule, if we want arithmetics pushdown, but we can keep everything only in SQL Server

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it only in SQL Server for now in that case until we have better ways to implement connector expression to SQL conversion + more test coverage.

The problem seems to be that there is no way to "prevent a connector expression pushdown" - we can only add more rules. i.e. we cannot say allow RewriteComparision but only if operands are integer types (we can but it'd require writing entirely new rule).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem seems to be that there is no way to "prevent a connector expression pushdown" - we can only add more rules. i.e. we cannot say allow RewriteComparision but only if operands are integer types (we can but it'd require writing entirely new rule).

exactly, I tried to made it, but with no luck

@kokosing
Copy link
Member

@hashhar @wendigo @findepi @ssheikin Can you please resume the review? @vlad-lyutenko says that all yours concerns were addressed.

@findepi
Copy link
Member

findepi commented Oct 26, 2022

@hashhar @wendigo @findepi @ssheikin Can you please resume the review? @vlad-lyutenko says that all yours concerns were addressed.

Happy to.

from my perspective it would be helpful to squash the commits, as I'll read anew.
For example, i still see Support predicate modulo pushdown for SqlServer commit, even though I think we wanted arithmetics as a whole, not singling out the modulo (which is likely least interesting anyway).

@kokosing
Copy link
Member

@vlad-lyutenko It is also a good practice to make sure that CI is happy with changes you provide. It could be a case that you have a serious bug that may affect in big refactor.
If CI issues are because of flakiness, please report flaky tests as github issues or add a comment to existing issues. If you have some time, consider to fix flaky issues too (as separate PR :)).
:)

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/sqlserver-modulo-pushdown branch from 9b86729 to 91527c1 Compare October 26, 2022 10:42
@vlad-lyutenko
Copy link
Contributor Author

@hashhar @wendigo @findepi @ssheikin Can you please resume the review? @vlad-lyutenko says that all yours concerns were addressed.

Happy to.

from my perspective it would be helpful to squash the commits, as I'll read anew. For example, i still see Support predicate modulo pushdown for SqlServer commit, even though I think we wanted arithmetics as a whole, not singling out the modulo (which is likely least interesting anyway).

commits squashed

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/sqlserver-modulo-pushdown branch from 8b7411f to 4924e3c Compare October 26, 2022 15:25
Comment on lines +1018 to +1034
assertThat(query("SELECT shippriority FROM orders WHERE shippriority % 4 = 0")).isFullyPushedDown();

assertThat(query("SELECT nationkey, name, regionkey FROM nation WHERE nationkey > 0 AND (nationkey - regionkey) % nationkey = 2"))
.isFullyPushedDown()
.matches("VALUES (BIGINT '3', CAST('CANADA' AS varchar(25)), BIGINT '1')");

// some databases calculate remainder instead of modulus when one of the values is negative
assertThat(query("SELECT nationkey, name, regionkey FROM nation WHERE nationkey > 0 AND (nationkey - regionkey) % -nationkey = 2"))
.isFullyPushedDown()
.matches("VALUES (BIGINT '3', CAST('CANADA' AS varchar(25)), BIGINT '1')");

assertThatThrownBy(() -> query("SELECT nationkey, name, regionkey FROM nation WHERE nationkey > 0 AND (nationkey - regionkey) % 0 = 2"))
.hasMessageContaining("by zero");

// Expression that evaluates to 0 for some rows on RHS of modulus
assertThatThrownBy(() -> query("SELECT nationkey, name, regionkey FROM nation WHERE nationkey > 0 AND (nationkey - regionkey) % (regionkey - 1) = 2"))
.hasMessageContaining("by zero");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd probably extract these to a separate method (not @Test) so that it's easy to group related things togethers and for other connectors to override easily if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add TODO to add coverage for other arithmetic pushdowns + create a issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue created #14808, TODO added,
but I didn't got idea with method extraction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a protected static void moduloPushdownTestCases() { ... } and move the modulo related assertions to that method and call that method from this test.

That way subclasses can override just modulo pushdown if needed. But maybe it's premature optimization - so feel free to ignore for now.

Comment on lines 55 to 59
if (isUnicodeString) {
return Optional.of("N'" + sliceUtf8String.replace("'", "''") + "'");
}

return Optional.of("'" + sliceUtf8String.replace("'", "''") + "'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting it is one thing - but the bigger question is do escapes get handled the same across databases.

From an older offline discussion my conclusion was it might make more sense to make ConnectorExpression pushdown use prepared statements which would make handling this (and other cases) easier.

In case we are able to identify cases where the N behaviour differs across databases then IMO we should remove all non-arithmetic pushdowns from this PR and revisit them when we have better mechanisms to support them.

@vlad-lyutenko
Copy link
Contributor Author

vlad-lyutenko commented Oct 28, 2022

Created new flaky test issue
#14814

{
// because we have arithmetic push down, now we will get exception not on trino side, but from sql server,
// so error message will be different
assertThatThrownBy(() -> getQueryRunner().execute("SELECT * FROM nation WHERE regionKey / nationKey - 1 = 0"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass a session in a base test to disable the predicate pushdown. Then you don't need to override the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -406,13 +406,6 @@ public void testShowCreateTable()
")");
}

@Override
public void testDeleteWithLike()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/sqlserver-modulo-pushdown branch from 47acd02 to a0ab098 Compare October 28, 2022 10:33
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments.

Comment on lines +47 to +50
Slice slice = (Slice) constant.getValue();
if (slice == null) {
return Optional.empty();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A stupid question maybe but won't this be caught by the null check above?

Copy link
Member

@hashhar hashhar Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The topmost check was added in 15839ea.

The cast check was added in 2c5ce56.

I don't understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest it taken from RewriteVarcharConstant rule, and I was not brave enough to touch this code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a question for @findepi and @wendigo to answer. No change requested here - it's pre-existing code. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secondary check seems redundant 😄

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/sqlserver-modulo-pushdown branch from a0ab098 to 1c4a1fe Compare October 28, 2022 11:04
@hashhar
Copy link
Member

hashhar commented Oct 28, 2022

Oh, also reminder to squash commits before merging since all of them depend on each other and it's single logical change.

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/sqlserver-modulo-pushdown branch from 1c4a1fe to ebf34d6 Compare October 28, 2022 13:03
Test added in BaseJdbcConnectorTest and can be used in
any connector using hasBehavior override
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/sqlserver-modulo-pushdown branch from ebf34d6 to fb0d057 Compare October 31, 2022 10:53
Comment on lines +55 to +59
if (isUnicodeString) {
return Optional.of("N'" + sliceUtf8String.replace("'", "''") + "'");
}

return Optional.of("'" + sliceUtf8String.replace("'", "''") + "'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we unconditionally use N'<literal>'? Seems like it'll still be valid - no need to bother with CharMatcher which can be slow when matching larger values.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hashhar hashhar merged commit 5c65ffd into trinodb:master Oct 31, 2022
@github-actions github-actions bot added this to the 402 milestone Oct 31, 2022
@hashhar hashhar mentioned this pull request Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants