Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate IN predicate to connector expression #11396

Closed

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Mar 9, 2022

Description

Adds translation for InPredicate and InListExpression to connector expression form.

Is this change a fix, improvement, new feature, refactoring, or other?

New feature.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core engine & postgres connector.

How would you describe this change to a non-technical end user or system administrator?

This enabled pushdown of IN predicates which are not expressible with Domain (i.e. references other symbols or contains function calls)

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Mar 9, 2022
@wendigo wendigo requested review from findepi, hashhar and assaf2 and removed request for findepi March 9, 2022 14:53
}

ConnectorExpression inListExpression = new Call(typeOf(node.getValueList()), StandardFunctions.IN_LIST_FUNCTION_NAME, values.build());
return Optional.of(new Call(typeOf(node), StandardFunctions.IN_PREDICATE_FUNCTION_NAME, List.of(valueExpression.get(), inListExpression)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think about efficiency of the representation, since IN list can be long and can be processed multiple times. This is the same problem as with large Domains, and with same performance challenges. See 82e8f93 and cc @raunaqmorarka

Now, I am aware that InListExpression doesn't have an efficient representation, but it can be evolved to have one. With ConnectorExpressions, such an evolution would be an SPI breaking change, so I'd suggest we consider performance in the first version already.

cc @martint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint any thoughts here?

@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch 2 times, most recently from f886581 to b44ef48 Compare March 10, 2022 17:41
@wendigo wendigo requested a review from findepi March 10, 2022 18:01
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Translate IN predicate to connector expression" LGTM

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Rewrite connector IN expression in PostgreSQL connector"

@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch from b44ef48 to d624b09 Compare March 15, 2022 12:25
@wendigo wendigo requested a review from findepi March 15, 2022 12:33
@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch from d624b09 to 871d94c Compare March 15, 2022 18:57
@wendigo wendigo requested a review from findepi March 15, 2022 18:58
@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch from 871d94c to 9138284 Compare March 16, 2022 11:25
@wendigo
Copy link
Contributor Author

wendigo commented Mar 17, 2022

@findepi PTAL

@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch 2 times, most recently from 01875b6 to e629253 Compare March 24, 2022 13:28
@wendigo
Copy link
Contributor Author

wendigo commented Mar 24, 2022

Rebased & resolved conflicts. PTAL @findepi

@findepi
Copy link
Member

findepi commented Mar 24, 2022

@wendigo did the CI run? seems not. can you add empty / rebase?

@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch from 8a07a0a to 67c8132 Compare March 28, 2022 07:51
@wendigo
Copy link
Contributor Author

wendigo commented Mar 28, 2022

@findepi fixed, after merge checkstyle violation :(

@findepi
Copy link
Member

findepi commented Apr 1, 2022

AFAIU the "Improve NULL handling in IN predicates optimizations" commit prevents some problems/regressions that would be introduced by previous commits. is it correct?

@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch from dd7cbb3 to a22f682 Compare April 1, 2022 11:17
@wendigo wendigo requested a review from findepi April 1, 2022 11:18
@wendigo
Copy link
Contributor Author

wendigo commented Apr 1, 2022

All tests are passing. Seems in a good shape for next round of the review @findepi

@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch from ea916b7 to dd51415 Compare April 8, 2022 13:10
@wendigo
Copy link
Contributor Author

wendigo commented Apr 8, 2022

@findepi rebased and solved conflicts. please review

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help me understand the purpose and the mechanics of "Improve NULL handling in IN predicates optimizations" commit

@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch 2 times, most recently from 5114d72 to 666be07 Compare April 15, 2022 08:10
@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch from 666be07 to 6336f75 Compare April 22, 2022 11:48
@wendigo wendigo requested a review from findepi April 22, 2022 11:51
@findepi findepi force-pushed the serafin/in-list-connector-pushdown branch from d9cd56b to 9933b83 Compare April 22, 2022 14:02
@wendigo wendigo force-pushed the serafin/in-list-connector-pushdown branch from 9933b83 to 94432ee Compare May 6, 2022 14:27
@wendigo
Copy link
Contributor Author

wendigo commented May 6, 2022

@findepi PTAL

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except of "Rewrite typed NULL char/varchar constants".

I'd suggest dropping this commit.

return addStandardRules(identifierQuote, type -> Optional.empty());
}

public JdbcConnectorExpressionRewriterBuilder addStandardRules(Function<String, String> identifierQuote, Function<Type, Optional<String>> typeMapping)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simple for PostgreSQL and varchar case, but generally this is tricky, since remote database and Trino have different type systems.

In case of

where x IN (a, b, c, NULL)

the NULL should be eliminated on the engine side (generic engine rule).

Do we have other cases where this is actually useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x IN (NULL) is always false, right? Even for NULL IN (NULL) so eliminating NULLs is safe?

also x IN () (empty list) is also false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x IN (NULL) is always false, right?

result is NULL

also x IN () (empty list) is also false

This is not allowed in SQL, but it it was allowed, it would be false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the NULL should be eliminated on the engine side (generic engine rule).

BTW such rule, if written, should go in separate PR.

in this PR please keep the tests exercising IN pushdown with NULLs (asserting that these doesn't get pushed down).

{
if (type instanceof VarcharType) {
VarcharType varcharType = (VarcharType) type;
return varcharType.getLength().map(length -> format("varchar(%d)", length)).or(() -> Optional.of("varchar"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trino bounded varchar can be of length 2147483646, PostgreSQL has a different limit

Comment on lines +41 to +42
.with(functionName().equalTo(CAST_FUNCTION_NAME))
.with(argument(0).capturedAs(ARGUMENT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt

"3, 'c'",
"4, 'd'",
"5, 'f'"))) {
assertThat(query("SELECT id FROM " + table.getName() + " WHERE CAST(id AS VARCHAR(1)) = '2' OR id2 = 'd'"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trino and PostgreSQL have different CAST behavior for BIGINT -> varchar(1) cast.

trino> SELECT CAST(BIGINT '12' AS varchar(1));
Query 20220509_075440_00000_3ce4u failed: Value 12 cannot be represented as varchar(1)

PostgreSQL

SELECT CAST(BIGINT '12' AS varchar(1));
1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres ended up going the Hive way? Does the SQL spec say anything about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitsondatadev
Copy link
Member

👋 @ping - this PR is inactive and doesn't seem to be under development, and it might already be implemented in #13136. If you'd like to continue work on this at any point in the future, feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants