-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix case insensitive predicate pushdown for MySQL, MemSQL and SQL Server #6753
Fix case insensitive predicate pushdown for MySQL, MemSQL and SQL Server #6753
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % "Allow to reuse few testing servers" commit.
Reusing containers speeds up tests but we need to be sure that all existing tests clean up properly after themselves to avoid tests masking each other's failures. The most common thing I can think of is GRANTing permissions to some users.
The comments I've added are for intermediate commits which get removed in future commits and as such they are only nits.
@@ -120,6 +120,11 @@ protected TestTable createTableWithDefaultColumns() | |||
return Optional.empty(); | |||
} | |||
|
|||
if (typeName.contains("char(1)")) { | |||
// https://github.com/trinodb/trino/issues/6746 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the same link to the varchar branch above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Varchar above do not use case insensitive data. So it fails for the reason in TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand the explanation. Do you mean that MemSQL's varchar(1)
is case sensitive (by default), while char(1)
is not (by default)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. varchar
above (is not varchar(1)
) is testing UTF-8
@@ -139,6 +139,16 @@ protected boolean isColumnNameRejected(Exception exception, String columnName, b | |||
return Optional.empty(); | |||
} | |||
|
|||
if (typeName.equals("char(1)")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be merged into a single if
since the cause is same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be, but I fix them one by one in separate commits.
@@ -103,6 +103,11 @@ protected TestTable createTableWithDefaultColumns() | |||
return Optional.empty(); | |||
} | |||
|
|||
if (typeName.contains("char(1)")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is varchar
not skipped here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is. Notice contains
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh - I did not notice that. Please change to equals || equals
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or - actually - NVM - you get rid of that anway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes up to Close testing servers after QueryRunner
good. Consider extracting separete PR so we merge already.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/PredicatePushdownController.java
Outdated
Show resolved
Hide resolved
|
||
DomainPushdownResult disabledPredicatePushdown = new DomainPushdownResult(Domain.all(domain.getType()), domain); | ||
|
||
if (!domain.getValues().isDiscreteSet() && domain.getValues().complement().isDiscreteSet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already covered by the for
below. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The below handles ranges like -inf, value
or value, +inf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one below handles any case where we have range which looks like ..., x)
or (x,...
. Which is also true for case which is covered by this if
. Unless I am missing something.
@kokosing i very much want to review this. Currently this is 14 commits though. Also, for benefit of everyone involved here, can you please add description to the PR? |
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
BTW please fix the names: MySQL, MemSQL, SQL Server. |
...ino-sqlserver/src/test/java/io/trino/plugin/sqlserver/TestSqlServerIntegrationSmokeTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commits reagarding predicatate pushdown changes mostly good. Minor comments. Thanks.
I did not review the changes regarding container reusability. No context on that.
public final class TestContainers | ||
{ | ||
// Please turn it on locally so container will survive JVM reboot. | ||
private static final boolean TESTCONTAINERS_REUSE_ENABLE = "true".equalsIgnoreCase(getenv("TESTCONTAINERS_REUSE_ENABLE")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boolean.getBoolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boolean.getBoolean
This is using system property, while I used env variable, like testcontainers did.
testing/trino-testing/src/main/java/io/trino/testing/containers/TestContainers.java
Outdated
Show resolved
Hide resolved
public final class TestContainers | ||
{ | ||
// Please turn it on locally so container will survive JVM reboot. | ||
private static final boolean TESTCONTAINERS_REUSE_ENABLE = "true".equalsIgnoreCase(getenv("TESTCONTAINERS_REUSE_ENABLE")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REUSE_ENABLE -> ENABLE_REUSE or REUSE_ENABLED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I cannot change this. This env variable is defined in testcontainers. I am just verifying it if it works as expected. Also it matches testcontainers config property testcontainers.reuse.enable
.
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
6b59231
to
0a2908a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public final class TestContainers | ||
{ | ||
// Please turn it on locally so container will survive JVM reboot. | ||
private static final boolean TESTCONTAINERS_REUSE_ENABLE = "true".equalsIgnoreCase(getenv("TESTCONTAINERS_REUSE_ENABLE")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I cannot change this. This env variable is defined in testcontainers. I am just verifying it if it works as expected. Also it matches testcontainers config property testcontainers.reuse.enable
.
public final class TestContainers | ||
{ | ||
// Please turn it on locally so container will survive JVM reboot. | ||
private static final boolean TESTCONTAINERS_REUSE_ENABLE = "true".equalsIgnoreCase(getenv("TESTCONTAINERS_REUSE_ENABLE")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boolean.getBoolean
This is using system property, while I used env variable, like testcontainers did.
@@ -120,6 +120,11 @@ protected TestTable createTableWithDefaultColumns() | |||
return Optional.empty(); | |||
} | |||
|
|||
if (typeName.contains("char(1)")) { | |||
// https://github.com/trinodb/trino/issues/6746 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Varchar above do not use case insensitive data. So it fails for the reason in TODO.
@@ -139,6 +139,16 @@ protected boolean isColumnNameRejected(Exception exception, String columnName, b | |||
return Optional.empty(); | |||
} | |||
|
|||
if (typeName.equals("char(1)")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be, but I fix them one by one in separate commits.
@@ -103,6 +103,11 @@ protected TestTable createTableWithDefaultColumns() | |||
return Optional.empty(); | |||
} | |||
|
|||
if (typeName.contains("char(1)")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is. Notice contains
.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/PredicatePushdownController.java
Outdated
Show resolved
Hide resolved
|
||
DomainPushdownResult disabledPredicatePushdown = new DomainPushdownResult(Domain.all(domain.getType()), domain); | ||
|
||
if (!domain.getValues().isDiscreteSet() && domain.getValues().complement().isDiscreteSet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The below handles ranges like -inf, value
or value, +inf
...ino-sqlserver/src/test/java/io/trino/plugin/sqlserver/TestSqlServerIntegrationSmokeTest.java
Show resolved
Hide resolved
...ino-sqlserver/src/test/java/io/trino/plugin/sqlserver/TestSqlServerIntegrationSmokeTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
mind the red (I did not check if related) |
@@ -290,18 +291,21 @@ protected String getTableSchemaName(ResultSet resultSet) | |||
return Optional.of(decimalColumnMapping(createDecimalType(precision, max(decimalDigits, 0)))); | |||
|
|||
case Types.CHAR: | |||
return Optional.of(defaultCharColumnMapping(typeHandle.getRequiredColumnSize())); | |||
int columnSize = typeHandle.getRequiredColumnSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract method as you did for LONGNVARCHAR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted method is reused. I would prefer to keep as it is.
0a2908a
to
15860e5
Compare
@findepi Rebased. 🤯 I can't believe my eyes. Postgres fails the tests. |
Postgres is case sensitive but it has different collation. Should |
More tests:
|
This is covered by #3645 |
15860e5
to
0647256
Compare
@@ -30,7 +30,7 @@ | |||
} | |||
return new DomainPushdownResult(domain, Domain.all(domain.getType())); | |||
}; | |||
PredicatePushdownController PUSHDOWN_AND_KEEP = (session, domain) -> new DomainPushdownResult( | |||
PredicatePushdownController CASE_INSENSITIVE_PUSHDOWN = (session, domain) -> new DomainPushdownResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename PredicatePushdownController to match semantic
This commit does not rename PredicatePushdownController
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the PredicatePushdownController
interface is type-agnostic and "case-insensitive" is applicable to char/varchar type only, so maybe:
CASE_INSENSITIVE_PUSHDOWN -> eg CASE_INSENSITIVE_CHARACTER_PUSHDOWN
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDistributedQueries.java
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDistributedQueries.java
Show resolved
Hide resolved
plugin/trino-memsql/src/test/java/io/trino/plugin/memsql/TestMemSqlDistributedQueries.java
Outdated
Show resolved
Hide resolved
@@ -120,6 +120,11 @@ protected TestTable createTableWithDefaultColumns() | |||
return Optional.empty(); | |||
} | |||
|
|||
if (typeName.contains("char(1)")) { | |||
// https://github.com/trinodb/trino/issues/6746 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand the explanation. Do you mean that MemSQL's varchar(1)
is case sensitive (by default), while char(1)
is not (by default)?
plugin/trino-mysql/src/main/java/io/trino/plugin/mysql/MySqlClient.java
Outdated
Show resolved
Hide resolved
@@ -82,11 +82,11 @@ | |||
import static io.trino.plugin.jdbc.PredicatePushdownController.FULL_PUSHDOWN; | |||
import static io.trino.plugin.jdbc.StandardColumnMappings.bigintColumnMapping; | |||
import static io.trino.plugin.jdbc.StandardColumnMappings.bigintWriteFunction; | |||
import static io.trino.plugin.jdbc.StandardColumnMappings.charReadFunction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case insensitive predicate pushdown for char type in MySQL
Can you please make it a sentence?
plugin/trino-mysql/src/main/java/io/trino/plugin/mysql/MySqlClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-sqlserver/src/main/java/io/trino/plugin/sqlserver/SqlServerClient.java
Outdated
Show resolved
Hide resolved
@@ -120,11 +120,6 @@ protected TestTable createTableWithDefaultColumns() | |||
return Optional.empty(); | |||
} | |||
|
|||
if (typeName.contains("char(1)")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the TODO removals being scattered across commits, it's harder to see how many of them remained.
Also, the code in all these connectors looks very repetitive, so I think you can safely squash up the commit adding tests and commits fixing the problem across connectors, without any downside on the reviewability.
0647256
to
d501959
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDistributedQueries.java
Show resolved
Hide resolved
@@ -120,6 +120,11 @@ protected TestTable createTableWithDefaultColumns() | |||
return Optional.empty(); | |||
} | |||
|
|||
if (typeName.contains("char(1)")) { | |||
// https://github.com/trinodb/trino/issues/6746 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. varchar
above (is not varchar(1)
) is testing UTF-8
domain.simplify(getDomainCompactionThreshold(session)), | ||
domain); | ||
} | ||
return new DomainPushdownResult(Domain.all(domain.getType()), domain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That requires field reordering.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/PredicatePushdownController.java
Show resolved
Hide resolved
if (typeName.equals("time") | ||
|| typeName.equals("timestamp(3) with time zone")) { | ||
return Optional.of(dataMappingTestSetup.asUnsupported()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad rebase
@@ -290,18 +291,21 @@ protected String getTableSchemaName(ResultSet resultSet) | |||
return Optional.of(decimalColumnMapping(createDecimalType(precision, max(decimalDigits, 0)))); | |||
|
|||
case Types.CHAR: | |||
return Optional.of(defaultCharColumnMapping(typeHandle.getRequiredColumnSize())); | |||
int columnSize = typeHandle.getRequiredColumnSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted method is reused. I would prefer to keep as it is.
d501959
to
61d5d63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
Can you please changes as fixups?
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDistributedQueries.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/PredicatePushdownController.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/PredicatePushdownController.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/PredicatePushdownController.java
Show resolved
Hide resolved
plugin/trino-memsql/src/main/java/io/trino/plugin/memsql/MemSqlClient.java
Outdated
Show resolved
Hide resolved
...no-postgresql/src/test/java/io/trino/plugin/postgresql/TestPostgreSqlDistributedQueries.java
Outdated
Show resolved
Hide resolved
plugin/trino-sqlserver/src/main/java/io/trino/plugin/sqlserver/SqlServerClient.java
Outdated
Show resolved
Hide resolved
...ino-sqlserver/src/test/java/io/trino/plugin/sqlserver/TestSqlServerIntegrationSmokeTest.java
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDistributedQueries.java
Outdated
Show resolved
Hide resolved
61d5d63
to
58ab40b
Compare
@findepi PTAL |
@@ -1203,23 +1203,23 @@ public void testDataMappingSmokeTest(DataMappingTestSetup dataMappingTestSetup) | |||
|
|||
// without pushdown, i.e. test read data mapping | |||
assertQuery("SELECT row_id FROM " + tableName + " WHERE rand() = 42 OR value IS NULL", "VALUES 'null value'"); | |||
assertQuery("SELECT row_id FROM " + tableName + " WHERE rand() = 42 OR value IS NOT NULL", "VALUES ('sample value'), ('high value')"); | |||
assertQuery("SELECT row_id FROM " + tableName + " WHERE rand() = 42 OR value IS NOT NULL", "VALUES 'sample value', 'high value'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify testDataMappingSmokeTest
can be extracted and merged already.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/StandardColumnMappings.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/StandardColumnMappings.java
Outdated
Show resolved
Hide resolved
plugin/trino-memsql/src/main/java/io/trino/plugin/memsql/MemSqlClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-sqlserver/src/main/java/io/trino/plugin/sqlserver/SqlServerClient.java
Outdated
Show resolved
Hide resolved
...trino-cassandra/src/test/java/io/trino/plugin/cassandra/BaseCassandraDistributedQueries.java
Outdated
Show resolved
Hide resolved
@Override | ||
public void testCaseSensitiveDataMapping(DataMappingTestSetup dataMappingTestSetup) | ||
{ | ||
throw new SkipException("https://github.com/trinodb/trino/issues/3645 - PostgreSQL has different collation than Trino"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please
assertThatThrowBy(super...)
.hasMessage("something something");
before skipping?
2e37870
to
80d11f4
Compare
In case insensitive predicate pushdown, consider values 'A' and 'a'. Both are equal in data source, but different in Trino. It was safe to pushdown `x = 'a'` or `x = 'A'` as both values where always returned and we could filter them out in Trino. But operators like `!=`, `<` or `>` may return no value at all, which is incorrect as Trino is case sensitive only today. Consider also a casse of `B` and `a`. In Trino `B` is lower than `a`, but if you compare them in remote data source which is casee insensitive then `B` or `b` are higher `A` or `b`. Considering the above we can only push down the predicate for case insensitive column only when `=` operator is used.
80d11f4
to
9f04547
Compare
} | ||
|
||
// case insensitive predicate pushdown could return incorrect results for operators like `!=`, `<` or `>` | ||
return DISABLE_PUSHDOWN.apply(session, domain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, it may be reasonable to have a session toggle to unlock legacy unsafe, performant behavior -- especially for the cases where user knows it's safe (and we do not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me do it as separate PR.
CI hit #7009 |
try { | ||
super.testCaseSensitiveDataMapping(dataMappingTestSetup); | ||
} | ||
catch (AssertionError ignored) { | ||
// TODO https://github.com/trinodb/trino/issues/3645 - PostgreSQL has different collation than Trino | ||
assertThatThrownBy(() -> super.testCaseSensitiveDataMapping(dataMappingTestSetup)) | ||
.hasStackTraceContaining("not equal\nActual rows"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a long version of saying
// TODO https://github.com/trinodb/trino/issues/3645 - PostgreSQL has different collation than Trino
as we do not validate whether test passes or fails.
Also, this should throw SkippedException, instead of passing, so
// TODO https://github.com/trinodb/trino/issues/3645 - PostgreSQL has different collation than Trino
throw new SkippedException("TODO");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, see: #7032
Fixes #6746
Fixes #6671