Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix case insensitive predicate pushdown for MySQL, MemSQL and SQL Server #6753

Merged
merged 2 commits into from
Feb 23, 2021

Conversation

kokosing
Copy link
Member

@kokosing kokosing commented Jan 28, 2021

Fixes #6746
Fixes #6671

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % "Allow to reuse few testing servers" commit.

Reusing containers speeds up tests but we need to be sure that all existing tests clean up properly after themselves to avoid tests masking each other's failures. The most common thing I can think of is GRANTing permissions to some users.

The comments I've added are for intermediate commits which get removed in future commits and as such they are only nits.

@@ -120,6 +120,11 @@ protected TestTable createTableWithDefaultColumns()
return Optional.empty();
}

if (typeName.contains("char(1)")) {
// https://github.com/trinodb/trino/issues/6746
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the same link to the varchar branch above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Varchar above do not use case insensitive data. So it fails for the reason in TODO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand the explanation. Do you mean that MemSQL's varchar(1) is case sensitive (by default), while char(1) is not (by default)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. varchar above (is not varchar(1)) is testing UTF-8

@@ -139,6 +139,16 @@ protected boolean isColumnNameRejected(Exception exception, String columnName, b
return Optional.empty();
}

if (typeName.equals("char(1)")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be merged into a single if since the cause is same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, but I fix them one by one in separate commits.

@@ -103,6 +103,11 @@ protected TestTable createTableWithDefaultColumns()
return Optional.empty();
}

if (typeName.contains("char(1)")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is varchar not skipped here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is. Notice contains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - I did not notice that. Please change to equals || equals :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or - actually - NVM - you get rid of that anway.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes up to Close testing servers after QueryRunner good. Consider extracting separete PR so we merge already.


DomainPushdownResult disabledPredicatePushdown = new DomainPushdownResult(Domain.all(domain.getType()), domain);

if (!domain.getValues().isDiscreteSet() && domain.getValues().complement().isDiscreteSet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already covered by the for below. Right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The below handles ranges like -inf, value or value, +inf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one below handles any case where we have range which looks like ..., x) or (x,.... Which is also true for case which is covered by this if. Unless I am missing something.

@findepi
Copy link
Member

findepi commented Jan 29, 2021

@kokosing i very much want to review this. Currently this is 14 commits though.
Is there any way to reduce scope, eg review some simple refactors separately? Or you're concerned those refactors do not make sense on its own, without subsequent changes?

Also, for benefit of everyone involved here, can you please add description to the PR?

@findepi
Copy link
Member

findepi commented Jan 29, 2021

MySql, MemSql and SqlServer

BTW please fix the names: MySQL, MemSQL, SQL Server.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits reagarding predicatate pushdown changes mostly good. Minor comments. Thanks.

I did not review the changes regarding container reusability. No context on that.

public final class TestContainers
{
// Please turn it on locally so container will survive JVM reboot.
private static final boolean TESTCONTAINERS_REUSE_ENABLE = "true".equalsIgnoreCase(getenv("TESTCONTAINERS_REUSE_ENABLE"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean.getBoolean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean.getBoolean

This is using system property, while I used env variable, like testcontainers did.

public final class TestContainers
{
// Please turn it on locally so container will survive JVM reboot.
private static final boolean TESTCONTAINERS_REUSE_ENABLE = "true".equalsIgnoreCase(getenv("TESTCONTAINERS_REUSE_ENABLE"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REUSE_ENABLE -> ENABLE_REUSE or REUSE_ENABLED

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I cannot change this. This env variable is defined in testcontainers. I am just verifying it if it works as expected. Also it matches testcontainers config property testcontainers.reuse.enable.

@kokosing kokosing force-pushed the origin/master/270_case_chars branch from 6b59231 to 0a2908a Compare January 30, 2021 23:08
Copy link
Member Author

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi @losipiuk CA, PTALA

public final class TestContainers
{
// Please turn it on locally so container will survive JVM reboot.
private static final boolean TESTCONTAINERS_REUSE_ENABLE = "true".equalsIgnoreCase(getenv("TESTCONTAINERS_REUSE_ENABLE"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I cannot change this. This env variable is defined in testcontainers. I am just verifying it if it works as expected. Also it matches testcontainers config property testcontainers.reuse.enable.

public final class TestContainers
{
// Please turn it on locally so container will survive JVM reboot.
private static final boolean TESTCONTAINERS_REUSE_ENABLE = "true".equalsIgnoreCase(getenv("TESTCONTAINERS_REUSE_ENABLE"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean.getBoolean

This is using system property, while I used env variable, like testcontainers did.

@@ -120,6 +120,11 @@ protected TestTable createTableWithDefaultColumns()
return Optional.empty();
}

if (typeName.contains("char(1)")) {
// https://github.com/trinodb/trino/issues/6746
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Varchar above do not use case insensitive data. So it fails for the reason in TODO.

@@ -139,6 +139,16 @@ protected boolean isColumnNameRejected(Exception exception, String columnName, b
return Optional.empty();
}

if (typeName.equals("char(1)")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, but I fix them one by one in separate commits.

@@ -103,6 +103,11 @@ protected TestTable createTableWithDefaultColumns()
return Optional.empty();
}

if (typeName.contains("char(1)")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is. Notice contains.


DomainPushdownResult disabledPredicatePushdown = new DomainPushdownResult(Domain.all(domain.getType()), domain);

if (!domain.getValues().isDiscreteSet() && domain.getValues().complement().isDiscreteSet()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The below handles ranges like -inf, value or value, +inf

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@losipiuk
Copy link
Member

losipiuk commented Feb 1, 2021

mind the red (I did not check if related)

@findepi
Copy link
Member

findepi commented Feb 1, 2021

@findepi @losipiuk CA, PTALA

@kokosing would it work for you if we got #6759 in, rebased this & and then reviewed?

@@ -290,18 +291,21 @@ protected String getTableSchemaName(ResultSet resultSet)
return Optional.of(decimalColumnMapping(createDecimalType(precision, max(decimalDigits, 0))));

case Types.CHAR:
return Optional.of(defaultCharColumnMapping(typeHandle.getRequiredColumnSize()));
int columnSize = typeHandle.getRequiredColumnSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract method as you did for LONGNVARCHAR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted method is reused. I would prefer to keep as it is.

@kokosing
Copy link
Member Author

kokosing commented Feb 1, 2021

@kokosing would it work for you if we got #6759 in, rebased this & and then reviewed?

@findepi Sure, also I would like to get #6776 in before rebasing. Reusing makes it nicer to work with MySQL connector.

@kokosing kokosing force-pushed the origin/master/270_case_chars branch from 0a2908a to 15860e5 Compare February 6, 2021 19:45
@kokosing
Copy link
Member Author

kokosing commented Feb 6, 2021

@findepi Rebased.

🤯 I can't believe my eyes. Postgres fails the tests.

@kokosing
Copy link
Member Author

kokosing commented Feb 6, 2021

tpch=# select * from tpch.test_data_mapping_smoke_char_1__jg9rm order by value;
    row_id    | value
--------------+-------
 high value   | a
 sample value | A
 null value   |
(3 rows)

tpch=# select * from tpch.test_data_mapping_smoke_char_1__jg9rm where value < 'A';
   row_id   | value
------------+-------
 high value | a
(1 row)

tpch=# select * from tpch.test_data_mapping_smoke_char_1__jg9rm where value > 'A';
 row_id | value
--------+-------
(0 rows)

Postgres is case sensitive but it has different collation. Should CASE_INSENSITIVE_PUSHDOWN become default for char and varchar columns?

@kokosing
Copy link
Member Author

kokosing commented Feb 6, 2021

More tests:

tpch=# select *, value = 'a' from tpch.test_data_mapping_smoke_char_1__chek9;
    row_id    | value | ?column?
--------------+-------+----------
 null value   |       |
 sample value | A     | f
 high value   | a     | t
(3 rows)

tpch=# select *, value = 'a' from tpch.test_data_mapping_smoke_char_1__chek9 order by value;
    row_id    | value | ?column?
--------------+-------+----------
 high value   | a     | t
 sample value | A     | f
 null value   |       |
(3 rows)

tpch=# select *, value = 'a' from tpch.test_data_mapping_smoke_char_1__chek9 order by value desc;
    row_id    | value | ?column?
--------------+-------+----------
 null value   |       |
 sample value | A     | f
 high value   | a     | t
(3 rows)

@findepi
Copy link
Member

findepi commented Feb 6, 2021

Postgres is case sensitive but it has different collation. Should CASE_INSENSITIVE_PUSHDOWN become default for char and varchar columns?

This is covered by #3645

@kokosing kokosing force-pushed the origin/master/270_case_chars branch from 15860e5 to 0647256 Compare February 7, 2021 20:16
@@ -30,7 +30,7 @@
}
return new DomainPushdownResult(domain, Domain.all(domain.getType()));
};
PredicatePushdownController PUSHDOWN_AND_KEEP = (session, domain) -> new DomainPushdownResult(
PredicatePushdownController CASE_INSENSITIVE_PUSHDOWN = (session, domain) -> new DomainPushdownResult(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename PredicatePushdownController to match semantic

This commit does not rename PredicatePushdownController

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PredicatePushdownController interface is type-agnostic and "case-insensitive" is applicable to char/varchar type only, so maybe:

CASE_INSENSITIVE_PUSHDOWN -> eg CASE_INSENSITIVE_CHARACTER_PUSHDOWN

@@ -120,6 +120,11 @@ protected TestTable createTableWithDefaultColumns()
return Optional.empty();
}

if (typeName.contains("char(1)")) {
// https://github.com/trinodb/trino/issues/6746
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand the explanation. Do you mean that MemSQL's varchar(1) is case sensitive (by default), while char(1) is not (by default)?

@@ -82,11 +82,11 @@
import static io.trino.plugin.jdbc.PredicatePushdownController.FULL_PUSHDOWN;
import static io.trino.plugin.jdbc.StandardColumnMappings.bigintColumnMapping;
import static io.trino.plugin.jdbc.StandardColumnMappings.bigintWriteFunction;
import static io.trino.plugin.jdbc.StandardColumnMappings.charReadFunction;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case insensitive predicate pushdown for char type in MySQL

Can you please make it a sentence?

@@ -120,11 +120,6 @@ protected TestTable createTableWithDefaultColumns()
return Optional.empty();
}

if (typeName.contains("char(1)")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the TODO removals being scattered across commits, it's harder to see how many of them remained.
Also, the code in all these connectors looks very repetitive, so I think you can safely squash up the commit adding tests and commits fixing the problem across connectors, without any downside on the reviewability.

@kokosing kokosing force-pushed the origin/master/270_case_chars branch from 0647256 to d501959 Compare February 17, 2021 12:37
Copy link
Member Author

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@losipiuk @findepi CA, PTAL

@@ -120,6 +120,11 @@ protected TestTable createTableWithDefaultColumns()
return Optional.empty();
}

if (typeName.contains("char(1)")) {
// https://github.com/trinodb/trino/issues/6746
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. varchar above (is not varchar(1)) is testing UTF-8

domain.simplify(getDomainCompactionThreshold(session)),
domain);
}
return new DomainPushdownResult(Domain.all(domain.getType()), domain);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That requires field reordering.

Comment on lines 94 to 96
if (typeName.equals("time")
|| typeName.equals("timestamp(3) with time zone")) {
return Optional.of(dataMappingTestSetup.asUnsupported());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad rebase

@@ -290,18 +291,21 @@ protected String getTableSchemaName(ResultSet resultSet)
return Optional.of(decimalColumnMapping(createDecimalType(precision, max(decimalDigits, 0))));

case Types.CHAR:
return Optional.of(defaultCharColumnMapping(typeHandle.getRequiredColumnSize()));
int columnSize = typeHandle.getRequiredColumnSize();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted method is reused. I would prefer to keep as it is.

@kokosing kokosing force-pushed the origin/master/270_case_chars branch from d501959 to 61d5d63 Compare February 17, 2021 12:47
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

Can you please changes as fixups?

@kokosing kokosing force-pushed the origin/master/270_case_chars branch from 61d5d63 to 58ab40b Compare February 19, 2021 13:24
@kokosing
Copy link
Member Author

@findepi PTAL

@findepi findepi changed the title Fix case insensitive predicate pushdown for MySql, MemSql and SqlServer Fix case insensitive predicate pushdown for MySQL, MemSQL and SQL Server Feb 21, 2021
@@ -1203,23 +1203,23 @@ public void testDataMappingSmokeTest(DataMappingTestSetup dataMappingTestSetup)

// without pushdown, i.e. test read data mapping
assertQuery("SELECT row_id FROM " + tableName + " WHERE rand() = 42 OR value IS NULL", "VALUES 'null value'");
assertQuery("SELECT row_id FROM " + tableName + " WHERE rand() = 42 OR value IS NOT NULL", "VALUES ('sample value'), ('high value')");
assertQuery("SELECT row_id FROM " + tableName + " WHERE rand() = 42 OR value IS NOT NULL", "VALUES 'sample value', 'high value'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify testDataMappingSmokeTest can be extracted and merged already.

@Override
public void testCaseSensitiveDataMapping(DataMappingTestSetup dataMappingTestSetup)
{
throw new SkipException("https://github.com/trinodb/trino/issues/3645 - PostgreSQL has different collation than Trino");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please

assertThatThrowBy(super...)
  .hasMessage("something something");

before skipping?

@kokosing kokosing force-pushed the origin/master/270_case_chars branch 3 times, most recently from 2e37870 to 80d11f4 Compare February 23, 2021 11:48
In case insensitive predicate pushdown, consider values 'A' and 'a'.
Both are equal in data source, but different in Trino.

It was safe to pushdown `x = 'a'` or `x = 'A'` as both values where
always returned and we could filter them out in Trino.

But operators like `!=`, `<` or `>` may return no value at all, which is
incorrect as Trino is case sensitive only today.

Consider also a casse of `B` and `a`. In Trino `B` is lower than `a`,
but if you compare them in remote data source which is casee
insensitive then `B` or `b` are higher `A` or `b`.

Considering the above we can only push down the predicate
for case insensitive column only when `=` operator is used.
@kokosing kokosing force-pushed the origin/master/270_case_chars branch from 80d11f4 to 9f04547 Compare February 23, 2021 13:39
}

// case insensitive predicate pushdown could return incorrect results for operators like `!=`, `<` or `>`
return DISABLE_PUSHDOWN.apply(session, domain);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, it may be reasonable to have a session toggle to unlock legacy unsafe, performant behavior -- especially for the cases where user knows it's safe (and we do not)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me do it as separate PR.

@kokosing
Copy link
Member Author

CI hit #7009

@kokosing kokosing merged commit 2da5aa2 into trinodb:master Feb 23, 2021
@kokosing kokosing deleted the origin/master/270_case_chars branch February 23, 2021 19:48
@kokosing kokosing mentioned this pull request Feb 23, 2021
10 tasks
Comment on lines +735 to +742
try {
super.testCaseSensitiveDataMapping(dataMappingTestSetup);
}
catch (AssertionError ignored) {
// TODO https://github.com/trinodb/trino/issues/3645 - PostgreSQL has different collation than Trino
assertThatThrownBy(() -> super.testCaseSensitiveDataMapping(dataMappingTestSetup))
.hasStackTraceContaining("not equal\nActual rows");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a long version of saying

// TODO https://github.com/trinodb/trino/issues/3645 - PostgreSQL has different collation than Trino

as we do not validate whether test passes or fails.

Also, this should throw SkippedException, instead of passing, so

// TODO https://github.com/trinodb/trino/issues/3645 - PostgreSQL has different collation than Trino
throw new SkippedException("TODO");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, see: #7032

@findepi findepi mentioned this pull request Mar 5, 2021
@martint martint added this to the 353 milestone Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
6 participants