Migrate Elasticsearch tests to use BaseConnectorTest #7109

losipiuk · 2021-03-02T11:57:25Z

#6989 introduced new BaseConnectorTest test class which replaces AbstractTestDistributedQueries and AbstractTestIntegrationSmokeTest.

Scope of this ticket is to restructure Elasticsearch test code to follow new pattern.
BaseElasticsearchSmokeTest should be renamed to BaseTestElasticsearchConnectorTest and extend BaseConnectorTest instead of AbstractTestIntegrationSmokeTest
TestElasticsearch6IntegrationSmokeTest should be renamed to TestElasticsearch6ConnectorTest.
TestElasticsearch7IntegrationSmokeTest should be renamed to TestElasticsearch7ConnectorTest.

A PR which restructures MySQL tests is a good reference: #7011

The text was updated successfully, but these errors were encountered:

ToumaKazusa-san · 2021-03-06T10:04:44Z

I am pretty interested in this issuse, I want to deal with it. Could you give me a chance to solve this problem?

losipiuk · 2021-03-06T14:11:38Z

Sure. I assigned you. Thanks @ToumaKazusa-san

findinpath · 2021-04-14T08:13:49Z

I've started to work on this one seeing that it's been a month since it was assigned.
Here is my draft branch https://github.com/findinpath/trino/tree/refactoring/7109
Can I be assigned on this one?

losipiuk · 2021-04-14T08:19:15Z

Can I be assigned on this one?

Sure. Thanks @findinpath

@ToumaKazusa-san are you still working on this one?

findinpath · 2021-04-14T11:38:34Z

@losipiuk any idea how to tackle the test io.trino.testing.AbstractTestQueries#testShowColumns

        MaterializedResult actual = computeActual("SHOW COLUMNS FROM orders");

        MaterializedResult expectedUnparameterizedVarchar = resultBuilder(getSession(), VARCHAR, VARCHAR, VARCHAR, VARCHAR)
                .row("orderkey", "bigint", "", "")
                .row("custkey", "bigint", "", "")
                .row("orderstatus", "varchar", "", "")
                .row("totalprice", "double", "", "")
                .row("orderdate", "date", "", "")
                .row("orderpriority", "varchar", "", "")
                .row("clerk", "varchar", "", "")
                .row("shippriority", "integer", "", "")
                .row("comment", "varchar", "", "")
                .build();
...
        assertTrue(actual.equals(expectedParameterizedVarchar)

but Elasticsearch delivers the table column metadata ordered by name alphabetically

➜  ~ curl -X GET "localhost:32833/orders/_mappings" | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   306  100   306    0     0  25500      0 --:--:-- --:--:-- --:--:-- 25500
{
  "orders": {
    "mappings": {
      "doc": {
        "properties": {
          "clerk": {
            "type": "text"
          },
          "comment": {
            "type": "text"
          },
          "custkey": {
            "type": "long"
          },
          "orderdate": {
            "type": "date"
          },
          "orderkey": {
            "type": "long"
          },
          "orderpriority": {
            "type": "text"
          },
          "orderstatus": {
            "type": "text"
          },
          "shippriority": {
            "type": "integer"
          },
          "totalprice": {
            "type": "double"
          }
        }
      }
    }
  }
}

which makes the test to fail.

I've also tried creating the index explicitly by specifying the columns in the same order as they are delivered from the tpch orders result set via a create index call in io.trino.plugin.elasticsearch.ElasticsearchLoader but the result is the same.

The test obviously fails because the Elasticsearch columns are ordered alphabetically.
This also causes the test io.trino.testing.AbstractTestQueries#testLimitPushDown to fail (due to same reason as described above).

losipiuk · 2021-04-14T11:55:00Z

@losipiuk any idea how to tackle the test io.trino.testing.AbstractTestQueries#testShowColumns

        MaterializedResult actual = computeActual("SHOW COLUMNS FROM orders");

        MaterializedResult expectedUnparameterizedVarchar = resultBuilder(getSession(), VARCHAR, VARCHAR, VARCHAR, VARCHAR)
                .row("orderkey", "bigint", "", "")
                .row("custkey", "bigint", "", "")
                .row("orderstatus", "varchar", "", "")
                .row("totalprice", "double", "", "")
                .row("orderdate", "date", "", "")
                .row("orderpriority", "varchar", "", "")
                .row("clerk", "varchar", "", "")
                .row("shippriority", "integer", "", "")
                .row("comment", "varchar", "", "")
                .build();
...
        assertTrue(actual.equals(expectedParameterizedVarchar)

but Elasticsearch delivers the table column metadata ordered by name alphabetically

which makes the test to fail.

I've also tried creating the index explicitly by specifying the columns in the same order as they are delivered from the tpch orders result set via a create index call in io.trino.plugin.elasticsearch.ElasticsearchLoader but the result is the same.

The test obviously fails because the Elasticsearch columns are ordered alphabetically.

Just provide an override for testShowColumns. It is not uncommon to do so. See other connectors, there is a bunch of overrides already.

This also causes the test io.trino.testing.AbstractTestQueries#testLimitPushDown to fail (due to same reason as described above).

Can you clarify? I do not see a reason why different column ordering should fail this test.

findinpath · 2021-04-14T12:00:58Z

This also causes the test io.trino.testing.AbstractTestQueries#testLimitPushDown to fail (due to same reason as described above).

Can you clarify? I do not see a reason why different column ordering should fail this test.

MaterializedResult actual = computeActual(
                "(TABLE orders ORDER BY orderkey) UNION ALL " +
                        "SELECT * FROM orders WHERE orderstatus = 'F' UNION ALL " +
                        "(TABLE orders ORDER BY orderkey LIMIT 20) UNION ALL " +
                        "(TABLE orders LIMIT 5) UNION ALL " +
                        "TABLE orders LIMIT 10");
        MaterializedResult all = computeExpected("SELECT * FROM orders", actual.getTypes());
        assertEquals(actual.getMaterializedRows().size(), 10);
        assertContains(all, actual);

The call

computeExpected("SELECT * FROM orders", actual.getTypes())

will fail, because in the translation of the resultset from H2 it will rely on the list of types retrieved from Elasticsearch (which is alphabetical) and therefore the conversion of the values from the result set will throw a conversion exception.

losipiuk · 2021-04-14T12:38:08Z

This also causes the test io.trino.testing.AbstractTestQueries#testLimitPushDown to fail (due to same reason as described above).

Can you clarify? I do not see a reason why different column ordering should fail this test.
MaterializedResult actual = computeActual(
                "(TABLE orders ORDER BY orderkey) UNION ALL " +
                        "SELECT * FROM orders WHERE orderstatus = 'F' UNION ALL " +
                        "(TABLE orders ORDER BY orderkey LIMIT 20) UNION ALL " +
                        "(TABLE orders LIMIT 5) UNION ALL " +
                        "TABLE orders LIMIT 10");
        MaterializedResult all = computeExpected("SELECT * FROM orders", actual.getTypes());
        assertEquals(actual.getMaterializedRows().size(), 10);
        assertContains(all, actual);
The call
computeExpected("SELECT * FROM orders", actual.getTypes())
will fail, because in the translation of the resultset from H2 it will rely on the list of types retrieved from Elasticsearch (which is alphabetical) and therefore the conversion of the values from the result set will throw a conversion exception.

I see. Please override this test for now.
And expand the * in MaterializedResult all = computeExpected("SELECT * FROM orders", actual.getTypes()); to match the ordering in ES. Should help.

And please put explanatory comment in overridden test method.

findinpath · 2021-04-14T21:15:23Z

I've stumbled on a different problem while working on io.trino.testing.AbstractTestQueries#testLimitWithAggregation

        MaterializedResult actual = computeActual("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey LIMIT 10");
        MaterializedResult all = computeExpected("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey", actual.getTypes());

Some of the double value for totalprice in the orders H2 table while being casted to BIGINT get rounded.
See the image below:

Note that after the casting to BIGINT we get the value 32000489 instead of 32000488.
I've tried to reproduce the problem locally with a small jdbi & H2 memory db test , but somehow there is no rounding happening.

Any ideas on this one? This is not related to Elasticsearch, but rather to Jdbi & H2

losipiuk · 2021-04-15T12:23:59Z

I've stumbled on a different problem while working on io.trino.testing.AbstractTestQueries#testLimitWithAggregation
        MaterializedResult actual = computeActual("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey LIMIT 10");
        MaterializedResult all = computeExpected("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey", actual.getTypes());
Some of the double value for totalprice in the orders H2 table while being casted to BIGINT get rounded.
See the image below:

Note that after the casting to BIGINT we get the value 32000489 instead of 32000488.
I've tried to reproduce the problem locally with a small jdbi & H2 memory db test , but somehow there is no rounding happening.

Any ideas on this one? This is not related to Elasticsearch, but rather to Jdbi & H2

It look weird. Note that we are aggregating on totalPrice * 100. So there should not be any fractional part.
But maybe there still is something if Elastic maps totalPrice to double. I most other databases we would map it to DECIMAL(..,2) and rounding errors are not a problem there.

Is the problem you are describing reproducable on your working branch already? I would like to play with that.

findinpath · 2021-04-15T16:04:45Z

Yes. I've pushed the code already. Also have a look on slack https://trinodb.slack.com/archives/CP1MUNEUX/p1618435906295800?thread_ts=1618435906.295800&cid=CP1MUNEUX I think the problem exists since a longer time and is related to H2 rather to Elastic

…

On Thu, 15 Apr 2021, 14:24 Łukasz Osipiuk, ***@***.***> wrote: I've stumbled on a different problem while working on io.trino.testing.AbstractTestQueries#testLimitWithAggregation MaterializedResult actual = computeActual("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey LIMIT 10"); MaterializedResult all = computeExpected("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey", actual.getTypes()); Some of the double value for totalprice in the orders H2 table while being casted to BIGINT get rounded. See the image below: [image: image] <https://user-images.githubusercontent.com/57260417/114779628-a4b59f00-9d76-11eb-8d68-59add875a8f5.png> Note that after the casting to BIGINT we get the value 32000489 instead of 32000488. I've tried to reproduce the problem locally with a small jdbi & H2 memory db test , but somehow there is no rounding happening. Any ideas on this one? This is not related to Elasticsearch, but rather to Jdbi & H2 It look weird. Note that we are aggregating on totalPrice * 100. So there should not be any fractional part. But maybe there still is something if Elastic maps totalPrice to double. I most other databases we would map it to DECIMAL(..,2) and rounding errors are not a problem there. Is the problem you are describing reproducable on your working branch already? I would like to play with that. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7109 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANU3TAPB6QPLY4LG7HO5N5TTI3LG5ANCNFSM4YO2QLXQ> .

losipiuk · 2021-04-15T16:14:02Z

Yes. I've pushed the code already.

Thanks I will take a look how to work around that.

I think the problem exists since a longer time and is related to H2 rather to Elastic

Agreed. Just for connectors which use DECIMAL instead of DOUBLE it does not manifest in tests.

findinpath · 2021-04-15T20:18:35Z

The tests:

io.trino.testing.BaseConnectorTest#testPredicateReflectedInExplain
io.trino.testing.BaseConnectorTest#testSortItemsReflectedInExplain

are also failing.

Here is the relevant failed assertion:

java.lang.AssertionError: 
Expecting:
  "Fragment 0 [SINGLE]
    Output layout: [name]
    Output partitioning: SINGLE []
    Stage Execution Strategy: UNGROUPED_EXECUTION
    Output[name]
    │   Layout: [name:varchar]
    │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
    └─ RemoteSource[1]
           Layout: [name:varchar]

Fragment 1 [SOURCE]
    Output layout: [name]
    Output partitioning: SINGLE []
    Stage Execution Strategy: UNGROUPED_EXECUTION
    TableScan[elasticsearch:SCAN:nation, grouped = false]
        Layout: [name:varchar]
        Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}
        name := name::varchar
        nationkey::bigint
            :: [[42]]

"
to contain pattern:
  "(predicate|filterPredicate|constraint).{0,10}(nationkey|NATIONKEY)"

I've looked over io.trino.plugin.elasticsearch.ElasticsearchTableHandle#toString and it just introduces elasticsearch:SCAN:nation , but no reference to a predicate/filterPredicate/constraint.

@martint you've modified the ElasticsearchTableHandle#toString method not that long ago. Do you have any suggestions regarding the failing tests?

findinpath · 2021-04-15T20:53:57Z

@losipiuk regarding the test io.trino.testing.AbstractTestQueries#testLargeIn with 5000 clauses - the test fails due to a limitation of Elasticsearch which by default allows only 1024 clauses

Too_many_clauses: maxClauseCount is set to 1024 & index.query.bool.max_clause_count

This issue can be fixed by overriding the value of the setting indices.query.bool.max_clause_count to 5001 in the file /usr/share/elasticsearch/config/elasticsearch.yml in the elastic 6.x / 7.x testcontainers image.

I see two options here:

override the method io.trino.testing.AbstractTestQueries#largeInValuesCount in the BaseTestElasticsearchConnectorTest so that it provides only up to 1000value clauses

    @Override
    protected Object[][] largeInValuesCountData()
    {
        return new Object[][] {
                {200},
                {500},
                {1000}
        };
    }

provide custom elasticsearch.yml files for Elastic 6/ Elastic 7 test classes which support more than 5000 clauses.

I tend to go for the customized largeInValuesCountData implementation. Which solution would you preffer?

martint · 2021-04-15T22:33:39Z

you've modified the ElasticsearchTableHandle#toString method not that long ago. Do you have any suggestions regarding the failing tests?

Two thoughts:

Those tests are not appropriate for a generic suite. The format of the string representation of what gets shown in the table scan is connector-specific, and there's no requirement that the conform to a specific shape or contain certain keywords.
But, we should update that toString method to include information about the filter. That's information that's useful to display, regardless.

losipiuk · 2021-04-16T09:11:42Z

I tend to go for the customized largeInValuesCountData implementation. Which solution would you preffer?

This is fine :)

losipiuk · 2021-04-16T09:12:33Z

Regarding io.trino.testing.AbstractTestQueries#testLimitWithAggregation see #7610

findinpath · 2021-04-16T09:16:39Z

@losipiuk based on the previous comment from @martint I've added (at the moment) empty implementations for the methods:

testPredicateReflectedInExplain
testSortItemsReflectedInExplain

I'd appreciate some hints here on how to fill these methods with a concrete implementation.

losipiuk · 2021-04-16T09:34:35Z

@losipiuk based on the previous comment from @martint I've added (at the moment) empty implementations for the methods:

testPredicateReflectedInExplain

testSortItemsReflectedInExplain

I'd appreciate some hints here on how to fill these methods with a concrete implementation.

Maybe just override test method with one using ES specific assertion. Looking for

nationkey::bigint
            :: [[42]]

for testPredicateReflectedInExplain and accordingly for testSortItemsReflectedInExplain

Summary of changes: - Rename BaseElasticsearchSmokeTest.java has been renamed to BaseElasticsearchConnectorTest.java - Rename TestElasticsearch6IntegrationSmokeTest.java to TestElasticsearch6ConnectorTest.java - Rename TestElasticsearch7IntegrationSmokeTest.java to TestElasticsearch7ConnectorTest.java - Add to the class BaseTestElasticsearchConnectorTest overrides for several generic tests in order to make them specific for Elasticsearch connector Resolves: trinodb#7109

losipiuk added good first issue Good for newcomers test labels Mar 2, 2021

losipiuk assigned ToumaKazusa-san Mar 6, 2021

losipiuk assigned findinpath Apr 14, 2021

findinpath mentioned this issue Apr 16, 2021

Migrate Elasticsearch tests to use BaseConnectorTest #7609

Merged

martint closed this as completed in 3dd6f5f Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate Elasticsearch tests to use BaseConnectorTest #7109

Migrate Elasticsearch tests to use BaseConnectorTest #7109

losipiuk commented Mar 2, 2021 •

edited

Loading

ToumaKazusa-san commented Mar 6, 2021

losipiuk commented Mar 6, 2021

findinpath commented Apr 14, 2021

losipiuk commented Apr 14, 2021

findinpath commented Apr 14, 2021 •

edited

Loading

losipiuk commented Apr 14, 2021

findinpath commented Apr 14, 2021

losipiuk commented Apr 14, 2021 •

edited

Loading

findinpath commented Apr 14, 2021

losipiuk commented Apr 15, 2021

findinpath commented Apr 15, 2021 via email

losipiuk commented Apr 15, 2021

findinpath commented Apr 15, 2021 •

edited

Loading

findinpath commented Apr 15, 2021

martint commented Apr 15, 2021

losipiuk commented Apr 16, 2021

losipiuk commented Apr 16, 2021

findinpath commented Apr 16, 2021

losipiuk commented Apr 16, 2021 •

edited

Loading

Migrate Elasticsearch tests to use BaseConnectorTest #7109

Migrate Elasticsearch tests to use BaseConnectorTest #7109

Comments

losipiuk commented Mar 2, 2021 • edited Loading

ToumaKazusa-san commented Mar 6, 2021

losipiuk commented Mar 6, 2021

findinpath commented Apr 14, 2021

losipiuk commented Apr 14, 2021

findinpath commented Apr 14, 2021 • edited Loading

losipiuk commented Apr 14, 2021

findinpath commented Apr 14, 2021

losipiuk commented Apr 14, 2021 • edited Loading

findinpath commented Apr 14, 2021

losipiuk commented Apr 15, 2021

findinpath commented Apr 15, 2021 via email

losipiuk commented Apr 15, 2021

findinpath commented Apr 15, 2021 • edited Loading

findinpath commented Apr 15, 2021

martint commented Apr 15, 2021

losipiuk commented Apr 16, 2021

losipiuk commented Apr 16, 2021

findinpath commented Apr 16, 2021

losipiuk commented Apr 16, 2021 • edited Loading

losipiuk commented Mar 2, 2021 •

edited

Loading

findinpath commented Apr 14, 2021 •

edited

Loading

losipiuk commented Apr 14, 2021 •

edited

Loading

findinpath commented Apr 15, 2021 •

edited

Loading

losipiuk commented Apr 16, 2021 •

edited

Loading