Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Elasticsearch tests to use BaseConnectorTest #7109

Closed
losipiuk opened this issue Mar 2, 2021 · 19 comments
Closed

Migrate Elasticsearch tests to use BaseConnectorTest #7109

losipiuk opened this issue Mar 2, 2021 · 19 comments
Assignees
Labels
good first issue Good for newcomers test

Comments

@losipiuk
Copy link
Member

losipiuk commented Mar 2, 2021

#6989 introduced new BaseConnectorTest test class which replaces AbstractTestDistributedQueries and AbstractTestIntegrationSmokeTest.

Scope of this ticket is to restructure Elasticsearch test code to follow new pattern.
BaseElasticsearchSmokeTest should be renamed to BaseTestElasticsearchConnectorTest and extend BaseConnectorTest instead of AbstractTestIntegrationSmokeTest
TestElasticsearch6IntegrationSmokeTest should be renamed to TestElasticsearch6ConnectorTest.
TestElasticsearch7IntegrationSmokeTest should be renamed to TestElasticsearch7ConnectorTest.

A PR which restructures MySQL tests is a good reference: #7011

@losipiuk losipiuk added good first issue Good for newcomers test labels Mar 2, 2021
@ToumaKazusa-san
Copy link

I am pretty interested in this issuse, I want to deal with it. Could you give me a chance to solve this problem?

@losipiuk
Copy link
Member Author

losipiuk commented Mar 6, 2021

Sure. I assigned you. Thanks @ToumaKazusa-san

@findinpath
Copy link
Contributor

I've started to work on this one seeing that it's been a month since it was assigned.
Here is my draft branch https://github.com/findinpath/trino/tree/refactoring/7109
Can I be assigned on this one?

@losipiuk
Copy link
Member Author

Can I be assigned on this one?

Sure. Thanks @findinpath

@ToumaKazusa-san are you still working on this one?

@findinpath
Copy link
Contributor

findinpath commented Apr 14, 2021

@losipiuk any idea how to tackle the test io.trino.testing.AbstractTestQueries#testShowColumns

        MaterializedResult actual = computeActual("SHOW COLUMNS FROM orders");

        MaterializedResult expectedUnparameterizedVarchar = resultBuilder(getSession(), VARCHAR, VARCHAR, VARCHAR, VARCHAR)
                .row("orderkey", "bigint", "", "")
                .row("custkey", "bigint", "", "")
                .row("orderstatus", "varchar", "", "")
                .row("totalprice", "double", "", "")
                .row("orderdate", "date", "", "")
                .row("orderpriority", "varchar", "", "")
                .row("clerk", "varchar", "", "")
                .row("shippriority", "integer", "", "")
                .row("comment", "varchar", "", "")
                .build();
...
        assertTrue(actual.equals(expectedParameterizedVarchar)

but Elasticsearch delivers the table column metadata ordered by name alphabetically

➜  ~ curl -X GET "localhost:32833/orders/_mappings" | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   306  100   306    0     0  25500      0 --:--:-- --:--:-- --:--:-- 25500
{
  "orders": {
    "mappings": {
      "doc": {
        "properties": {
          "clerk": {
            "type": "text"
          },
          "comment": {
            "type": "text"
          },
          "custkey": {
            "type": "long"
          },
          "orderdate": {
            "type": "date"
          },
          "orderkey": {
            "type": "long"
          },
          "orderpriority": {
            "type": "text"
          },
          "orderstatus": {
            "type": "text"
          },
          "shippriority": {
            "type": "integer"
          },
          "totalprice": {
            "type": "double"
          }
        }
      }
    }
  }
}

which makes the test to fail.

I've also tried creating the index explicitly by specifying the columns in the same order as they are delivered from the tpch orders result set via a create index call in io.trino.plugin.elasticsearch.ElasticsearchLoader but the result is the same.

The test obviously fails because the Elasticsearch columns are ordered alphabetically.
This also causes the test io.trino.testing.AbstractTestQueries#testLimitPushDown to fail (due to same reason as described above).

@losipiuk
Copy link
Member Author

@losipiuk any idea how to tackle the test io.trino.testing.AbstractTestQueries#testShowColumns

        MaterializedResult actual = computeActual("SHOW COLUMNS FROM orders");

        MaterializedResult expectedUnparameterizedVarchar = resultBuilder(getSession(), VARCHAR, VARCHAR, VARCHAR, VARCHAR)
                .row("orderkey", "bigint", "", "")
                .row("custkey", "bigint", "", "")
                .row("orderstatus", "varchar", "", "")
                .row("totalprice", "double", "", "")
                .row("orderdate", "date", "", "")
                .row("orderpriority", "varchar", "", "")
                .row("clerk", "varchar", "", "")
                .row("shippriority", "integer", "", "")
                .row("comment", "varchar", "", "")
                .build();
...
        assertTrue(actual.equals(expectedParameterizedVarchar)

but Elasticsearch delivers the table column metadata ordered by name alphabetically

which makes the test to fail.

I've also tried creating the index explicitly by specifying the columns in the same order as they are delivered from the tpch orders result set via a create index call in io.trino.plugin.elasticsearch.ElasticsearchLoader but the result is the same.

The test obviously fails because the Elasticsearch columns are ordered alphabetically.

Just provide an override for testShowColumns. It is not uncommon to do so. See other connectors, there is a bunch of overrides already.

This also causes the test io.trino.testing.AbstractTestQueries#testLimitPushDown to fail (due to same reason as described above).

Can you clarify? I do not see a reason why different column ordering should fail this test.

@findinpath
Copy link
Contributor

This also causes the test io.trino.testing.AbstractTestQueries#testLimitPushDown to fail (due to same reason as described above).

Can you clarify? I do not see a reason why different column ordering should fail this test.

MaterializedResult actual = computeActual(
                "(TABLE orders ORDER BY orderkey) UNION ALL " +
                        "SELECT * FROM orders WHERE orderstatus = 'F' UNION ALL " +
                        "(TABLE orders ORDER BY orderkey LIMIT 20) UNION ALL " +
                        "(TABLE orders LIMIT 5) UNION ALL " +
                        "TABLE orders LIMIT 10");
        MaterializedResult all = computeExpected("SELECT * FROM orders", actual.getTypes());
        assertEquals(actual.getMaterializedRows().size(), 10);
        assertContains(all, actual);

The call

computeExpected("SELECT * FROM orders", actual.getTypes())

will fail, because in the translation of the resultset from H2 it will rely on the list of types retrieved from Elasticsearch (which is alphabetical) and therefore the conversion of the values from the result set will throw a conversion exception.

@losipiuk
Copy link
Member Author

losipiuk commented Apr 14, 2021

This also causes the test io.trino.testing.AbstractTestQueries#testLimitPushDown to fail (due to same reason as described above).

Can you clarify? I do not see a reason why different column ordering should fail this test.

MaterializedResult actual = computeActual(
                "(TABLE orders ORDER BY orderkey) UNION ALL " +
                        "SELECT * FROM orders WHERE orderstatus = 'F' UNION ALL " +
                        "(TABLE orders ORDER BY orderkey LIMIT 20) UNION ALL " +
                        "(TABLE orders LIMIT 5) UNION ALL " +
                        "TABLE orders LIMIT 10");
        MaterializedResult all = computeExpected("SELECT * FROM orders", actual.getTypes());
        assertEquals(actual.getMaterializedRows().size(), 10);
        assertContains(all, actual);

The call

computeExpected("SELECT * FROM orders", actual.getTypes())

will fail, because in the translation of the resultset from H2 it will rely on the list of types retrieved from Elasticsearch (which is alphabetical) and therefore the conversion of the values from the result set will throw a conversion exception.

I see. Please override this test for now.
And expand the * in MaterializedResult all = computeExpected("SELECT * FROM orders", actual.getTypes()); to match the ordering in ES. Should help.

And please put explanatory comment in overridden test method.

@findinpath
Copy link
Contributor

I've stumbled on a different problem while working on io.trino.testing.AbstractTestQueries#testLimitWithAggregation

        MaterializedResult actual = computeActual("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey LIMIT 10");
        MaterializedResult all = computeExpected("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey", actual.getTypes());

Some of the double value for totalprice in the orders H2 table while being casted to BIGINT get rounded.
See the image below:

image

Note that after the casting to BIGINT we get the value 32000489 instead of 32000488.
I've tried to reproduce the problem locally with a small jdbi & H2 memory db test , but somehow there is no rounding happening.

Any ideas on this one? This is not related to Elasticsearch, but rather to Jdbi & H2

@losipiuk
Copy link
Member Author

I've stumbled on a different problem while working on io.trino.testing.AbstractTestQueries#testLimitWithAggregation

        MaterializedResult actual = computeActual("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey LIMIT 10");
        MaterializedResult all = computeExpected("SELECT custkey, SUM(CAST(totalprice * 100 AS BIGINT)) FROM orders GROUP BY custkey", actual.getTypes());

Some of the double value for totalprice in the orders H2 table while being casted to BIGINT get rounded.
See the image below:

image

Note that after the casting to BIGINT we get the value 32000489 instead of 32000488.
I've tried to reproduce the problem locally with a small jdbi & H2 memory db test , but somehow there is no rounding happening.

Any ideas on this one? This is not related to Elasticsearch, but rather to Jdbi & H2

It look weird. Note that we are aggregating on totalPrice * 100. So there should not be any fractional part.
But maybe there still is something if Elastic maps totalPrice to double. I most other databases we would map it to DECIMAL(..,2) and rounding errors are not a problem there.

Is the problem you are describing reproducable on your working branch already? I would like to play with that.

@findinpath
Copy link
Contributor

findinpath commented Apr 15, 2021 via email

@losipiuk
Copy link
Member Author

Yes. I've pushed the code already.

Thanks I will take a look how to work around that.

I think the problem exists since a longer time and is related to H2 rather to Elastic

Agreed. Just for connectors which use DECIMAL instead of DOUBLE it does not manifest in tests.

@findinpath
Copy link
Contributor

findinpath commented Apr 15, 2021

The tests:

  • io.trino.testing.BaseConnectorTest#testPredicateReflectedInExplain
  • io.trino.testing.BaseConnectorTest#testSortItemsReflectedInExplain

are also failing.

Here is the relevant failed assertion:

java.lang.AssertionError: 
Expecting:
  "Fragment 0 [SINGLE]
    Output layout: [name]
    Output partitioning: SINGLE []
    Stage Execution Strategy: UNGROUPED_EXECUTION
    Output[name]
    │   Layout: [name:varchar]
    │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
    └─ RemoteSource[1]
           Layout: [name:varchar]

Fragment 1 [SOURCE]
    Output layout: [name]
    Output partitioning: SINGLE []
    Stage Execution Strategy: UNGROUPED_EXECUTION
    TableScan[elasticsearch:SCAN:nation, grouped = false]
        Layout: [name:varchar]
        Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}
        name := name::varchar
        nationkey::bigint
            :: [[42]]

"
to contain pattern:
  "(predicate|filterPredicate|constraint).{0,10}(nationkey|NATIONKEY)"

I've looked over io.trino.plugin.elasticsearch.ElasticsearchTableHandle#toString and it just introduces elasticsearch:SCAN:nation , but no reference to a predicate/filterPredicate/constraint.

@martint you've modified the ElasticsearchTableHandle#toString method not that long ago. Do you have any suggestions regarding the failing tests?

@findinpath
Copy link
Contributor

@losipiuk regarding the test io.trino.testing.AbstractTestQueries#testLargeIn with 5000 clauses - the test fails due to a limitation of Elasticsearch which by default allows only 1024 clauses

Too_many_clauses: maxClauseCount is set to 1024 & index.query.bool.max_clause_count

This issue can be fixed by overriding the value of the setting indices.query.bool.max_clause_count to 5001 in the file /usr/share/elasticsearch/config/elasticsearch.yml in the elastic 6.x / 7.x testcontainers image.

I see two options here:

  • override the method io.trino.testing.AbstractTestQueries#largeInValuesCount in the BaseTestElasticsearchConnectorTest so that it provides only up to 1000value clauses
    @Override
    protected Object[][] largeInValuesCountData()
    {
        return new Object[][] {
                {200},
                {500},
                {1000}
        };
    }
  • provide custom elasticsearch.yml files for Elastic 6/ Elastic 7 test classes which support more than 5000 clauses.

I tend to go for the customized largeInValuesCountData implementation. Which solution would you preffer?

@martint
Copy link
Member

martint commented Apr 15, 2021

you've modified the ElasticsearchTableHandle#toString method not that long ago. Do you have any suggestions regarding the failing tests?

Two thoughts:

  • Those tests are not appropriate for a generic suite. The format of the string representation of what gets shown in the table scan is connector-specific, and there's no requirement that the conform to a specific shape or contain certain keywords.
  • But, we should update that toString method to include information about the filter. That's information that's useful to display, regardless.

@losipiuk
Copy link
Member Author

I tend to go for the customized largeInValuesCountData implementation. Which solution would you preffer?

This is fine :)

@losipiuk
Copy link
Member Author

Regarding io.trino.testing.AbstractTestQueries#testLimitWithAggregation see #7610

@findinpath
Copy link
Contributor

@losipiuk based on the previous comment from @martint I've added (at the moment) empty implementations for the methods:

  • testPredicateReflectedInExplain
  • testSortItemsReflectedInExplain

I'd appreciate some hints here on how to fill these methods with a concrete implementation.

@losipiuk
Copy link
Member Author

losipiuk commented Apr 16, 2021

@losipiuk based on the previous comment from @martint I've added (at the moment) empty implementations for the methods:

  • testPredicateReflectedInExplain
  • testSortItemsReflectedInExplain

I'd appreciate some hints here on how to fill these methods with a concrete implementation.

Maybe just override test method with one using ES specific assertion. Looking for

nationkey::bigint
            :: [[42]]

for testPredicateReflectedInExplain and accordingly for testSortItemsReflectedInExplain

findinpath added a commit to findinpath/trino that referenced this issue Apr 27, 2021
Summary of changes:
- Rename BaseElasticsearchSmokeTest.java has been renamed to BaseElasticsearchConnectorTest.java
- Rename TestElasticsearch6IntegrationSmokeTest.java to TestElasticsearch6ConnectorTest.java
- Rename TestElasticsearch7IntegrationSmokeTest.java to TestElasticsearch7ConnectorTest.java
- Add to the class BaseTestElasticsearchConnectorTest overrides for several generic tests in order to make them specific for Elasticsearch connector

Resolves: trinodb#7109
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers test
Development

No branches or pull requests

4 participants