Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not return all schema prefixes when there are more than 100 #17231

Merged

Conversation

huberty89
Copy link
Contributor

Description

Query:

SELECT * FROM information_schema.tables WHERE table_name LIKE 'test%'

fails when there is more than 100 schemas.

Additional context and related issues

Fix small regression from #17121

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Apr 25, 2023
@huberty89 huberty89 requested review from kokosing and findepi April 25, 2023 15:17
@huberty89 huberty89 force-pushed the huberty89/fix-schema-prefixes-calculation branch from 84064b5 to 9f786ec Compare April 25, 2023 16:58
@github-actions github-actions bot added hive Hive connector tests:hive labels Apr 25, 2023
@huberty89 huberty89 requested a review from Praveen2112 April 26, 2023 06:57
if (schemaPrefixes.size() > MAX_PREFIXES_COUNT) {
// in case of high number of prefixes it is better to populate all data and then filter
// TODO this may cause re-running the above filtering upon next applyFilter
return defaultPrefixes(catalogName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably fixes a bug I introduced in 677b52d.
calculatePrefixesWithSchemaName could return many prefixes
then calculatePrefixesWithTableName could exit here


and then this verification would fail
verify(tablePrefixes.size() <= MAX_PREFIXES_COUNT, "calculatePrefixesWithTableName returned too many prefixes: %s", tablePrefixes.size());

Can you call the commit "Fix ..." if indeed this is a bug fix?
is it worth adding a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testSelectTablesWithLikeOverSchema and testSelectTablesWithLikeOverTableName from added TestHiveMetastoreReadAccessOperations detect that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would reorder commits then it would be easily noticed that it is an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed commit message with short explanation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huberty89 thanks for adding

When there is more than 100 schemas InformationSchemaMetadata#calculatePrefixesWithSchemaName should return default prefix instead of all schemas

did the code fail before?
locally see that the new test io.trino.plugin.hive.metastore.thrift.TestHiveMetastoreReadAccessOperations#testSelectTablesWithLikeOverSchema would fail without this change, so i am marking this PR as a release blocker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was failing - exception from verify check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw thanks for fixing my bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to help

Comment on lines 347 to 351
methodInvocations.add(Methods.GET_ALL_TABLES_WITH_PARAMETER);
return delegate.getAllTables(databaseName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why _WITH_PARAMETER? shouldn't this be GET_ALL_TABLES?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PR #17127 I will add getAllTables without any parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

queryRunner.installPlugin(new TestingHivePlugin(metastore));
queryRunner.createCatalog("hive", "hive", ImmutableMap.of());

for (int i = 0; i <= MAX_PREFIXES_COUNT; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i -> schemaNumber
< TEST_SCHEMAS_COUNT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This test is nice.

if (schemaPrefixes.size() > MAX_PREFIXES_COUNT) {
// in case of high number of prefixes it is better to populate all data and then filter
// TODO this may cause re-running the above filtering upon next applyFilter
return defaultPrefixes(catalogName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would reorder commits then it would be easily noticed that it is an improvement.

@@ -341,6 +344,7 @@ public void updatePartitionStatistics(Table table, Map<String, Function<Partitio
@Override
public List<String> getAllTables(String databaseName)
{
throw new UnsupportedOperationException();
methodInvocations.add(Methods.GET_ALL_TABLES_WITH_PARAMETER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import static io.trino.plugin.hive.metastore.file.FileHiveMetastore.createTestingFileHiveMetastore;
import static io.trino.testing.TestingSession.testSessionBuilder;

@Test(singleThreaded = true) // metastore invocation counters shares mutable state so can't be run from many threads simultaneously
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please copy-paste tests but for system.jdbc. To capture how things looks today and prevent any regression. Please do not fix any encountered problem as part of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done but I feel we should use trino-jdbc to have exact queries used by JDBC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system is also a plain connector so the fact it is used in jdbc is not that much important. We need are testing a system connector integration with hive connector here. Putting JDBC into the same test sounds like an overkill as it is already a bit complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already exercise system.jdbc directly in other tests (see eg BCT).
better to do that directly than not do that at all.

testing via JDBC could be more verbose, i wouldn't insist on that, but wouldn't also oppose strongly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system is also a plain connector so the fact it is used in jdbc is not that much important.

i don't agree.
system.jdbc is supposed to be accessed via JDBC and we care primarily about usage patterns generated by the JDBC driver.
at this point, i wouldn't bother about performance of, say, joins or union queries or anything like that on top of system.jdbc, because it's outside of typical usage pattern

When there is more than 100 schemas InformationSchemaMetadata#calculatePrefixesWithSchemaName should return default prefix instead of all schemas
@huberty89 huberty89 force-pushed the huberty89/fix-schema-prefixes-calculation branch from 9f786ec to 7625801 Compare April 26, 2023 13:19
@findepi
Copy link
Member

findepi commented Apr 26, 2023

release-blocker per #17231 (comment)
(regression introduced in #17121)

@huberty89 huberty89 force-pushed the huberty89/fix-schema-prefixes-calculation branch 7 times, most recently from 151ebf5 to 665b37c Compare April 27, 2023 05:50
@Test
public void testSelectColumnsWithLikeOverTableName()
{
assertMetastoreInvocations("SELECT * FROM information_schema.columns WHERE table_name LIKE 'test%'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the like pattern matches a single table ? Does it still be non-deterministic ?

import static io.trino.plugin.hive.metastore.CountingAccessHiveMetastore.Methods.GET_TABLE;

@Test(singleThreaded = true) // metastore invocation counters shares mutable state so can't be run from many threads simultaneously
public class TestHiveMetastoreJdbcMetadataQueriesAccessOperations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it possible to move the test and assertion to the base class - if the assertions are same ? We could expose some sort of an API - that could expose the tableName and a columnName which could be overridden for InformationSchema or system.jdbc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I merge both implementation together it's much easier to read which implementation (information_schema or JDBC) lagging behind

import static io.trino.plugin.hive.metastore.file.FileHiveMetastore.createTestingFileHiveMetastore;
import static io.trino.testing.TestingSession.testSessionBuilder;

public class AbstractTestHiveMetastoreMetadataQueriesAccessOperations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseHiveMetastoreMetadataQueriesAccessOperationsTest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class will be merged back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged together

import static io.trino.plugin.hive.metastore.CountingAccessHiveMetastore.Methods.GET_TABLE;

@Test(singleThreaded = true) // metastore invocation counters shares mutable state so can't be run from many threads simultaneously
public class TestHiveMetastoreJdbcMetadataQueriesAccessOperations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would merge these tests classes together

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. there is little benefit in having AbstractTestHiveMetastoreMetadataQueriesAccessOperations
it can encourage us to have inconsistent (=> insufficient) test coverage between informaito_schema and system.jdbc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged together

@huberty89 huberty89 force-pushed the huberty89/fix-schema-prefixes-calculation branch from 665b37c to 9cf99b4 Compare April 27, 2023 10:50
@huberty89 huberty89 force-pushed the huberty89/fix-schema-prefixes-calculation branch from 9cf99b4 to 33f8815 Compare April 28, 2023 09:02
@findepi
Copy link
Member

findepi commented Apr 28, 2023

@martint please merge

@martint martint merged commit 14dde00 into trinodb:master Apr 28, 2023
@github-actions github-actions bot added this to the 415 milestone Apr 28, 2023
Multiset<Methods> invocations = metastore.getMethodInvocations();

assertThat(invocations.count(GET_TABLE)).as("GET_TABLE invocations")
// some lengthy explanatory comment why variable count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"some lengthy explanatory comment why variable count" from #17231 (comment) wasn't supposed to be treated literally, it was a placeholder :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants