Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out Hive information_schema and sys #3008

Merged
merged 2 commits into from
Mar 9, 2020
Merged

Conversation

findepi
Copy link
Member

@findepi findepi commented Mar 5, 2020

information_schema will be inaccessible anyway.

sys could be accessible, but

  • it doesn't work (contains JdbcStorageHandler tables and Hive views)
  • exposing it may require proper handling in access control.

@findepi findepi requested review from aalbu and kokosing March 5, 2020 12:51
@cla-bot cla-bot bot added the cla-signed label Mar 5, 2020
import static io.prestosql.tests.utils.QueryExecutors.onPresto;
import static java.util.Objects.requireNonNull;

public class TestHiveSchema
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear Reviewer,
I know that the TestHiveSchema is awfully long and looks like 90% redundant, but it is actually not.

return ImmutableList.of(schemaName.get());
}
return listSchemaNames(session);
}

private static boolean filterSchema(String schemaName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a subjective comment, but to me if (!filterSchema(tableName.getSchemaName())) isn't very intuitive. Perhaps if the we'd call the method something like isSystemSchema() (of course, the return values would be reversed), those tests would read better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems filter is overloaded, and doesn't really mean whether it's positive (filter) or negative (filter out, exclude).
the reason i picked this name is -- java.util.stream.Stream#filter, io.prestosql.security.AccessControl#filterSchemas ...

hmm maybe isExcludedSchema ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am keeping filterSchema

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isHiveMetadataSchema?

filterSchema sounds like *AccessControl#filterSchemas. I don't like this name here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is with semntics of the method

While it's true today, in general: metadata schema != we should hide it

so isHiveMetadataSchema isn't IMO btter

`information_schema` will be inaccessible anyway.

`sys` could be accessible, but
- it doesn't work (contains JdbcStorageHandler tables and Hive views)
- exposing it may require proper handling in access control.
@findepi findepi mentioned this pull request Mar 5, 2020
29 tasks
return ImmutableList.of(schemaName.get());
}
return listSchemaNames(session);
}

private static boolean filterSchema(String schemaName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isHiveMetadataSchema?

filterSchema sounds like *AccessControl#filterSchemas. I don't like this name here.

}
if ("sys".equals(schemaName)) {
// Hive 3's `sys` schema contains no objects we can handle, so there is no point in exposing it.
// Also, exposing it may require proper handling in access control.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by: exposing it may require proper handling in access control.? What is sys schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys contains table paramters, partition paramters, some stats, etc.

return ImmutableList.of(schemaName.get());
}
return listSchemaNames(session);
}

private static boolean filterSchema(String schemaName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this method under the last usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

public void setUp()
{
// make sure hive.default schema is not empty
onPresto().executeQuery("DROP TABLE IF EXISTS hive.default.test_sys_schema_disabled_table_in_default");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to be called before each test. Maybe you could have a variable that would not create this table if it is already created, then tearDown could be removed. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general -- i agree

it's tiny overhead compared to the number of test queries. i choose to ignore this

import static io.prestosql.tests.utils.QueryExecutors.onPresto;
import static java.util.Objects.requireNonNull;

public class TestHiveSchema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a pity that this is a product test. After #3009 we could write "unit" test, but still it is not going to be much faster, but it would be easier to run tests from IDE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a necessity that there is a product test. i want to run with HMS whether information_schema/sys exists and where it doesn't

it would be good to have a unit test too though. out of scope

.satisfies(containsFirstColumnValue("information_schema"));
assertThat(onPresto().executeQuery("SELECT table_name FROM information_schema.tables WHERE table_schema = 'information_schema'"))
.containsOnly(allInformationSchemaTablesAsRows);
Assertions.assertThat(onPresto().executeQuery("SELECT table_schema, table_name FROM information_schema.tables").rows().stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import

assertThat?
I use tempto QueryAssert.assertThat and assertj Assertions.assertThat -- it happens; not pretty; we already do this in other tests

@findepi
Copy link
Member Author

findepi commented Mar 6, 2020

AC

@findepi findepi merged commit f01b3b4 into trinodb:master Mar 9, 2020
@findepi findepi deleted the hdp4 branch March 9, 2020 13:19
@findepi findepi mentioned this pull request Mar 9, 2020
6 tasks
@findepi findepi added this to the 331 milestone Mar 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants