Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce hive.translate-hive-views propertry #5636

Merged
merged 2 commits into from
Oct 23, 2020

Conversation

phd3
Copy link
Member

@phd3 phd3 commented Oct 21, 2020

Fixes #5623

@cla-bot cla-bot bot added the cla-signed label Oct 21, 2020
@phd3 phd3 added the WIP label Oct 21, 2020
@phd3 phd3 force-pushed the hive-view-config-enforce branch from b290bb2 to c151153 Compare October 21, 2020 17:54
@laurachenyu
Copy link
Contributor

Should we throw exception here or just return Optional.empty() like before? I am doing following:

    {
        if (!filterSchema(viewName.getSchemaName())) {
            return Optional.empty();
        }
        return metastore.getTable(new HiveIdentity(session), viewName.getSchemaName(), viewName.getTableName())
                .filter(ViewReaderUtil::canDecodeView)
                .flatMap(view -> {
                    if (translateHiveViews || isPrestoView(view)) {
                        ConnectorViewDefinition definition = createViewReader(metastore, new HiveIdentity(session), view, typeManager)
                                .decodeViewData(view.getViewOriginalText().get(), view, catalogName);
                        // use owner from table metadata if it exists
                        if (view.getOwner() != null && !definition.isRunAsInvoker()) {
                            definition = new ConnectorViewDefinition(
                                    definition.getOriginalSql(),
                                    definition.getCatalog(),
                                    definition.getSchema(),
                                    definition.getColumns(),
                                    definition.getComment(),
                                    Optional.of(view.getOwner()),
                                    false);
                        }
                        return Optional.of(definition);
                    }
                    return Optional.empty();
                });
    }```

Also, ThriftHiveMetastore:getTable() also used tranlativeHiveViews and we can recover it as well: 
                    if (!translateHiveViews
                            && table.getTableType().equals(TableType.VIRTUAL_VIEW.name())
                            && !"true".equals(table.getParameters().get(PRESTO_VIEW_FLAG))) {
                        throw new HiveViewNotSupportedException(new SchemaTableName(databaseName, tableName));
                    }

@phd3 phd3 force-pushed the hive-view-config-enforce branch 2 times, most recently from 359532b to ed776a0 Compare October 22, 2020 03:21
@phd3
Copy link
Member Author

phd3 commented Oct 22, 2020

@laurachenyu your suggestion also looks good. One difference though is that previously the isPrestoView api used hive's Table object, but now we use Presto's object, so that logic would be kind of duplicated. I'm fine with either approaches, @kokosing wdyt?

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return Optional.empty() when hive view is not supported. Otherwise it does make sense to have a method that returns Optional when it fails in case it cannot return a value.

ConnectorMetadata metadata = transaction.getMetadata();
assertThatThrownBy(() -> metadata.getView(session, new SchemaTableName(database, tableName)))
.isInstanceOf(HiveViewNotSupportedException.class)
.hasMessageContaining("Hive views are not supported");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice add a test where changing config property would actually return a view here.

We have product tests for that, so if adding a test here is problematic then maybe a comment would be enough.

@phd3 phd3 force-pushed the hive-view-config-enforce branch from ed776a0 to 3fc2b0e Compare October 22, 2020 17:25
@phd3
Copy link
Member Author

phd3 commented Oct 22, 2020

Updated based on the suggestions, However, not sure if this is the best way to test. currently, it verifies the logic for ThriftHiveMetastore and FileHiveMetastore through TestHive and TestHiveFileMetastore. For GlueMetastore, it seems that we haven't tested the support for the hive view translation so far, so I think that can be done in a separate effort.

In general, I feel that the testing related to view translation "integration" can be improved using one of the approaches below:

approach-1: (similar to current updated PR)

  • integration logic can be in different HiveMetastore implementations.
  • need to test them through metastore-specific tests, since they've different codepaths (e.g. a test in AbstractTestHive). If we want to test both the cases (enabled/disabled translation), it requires some amount of refactoring of AbstractTestHive.

approach-2:

  • Keep all integration related logic in HiveMetadata class
  • Test the cases for view-translation enabled and disabled using FileHiveMetastore.

approach-3:

  • make the translation enabling switch a session property and test along with product tests by enabling/disabling it.

I feel approach-2 and approach-3 might be easier, but approach-1 is more exhaustive. however, need to keep in mind that we don't overfit the code-implementation on the ease of testing :)

@phd3 phd3 removed the WIP label Oct 22, 2020
@phd3 phd3 requested a review from kokosing October 22, 2020 21:29
@martint martint requested a review from electrum October 22, 2020 21:45
@@ -314,7 +316,15 @@ else if (table.getTableType().equals(EXTERNAL_TABLE.name())) {
@Override
public synchronized Optional<Table> getTable(HiveIdentity identity, String databaseName, String tableName)
{
return getTable(databaseName, tableName);
Optional<Table> optionalTable = getTable(databaseName, tableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need any view specific logic in the metastore, as the metastore's job is to simply store/retrieve the table metadata. This specific check seems to be a sanity check that the file metastore doesn't contain a Hive view, but that should be impossible since the code is only used by Presto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, agreed. ThriftHiveMetastore::getAllViews also involves usage of view specific config, may be we can move it out too in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's always had logic to do filtering based on presto_view flag, but I forgot that it also has special logic based on whether or not we're translating Hive views. In that case, maybe it is required to add more logic to ThriftHiveMetastore. Though FileHiveMetastore doesn't have any today, so I'd need to be more convinced for that one.

@@ -343,6 +343,14 @@ public ThriftMetastoreStats getStats()
.stopOnIllegalExceptions()
.run("getTable", stats.getGetTable().wrap(() -> {
Table table = getTableFromMetastore(identity, databaseName, tableName);

// Check if hive view is supported
if (!translateHiveViews
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the other comment, this check should be done in HiveMetadata where the translation occurs.

{
String tableName = "test_hive_view";
ConnectorSession session = newSession();
Table table = Table.builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to create a new Hive view here, as these tests already create presto_test_view (referenced by the view field in this class). It should be enough to call

metadata.getView(session, view);

@electrum
Copy link
Member

The view translation logic should be independent of the metastore implementation (which is a "dumb" store in terms of view data) and thus so should the tests. The AbstractTestHive tests are ancient and it would be nice to deprecate them at some point. Doing this end-to-end with a product test is cleaner.

If it would make it easier, we could add a session property for this, then we could easily change it in the product test.

@phd3 phd3 force-pushed the hive-view-config-enforce branch from 3fc2b0e to 7460b57 Compare October 23, 2020 00:40
@phd3 phd3 requested a review from electrum October 23, 2020 04:29
@phd3
Copy link
Member Author

phd3 commented Oct 23, 2020

@electrum can you please take a look again? I've simplified the testing based on slack discussion, and just testing with TestHive for now.

Test failures were unrelated, retriggered to get a clean build.

@findepi
Copy link
Member

findepi commented Oct 23, 2020

The view translation logic should be independent of the metastore implementation (which is a "dumb" store in terms of view data) and thus so should the tests. The AbstractTestHive tests are ancient and it would be nice to deprecate them at some point. Doing this end-to-end with a product test is cleaner.

I agree, but note that product tests do not involve Glue today (#5426).
The Hive view support translation layer is independent from metastore, but listing of tables and views is.
https://github.com/prestosql/presto/blob/58ee2329fe769042b72974e5b683332e09fc973e/presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/thrift/ThriftHiveMetastore.java#L933-L942

I believe we do need both: product tests for "nice tests" and AbstractTestHive to cover Glue too until we do #5426.

since this is not a session property, I think it's currently hard to test in the product tests without adding a separate config file. But I'm not sure a different configuration is worth the overhead in product test framework.

If it would make it easier, we could add a session property for this, then we could easily change it in the product test.

@phd3 @electrum
adding a new catalog in product tests is really cheap, perhaps just one new file, the catalog configuration.
We should add a session property if we want this to be configurable on per-query basis.
I think this is a good idea, but I really want this not to be a result of our testing (in)abilities.

@phd3 phd3 force-pushed the hive-view-config-enforce branch from 7460b57 to 97c0a12 Compare October 23, 2020 14:17
@phd3
Copy link
Member Author

phd3 commented Oct 23, 2020

Two unrelated failures: #4173 and the one fixed in #5671. Rebasing and rerunning the jobs.

@phd3 phd3 force-pushed the hive-view-config-enforce branch from 97c0a12 to 20961dc Compare October 23, 2020 14:24
@martint martint added this to the 345 milestone Oct 23, 2020
@martint martint merged commit f4eb7f0 into trinodb:master Oct 23, 2020
@phd3 phd3 deleted the hive-view-config-enforce branch November 1, 2020 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Configuration property hive.translate-hive-views is partially ignored
7 participants