Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the custom hive catalog to support Hive3 #21502

Merged
merged 1 commit into from
May 17, 2024

Conversation

osscm
Copy link
Contributor

@osscm osscm commented Apr 11, 2024

Description

Currently Trino is constrained to utilize the default hive catalog (referred to as "hive") when interfacing with the hive metastore. Consequently, users are confined to employing only a two-level hierarchy (schema.table) within a Trino catalog/connector. In contrast, Hive thrift allows for custom hive catalog names as the parent of the schema, enabling the utilization of a three-level hierarchy such as hive-catalog.schema.table.

Hive Thrift API supports custom catalog since Hive3
Therefore, at high level, this PR has the following changes

  1. Customize the hive catalog name within Trino's catalog settings and transmit it via the factories to the Thrift client --> StaticMetastoreConfig , StaticTokenAwareHttpMetastoreClientFactory , StaticTokenAwareMetastoreClientFactory , ThriftMetastoreClientFactory, 'DefaultThriftMetastoreClientFactory', HttpThriftMetastoreClientFactory
  2. Thrift client to support custom catalog: ThriftHiveMetastoreClient
  3. There will be 1:1 mapping between Trino catalog and hive catalog.
  4. "hive" will be the default catalog name.

Thanks to @dain @electrum @hashhar @findinpath @anusudarsan @samssh @mosabua

Additional context and related issues

Above approach was discussed at a few places, so adopted in the PR as well.
one
two

@samssh please let us know, if anything is missing or need to take care of.
Fixes #10287

co-author : @samssh, he has worked on the similar change.

Release notes

(x) Release notes are required, with the following suggested text:

# Hive
* Add support for specifying catalog name in Thrift metastore. (`#10287`)

@cla-bot cla-bot bot added the cla-signed label Apr 11, 2024
@osscm osscm marked this pull request as draft April 11, 2024 00:12
@github-actions github-actions bot added tests:hive hive Hive connector labels Apr 11, 2024
@osscm osscm force-pushed the hive3thrift branch 3 times, most recently from 4dd8066 to 141b9e7 Compare April 17, 2024 07:09
@osscm osscm changed the title WIP/Temp: Hive3 compatible ThriftHiveMetastore client Add support for the custom hive catalog Apr 17, 2024
@osscm osscm assigned osscm and unassigned osscm Apr 17, 2024
@osscm osscm marked this pull request as ready for review April 17, 2024 16:00
@osscm
Copy link
Contributor Author

osscm commented Apr 17, 2024

cc @samssh

Copy link
Member

@anusudarsan anusudarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimmed and first pass

@findinpath
Copy link
Contributor

@osscm now that you have a "ready to review" PR could you please update the description to contain the business case & concise notes including the implementation strategy contained in the code of this PR?

Copy link
Member

@anusudarsan anusudarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@osscm TestThriftHttpMetastoreClient is failing.

You need to fix https://github.com/trinodb/trino/blob/master/plugin/trino-hive/src/test/java/io/trino/plugin/hive/metastore/thrift/TestThriftHttpMetastoreClient.java#L60 as something like

 if (databaseName.equals("@hive#testDbName")) {
                    return new Database("testDbName", "testOwner", "testLocation", Map.of("key", "value"));
                }

and also TestingThriftHttpMetastoreServer to have the new method called

            case "getAllDatabases", "getDatabases" -> delegate.getAllDatabases();

@anusudarsan anusudarsan requested a review from electrum April 18, 2024 21:39
@anusudarsan
Copy link
Member

@electrum can you enable test with secrets in this PR?

@osscm
Copy link
Contributor Author

osscm commented Apr 19, 2024

@osscm now that you have a "ready to review" PR could you please update the description to contain the business case & concise notes including the implementation strategy contained in the code of this PR?

thanks @findinpath
added details, please see if this make sense.

cc @anusudarsan

@ebyhr
Copy link
Member

ebyhr commented Apr 19, 2024

/test-with-secrets sha=e7556cb2cb2db99b03236409fcc1d1186badf430

Copy link

github-actions bot commented Apr 19, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/8748163326

@anusudarsan
Copy link
Member

@osscm re: tests can we extend TestHiveMetastoreMetadataQueriesAccessOperations and have the new test set a custom catalog name ? To create the custom catalogs/namespaces in hive, it doesnt look like Hive yet supports creating catalogs via beeline CLI or SQL. We could update the metastore directly

hiveHadoop.runOnMetastore(INSERT INTO CTLGS VALUES (2, 'test_custom_catalog', 'test catalog', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog')");

I guess you will also need to create a default schema in test_catalog using either CREATE DATABASE if hive supports it for custom catalog or directly modifying the metastore db like

hiveHadoop.runOnMetastore("INSERT INTO DBS VALUES (4, 'test default schema', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog/default.db', 'default', 'hive', 'USER', 'test_custom_catalog')");

let me know if you hit any issues with this or need help.

@findinpath @electrum is extending TestHiveMetastoreMetadataQueriesAccessOperations good enough for this functionality, or do we need BCT as well?

@osscm
Copy link
Contributor Author

osscm commented Apr 19, 2024

fyi, will work on the pending comments today.

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % open comments

@ebyhr
Copy link
Member

ebyhr commented May 6, 2024

/test-with-secrets sha=cf153475e20161039b1e99815c414b2bf2bfafb5

Copy link

github-actions bot commented May 6, 2024

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/8977088500

@osscm
Copy link
Contributor Author

osscm commented May 13, 2024

sorry, got engaged in other work, will try to take care of remaining comments in next a few days.

@ebyhr ebyhr force-pushed the hive3thrift branch 4 times, most recently from c62f95e to f0b0a7f Compare May 13, 2024 09:20
@ebyhr
Copy link
Member

ebyhr commented May 13, 2024

@osscm I pushed some changes instead of you.

@osscm
Copy link
Contributor Author

osscm commented May 14, 2024

@osscm I pushed some changes instead of you.

Thanks a lot @ebyhr!

@ebyhr
Copy link
Member

ebyhr commented May 15, 2024

@osscm Why did you revert my change?

@osscm
Copy link
Contributor Author

osscm commented May 15, 2024

@osscm Why did you revert my change?

did I revert?
Apologies! maybe by mistake while squashing in local.

@github-actions github-actions bot added iceberg Iceberg connector bigquery BigQuery connector labels May 15, 2024
@ebyhr
Copy link
Member

ebyhr commented May 16, 2024

@osscm I restored to the status before you reverted my change. Please add a fixup commit for additional changes.

@osscm
Copy link
Contributor Author

osscm commented May 16, 2024

@osscm I restored to the status you reverted my change. Please add a fixup commit for additional changes.

@osscm I restored to the status you reverted my change. Please add a fixup commit for additional changes.

Thanks @ebyhr!

I think, already it has all the changes in! so dont need to do any changes.

two test cases are failing, checking why

  1. test-jdbc failed --> seems to be transient?
com.github.dockerjava.api.exception.NotFoundException: Status 404: {"message":"manifest for trinodb/trino:448 not found: manifest unknown: manifest unknown"}
  1. pt (default, suite-7-non-generic, )

this one I am not sure, was not able to run in local, we can try to run build again?

2024-05-16 10:09:50 INFO: FAILURE     /    io.trino.tests.product.cli.TestTrinoCli.shouldPrintExplainAnalyzePlan (Groups: cli) took 3.7 seconds
2024-05-16T04:24:50.1411172Z tests               | 2024-05-16 10:09:50 SEVERE: Failure cause:
2024-05-16T04:24:50.1412046Z tests               | java.lang.AssertionError: 
2024-05-16T04:24:50.1412723Z tests               | Expecting ArrayList:
2024-05-16T04:24:50.1413290Z tests               |   ["",
2024-05-16T04:24:50.1414102Z tests               |     "Query 20240516_042448_00042_raqe3 [RUNNING] i[0 0B 0B] o[0 0B 0B] splits[0/0/0]",
2024-05-16T04:24:50.1415007Z tests               |     "",
2024-05-16T04:24:50.1415937Z tests               |     "Query 20240516_042448_00042_raqe3 [FAILED] i[35 3.31K 2.05K] o[35 3.31K 2.05K] splits[4/8/4]",
2024-05-16T04:24:50.1417109Z tests               |     "Query 20240516_042448_00042_raqe3, FAILED, 1 node",
2024-05-16T04:24:50.1417922Z tests               |     "Splits: 16 total, 4 done (25.00%)",
2024-05-16T04:24:50.1418669Z tests               |     "1.58 [35 rows, 3.31KB] [22 rows/s, 2.09KB/s]",
2024-05-16T04:24:50.1419308Z tests               |     "",
2024-05-16T04:24:50.1420095Z tests               |     "Query 20240516_042448_00042_raqe3 failed: Error committing write parquet to Hive"]
2024-05-16T04:24:50.1421074Z tests               | to contain:
2024-05-16T04:24:50.1421690Z tests               |   ["CREATE TABLE", "Query Plan"]
2024-05-16T04:24:50.1422472Z tests               | but could not find the following element(s):
2024-05-16T04:24:50.1423281Z tests               |   ["CREATE TABLE", "Query Plan"]
2024-05-16T04:24:50.1423913Z tests               | 
2024-05-16T04:24:50.1425259Z tests               | 	at io.trino.tests.product.cli.TestTrinoCli.shouldPrintExplainAnalyzePlan(TestTrinoCli.java:416)
2024-05-16T04:24:50.1427234Z tests               | 	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
2024-05-16T04:24:50.1428777Z tests               | 	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
2024-05-16T04:24:50.1430220Z tests               | 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
2024-05-16T04:24:50.1432046Z tests               | 	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:54)
2024-05-16T04:24:50.1433710Z tests               | 	at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:44)
2024-05-16T04:24:50.1435238Z tests               | 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
2024-05-16T04:24:50.1479952Z tests               | 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
2024-05-16T04:24:50.1481452Z tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
2024-05-16T04:24:50.1483137Z tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)

@anusudarsan
Copy link
Member

@osscm the CI looks good now

@osscm
Copy link
Contributor Author

osscm commented May 17, 2024

@osscm the CI looks good now

@anusudarsan, so good to merge now? :)

@ebyhr or @dain if you can please help to merge, thanks!

@dain dain merged commit fe9b6ef into trinodb:master May 17, 2024
60 checks passed
@ebyhr ebyhr added this to the 449 milestone May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery BigQuery connector cla-signed docs hive Hive connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

Add support non-default (hive) catalog for Hive Metastore 3.x or newer
6 participants