Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to Optimize Trino Hive Metastore Query Latency by Caching createMetastoreClient() #21671

Open
yohengyang opened this issue Apr 23, 2024 · 3 comments
Labels
enhancement New feature or request performance

Comments

@yohengyang
Copy link

yohengyang commented Apr 23, 2024

Upon analyzing Hive and Iceberg queries in Trino, we observed a significant latency during the analyze phase in io.trino.plugin.hive.metastore.cache.CachingHiveMetastore#loadTable as shown below:

@Override
public Optional<Table> getTable(String databaseName, String tableName)
{
    try {
        return retry()
                .stopOn(NoSuchObjectException.class)
                .stopOnIllegalExceptions()
                .run("getTable", stats.getGetTable().wrap(() -> {
                    try (ThriftMetastoreClient client = createMetastoreClient()) {  // Takes 167 ms
                        return Optional.of(client.getTable(databaseName, tableName)); // Takes 20+ ms
                    }
                }));
    }
    ......
}

From the section of code above, we see that the majority of time is consumed at ThriftHiveMetastore:createMetastoreClient(), which is about 167 milliseconds, whereas the actual request to retrieve the table via ThriftMetastoreClient:getTable() consumes significantly less time - around 20+ ms.

Looking at the trace logs:

`---ts=2023-12-22 17:52:34;thread_name=Query-20231222_095234_00058_j8b4z-55744;id=d9c0;is_daemon=true;priority=5;TCCL=io.trino.server.PluginClassLoader@1a2d02d8
    `---[145.533198ms] io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore:createMetastoreClient()
        `---[145.454194ms] io.trino.plugin.hive.metastore.thrift.IdentityAwareMetastoreClientFactory:createMetastoreClientFor() #2126

We noticed that every transaction creates a new CachingHiveMetastore, meaning the acceleration effect of this cache is only effective within a single query which can't provide acceleration effect for this scenario.

Given the bottleneck at createMetastoreClient(), we're considering an optimization: caching the result of createMetastoreClient(), to avoid recreating the ThriftMetastoreClient for each request. We are interested in your thoughts on this proposed solution.

@findepi
Copy link
Member

findepi commented May 10, 2024

cc @electrum @dain @raunaqmorarka

@electrum
Copy link
Member

Are you able to tell why creation of the client takes so much time?

@yohengyang
Copy link
Author

yohengyang commented May 11, 2024

Are you able to tell why creation of the client takes so much time?

@electrum I haven't delved into the reasons yet.

We used Waggledance (version 3.7.0) as the Metastore(version 3.1.2) router; Trino is actually connected to Waggledance.
The Trino cluster, Waggledance, and Metastore are all in the same AZ, and there is no network bandwidth bottleneck.

The following is the time trace for Trino to create a connection with Waggledance, and it can be seen that the delay is almost on the Waggledance server side. I will continue to track the connection time of Waggledance.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Development

Successfully merging a pull request may close this issue.

3 participants