You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upon analyzing Hive and Iceberg queries in Trino, we observed a significant latency during the analyze phase in io.trino.plugin.hive.metastore.cache.CachingHiveMetastore#loadTable as shown below:
From the section of code above, we see that the majority of time is consumed at ThriftHiveMetastore:createMetastoreClient(), which is about 167 milliseconds, whereas the actual request to retrieve the table via ThriftMetastoreClient:getTable() consumes significantly less time - around 20+ ms.
We noticed that every transaction creates a new CachingHiveMetastore, meaning the acceleration effect of this cache is only effective within a single query which can't provide acceleration effect for this scenario.
Given the bottleneck at createMetastoreClient(), we're considering an optimization: caching the result of createMetastoreClient(), to avoid recreating the ThriftMetastoreClient for each request. We are interested in your thoughts on this proposed solution.
The text was updated successfully, but these errors were encountered:
We used Waggledance (version 3.7.0) as the Metastore(version 3.1.2) router; Trino is actually connected to Waggledance.
The Trino cluster, Waggledance, and Metastore are all in the same AZ, and there is no network bandwidth bottleneck.
The following is the time trace for Trino to create a connection with Waggledance, and it can be seen that the delay is almost on the Waggledance server side. I will continue to track the connection time of Waggledance.
Upon analyzing Hive and Iceberg queries in Trino, we observed a significant latency during the analyze phase in io.trino.plugin.hive.metastore.cache.CachingHiveMetastore#loadTable as shown below:
From the section of code above, we see that the majority of time is consumed at
ThriftHiveMetastore:createMetastoreClient()
, which is about 167 milliseconds, whereas the actual request to retrieve the table viaThriftMetastoreClient:getTable()
consumes significantly less time - around 20+ ms.Looking at the trace logs:
We noticed that every transaction creates a new CachingHiveMetastore, meaning the acceleration effect of this cache is only effective within a single query which can't provide acceleration effect for this scenario.
Given the bottleneck at createMetastoreClient(), we're considering an optimization: caching the result of createMetastoreClient(), to avoid recreating the ThriftMetastoreClient for each request. We are interested in your thoughts on this proposed solution.
The text was updated successfully, but these errors were encountered: