-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get stats from hive metastore only for the necessary columns #16203
Get stats from hive metastore only for the necessary columns #16203
Conversation
7a75b11
to
29b2138
Compare
5d763ff
to
ee3f5fb
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Show resolved
Hide resolved
ee3f5fb
to
5cfb83d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CA
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Outdated
Show resolved
Hide resolved
5cfb83d
to
fede049
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Outdated
Show resolved
Hide resolved
.../trino-hive/src/main/java/io/trino/plugin/hive/metastore/SemiTransactionalHiveMetastore.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
fede049
to
d777bb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ca
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetastoreClosure.java
Show resolved
Hide resolved
.../trino-hive/src/main/java/io/trino/plugin/hive/metastore/SemiTransactionalHiveMetastore.java
Show resolved
Hide resolved
d777bb7
to
d954002
Compare
rebased on the latest master (auto rebase were causing compilation errors) |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveTableHandle.java
Show resolved
Hide resolved
Before applyProjection is called or when projection pushdown is disabled, we should assume that all table columns are to be projected for populating HiveTableHandle#projectedColumns
d954002
to
7341c8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ca
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
HiveTableHandle contains the list of projected columns. Since only projected columns will be available to the engine, HiveMetadata#getTableStatistics can return statistics limited to those columns. This also requires CachingHiveMetastore to support caching statistics for a subset of table columns.
7341c8f
to
6280a6d
Compare
List<Column> requestedColumns = table.getDataColumns().stream() | ||
.filter(column -> requestedColumnNames.contains(column.getName())) | ||
.collect(toImmutableList()); | ||
table = Table.builder(table).setDataColumns(requestedColumns).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're building fake table
object, hoping the callee will trust the table
, and not eg use knowledge about column list from some other place
that's an opaque design, something explicit would probably be better.
Why not pass Set<String> requestedColumns
(or Optional
thereof`) down the call chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not pass Set requestedColumns (or Optional thereof`) down the call chain?
That's an excellent idea. It also applies to the getPartitionStatistics
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially yes
Description
HiveTableHandle
contains the list of projected columns. Since only projected columns will be available to the engine,HiveMetadata.getTableStatistics
can return statistics limited to those columns.This also requires
CachingHiveMetastore
to support caching statistics for a subset of table columns.Benchmarks
I tested the change on the glue metastore and table with 100 columns (details below) with caching stats disabled (
hive.metastore-stats-cache-ttl=0s)
and a simple query (select count(c1) from test_col_stats_part
) where planning dominates.There is a difference of about 20% for planning time/elapsed (so visible)
baseline 2.251s on average and only required stats 1.853s
This would have a potentially higher impact on a metastore under heavy load.
The table used:
I also ran standard tpch/tpcds on scale factor 1000, orc, partitioned. The results show mostly no change (except for some variability)
sf1k-orc-part-240223.pdf
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: