-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve estimation of row count from partition samples #11333
Conversation
TPC benchmark results for partitioned sf1000 orc |
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Show resolved
Hide resolved
6757cfe
to
db90cc1
Compare
db90cc1
to
4bc9ead
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comments
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Show resolved
Hide resolved
...-hive/src/test/java/io/trino/plugin/hive/statistics/TestMetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
...-hive/src/test/java/io/trino/plugin/hive/statistics/TestMetastoreHiveStatisticsProvider.java
Show resolved
Hide resolved
4bc9ead
to
0b02f0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an expert here, but seems legit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Outdated
Show resolved
Hide resolved
Reduce the possiblity of estimation errors in averageRowsPerPartition and rowCount due to a couple of outliers by excluding the min and max rowCount values from the calculation of avg rows per partition.
0b02f0f
to
d3ea6a9
Compare
...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java
Show resolved
Hide resolved
lgtm % mind automation |
Test failure due to #11368 |
Cherry-pick of trinodb/trino#11333 Co-authored-by: Raunaq Morarka <[email protected]>
Reduce the possibility of estimation errors in averageRowsPerPartition and rowCount due to a couple of outliers by excluding the min and max rowCount values from the calculation of avg rows per partition. Cherry-pick of trinodb/trino#11333 Co-authored-by: Raunaq Morarka <[email protected]>
Description
Reduce the possiblity of estimation errors in averageRowsPerPartition
and rowCount due to a couple of outliers by excluding the
min and max rowCount values from the calculation of
avg rows per partition.
improvement
hive connector statistics
improves estimates for partitioned hive tables
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: