Improve estimation of row count from partition samples #11333

raunaqmorarka · 2022-03-04T20:18:26Z

Description

Reduce the possiblity of estimation errors in averageRowsPerPartition
and rowCount due to a couple of outliers by excluding the
min and max rowCount values from the calculation of
avg rows per partition.

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

hive connector statistics

How would you describe this change to a non-technical end user or system administrator?

improves estimates for partitioned hive tables

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

raunaqmorarka · 2022-03-06T09:42:21Z

TPC benchmark results for partitioned sf1000 orc
Rowcount skew fix sf1000 orc partitioned.pdf

...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java

sopel39

lgtm % comments

...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java

...-hive/src/test/java/io/trino/plugin/hive/statistics/TestMetastoreHiveStatisticsProvider.java

skrzypo987

Not an expert here, but seems legit.

lukasz-stec

lgtm

...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java

Reduce the possiblity of estimation errors in averageRowsPerPartition and rowCount due to a couple of outliers by excluding the min and max rowCount values from the calculation of avg rows per partition.

...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java

sopel39 · 2022-03-08T11:27:24Z

lgtm % mind automation

raunaqmorarka · 2022-03-08T11:43:02Z

Test failure due to #11368

Cherry-pick of trinodb/trino#11333 Co-authored-by: Raunaq Morarka <[email protected]>

Reduce the possibility of estimation errors in averageRowsPerPartition and rowCount due to a couple of outliers by excluding the min and max rowCount values from the calculation of avg rows per partition. Cherry-pick of trinodb/trino#11333 Co-authored-by: Raunaq Morarka <[email protected]>

cla-bot bot added the cla-signed label Mar 4, 2022

raunaqmorarka requested a review from sopel39 March 4, 2022 20:18

github-actions bot added the tests:hive label Mar 4, 2022

sopel39 reviewed Mar 7, 2022

View reviewed changes

...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java Outdated Show resolved Hide resolved

...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java Show resolved Hide resolved

raunaqmorarka force-pushed the rowcount-skew branch from 6757cfe to db90cc1 Compare March 7, 2022 11:52

raunaqmorarka requested a review from sopel39 March 7, 2022 11:54

raunaqmorarka force-pushed the rowcount-skew branch from db90cc1 to 4bc9ead Compare March 7, 2022 12:06

sopel39 reviewed Mar 7, 2022

View reviewed changes

sopel39 requested review from lukasz-stec, skrzypo987 and radek-kondziolka March 7, 2022 14:17

raunaqmorarka force-pushed the rowcount-skew branch from 4bc9ead to 0b02f0f Compare March 7, 2022 14:31

raunaqmorarka requested a review from sopel39 March 7, 2022 14:33

skrzypo987 reviewed Mar 7, 2022

View reviewed changes

lukasz-stec approved these changes Mar 7, 2022

View reviewed changes

...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java Outdated Show resolved Hide resolved

Improve estimation of row count from partition samples

d3ea6a9

Reduce the possiblity of estimation errors in averageRowsPerPartition and rowCount due to a couple of outliers by excluding the min and max rowCount values from the calculation of avg rows per partition.

raunaqmorarka force-pushed the rowcount-skew branch from 0b02f0f to d3ea6a9 Compare March 8, 2022 05:29

lukasz-stec reviewed Mar 8, 2022

View reviewed changes

...rino-hive/src/main/java/io/trino/plugin/hive/statistics/MetastoreHiveStatisticsProvider.java Show resolved Hide resolved

sopel39 approved these changes Mar 8, 2022

View reviewed changes

sopel39 merged commit 84f0b69 into trinodb:master Mar 8, 2022

raunaqmorarka deleted the rowcount-skew branch March 8, 2022 12:47

sopel39 mentioned this pull request Mar 8, 2022

Release notes for 373 #11288

Closed

github-actions bot added this to the 373 milestone Mar 8, 2022

mosabua mentioned this pull request Mar 8, 2022

Add Trino 373 release notes #11290

Merged

branimir-vujicic added a commit to axiomq/presto that referenced this pull request Mar 20, 2022

Improve estimation of row count from partition samples

2387ecc

Cherry-pick of trinodb/trino#11333 Co-authored-by: Raunaq Morarka <[email protected]>

branimir-vujicic mentioned this pull request Mar 21, 2022

Improve estimation of row count from partition samples prestodb/presto#17490

Closed

branimir-vujicic mentioned this pull request Mar 21, 2022

Improve estimation of row count from partition samples prestodb/presto#17492

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve estimation of row count from partition samples #11333

Improve estimation of row count from partition samples #11333

raunaqmorarka commented Mar 4, 2022

raunaqmorarka commented Mar 6, 2022

sopel39 left a comment

skrzypo987 left a comment

lukasz-stec left a comment

sopel39 commented Mar 8, 2022

raunaqmorarka commented Mar 8, 2022

Improve estimation of row count from partition samples #11333

Improve estimation of row count from partition samples #11333

Conversation

raunaqmorarka commented Mar 4, 2022

Description

Documentation

Release notes

raunaqmorarka commented Mar 6, 2022

sopel39 left a comment

Choose a reason for hiding this comment

skrzypo987 left a comment

Choose a reason for hiding this comment

lukasz-stec left a comment

Choose a reason for hiding this comment

sopel39 commented Mar 8, 2022

raunaqmorarka commented Mar 8, 2022