Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests for underestimating size of partitioned Glue hive tables #17782

Merged
merged 1 commit into from
Jun 8, 2023

Conversation

Dith3r
Copy link
Member

@Dith3r Dith3r commented Jun 7, 2023

Description

Fix tests for underestimating size of partitioned Glue hive tables introduced by #17677.

Release notes

(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Jun 7, 2023
@github-actions github-actions bot added hive Hive connector tests:hive labels Jun 7, 2023
@Dith3r Dith3r changed the title Avoid underestimating size of partitioned hive tables Fix tests for underestimating size of partitioned Glue hive tables Jun 7, 2023
@Dith3r Dith3r requested a review from findepi June 7, 2023 08:18
@wendigo
Copy link
Contributor

wendigo commented Jun 7, 2023

@raunaqmorarka please run this with secrets

@wendigo
Copy link
Contributor

wendigo commented Jun 7, 2023

LGTM, thanks @Dith3r

@findepi
Copy link
Member

findepi commented Jun 7, 2023

Fix tests for underestimating size of partitioned Glue hive tables.

What was broken? I.e. what problem is this fix for?

@findepi
Copy link
Member

findepi commented Jun 7, 2023

/test-with-secrets sha=8b8c7988a356de42e8250c3258b61a661d236ea4

Comment on lines +343 to +345
// When the table has partitions, but row count statistics are set to zero, we treat this case as empty
// statistics to avoid underestimation in the CBO. This scenario may be caused when other engines are
// used to ingest data into partitioned hive tables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test override is in Glue tests, but comment wording doesn't sound Glue-specific.
Please explain.

Copy link
Member Author

@Dith3r Dith3r Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR which introduced this regression was about altering hive metastore response for partitioned tables when metastore returned zero row count and is consistent across all impacted implementations (excluding in memory, file). Provided link to PR in the description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see similar overrides in other tests added in #17677.

shouldn't we rather change default test behavior to be what-we-want-most, and have the overrides for metastores that lag behind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per offline discussion. override is needed only for those (broken) metastores that incorrectly return rowCount=0 when table/partition has not been analyzed.

@Dith3r
Copy link
Member Author

Dith3r commented Jun 7, 2023

It is fix for https://github.com/trinodb/trino/actions/runs/5191815702/jobs/9360151132 which was introduced by 6028dec. Test for Glue were not triggered by PR.

@github-actions
Copy link

github-actions bot commented Jun 7, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5200127902

@raunaqmorarka
Copy link
Member

/test-with-secrets sha=692cad81648309aa5b07ca064d8be5319679dd65

@github-actions
Copy link

github-actions bot commented Jun 7, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5203972811

@raunaqmorarka raunaqmorarka merged commit 35fcb98 into trinodb:master Jun 8, 2023
@github-actions github-actions bot added this to the 420 milestone Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

4 participants