Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-allocate definition & repetition levels; perf improvement #11675

Closed
wants to merge 2 commits into from

Conversation

theosib-amazon
Copy link
Contributor

@theosib-amazon theosib-amazon commented Mar 25, 2022

Testing shows 5% to 15% performance improvement on a lot of queries by pre-reserving capacity for definitionLevels and repetitionLevels.

Description

I did profiling on Trino while doing TPCDS queries from a Parquet source and noticed that a lot of them were wasting time on array copy while growing capacity of the definitionLevels and definitionLevels integer array lists. Pre-reserving capacity sped up a lot of queries, some as much as 15%.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Change to the parquet reader component

How would you describe this change to a non-technical end user or system administrator?

Speed up parquet column reading a bit by pre-reserving capacity in some of the containers.

Related issues, pull requests, and links

Performance optimization

Documentation

(x) No documentation is needed.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Hive
* Improve performance when reading Parquet data. ({issue}`11675`)

# Iceberg
* Improve performance when reading Parquet data. ({issue}`11675`)

Testing shows 5% to 15% performance improvement on a lot of queries by pre-reserving capacity for definitionLevels and repetitionLevels.
Pre-allocate capacity for definition & repetition levels
@cla-bot
Copy link

cla-bot bot commented Mar 25, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

@theosib-amazon
Copy link
Contributor Author

I submitted my CLA a week ago; hopefully that will be acted upon soon.

@theosib-amazon theosib-amazon changed the title Pre-allocate capacity for definition & repetition levels Pre-allocate definition & repetition levels; perf improvement Mar 25, 2022
@hashhar hashhar requested a review from skrzypo987 March 25, 2022 20:19
Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get rid of the merge commit

Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % Raunaq comment

@skrzypo987
Copy link
Member

@cla-bot check

@cla-bot
Copy link

cla-bot bot commented Mar 27, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla.

@cla-bot
Copy link

cla-bot bot commented Mar 27, 2022

The cla-bot has been summoned, and re-checked this pull request!

@skrzypo987
Copy link
Member

@martint Have you received a CLA from @theosib-amazon ?

@theosib-amazon
Copy link
Contributor Author

Should I resubmit my CLA some other way?

@theosib-amazon
Copy link
Contributor Author

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Mar 28, 2022
@cla-bot
Copy link

cla-bot bot commented Mar 28, 2022

The cla-bot has been summoned, and re-checked this pull request!

@martint
Copy link
Member

martint commented Mar 28, 2022

Merged as 086d56f. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants