-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-allocate definition & repetition levels; perf improvement #11675
Conversation
Testing shows 5% to 15% performance improvement on a lot of queries by pre-reserving capacity for definitionLevels and repetitionLevels.
Pre-allocate capacity for definition & repetition levels
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla. |
I submitted my CLA a week ago; hopefully that will be acted upon soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please get rid of the merge commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % Raunaq comment
@cla-bot check |
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/trinodb/cla. |
The cla-bot has been summoned, and re-checked this pull request! |
@martint Have you received a CLA from @theosib-amazon ? |
Should I resubmit my CLA some other way? |
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
Merged as 086d56f. Thanks! |
Testing shows 5% to 15% performance improvement on a lot of queries by pre-reserving capacity for definitionLevels and repetitionLevels.
Description
I did profiling on Trino while doing TPCDS queries from a Parquet source and noticed that a lot of them were wasting time on array copy while growing capacity of the definitionLevels and definitionLevels integer array lists. Pre-reserving capacity sped up a lot of queries, some as much as 15%.
Improvement
Change to the parquet reader component
Speed up parquet column reading a bit by pre-reserving capacity in some of the containers.
Related issues, pull requests, and links
Performance optimization
Documentation
(x) No documentation is needed.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: