Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parquet Performance Optimization: StructArrayReader Redundant Level & Bitmap Computation #1034

Closed
tustvold opened this issue Dec 12, 2021 · 0 comments · Fixed by #1035
Closed
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@tustvold
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

StructArrayReader currently always computes repetition and definition level buffers, along with a null bitmask - see here. In the case where the struct array is not nullable, and has no nullable or repeated parents, this is redundant. The bitmask will be all true, and no parent array reader is going to consult the levels buffers. This situation will arise in the common case of a flat schema.

Describe the solution you'd like

Skip the definition and repetition level logic in the case where the definition level and repetition level of the struct is 0.

Describe alternatives you've considered

The logic could remain the same

@tustvold tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Dec 12, 2021
tustvold added a commit to tustvold/arrow-rs that referenced this issue Dec 12, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
tustvold added a commit to tustvold/arrow-rs that referenced this issue Dec 12, 2021
@alamb alamb changed the title StructArrayReader Redundant Level & Bitmap Computation parquet Performance Optimization: StructArrayReader Redundant Level & Bitmap Computation Dec 20, 2021
@alamb alamb added the parquet Changes to the parquet crate label Dec 20, 2021
alamb pushed a commit that referenced this issue Jan 11, 2022
…uct arrays in parquet (#1035)

* Skip levels computation for required struct arrays (#1034)

* Review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
2 participants