Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

reading of parquet binary type, is very slow: #666

Closed
ldn9638 opened this issue Dec 8, 2021 · 1 comment · Fixed by #670
Closed

reading of parquet binary type, is very slow: #666

ldn9638 opened this issue Dec 8, 2021 · 1 comment · Fixed by #670
Labels
enhancement An improvement to an existing feature no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@ldn9638
Copy link

ldn9638 commented Dec 8, 2021

I find [ https://github.com/jorgecarleitao/arrow2/blob/main/src/io/parquet/read/binary/mod.rs#L34]
capacity cannot be pre allocated here,Cause ‘extend_from_Slice’ is very slow。
image

@jorgecarleitao
Copy link
Owner

Ahahaha I also found that out yesterday! Fully agree, we should pre-reserve this one for sure. Not that in plain, a value is always [4 bytes][values] and each page anounces how many values it contains, so we can pre-allocate exactly.

@jorgecarleitao jorgecarleitao added the bug Something isn't working label Dec 8, 2021
@jorgecarleitao jorgecarleitao added enhancement An improvement to an existing feature no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog and removed bug Something isn't working labels Dec 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants