Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cat uses excessive memory when using large skip number #219

Closed
RamilNurtdinov opened this issue Jul 3, 2023 · 3 comments
Closed

cat uses excessive memory when using large skip number #219

RamilNurtdinov opened this issue Jul 3, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@RamilNurtdinov
Copy link

Hi.
I am trying to split a big (5Gb) parquet file into small ones using "--skip" and "--limit"
Skip 5 million and write 6-th, skip 6 million and write 7-th.

However during skipping "parquet-tools cat" start consuming all my memory, reaching 128Gb it goes to swap and eventually killed.
Can you check the memory usage during "--skip"

Ramil
.

@hangxie
Copy link
Owner

hangxie commented Jul 3, 2023

yeah it seems an issue from parquet-go that use excessive memory with skip, I need some time to troubleshoot the issue.

FYI -limit is not an issue, it's problem from --skip

@hangxie
Copy link
Owner

hangxie commented Jul 3, 2023

Filed ticket to parquet-go xitongsys/parquet-go#545

@hangxie hangxie changed the title memory cat uses excessive memory when using large skip number Jul 3, 2023
@hangxie
Copy link
Owner

hangxie commented Jul 3, 2023

try v1.19.5, it has a hardcoded page size for SkipRows, I will come up with a cli parameter to make this configurable (#221)

@hangxie hangxie closed this as completed Jul 5, 2023
@hangxie hangxie added the bug Something isn't working label Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants