Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SkipRows utilize excessive memory #545

Open
hangxie opened this issue Jul 3, 2023 · 0 comments
Open

SkipRows utilize excessive memory #545

hangxie opened this issue Jul 3, 2023 · 0 comments

Comments

@hangxie
Copy link
Contributor

hangxie commented Jul 3, 2023

I'm testing SkipRows with https://dpla-provider-export.s3.amazonaws.com/2021/04/all.parquet/part-00000-471427c6-8097-428d-9703-a751a6572cca-c000.snappy.parquet which is about 4.4GB and contains 14M+ records, it seems larger skip number causes layout.(*Table).Merge uses excessive memory usage, here are output from profile with different skip numbers

  • 10K
    495.67MB 82.56% 82.56% 495.67MB 82.56% github.com/xitongsys/parquet-go/layout.(*Table).Merge
  • 100K
    630.81MB 79.52% 79.52% 630.81MB 79.52% github.com/xitongsys/parquet-go/layout.(*Table).Merge
  • 1M
    2.48GB 80.16% 80.16% 2.48GB 80.16% github.com/xitongsys/parquet-go/layout.(*Table).Merge
  • 10M
    20.28GB 84.17% 84.17% 20.28GB 84.17% github.com/xitongsys/parquet-go/layout.(*Table).Merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant