Is there a way to reduce memory utilization in dictPageReader/PageReader ? #90

vikramarsid · 2022-07-08T17:19:50Z

Describe the bug
I am trying to load multiple concurrent parquet files into memory and try to read them row by row. I am facing OOM issue while I read 10 concurrent file of 50MB each. Do you see any obvious things in the call graph ? Thank you!!

Unit test to reproduce
Please provide a unit test, either as a patch or text snippet or link to your fork. If you can't isolate it into a unit test then please provide steps to reproduce.

parquet-go specific details

What version are you using? v0.11.0

Misc Details

Are you using AWS Athena, Google BigQuery, presto... ? AWS S3
Any other relevant details... how big are the files / rowgroups you're trying to read/write? 10 - 100 MB
Do you have memory stats to share? Yes
Can you provide a stacktrace? Yes

vikramarsid changed the title ~~Is there a way to reduce memory utilization in deictPageReader/PageReader ?~~ Is there a way to reduce memory utilization in dictPageReader/PageReader ? Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to reduce memory utilization in dictPageReader/PageReader ? #90

Is there a way to reduce memory utilization in dictPageReader/PageReader ? #90

vikramarsid commented Jul 8, 2022 •

edited

Loading

Is there a way to reduce memory utilization in dictPageReader/PageReader ? #90

Is there a way to reduce memory utilization in dictPageReader/PageReader ? #90

Comments

vikramarsid commented Jul 8, 2022 • edited Loading

vikramarsid commented Jul 8, 2022 •

edited

Loading