You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when writing big parquet files using AsyncArrowWriter, we found that the memory usage is unexpectedly high, and sometimes makes the process run out of memory.
the bug is likely in the following code. it tried to trigger flushing once the buffer size reaches half of the capacity. however, when data is written into buffer, the capacity also increases along with size. so this condition is not working expectedly.
if !force && buffer.len() < buffer.capacity() / 2{
To Reproduce
read a big parquet file, then write to another file with AsyncArrowWriter. since reading is ususally faster than writing. data will be buffered but not correctly flushed, causing OOM.
Expected behavior
trigger flushing with the constant initial buffer capacity.
Additional context
The text was updated successfully, but these errors were encountered:
Describe the bug
when writing big parquet files using
AsyncArrowWriter
, we found that the memory usage is unexpectedly high, and sometimes makes the process run out of memory.the bug is likely in the following code. it tried to trigger flushing once the buffer size reaches half of the capacity. however, when data is written into buffer, the capacity also increases along with size. so this condition is not working expectedly.
arrow-rs/parquet/src/arrow/async_writer/mod.rs
Line 145 in aac3aa9
To Reproduce
read a big parquet file, then write to another file with
AsyncArrowWriter
. since reading is ususally faster than writing. data will be buffered but not correctly flushed, causing OOM.Expected behavior
trigger flushing with the constant initial buffer capacity.
Additional context
The text was updated successfully, but these errors were encountered: