Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some memory reservations of GroupedHashAggregateStream seem to be mis-tagged as spillable while they do not allow spilling #11390

Open
Ablu opened this issue Jul 10, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Ablu
Copy link

Ablu commented Jul 10, 2024

Describe the bug

When using a grouping (I tested with distinct_on) in combination with a FairSpillPool not all memory seems to be tagged correctly. While the grouping itself does gracefully handle allocation errors by spilling, the same reservation is also shared with a BatchBuilder that does not.

To Reproduce

(Tested on Linux, Fedora 40)

  1. Clone my reproducer: https://github.com/Ablu/datafusion-repro-resource-allocation (chances are that this can be simplified further)
  2. Generate test file: cargo run --release -- generate out.parquet
  3. increase open file limit: ulimit -nS 102400
  4. Attempt to run the deduplication query: cargo run --release -- deduplicate out.parquet dedup.parquet
Failure to write back final result: ResourcesExhausted("Failed to allocate additional 8939616 bytes for GroupedHashAggregateStream[10] with 58373092 bytes already allocated - maximum available is 62500000")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It looks like it tries to acquire from the smaller "spillable" part of the memory while it should probably allocate with a non-spillable reservation.

As a hack I remove the offending resource allocations:

diff --git a/datafusion/physical-plan/src/sorts/builder.rs b/datafusion/physical-plan/src/sorts/builder.rs
index d32c60697..d0305c0bf 100644
--- a/datafusion/physical-plan/src/sorts/builder.rs
+++ b/datafusion/physical-plan/src/sorts/builder.rs
@@ -65,15 +65,15 @@ impl BatchBuilder {
             indices: Vec::with_capacity(batch_size),
             reservation,
         }
     }
 
     /// Append a new batch in `stream_idx`
     pub fn push_batch(&mut self, stream_idx: usize, batch: RecordBatch) -> Result<()> {
-        self.reservation.try_grow(batch.get_array_memory_size())?;
+        // dbg!(self.reservation.try_grow(batch.get_array_memory_size()))?;
         let batch_idx = self.batches.len();
         self.batches.push((stream_idx, batch));
         self.cursors[stream_idx] = BatchCursor {
             batch_idx,
             row_idx: 0,
         };
         Ok(())
@@ -137,15 +137,15 @@ impl BatchBuilder {
             let retain = stream_cursor.batch_idx == batch_idx;
             batch_idx += 1;
 
             if retain {
                 stream_cursor.batch_idx = retained;
                 retained += 1;
             } else {
-                self.reservation.shrink(batch.get_array_memory_size());
+                // self.reservation.shrink(batch.get_array_memory_size());
             }
             retain
         });
 
         Ok(Some(RecordBatch::try_new(
             Arc::clone(&self.schema),
             columns,

That avoids the immediate error. It looks like we progress further, but eventually end in some kind of deadlock (that I have not fully understand yet, but it does not seem to be related to this hack?)

Expected behavior

The query + writeback should pass after reading back spilled data.

Additional context

Originally discussed/analyzed on Discord: https://discord.com/channels/885562378132000778/1166447479609376850/1260302583637999756

@Ablu Ablu added the bug Something isn't working label Jul 10, 2024
@alamb
Copy link
Contributor

alamb commented Jul 10, 2024

cc @kazuyukitanimura

@Ablu
Copy link
Author

Ablu commented Aug 5, 2024

Simply sorting seems to run into the same kind of problems:

External(External(ResourcesExhausted("Failed to allocate additional 2470912 bytes for ExternalSorterMerge[0] with 530680832 bytes already allocated for this reservation - 2079445 bytes remain available for the total pool")))

EDIT: that is #5108 probably?

@alamb
Copy link
Contributor

alamb commented Aug 8, 2024

Hi @Ablu -- it could be.

In general I think our support of ensuring a sort can spill might need some tweaking / help -- once the system is under memory pressure (aka the pool is almost exhausted) several operations to actually spill require (more) memory to actually do so (such as merging the in memory buffers)

So in other words, the actual process of trying to spill can itself trigger the "resources exhausted" error

One approach I can think of is to turn off resource checking while spilling -- this would be simple to implement, but would result in a memory overshoot.

An alternate is to pre-reserve memory for the spilling before memory pressure is too high, but this would likely result in memory that is almost never used (only held in reserve in case it was needed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants