-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External sorting not working for (maybe only for string columns??) #12136
Comments
This is probably unrelated, but there's something that puzzles me about the fair spill pool logic. The Consider a single spillable consumer that allocates the entire pool. It shouldn't be able to grow the resulting reservation. But if it splits the reservation into two, it can now grow it - even when the other reservation remains allocated. Here's a failing test showing this:
Let me know if I'm misunderstanding something here. |
I could reproduce the pod freezing up at 1TB disk usage. This happened within a minute of spawning 500 threads. All threads were blocked on a mutex (
It actually makes sense because spill files use Apache IPC format without compression, while the partition uses Parquet files with Snappy compression. |
This change replaces `try_resize` with `resize` in three sites, allowing memory to overshoot the configured pool size. These are sites where we don't fall back to spilling to disk when the allocation fails. Fixes: apache#12136
This change replaces `try_resize` with `resize` in three sites, allowing memory to overshoot the configured pool size. These are sites where we don't fall back to spilling to disk when the allocation fails. Fixes: apache#12136
This change replaces `try_resize` with `resize` in three sites, allowing memory to overshoot the configured pool size. These are sites where we don't fall back to spilling to disk when the allocation fails. Fixes: apache#12136
This change replaces `try_resize` with `resize` in three sites, allowing memory to overshoot the configured pool size. These are sites where we don't fall back to spilling to disk when the allocation fails. Fixes: apache#12136
Also hitting this bug, is there any update on a fix ? |
I don't know of anyone working explicitly working to make external sorting better. Some recent work maybe would make it better
But realistically I think there is significant additional room for improvement. Any help welcome |
I don't believe that adding an option for compression in DiskManager would be all that difficult - the IPCWriter and reader already has support for LZ4. It would of course incur the cost of compression/decompression but would likely result in 50+% space savings on disk. It's a nice to have though - I think of it more as a band-aid than a solution. |
For the given reproducer, I got the error
Configure datafusion.execution.sort_spill_reservation_bytes to 1MB can let it run successfully. ( I don't know whether the parquet related error message is caused by the same issue) // Reproducer: place in datafusion/core/tests/memory_limit/mod.rs
#[tokio::test]
async fn test_sort_with_memory_limit() -> Result<()> {
// initialize logging to see DataFusion's internal logging
let _ = env_logger::try_init();
// how much data to sort
let row_limit = 10 * 1000;
let mem_limit = 10 * 1024 * 1024; // 10 MB
let sort_spill_reservation_bytes = 1024 * 1024; // 1 MB
let generator = AccessLogGenerator::new()
.with_row_limit(row_limit)
.with_max_batch_size(100); // 100 rows per batch
let pool = Arc::new(GreedyMemoryPool::new(mem_limit));
let runtime = RuntimeEnvBuilder::new()
.with_memory_pool(pool)
.with_disk_manager(DiskManagerConfig::new())
.build()?;
let session_config = SessionConfig::new()
.with_sort_spill_reservation_bytes(sort_spill_reservation_bytes);
let state = SessionStateBuilder::new()
.with_config(session_config)
.with_runtime_env(Arc::new(runtime))
.build();
let ctx = SessionContext::new_with_state(state);
// create a plan that simply sorts on the hostname
let df = ctx
.read_batches(generator)?
.sort(vec![col("host").sort(true, true)])?;
// execute the plan (it should succeed)
let _results: Vec<RecordBatch> = df.collect().await?;
Ok(())
} Reasons:
Thoughts: |
@2010YOUY01 Your solution does not work for me. I did play around with |
We have also encountered this issue. After some debugging (by adding debug logs before every call to I set After insert some batches, the pool is full and start to in-memory sort, at this moment the pool usage is: datafusion/datafusion/physical-plan/src/sorts/sort.rs Lines 440 to 443 in 0228bee
finished without error. After sort, all memory are released. Then the following code count the sorted batches' memory: datafusion/datafusion/physical-plan/src/sorts/sort.rs Lines 445 to 449 in 0228bee
And this size is: 698840964, which exceeds the memory limit and the following self.reservation.try_resize(size)?; failed:
Not sure why the sorted batches' memory is over 2.6x than the batches before sort. |
Describe the bug
Filing a ticket based on a conversation in discord: https://discord.com/channels/885562378132000778/1166447479609376850/1275728622224932959
Basically, I expect that when properly configured, DataFusion would be able to sort data that doesn't fit in RAM, but instead it results in an error like
To Reproduce
Here is a reproducer: rust_playground.tar.gz
tar xf rust_playground.tar.gz cd rust_playground/ cargo run
The code looks like this
Expected behavior
I expect the query to succeed (by spilling data to disk, etc)
Additional context
@westonpace notes #10073 may be related
Here is some of the commentary from discord:
datafusion/datafusion/physical-plan/src/sorts/sort.rs
Line 303 in a50aeef
datafusion/datafusion/physical-plan/src/sorts/sort.rs
Line 434 in a50aeef
datafusion/datafusion/physical-plan/src/sorts/sort.rs
Line 513 in a50aeef
datafusion/datafusion/physical-plan/src/sorts/builder.rs
Line 72 in a50aeef
datafusion/datafusion/physical-plan/src/sorts/sort.rs
Line 434 in a50aeef
datafusion/datafusion/physical-plan/src/sorts/sort.rs
Line 572 in a50aeef
datafusion/datafusion/physical-plan/src/sorts/builder.rs
Line 72 in a50aeef
The text was updated successfully, but these errors were encountered: