Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Possible Congestion Scenario in SPM #12230

Closed
wants to merge 4 commits into from

Conversation

berkaysynnada
Copy link
Contributor

@berkaysynnada berkaysynnada commented Aug 29, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

During the initialization of cursors of SPM (SortPreservingMergeExec), it needs to get Poll::Ready() for each partition, to initiate the next partition cursor. We have experienced a scenario in our downstream where SPM is polling 1st partition continuously but never get a Poll::Ready(). Those polls get all upstream operator buffers and channels grow. However, if SPM have skipped that partition and tried to initiate other partitions before retry, we did not experience this problem (congestion in partition 1 is cleared once the other partitions are pulled).

What changes are included in this PR?

SPM does not wait initiating the next cursor partition. Instead, it iterates over all partitions continuously whether it is OK or not, until they are all successfully initiated.

Are these changes tested?

Yes, with a unit test. The test depends on a timeout duration to sense deadlock but I could also add a memory usage test.

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels Aug 29, 2024
@berkaysynnada berkaysynnada marked this pull request as draft August 29, 2024 12:06
@berkaysynnada berkaysynnada marked this pull request as ready for review September 2, 2024 12:33
@berkaysynnada
Copy link
Contributor Author

@tustvold @alamb, could you please take a look when you have time?

@berkaysynnada berkaysynnada marked this pull request as draft September 2, 2024 14:55
@berkaysynnada berkaysynnada marked this pull request as ready for review September 2, 2024 15:21
Update merge_fuzz.rs

Update merge.rs

Update merge.rs

Update merge.rs

Update merge_fuzz.rs

Update merge.rs

Update merge_fuzz.rs

Update merge.rs

Counter is not enough

Termination logic with counter
@berkaysynnada berkaysynnada force-pushed the feature/spm-cursor-init branch from 0449f47 to 4382945 Compare September 2, 2024 15:22
@berkaysynnada berkaysynnada marked this pull request as draft September 3, 2024 06:35
@alamb
Copy link
Contributor

alamb commented Sep 4, 2024

Follow on PR is #12302

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants