-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve repartition buffering #4865
Labels
enhancement
New feature or request
Comments
I'm working on this (code is done, I "just" need a solid amount of tests and good docs). |
crepererum
added a commit
to crepererum/arrow-datafusion
that referenced
this issue
Jan 10, 2023
crepererum
added a commit
to crepererum/arrow-datafusion
that referenced
this issue
Jan 10, 2023
crepererum
added a commit
to crepererum/arrow-datafusion
that referenced
this issue
Jan 13, 2023
alamb
added a commit
that referenced
this issue
Jan 13, 2023
* refactor: improve repartition buffering Closes #4865. * fix: remove dead code * refactor: improve timer handling * docs: improve Co-authored-by: Andrew Lamb <[email protected]> * docs: explain test * test: extend `test_close_channel_by_dropping_rx_on_closed_gate` * test: `test_close_channel_by_dropping_rx_on_closed_gate` Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In #4820 @alamb and I discussed that the repartition node could have a slightly smarter buffering. This is a tracking issue for this.
Describe the solution you'd like
While the repartition node needs an unbounded buffer to prevent dead locks, it doesn't need to buffer unlimited amount of data in all cases. To be precise: if ALL output channels have data (i.e. are not empty), than the input workers can be paused. However if it least one output channel is empty, we need to drive the input workers. In the worst case, a few channels will fill up with unbounded data but one channel will forever stay empty. Realistically, this will not happen for any reasonable repartition configuration.
Describe alternatives you've considered
Keeping the current state.
Additional context
-
The text was updated successfully, but these errors were encountered: