Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

event: fix Resubscribe deadlock when unsubscribing after inner sub ends #28359

Merged
merged 4 commits into from
Oct 22, 2023

Conversation

Inphi
Copy link
Contributor

@Inphi Inphi commented Oct 17, 2023

A goroutine oversees the lifetime of subscriptions handled by resubscriptions. This goroutine terminates when the subscription ends without any errors. However, the resub goroutine needs to live long enough to read from the unsub channel. Otherwise, an Unsubscribe call deadlocks when writing to the unsub channel.

A goroutine is used to manage the lifetime of subscriptions managed by
resubscriptions. When the subscription ends with no error, the resub
goroutine ends as well. However, the resub goroutine needs to live
long enough to read from the unsub channel. Otheriwse, an Unsubscribe
call deadlocks when writing to the unsub channel.
Comment on lines 162 to 172
sub := ResubscribeErr(100*time.Millisecond, func(ctx context.Context, lastErr error) (Subscription, error) {
return NewSubscription(func(unsubscribed <-chan struct{}) error {
select {
case <-time.After(2 * time.Second):
innerSubDone <- struct{}{}
return nil
case <-unsubscribed:
return nil
}
}), nil
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too familiar with how the semantics of subscriptions work in any depth, but I've been trying to figure out if this is 'correct' or not. The docs for NewSubscription says

NewSubscription runs a producer function as a subscription in a new goroutine. The
channel given to the producer is closed when Unsubscribe is called. If fn returns an
error, it is sent on the subscription's error channel.

In other words, unsubscribed will be closed when externalities wants the producer to stop. In this testcase, however, the producer just stops producing, and exiting without returning any error or closing any channel. And yeah, that intuitively seems like something that could lead to a deadlock.

If the select is changed into

			select {
			case <-time.After(2 * time.Second):
				innerSubDone <- struct{}{}
				return errors.New("time to go")
			case <-unsubscribed:
				return nil
			}

Then the deadlock disappears.

So my thinking goes: this testcase is based on a flawed producer. But also, the NewSubscription documentation should mention something like "the producer must either exit with an error or keep listening to the given channel".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Clarifying the behavior of producers would be a better fix in that case. Will do that instead. Perhaps it'll be good to add defensive checks against nil producer errors by panicking in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, though: I'm no authority here. @fjl would know

Copy link
Contributor Author

@Inphi Inphi Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this documentation already clarifies the intended behavior based on if fn returns an error; implying that producers are permitted to return nil errors. And Subscription explicitly checks for a nil error here. While this behavior doesn't compose well with Resubscriptions, we should maintain the current API contract.

@fjl
Copy link
Contributor

fjl commented Oct 18, 2023

Maybe an easier fix would be adding a 1 buffer to unsub

@Inphi
Copy link
Contributor Author

Inphi commented Oct 18, 2023

@fjl Simplified the fix by making unsub buffered. Thanks for the suggestion.

sub := ResubscribeErr(100*time.Millisecond, func(ctx context.Context, lastErr error) (Subscription, error) {
return NewSubscription(func(unsubscribed <-chan struct{}) error {
select {
case <-time.After(2 * time.Second):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to have a timeout here. You can achieve the necessary synchronization event with another channel, like so

quitInner := make(chan struct{})
...
    select {
    case <-quitInner:
...

close(quitInner)
<-innerSubDone
sub.Unsubscribe()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion. Made the change.

@fjl fjl changed the title event: fix deadlock when unsubscribing a resub event: fix Resubscribe deadlock when unsubscribing after inner sub ends Oct 22, 2023
@fjl fjl merged commit ffc6a0f into ethereum:master Oct 22, 2023
1 check passed
@fjl fjl added this to the 1.13.5 milestone Oct 22, 2023
devopsbo3 pushed a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
…ds (ethereum#28359)

A goroutine is used to manage the lifetime of subscriptions managed by
resubscriptions. When the subscription ends with no error, the resub
goroutine ends as well. However, the resub goroutine needs to live
long enough to read from the unsub channel. Otheriwse, an Unsubscribe
call deadlocks when writing to the unsub channel.

This is fixed by adding a buffer to the unsub channel.
devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
devopsbo3 added a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
ajsutton added a commit to ethereum-optimism/op-geth that referenced this pull request Nov 12, 2023
Dergarcon pushed a commit to specialmechanisms/mev-geth-0x2mev that referenced this pull request Jan 31, 2024
…ds (ethereum#28359)

A goroutine is used to manage the lifetime of subscriptions managed by
resubscriptions. When the subscription ends with no error, the resub
goroutine ends as well. However, the resub goroutine needs to live
long enough to read from the unsub channel. Otheriwse, an Unsubscribe
call deadlocks when writing to the unsub channel.

This is fixed by adding a buffer to the unsub channel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants