Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare occurences of panic: send on closed channel in rpc/subscription.go #22322

Closed
samsondav opened this issue Feb 12, 2021 · 4 comments · Fixed by #22597
Closed

Rare occurences of panic: send on closed channel in rpc/subscription.go #22322

samsondav opened this issue Feb 12, 2021 · 4 comments · Fixed by #22597
Assignees
Labels

Comments

@samsondav
Copy link

samsondav commented Feb 12, 2021

System information

Geth version: 1.9.25
OS & Version: Linux

Expected behaviour

We are calling rpc.Unsubscribe like so:

	client.SubscribeFilterLogs(ctx, filterQuery, ch)
        # ... some time later ...
	client.Unsubscribe()
	close(ch)

I would expect this to always be safe.

Actual behaviour

In very rare cases this can panic with send on closed channel. It seems that messages can still be sent on the passed in channel even after Unsubscribe returns.

We found this while load testing with fast resubscribes and hundreds of thousands of iterations.

Steps to reproduce the behaviour

Almost impossible to reproduce. Extremely high rates of subscribe/unsubscribe with channel close as detailed above might expose it.

Backtrace

panic: send on closed channel

goroutine 276873727 [running]:
reflect.rselect(0xc03196b860, 0x3, 0x3, 0x188458e, 0xe)
	/usr/local/go/src/runtime/select.go:544 +0x38a
reflect.Select(0xc013d83ee0, 0x3, 0x3, 0xc02e50f6e0, 0x1796e20, 0xc0353b62c0, 0x0, 0x0)
	/usr/local/go/src/reflect/value.go:2229 +0x170
github.com/ethereum/go-ethereum/rpc.(*ClientSubscription).forward(0xc005e0c090, 0xc0324ef100, 0x0, 0x0)
	/root/go/pkg/mod/github.com/ethereum/[email protected]/rpc/subscription.go:299 +0x2db
github.com/ethereum/go-ethereum/rpc.(*ClientSubscription).start(0xc005e0c090)
	/root/go/pkg/mod/github.com/ethereum/[email protected]/rpc/subscription.go:279 +0x2b
created by github.com/ethereum/go-ethereum/rpc.(*handler).handleResponse
	/root/go/pkg/mod/github.com/ethereum/[email protected]/rpc/handler.go:284 +0x23d

Probable cause and suggested fix

It seems that quitWithError (

func (sub *ClientSubscription) quitWithError(unsubscribeServer bool, err error) {
) can race with forward (
func (sub *ClientSubscription) forward() (unsubscribeServer bool, err error) {
) causing messages to be sent even after Unsubscribe returns.

A coordination channel so that quitWithError waits until forward has executed could resolve it.

@samsondav samsondav changed the title Rare occurences of panic: send on closed channel in rpc/subscription.go Rare occurences of panic: send on closed channel in rpc/subscription.go Feb 12, 2021
samsondav added a commit to smartcontractkit/go-ethereum that referenced this issue Feb 15, 2021
@fjl fjl self-assigned this Feb 16, 2021
@fjl
Copy link
Contributor

fjl commented Mar 25, 2021

I have checked this issue for quite a while today. I'm not sure how this case can occur, unless sync.Once is somehow broken.

There is only one write to the sub.err channel, and it happens in quitWithError, which is protected by quitOnce.
The channel is closed in Unsubscribe, which calls quitWithError before closing the channel. The behavior of sync.Once.Do(f) is that "only the first call will invoke f" and "no call to Do returns until the one call to f returns". This implies that the send must finish before the channel is closed.

@samsondav did this issue occur only once, or do you have a way to reproduce it?

@fjl
Copy link
Contributor

fjl commented Mar 26, 2021

Damn, I just now realize that this panic is not related to sub.err. The panic happens because Unsubscribe does not wait for forward to return after closing sub.quit.

fjl added a commit that referenced this issue Mar 30, 2021
This fixes a rare issue where the client subscription forwarding loop
would attempt send on the subscription's channel after Unsubscribe has
returned, leading to a panic if the subscription channel was already
closed by the user. Example:

    sub, _ := client.Subscribe(..., channel, ...)
    sub.Unsubscribe()
    close(channel)

The race occurred because Unsubscribe called quitWithServer to tell the
forwarding loop to stop sending on sub.channel, but did not wait for the
loop to actually come down. This is fixed by adding an additional channel
to track the shutdown, on which Unsubscribe now waits.

Fixes #22322
@shoenseiwaso
Copy link
Contributor

@samsondav thanks for reporting, @fjl thanks for fixing.

For future posterity, we have encountered a deadlock (not a panic) which #22597 has fixed, where:

  • Websocket read error is received and the dispatch loop called conn.close()
  • At the same time, we call Unsubscribe() from another goroutine
  • Unsubscribe() wins the race into quitWithError() (refactored away in rpc: tighter shutdown synchronization in client subscription #22597) but then gets stuck because...
  • ...the dispatch loop is stuck waiting on the Once.Do() in quitWithError()

Documenting here in case it comes up in the future. We have seen increased read errors of the form websocket: close 1011 (internal server error): Internal response error from one of our node providers, which was triggering this issue every few days.

@fjl
Copy link
Contributor

fjl commented Apr 27, 2021

Very good to hear it fixed another problem as well.

atif-konasl pushed a commit to frozeman/pandora-execution-engine that referenced this issue Oct 15, 2021
…m#22597)

This fixes a rare issue where the client subscription forwarding loop
would attempt send on the subscription's channel after Unsubscribe has
returned, leading to a panic if the subscription channel was already
closed by the user. Example:

    sub, _ := client.Subscribe(..., channel, ...)
    sub.Unsubscribe()
    close(channel)

The race occurred because Unsubscribe called quitWithServer to tell the
forwarding loop to stop sending on sub.channel, but did not wait for the
loop to actually come down. This is fixed by adding an additional channel
to track the shutdown, on which Unsubscribe now waits.

Fixes ethereum#22322
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants