Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FireFly crashed after attempting to send on closed websocket channel #265

Closed
awrichar opened this issue Oct 19, 2021 · 6 comments · Fixed by #298
Closed

FireFly crashed after attempting to send on closed websocket channel #265

awrichar opened this issue Oct 19, 2021 · 6 comments · Fixed by #298
Assignees
Labels
bug Something isn't working

Comments

@awrichar
Copy link
Contributor

While running E2E tests, I observed that one of the FireFly nodes suddenly went down and did not come back up. Logs (below) reveal that FireFly attempted to invoke wc.send from dispatchChangeEvent after the websocket was already closed, which triggered a fatal panic.

[2021-10-19T17:34:31.493Z] ERROR Read failed: websocket: close 1006 (abnormal closure): unexpected EOF pid=1 websocket=b7cd0598-f799-4a6b-a96c-c72b578665d0
[2021-10-19T17:34:31.493Z] DEBUG Sender closing pid=1 websocket=b7cd0598-f799-4a6b-a96c-c72b578665d0
[2021-10-19T17:34:31.493Z] DEBUG SQL<- commit dbtx=NHUhdPFo pid=1 role=aggregator
[2021-10-19T17:34:31.494Z] DEBUG Closing 1 dispatcher(s) for connection 'b7cd0598-f799-4a6b-a96c-c72b578665d0' pid=1
panic: send on closed channel

goroutine 834 [running]:
github.com/hyperledger/firefly/internal/events/websockets.(*websocketConnection).send(0xc0004be5a0, 0xc89840, 0xc0000e9410, 0x0, 0x0)
	/firefly/internal/events/websockets/websocket_connection.go:226 +0xce
github.com/hyperledger/firefly/internal/events/websockets.(*websocketConnection).dispatchChangeEvent(0xc0004be5a0, 0xc000101a40, 0xc000362780, 0x24)
	/firefly/internal/events/websockets/websocket_connection.go:180 +0x116
github.com/hyperledger/firefly/internal/events/websockets.(*WebSockets).ChangeEvent(0x1331000, 0xc000362780, 0x24, 0xc000101a40)
	/firefly/internal/events/websockets/websockets.go:96 +0xcc
github.com/hyperledger/firefly/internal/events.(*eventDispatcher).deliverEvents(0xc0002acc30)
	/firefly/internal/events/event_dispatcher.go:399 +0x617
created by github.com/hyperledger/firefly/internal/events.(*eventDispatcher).electAndStart
	/firefly/internal/events/event_dispatcher.go:147 +0x276
@awrichar
Copy link
Contributor Author

Seems like this may have been the same issue:
https://github.com/hyperledger/firefly/pull/264/checks?check_run_id=3942509622

@peterbroadhurst
Copy link
Contributor

@peterbroadhurst
Copy link
Contributor

Another instance fyi @awrichar or @nguyer if you have cycle for an investigation into this one:
https://github.com/hyperledger/firefly/actions/runs/1385766854

@nguyer
Copy link
Contributor

nguyer commented Oct 26, 2021

I just ran into this one too: https://github.com/hyperledger/firefly/runs/4014224317

I'll take a look and see what I can do to fix this. It's getting annoying.

@nguyer nguyer added the bug Something isn't working label Oct 26, 2021
@nguyer nguyer self-assigned this Oct 26, 2021
@awrichar
Copy link
Contributor Author

Reopening because I'm definitely still seeing this locally.

This is a confusing issue - but it boils down to 3 threads contending over the websocketConnection.sendMessages channel.

  • send() puts things into this channel
  • sendLoop() reads things from this channel
  • receiveLoop() closes this channel

The problem actually isn't related to websockets at all. It's simply the way this Go channel is managed, and the fact that one method may close it and then another attempt to use it.

@peterbroadhurst
Copy link
Contributor

https://github.com/hyperledger/firefly/actions/runs/1720901977 hit this in #418 e2e

�[34mfirefly_core_0_1  |�[0m [2022-01-20T00:21:06.319Z] ERROR Read failed: websocket: close 1006 (abnormal closure): unexpected EOF pid=1 websocket=4c783f1d-2ea6-4346-b7a3-17e30d0fefe8
�[34mfirefly_core_0_1  |�[0m panic: send on closed channel
�[34mfirefly_core_0_1  |�[0m 
�[34mfirefly_core_0_1  |�[0m goroutine 2490 [running]:
�[34mfirefly_core_0_1  |�[0m github.com/hyperledger/firefly/internal/events/websockets.(*websocketConnection).send(0xc000a91f40, 0xe0dac0, 0xc000383b60, 0x0, 0x0)
�[34mfirefly_core_0_1  |�[0m 	/firefly/internal/events/websockets/websocket_connection.go:229 +0xdd
�[34mfirefly_core_0_1  |�[0m github.com/hyperledger/firefly/internal/events/websockets.(*websocketConnection).dispatchChangeEvent(0xc000a91f40, 0xc0003a10e0, 0xc0003ed170, 0x24)
�[34mfirefly_core_0_1  |�[0m 	/firefly/internal/events/websockets/websocket_connection.go:180 +0x116
�[34mfirefly_core_0_1  |�[0m github.com/hyperledger/firefly/internal/events/websockets.(*WebSockets).ChangeEvent(0x162d2a0, 0xc0003ed170, 0x24, 0xc0003a10e0)
�[34mfirefly_core_0_1  |�[0m 	/firefly/internal/events/websockets/websockets.go:96 +0xcc
�[34mfirefly_core_0_1  |�[0m github.com/hyperledger/firefly/internal/events.(*eventDispatcher).deliverEvents(0xc000aba000)
�[34mfirefly_core_0_1  |�[0m 	/firefly/internal/events/event_dispatcher.go:399 +0x617
�[34mfirefly_core_0_1  |�[0m created by github.com/hyperledger/firefly/internal/events.(*eventDispatcher).electAndStart
�[34mfirefly_core_0_1  |�[0m 	/firefly/internal/events/event_dispatcher.go:147 +0x276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants