-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RPCClient Deadlock on Unsubscribe and NewHead #14236
Conversation
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -130,6 +131,33 @@ func TestRPCClient_SubscribeNewHead(t *testing.T) { | |||
assert.Equal(t, int64(0), highestUserObservations.FinalizedBlockNumber) | |||
assert.Equal(t, (*big.Int)(nil), highestUserObservations.TotalDifficulty) | |||
}) | |||
t.Run("Concurrent Unsubscribe and onNewHead calls do not lead to a deadlock", func(t *testing.T) { | |||
const numberOfAttempts = 1000 // need a large number to increase the odds of reproducing the issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this only reproduces the issue probabilistically, does that make the test flaky? Or is it just that there will be a low chance of it PASS'ing with a false positive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably just creating a potential race condition, and now we have the right locking, we don't expect the test to fail/deadlock happens. So it shouldn't be flaky I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reductionista, flaky behavior will only occur if there is regression. 1000 attempts seem to be large enough. Before the fix, the test failed on each run.
* Fix RPCClient Deadlock on Unsubscribe and NewHead * changeset (cherry picked from commit 0294e1f)
* Fix RPCClient Deadlock on Unsubscribe and NewHead * changeset (cherry picked from commit 0294e1f)
Cherry-pick of smartcontractkit/chainlink#14236 --------- Co-authored-by: Dmytro Haidashenko <[email protected]>
Cherry-pick of smartcontractkit/chainlink#14236 --------- Co-authored-by: Dmytro Haidashenko <[email protected]>
Ticket
Concurrent calls to
onNewHead
andUnsubscribeAllExceptAliveLoop
caused deadlock onstateMu
. As a result, any call to any configured RPC for that chain is blocked.PR addresses the issue by adding a separate lock to sync operations related to ChainInfo separately from subscription management.