Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RPCClient Deadlock on Unsubscribe and NewHead #14236

Merged
merged 3 commits into from
Aug 26, 2024

Conversation

dhaidashenko
Copy link
Collaborator

Ticket

Concurrent calls to onNewHead and UnsubscribeAllExceptAliveLoop caused deadlock on stateMu. As a result, any call to any configured RPC for that chain is blocked.

PR addresses the issue by adding a separate lock to sync operations related to ChainInfo separately from subscription management.

@dhaidashenko dhaidashenko marked this pull request as ready for review August 26, 2024 17:49
@dhaidashenko dhaidashenko requested review from a team as code owners August 26, 2024 17:49
@dhaidashenko dhaidashenko requested review from Atrax1 and removed request for a team August 26, 2024 17:49
@DylanTinianov DylanTinianov self-requested a review August 26, 2024 18:23
Copy link
Contributor

@DylanTinianov DylanTinianov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Contributor

@amit-momin amit-momin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -130,6 +131,33 @@ func TestRPCClient_SubscribeNewHead(t *testing.T) {
assert.Equal(t, int64(0), highestUserObservations.FinalizedBlockNumber)
assert.Equal(t, (*big.Int)(nil), highestUserObservations.TotalDifficulty)
})
t.Run("Concurrent Unsubscribe and onNewHead calls do not lead to a deadlock", func(t *testing.T) {
const numberOfAttempts = 1000 // need a large number to increase the odds of reproducing the issue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this only reproduces the issue probabilistically, does that make the test flaky? Or is it just that there will be a low chance of it PASS'ing with a false positive?

Copy link
Contributor

@huangzhen1997 huangzhen1997 Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just creating a potential race condition, and now we have the right locking, we don't expect the test to fail/deadlock happens. So it shouldn't be flaky I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reductionista, flaky behavior will only occur if there is regression. 1000 attempts seem to be large enough. Before the fix, the test failed on each run.

@prashantkumar1982 prashantkumar1982 added this pull request to the merge queue Aug 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 26, 2024
@prashantkumar1982 prashantkumar1982 added this pull request to the merge queue Aug 26, 2024
Merged via the queue into develop with commit 0294e1f Aug 26, 2024
134 checks passed
@prashantkumar1982 prashantkumar1982 deleted the fix/BCI-4095-rpc-deadlock branch August 26, 2024 21:58
dhaidashenko added a commit that referenced this pull request Aug 27, 2024
* Fix RPCClient Deadlock on Unsubscribe and NewHead

* changeset

(cherry picked from commit 0294e1f)
dhaidashenko added a commit that referenced this pull request Aug 27, 2024
* Fix RPCClient Deadlock on Unsubscribe and NewHead

* changeset

(cherry picked from commit 0294e1f)
0xnogo added a commit to smartcontractkit/ccip that referenced this pull request Sep 11, 2024
0xnogo added a commit to smartcontractkit/ccip that referenced this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants