Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to continue lookups after adding peers #5993

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

dapplion
Copy link
Collaborator

Issue Addressed

Testing v5.2.1 a node hit a corner case of lookup sync causing a lookup to get stuck:

The lookup was originally created from an unknown parent block event with a block as child components. Therefore the original state of each request was:

  • block: AwaitingProcess
  • blobs: AwaitingDownload

When the parent imported successfully, it got called lookup continue_request. For each request:

  • block: request is in AwaitingProcess so the block was sent to processing
  • blobs: request is AwaitingDownload but the lookup has 0 peers so it doesn't make progress

When the block completes processing and it gets the MissingComponents, continue_request is called again, but the lookup still has 0 peers. So no progress is made.

Then immediately after that attempt at continue_request the lookup gets peers. When the drop_lookups_without_peers function runs, the lookup has peers but nothing is trying to make progress on it.

It appears the comment below is too optmistic. Yes this situation is a rare case but it results in lookups getting stuck. We should attempt to make progress on lookups when adding peers

// We may choose to attempt to continue a lookup here. It is possible that a lookup had zero
// peers and after adding this set of peers it can make progress again. Note that this
// recursive function iterates from child to parent, so continuing the child first is weird.
// However, we choose to not attempt to continue the lookup for simplicity. It's not
// strictly required and just and optimization for a rare corner case.

Proposed Changes

Attempt to make progress on existing lookups that are not awaiting parent and got new peers

@dapplion dapplion added ready-for-review The code is ready for review v5.2.1 Patch release for v5.2.0 labels Jun 25, 2024
Copy link
Member

@realbigsean realbigsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too!
It's indeed a rare case (wasn't easy for me to reproduce) and seem to happen on only few overwhelmed node mostly.

@jimmygchen
Copy link
Member

I'm merging this into the testing branch. Please do not merge to release branch yet.

jimmygchen added a commit that referenced this pull request Jun 26, 2024
Squashed commit of the following:

commit 6d3d06b
Author: dapplion <[email protected]>
Date:   Tue Jun 25 16:50:26 2024 +0200

    Attempt to continue lookups after adding peers
@jimmygchen jimmygchen removed the ready-for-review The code is ready for review label Jun 26, 2024
@jimmygchen
Copy link
Member

@mergify queue

Copy link

mergify bot commented Jun 26, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at b38019c

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed do-not-merge labels Jun 26, 2024
mergify bot added a commit that referenced this pull request Jun 26, 2024
@mergify mergify bot merged commit b38019c into sigp:release-v5.2.1 Jun 26, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v5.2.1 Patch release for v5.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants