Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use reprocessing queue to send HTTP responses for duplicate blocks #4643

Closed

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Closes #4473

Proposed Changes

In order to avoid returning HTTP 400s when valid duplicate blocks are posted to the API:

  • Split the BlockIsAlreadyKnown error into three variants for "valid", "invalid" and "processing". If a block is valid (i.e. in fork choice with valid status) then we can return a 200 straight away. Conversely if it's invalid (i.e. in fork choice with invalid status) we can return a 400 straight away.
  • Classify a block as "processing" if it is optimistically imported or if it is present in the gossip filter but not (yet) present in fork choice.
  • To handle "processing" blocks, use the reprocessing queue to wait for a block with the same block root to be imported. Send a 200 as soon as one is.
  • Timeout on the reprocess queue after 2 seconds, at which point the status will be checked once more in fork choice, before a 400 is returned.

Additional Info

There are still several ways in which this behaves sub-optimally

Sub-optimal case 1: block is invalid

We don't usually add invalid blocks to fork choice (only if they are imported optimistically then invalidated), and we also don't remember invalid blocks outside of the networking crate. Therefore if we receive an invalid block over HTTP which we've already processed and rejected, we will get a BlockAlreadyKnownProcessingOrInvalid from gossip verification. This will then be force us to wait the full 2 second timeout, at which point we'll return a 400. Other than the long wait, this is the correct behaviour (400 for an invalid block).

We initially talked about using the DuplicateCache from networking to guess whether a block is processing or invalid, but I haven't made this change because it would require making the duplicate cache accessible from the beacon_chain. This seems like a delicate change that might require more careful thought and justification than this PR (which is already a large code change just to fix a slightly annoying corner case).

Sub-optimal case 2: valid block race

  1. Check fork choice: block is not there.
  2. Check gossip cache: block is there (it is being processed).
  3. Block finishes processing.
  4. Wait 2 seconds for a notification that will never arrive.
  5. Timeout, re-check fork choice, see the valid status, and return 200 OK.

This is also the correct behaviour, just with a sub-optimal 2 second wait. In practice I think block processing is unlikely to finish before step (4) if it was started at a similar time to step (1), which is the case for block relays. This is because gossip validation is much faster than full block import. For home users connected to relays it will be more likely, but in this case there's no harm done, just a slower HTTP response to the VC.

Sub-optimal case 3: slow-to-process valid blocks

If a block takes longer than 2s to process, then we'll hit the timeout and return a 400. This is the same as our behaviour today but hopefully reduced in frequency enough to not be too annoying.

@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress v4.4.1 ETA August 2023 labels Aug 21, 2023
@michaelsproul
Copy link
Member Author

Closing this as it's too complicated and buggy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v4.4.1 ETA August 2023 work-in-progress PR is a work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant