Use reprocessing queue to send HTTP responses for duplicate blocks #4643

michaelsproul · 2023-08-21T07:40:57Z

Issue Addressed

Closes #4473

Proposed Changes

In order to avoid returning HTTP 400s when valid duplicate blocks are posted to the API:

Split the BlockIsAlreadyKnown error into three variants for "valid", "invalid" and "processing". If a block is valid (i.e. in fork choice with valid status) then we can return a 200 straight away. Conversely if it's invalid (i.e. in fork choice with invalid status) we can return a 400 straight away.
Classify a block as "processing" if it is optimistically imported or if it is present in the gossip filter but not (yet) present in fork choice.
To handle "processing" blocks, use the reprocessing queue to wait for a block with the same block root to be imported. Send a 200 as soon as one is.
Timeout on the reprocess queue after 2 seconds, at which point the status will be checked once more in fork choice, before a 400 is returned.

Additional Info

There are still several ways in which this behaves sub-optimally

Sub-optimal case 1: block is invalid

We don't usually add invalid blocks to fork choice (only if they are imported optimistically then invalidated), and we also don't remember invalid blocks outside of the networking crate. Therefore if we receive an invalid block over HTTP which we've already processed and rejected, we will get a BlockAlreadyKnownProcessingOrInvalid from gossip verification. This will then be force us to wait the full 2 second timeout, at which point we'll return a 400. Other than the long wait, this is the correct behaviour (400 for an invalid block).

We initially talked about using the DuplicateCache from networking to guess whether a block is processing or invalid, but I haven't made this change because it would require making the duplicate cache accessible from the beacon_chain. This seems like a delicate change that might require more careful thought and justification than this PR (which is already a large code change just to fix a slightly annoying corner case).

Sub-optimal case 2: valid block race

Check fork choice: block is not there.
Check gossip cache: block is there (it is being processed).
Block finishes processing.
Wait 2 seconds for a notification that will never arrive.
Timeout, re-check fork choice, see the valid status, and return 200 OK.

This is also the correct behaviour, just with a sub-optimal 2 second wait. In practice I think block processing is unlikely to finish before step (4) if it was started at a similar time to step (1), which is the case for block relays. This is because gossip validation is much faster than full block import. For home users connected to relays it will be more likely, but in this case there's no harm done, just a slower HTTP response to the VC.

Sub-optimal case 3: slow-to-process valid blocks

If a block takes longer than 2s to process, then we'll hit the timeout and return a 400. This is the same as our behaviour today but hopefully reduced in frequency enough to not be too annoying.

michaelsproul · 2023-08-22T05:10:54Z

Closing this as it's too complicated and buggy

michaelsproul added 2 commits August 21, 2023 16:46

Use reprocessing queue to check duplicate blocks

425b0b6

Handle valid block race

724ff53

michaelsproul added work-in-progress PR is a work-in-progress v4.4.1 ETA August 2023 labels Aug 21, 2023

michaelsproul closed this Aug 22, 2023

michaelsproul mentioned this pull request Aug 22, 2023

Parallel gossip verification and block unblinding #4647

Closed

michaelsproul mentioned this pull request Sep 5, 2023

Handle unknown head block during attestation publishing #4699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use reprocessing queue to send HTTP responses for duplicate blocks #4643

Use reprocessing queue to send HTTP responses for duplicate blocks #4643

michaelsproul commented Aug 21, 2023

michaelsproul commented Aug 22, 2023

Use reprocessing queue to send HTTP responses for duplicate blocks #4643

Use reprocessing queue to send HTTP responses for duplicate blocks #4643

Conversation

michaelsproul commented Aug 21, 2023

Issue Addressed

Proposed Changes

Additional Info

Sub-optimal case 1: block is invalid

Sub-optimal case 2: valid block race

Sub-optimal case 3: slow-to-process valid blocks

michaelsproul commented Aug 22, 2023