Use reprocessing queue to send HTTP responses for duplicate blocks #4643
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue Addressed
Closes #4473
Proposed Changes
In order to avoid returning HTTP 400s when valid duplicate blocks are posted to the API:
BlockIsAlreadyKnown
error into three variants for "valid", "invalid" and "processing". If a block is valid (i.e. in fork choice with valid status) then we can return a 200 straight away. Conversely if it's invalid (i.e. in fork choice with invalid status) we can return a 400 straight away.Additional Info
There are still several ways in which this behaves sub-optimally
Sub-optimal case 1: block is invalid
We don't usually add invalid blocks to fork choice (only if they are imported optimistically then invalidated), and we also don't remember invalid blocks outside of the networking crate. Therefore if we receive an invalid block over HTTP which we've already processed and rejected, we will get a
BlockAlreadyKnownProcessingOrInvalid
from gossip verification. This will then be force us to wait the full 2 second timeout, at which point we'll return a 400. Other than the long wait, this is the correct behaviour (400 for an invalid block).We initially talked about using the
DuplicateCache
from networking to guess whether a block is processing or invalid, but I haven't made this change because it would require making the duplicate cache accessible from thebeacon_chain
. This seems like a delicate change that might require more careful thought and justification than this PR (which is already a large code change just to fix a slightly annoying corner case).Sub-optimal case 2: valid block race
This is also the correct behaviour, just with a sub-optimal 2 second wait. In practice I think block processing is unlikely to finish before step (4) if it was started at a similar time to step (1), which is the case for block relays. This is because gossip validation is much faster than full block import. For home users connected to relays it will be more likely, but in this case there's no harm done, just a slower HTTP response to the VC.
Sub-optimal case 3: slow-to-process valid blocks
If a block takes longer than 2s to process, then we'll hit the timeout and return a 400. This is the same as our behaviour today but hopefully reduced in frequency enough to not be too annoying.