-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
op-node: prevent spamming of reqs for blocks triggered by checkForGapInUnsafeQueue
#10063
Conversation
WalkthroughWalkthroughThe update enhances the synchronization process in the peer-to-peer network module. It introduces a new Changes
Recent Review DetailsConfiguration used: .coderabbit.yml Files selected for processing (3)
Files skipped from review as they are similar to previous changes (3)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review Status
Configuration used: CodeRabbit UI
Files selected for processing (1)
- op-node/p2p/sync.go (10 hunks)
Additional comments not posted (5)
op-node/p2p/sync.go (5)
101-102
: The addition ofrangeReqId
topeerRequest
struct is a good approach for identifying batch requests.
216-219
: The introduction ofactiveRangeRequests
map and related channels is well-implemented for managing batch requests efficiently.
398-401
: Ensure the use of a secure random number generator forrandomReqId
.
540-544
: Efficiently handling canceled sync requests by checkingactiveRangeRequests
before proceeding is a smart optimization.
557-565
: Consider implementing a more nuanced error handling strategy for different types of request failures.Suggest exploring a strategy that differentiates between transient and permanent errors, potentially incorporating a retry mechanism for the former.
Semgrep found 7
Named return arguments to functions must be appended with an underscore ( Semgrep found 2 Prefer Semgrep found 7
Inputs to functions must be prepended with an underscore ( |
ad4e06f
to
5d67d73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR lgtm. @protolambda if you have chance I'd appreciate a quick look as weel.
Semgrep found 2 Named return arguments to functions must be appended with an underscore ( |
48962aa
to
17667af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good generally but I'm reasonably confident that we've introduced a race condition with the change from *atomic.Bool
to bool
because previously the map was only accessed by a single thread and the *atomic.Bool
shared, whereas now the map is shared. It is a nice simplification though - I wonder if it's reasonable to add a mutex to protect the map - just need to make sure we don't introduce any deadlocks but as long as the lock is tightly scoped I think it would be fine.
585c922
to
c2e27fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
74bd258
to
863744f
Compare
Rebased on top of |
checkForGapInUnsafeQueue
checkForGapInUnsafeQueue
Description
Based on this comment, we would like to bail on a batch of block requests when we receive a
block not found
error from one of the blocks in the range. That will prevent the op-node from spamming its peers with block requests in the event that the sequencer is offline and unsafe blocks cannot be found, meaning the requests are destined-to-fail. This PR adds arangeRequestId
field to thepeerRequest
struct so that when we detect the aforementioned error, we can set a flag mapped to thatrangeRequestId
, which allows us to drop all other requests with that same id instead of continuing to send those requests to peers.Tests
The existing
TestMultiPeerSync
covers the updated code, as evidenced by the logs generated by that test. I added some checks within that same test to verify that arangeReq
was properly cancelled.Additional context
I considered an alternative approach that attached a shared
context
andcancel
function to thepeerRequest
struct instead of therangeRequestId
. However, the use ofrangeRequestId
seemed more straightforward and less prone to errors such as accidentally canceling block requests if the parent context is cancelled.In the linked issue and associated comment thread, it was also suggested that we consider backing off the
altSync
timer when ablock not found
error is produced. In the worst case scenario, with this updated code, we would send a single destined-to-fail request every (2 * blockTime) seconds to each peer before the entire batch of requests is cancelled. This doesn't seem like a very heavy load to put on any given peer so backoff logic is not currently included in this PR.Metadata
Summary by CodeRabbit
Summary by CodeRabbit