-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle host not available scenario during peer bootstrap more gracefully #1677
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1677 +/- ##
========================================
- Coverage 71.8% 71.8% -0.1%
========================================
Files 968 968
Lines 81169 81181 +12
========================================
+ Hits 58345 58351 +6
- Misses 18990 18992 +2
- Partials 3834 3838 +4
Continue to review full report at Codecov.
|
This will be a good improvement, unfortunate this scenario wasn't considered when implementing bootstrap consistency (flaw with my initial implementation). |
@@ -2833,13 +2846,6 @@ func (s *session) streamBlocksBatchFromPeer( | |||
result, attemptErr = client.FetchBlocksRaw(tctx, req) | |||
}) | |||
err := xerrors.FirstError(borrowErr, attemptErr) | |||
// Do not retry if cannot borrow the connection or | |||
// if the connection pool has no connections | |||
switch err { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted this since this will be non-retryable by default
c3f38c9
to
8533d75
Compare
@robskillington Yeah no worries, its a pretty difficult thing to consider. We only caught it because of the way Odin does node adds was triggering it in a reproducible way. Also I addressed all your feedback and added a regression test if you have a moment to review again |
src/dbnode/client/session.go
Outdated
// raw (retryable) hostNotAvailableError since the error is technically | ||
// retryable but was wrapped to prevent the exponential backoff in the | ||
// actual retrier. | ||
if isHostNotAvailableError(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better to get back a bool and the inner error from this function call? can avoid the extra fn call that way and i prefer the encapsulation of the innerNonRetryableError stuff to be hidden behind the hostNotAvailableError utility method
src/dbnode/client/session.go
Outdated
gaugeReportInterval = 500 * time.Millisecond | ||
blockMetadataChBufSize = 4096 | ||
shardResultCapacity = 4096 | ||
streamBlocksMetadataFromPeerErrorSleepInterval = 1 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why so low, why not 10/100 ms?
src/dbnode/client/session.go
Outdated
@@ -2151,6 +2155,12 @@ func (s *session) streamBlocksMetadataFromPeers( | |||
atomic.AddInt32(&success, 1) | |||
return | |||
} | |||
|
|||
// Prevent the loop from spinning too aggressively if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about sleeping for a longer duration only if the specialised error condition is triggered
b31157f
to
7525d5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
Fixes #1667