-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry.WithBackoff for all agents. #1377
Conversation
WalkthroughThis update introduces a retry mechanism to enhance the resilience of contract and RPC calls across various components. It also includes minor changes to error handling, logging, and debugging. The retry logic is implemented using the Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Files selected for processing (4)
- agents/agents/executor/executor.go (9 hunks)
- agents/agents/executor/executor_utils.go (3 hunks)
- agents/agents/guard/fraud.go (4 hunks)
- agents/config/executor/config.go (2 hunks)
Files skipped from review due to trivial changes (1)
- agents/agents/guard/fraud.go
Additional comments (Suppressed): 12
agents/config/executor/config.go (2)
3-15: The import order has been changed. Ensure that this change is consistent with the project's import ordering conventions.
45-50: A new field
MaxRetrySeconds
has been added to theConfig
struct. This field represents the maximum number of seconds to retry an RPC call (not a transaction). Make sure that all instances where theConfig
struct is used have been updated to include this new field, and that it is properly initialized where necessary.agents/agents/executor/executor.go (9)
11-16: The import of
github.com/jpillora/backoff
has been removed and replaced withgithub.aaakk.us.kg/synapsecns/sanguine/core/retry
. Ensure that all instances of the old backoff mechanism have been replaced with the new retry mechanism.23-29: The import of
github.com/synapsecns/sanguine/core/retry
has been added. This is consistent with the PR summary which mentions the introduction of a new retry mechanism.87-94: A new field
retryConfig
of type[]retry.WithBackoffConfigurator
has been added to theExecutor
struct. This is consistent with the PR summary which mentions adding a new field for the retry configuration.150-159: The
retryConfig
is being initialized based on theMaxRetrySeconds
value from theexecutor.Config
. This is consistent with the PR summary which mentions initializing the retry configuration in theNewExecutor
function.245-251: The
retryConfig
is being passed to theExecutor
struct during its initialization. This is consistent with the changes made in lines 150-159 where theretryConfig
was initialized.347-362: The
retry.WithBackoff
function is being used to retry the contract call if it fails. This is consistent with the PR summary which mentions using the retry mechanism for certain contract calls.440-470: The
retry.WithBackoff
function is being used to retry the contract call if it fails. This is consistent with the PR summary which mentions using the retry mechanism for certain contract calls.557-581: The
retry.WithBackoff
function is being used to retry the chain call if it fails. This is consistent with the PR summary which mentions using the retry mechanism for certain contract calls.637-662: The
retry.WithBackoff
function is being used to retry the contract call if it fails. This is consistent with the PR summary which mentions using the retry mechanism for certain contract calls.agents/agents/executor/executor_utils.go (1)
- 4-16: The import of the
github.com/synapsecns/sanguine/core/retry
package is a good replacement for the previously usedgithub.aaakk.us.kg/jpillora/backoff
package. The new retry mechanism seems to be more flexible and configurable, which should improve error handling in network-related operations.
return fmt.Errorf("could not encode state: %w", err) | ||
} | ||
// Verify that the state is valid w.r.t. Origin. | ||
valid, err := e.chainExecutors[state.Origin()].boundOrigin.IsValidState( | ||
ctx, | ||
statePayload, | ||
) | ||
var valid bool | ||
contractCall := func(ctx context.Context) error { | ||
valid, err = e.chainExecutors[state.Origin()].boundOrigin.IsValidState( | ||
ctx, | ||
statePayload, | ||
) | ||
if err != nil { | ||
return fmt.Errorf("could not check validity of state: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, contractCall, e.retryConfig...) | ||
if err != nil { | ||
return fmt.Errorf("could not check validity of state: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of the retry.WithBackoff
function here is a good improvement over the previous implementation. It provides a cleaner way to handle retries on the contract call. However, ensure that the context passed to the retry.WithBackoff
function has a timeout set to prevent indefinite retries in case of persistent errors.
- err = retry.WithBackoff(ctx, contractCall, e.retryConfig...)
+ ctxWithTimeout, cancel := context.WithTimeout(ctx, time.Duration(e.MaxRetrySeconds)*time.Second)
+ defer cancel()
+ err = retry.WithBackoff(ctxWithTimeout, contractCall, e.retryConfig...)
} | ||
|
||
// If the attestation is on a remote chain, we need to fetch the timestamp via an RPC call. | ||
b := &backoff.Backoff{ | ||
Factor: 2, | ||
Jitter: true, | ||
Min: 30 * time.Millisecond, | ||
Max: 3 * time.Second, | ||
} | ||
|
||
timeout := time.Duration(0) | ||
|
||
var logHeader *ethTypes.Header | ||
var err error | ||
contractCall := func(ctx context.Context) error { | ||
logHeader, err = e.chainExecutors[chainID].rpcClient.HeaderByNumber(ctx, big.NewInt(int64(logBlockNumber))) | ||
if err != nil { | ||
return fmt.Errorf("could not get log header: %w", err) | ||
} | ||
|
||
retryLoop: | ||
for { | ||
select { | ||
case <-ctx.Done(): | ||
return fmt.Errorf("context canceled: %w", ctx.Err()) | ||
case <-time.After(timeout): | ||
if b.Attempt() >= rpcRetry { | ||
return fmt.Errorf("could not get log header: %w", err) | ||
} | ||
logHeader, err = e.chainExecutors[chainID].rpcClient.HeaderByNumber(ctx, big.NewInt(int64(logBlockNumber))) | ||
if err != nil { | ||
timeout = b.Duration() | ||
|
||
continue | ||
} | ||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, contractCall, e.retryConfig...) | ||
if err != nil { | ||
return fmt.Errorf("could not get log header: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the use of retry.WithBackoff
is a good improvement. But similar to the previous comment, ensure that the context passed to the retry.WithBackoff
function has a timeout set to prevent indefinite retries in case of persistent errors.
- err = retry.WithBackoff(ctx, contractCall, e.retryConfig...)
+ ctxWithTimeout, cancel := context.WithTimeout(ctx, time.Duration(e.MaxRetrySeconds)*time.Second)
+ defer cancel()
+ err = retry.WithBackoff(ctxWithTimeout, contractCall, e.retryConfig...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 4
Files ignored due to filter (1)
- agents/go.mod
Files selected for processing (1)
- agents/agents/executor/executor.go (9 hunks)
Additional comments (Suppressed): 5
agents/agents/executor/executor.go (5)
11-16: The import statement for the
backoff
package has been removed and no new issues are introduced.23-29: The
retry
package is imported to replace the previous backoff mechanism. Ensure that all dependencies are correctly managed and updated.87-94: A new field
retryConfig
of type[]retry.WithBackoffConfigurator
is added to theExecutor
struct. This change seems fine as long as it's properly initialized before use.150-163: The
MaxRetrySeconds
configuration value is checked and set to a default value if not provided. Then, it's used to create aretryConfig
. This is a good practice to ensure that the retry mechanism works even when the configuration is not explicitly provided.249-254: The
retryConfig
is passed to theExecutor
struct during initialization. This is a correct way to pass configurations to structs.
return false, nil | ||
} | ||
|
||
proof, err := e.chainExecutors[message.OriginDomain()].merkleTree.MerkleProof(*nonce-1, (*state).Nonce()) | ||
var proof [][]byte | ||
contractCall := func(ctx context.Context) error { | ||
proof, err = e.chainExecutors[message.OriginDomain()].merkleTree.MerkleProof(*nonce-1, (*state).Nonce()) | ||
if err != nil { | ||
return fmt.Errorf("could not get merkle proof: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, contractCall, e.retryConfig...) | ||
if err != nil { | ||
return false, fmt.Errorf("could not get merkle proof: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry.WithBackoff
function is used to retry the contractCall
function in case of failure. This is a good practice to handle temporary network issues or service unavailability. However, make sure that the context ctx
passed to retry.WithBackoff
has a timeout or deadline to prevent infinite retries in case of permanent failures.
|
||
// verifyMessageMerkleProof verifies a message against the merkle tree at the state of the given nonce. | ||
func (e Executor) verifyMessageMerkleProof(message types.Message) (bool, error) { | ||
root, err := e.chainExecutors[message.OriginDomain()].merkleTree.Root(message.Nonce()) | ||
var root []byte | ||
contractCall := func(ctx context.Context) error { | ||
var err error | ||
root, err = e.chainExecutors[message.OriginDomain()].merkleTree.Root(message.Nonce()) | ||
if err != nil { | ||
return fmt.Errorf("could not get root: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err := retry.WithBackoff(context.Background(), contractCall, e.retryConfig...) | ||
if err != nil { | ||
return false, fmt.Errorf("could not get root: %w", err) | ||
} | ||
|
||
proof, err := e.chainExecutors[message.OriginDomain()].merkleTree.MerkleProof(message.Nonce()-1, message.Nonce()) | ||
var proof [][]byte | ||
contractCall = func(ctx context.Context) error { | ||
proof, err = e.chainExecutors[message.OriginDomain()].merkleTree.MerkleProof(message.Nonce()-1, message.Nonce()) | ||
if err != nil { | ||
return fmt.Errorf("could not get merkle proof: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err = retry.WithBackoff(context.Background(), contractCall, e.retryConfig...) | ||
if err != nil { | ||
return false, fmt.Errorf("could not get merkle proof: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry.WithBackoff
function is used twice to retry fetching the root and merkle proof. This is a good practice to handle temporary network issues or service unavailability. However, make sure that the context passed to retry.WithBackoff
has a timeout or deadline to prevent infinite retries in case of permanent failures.
return nil, nil | ||
} | ||
|
||
b := &backoff.Backoff{ | ||
Factor: 2, | ||
Jitter: true, | ||
Min: 30 * time.Millisecond, | ||
Max: 3 * time.Second, | ||
} | ||
|
||
timeout := time.Duration(0) | ||
|
||
var currentTime uint64 | ||
chainCall := func(ctx context.Context) error { | ||
var err error | ||
latestHeader, err := e.chainExecutors[chainID].rpcClient.HeaderByNumber(ctx, nil) | ||
if err != nil { | ||
return fmt.Errorf("could not get latest header: %w", err) | ||
} | ||
|
||
retryLoop: | ||
for { | ||
select { | ||
case <-ctx.Done(): | ||
return nil, fmt.Errorf("context canceled: %w", ctx.Err()) | ||
case <-time.After(timeout): | ||
if b.Attempt() >= rpcRetry { | ||
return nil, fmt.Errorf("could not get latest header: %w", err) | ||
} | ||
|
||
latestHeader, err := e.chainExecutors[destinationDomain].rpcClient.HeaderByNumber(ctx, nil) | ||
if err != nil { | ||
timeout = b.Duration() | ||
|
||
continue | ||
} | ||
if latestHeader == nil { | ||
return fmt.Errorf("latest header is nil") | ||
} | ||
|
||
currentTime = latestHeader.Time | ||
currentTime = latestHeader.Time | ||
|
||
break retryLoop | ||
} | ||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, chainCall, e.retryConfig...) | ||
if err != nil { | ||
return nil, fmt.Errorf("could not get latest header: %w", err) | ||
} | ||
|
||
if *messageMinimumTime > currentTime { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry.WithBackoff
function is used to retry fetching the latest header. This is a good practice to handle temporary network issues or service unavailability. However, make sure that the context ctx
passed to retry.WithBackoff
has a timeout or deadline to prevent infinite retries in case of permanent failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The e.retryConfig... does that. If the user does not pass in a set time, we automatically set it to 30 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CryptoMaxPlanck That's a good approach. Having a default timeout value ensures that the retry mechanism doesn't run indefinitely in case of permanent failures, which could lead to resource exhaustion. It also provides flexibility for users to configure the timeout based on their specific needs. Just make sure to document this behavior clearly so users are aware of the default timeout and how they can override it.
metrics.EndSpanWithErr(span, err) | ||
}() | ||
|
||
b := &backoff.Backoff{ | ||
Factor: 2, | ||
Jitter: true, | ||
Min: 30 * time.Millisecond, | ||
Max: 3 * time.Second, | ||
} | ||
|
||
timeout := time.Duration(0) | ||
|
||
for { | ||
select { | ||
case <-ctx.Done(): | ||
return false, fmt.Errorf("context canceled: %w", ctx.Err()) | ||
case <-time.After(timeout): | ||
if b.Attempt() >= rpcRetry { | ||
return false, fmt.Errorf("could not get executed status: %w", ctx.Err()) | ||
} | ||
|
||
executed, err := e.chainExecutors[message.DestinationDomain()].boundDestination.MessageStatus(ctx, message) | ||
if err != nil { | ||
timeout = b.Duration() | ||
span.AddEvent("could not get executed status", | ||
trace.WithAttributes(attribute.String("error", err.Error())), | ||
trace.WithAttributes(attribute.String("timeout", timeout.String())), | ||
) | ||
continue | ||
} | ||
var executed uint8 | ||
contractCall := func(ctx context.Context) error { | ||
var err error | ||
executed, err = e.chainExecutors[message.DestinationDomain()].boundDestination.MessageStatus(ctx, message) | ||
if err != nil { | ||
return fmt.Errorf("could not get executed status: %w", err) | ||
} | ||
|
||
if execTypes.MessageStatusType(executed) == execTypes.Success { | ||
span.AddEvent("message executed") | ||
return true, nil | ||
} | ||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, contractCall, e.retryConfig...) | ||
if err != nil { | ||
return false, fmt.Errorf("could not get executed status: %w", err) | ||
} | ||
|
||
span.AddEvent("message not executed") | ||
return false, nil | ||
} | ||
if execTypes.MessageStatusType(executed) == execTypes.Success { | ||
span.AddEvent("message executed") | ||
return true, nil | ||
} | ||
|
||
span.AddEvent("message not executed") | ||
return false, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry.WithBackoff
function is used to retry fetching the executed status. This is a good practice to handle temporary network issues or service unavailability. However, make sure that the context ctx
passed to retry.WithBackoff
has a timeout or deadline to prevent infinite retries in case of permanent failures.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1377 +/- ##
===================================================
- Coverage 50.89321% 49.98967% -0.90355%
===================================================
Files 353 343 -10
Lines 24966 24193 -773
Branches 277 277
===================================================
- Hits 12706 12094 -612
+ Misses 10989 10912 -77
+ Partials 1271 1187 -84
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 4
Files selected for processing (2)
- agents/agents/executor/executor.go (10 hunks)
- agents/agents/executor/executor_utils.go (3 hunks)
Additional comments (Suppressed): 6
agents/agents/executor/executor_utils.go (3)
4-16: The
github.com/jpillora/backoff
package has been replaced with thegithub.aaakk.us.kg/synapsecns/sanguine/core/retry
package. Ensure that this new package provides the same or better functionality and is compatible with the rest of the codebase.153-178: The function
processSnapshot
now uses a retry mechanism with backoff for the contract call to check the validity of the state. This change enhances the reliability of the code by allowing it to handle temporary issues such as network instability or rate limiting from the Ethereum node.209-225: The function
processAttestation
now uses a retry mechanism with backoff for the RPC call to fetch the log header. This change enhances the reliability of the code by allowing it to handle temporary issues such as network instability or rate limiting from the Ethereum node.agents/agents/executor/executor.go (3)
- 23-29: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [16-27]
The import of the
github.com/synapsecns/sanguine/core/retry
package is new and replaces the previousgithub.aaakk.us.kg/jpillora/backoff
package. Ensure that the new retry package provides all the necessary functionality and is compatible with the rest of the codebase.
91-94: A new field
retryConfig
of type[]retry.WithBackoffConfigurator
has been added to theExecutor
struct. This field is used to store the configuration for retrying RPC calls with backoff.152-158: The
MaxRetrySeconds
configuration value is checked and if it's zero, it's set to a default value of 30 seconds. Then, this value is used to create aretryConfig
which is an array ofretry.WithBackoffConfigurator
. TheretryConfig
is used later in the code to configure the retry mechanism for RPC calls. This is a good practice as it allows for flexible configuration of the retry mechanism.
contractCall := func(ctx context.Context) error { | ||
proof, err = e.chainExecutors[message.OriginDomain()].merkleTree.MerkleProof(*nonce-1, (*state).Nonce()) | ||
if err != nil { | ||
return fmt.Errorf("could not get merkle proof: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, contractCall, e.retryConfig...) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry.WithBackoff
function is used to retry the contract call in case of temporary failures. This is a good practice to handle temporary network issues or service unavailability. However, make sure that the context ctx
passed to retry.WithBackoff
has a timeout or deadline to prevent infinite retries in case of permanent failures.
var root []byte | ||
contractCall := func(ctx context.Context) error { | ||
var err error | ||
root, err = e.chainExecutors[message.OriginDomain()].merkleTree.Root(message.Nonce()) | ||
if err != nil { | ||
return fmt.Errorf("could not get root: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err := retry.WithBackoff(context.Background(), contractCall, e.retryConfig...) | ||
if err != nil { | ||
return false, fmt.Errorf("could not get root: %w", err) | ||
} | ||
|
||
proof, err := e.chainExecutors[message.OriginDomain()].merkleTree.MerkleProof(message.Nonce()-1, message.Nonce()) | ||
var proof [][]byte | ||
contractCall = func(ctx context.Context) error { | ||
proof, err = e.chainExecutors[message.OriginDomain()].merkleTree.MerkleProof(message.Nonce()-1, message.Nonce()) | ||
if err != nil { | ||
return fmt.Errorf("could not get merkle proof: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err = retry.WithBackoff(context.Background(), contractCall, e.retryConfig...) | ||
if err != nil { | ||
return false, fmt.Errorf("could not get merkle proof: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry.WithBackoff
function is used twice here to retry two different contract calls. This is a good practice to handle temporary network issues or service unavailability. However, make sure that the context context.Background()
passed to retry.WithBackoff
has a timeout or deadline to prevent infinite retries in case of permanent failures.
chainCall := func(ctx context.Context) error { | ||
var err error | ||
latestHeader, err := e.chainExecutors[chainID].rpcClient.HeaderByNumber(ctx, nil) | ||
if err != nil { | ||
return fmt.Errorf("could not get latest header: %w", err) | ||
} | ||
|
||
retryLoop: | ||
for { | ||
select { | ||
case <-ctx.Done(): | ||
return nil, fmt.Errorf("context canceled: %w", ctx.Err()) | ||
case <-time.After(timeout): | ||
if b.Attempt() >= rpcRetry { | ||
return nil, fmt.Errorf("could not get latest header: %w", err) | ||
} | ||
|
||
latestHeader, err := e.chainExecutors[destinationDomain].rpcClient.HeaderByNumber(ctx, nil) | ||
if err != nil { | ||
timeout = b.Duration() | ||
|
||
continue | ||
} | ||
if latestHeader == nil { | ||
return fmt.Errorf("latest header is nil") | ||
} | ||
|
||
currentTime = latestHeader.Time | ||
currentTime = latestHeader.Time | ||
|
||
break retryLoop | ||
} | ||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, chainCall, e.retryConfig...) | ||
if err != nil { | ||
return nil, fmt.Errorf("could not get latest header: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry.WithBackoff
function is used to retry fetching the latest header. This is a good practice to handle temporary network issues or service unavailability. However, make sure that the context ctx
passed to retry.WithBackoff
has a timeout or deadline to prevent infinite retries in case of permanent failures.
contractCall := func(ctx context.Context) error { | ||
var err error | ||
executed, err = e.chainExecutors[message.DestinationDomain()].boundDestination.MessageStatus(ctx, message) | ||
if err != nil { | ||
return fmt.Errorf("could not get executed status: %w", err) | ||
} | ||
|
||
if execTypes.MessageStatusType(executed) == execTypes.Success { | ||
span.AddEvent("message executed") | ||
return true, nil | ||
} | ||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, contractCall, e.retryConfig...) | ||
if err != nil { | ||
return false, fmt.Errorf("could not get executed status: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry.WithBackoff
function is used to retry getting the executed status. This is a good practice to handle temporary network issues or service unavailability. However, make sure that the context ctx
passed to retry.WithBackoff
has a timeout or deadline to prevent infinite retries in case of permanent failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Files selected for processing (2)
- agents/agents/guard/guard.go (3 hunks)
- agents/agents/notary/notary.go (11 hunks)
Additional comments (Suppressed): 9
agents/agents/guard/guard.go (1)
- 287-298: The new hunk introduces a retry mechanism for the contract call in the
loadSummitLatestStates
function. This is a good improvement as it makes the function more robust against transient errors. However, the error handling has been changed. In the old hunk, if an error occurred during the contract call, the error was logged and the function continued with the next domain. In the new hunk, if an error occurs, the function immediately returns the error and stops processing the remaining domains. This change could potentially cause the function to fail prematurely if there is a problem with one of the domains. Please verify if this change in behavior is intended.agents/agents/notary/notary.go (8)
- 7-28: > Note: This review was outside of the patch, but no patch was found that overlapped with it. Original lines [28-27]
The import of the
github.com/synapsecns/sanguine/core/retry
package is new. This package provides functionality for retrying operations with backoff.
45-45: A new field
retryConfig
of type[]retry.WithBackoffConfigurator
has been added to theNotary
struct. This field will be used to configure the retry mechanism.94-100: The
MaxRetrySeconds
configuration value is being checked. If it's zero, a default value of 30 seconds is set. Then, this value is used to initialize theretryConfig
field in theNotary
struct.+ if cfg.MaxRetrySeconds == 0 { + cfg.MaxRetrySeconds = 30 + } + + notary.retryConfig = []retry.WithBackoffConfigurator{ + retry.WithMaxAttemptTime(time.Second * time.Duration(cfg.MaxRetrySeconds)), + }
- 178-196: The
GetLatestNotaryAttestation
contract call has been wrapped in a retry mechanism. If the call fails, it will be retried according to theretryConfig
.+ var latestNotaryAttestation types.NotaryAttestation + contractCall := func(ctx context.Context) (err error) { + latestNotaryAttestation, err = n.summitDomain.Summit().GetLatestNotaryAttestation(ctx, n.bondedSigner) + if err != nil { + return fmt.Errorf("could not get latest notary attestation: %w", err) + } + + return nil + } + err := retry.WithBackoff(ctx, contractCall, n.retryConfig...)
- 208-243: The
GetAgentRoot
contract calls for both the bonding manager and destination light manager have been wrapped in a retry mechanism. If any of these calls fail, they will be retried according to theretryConfig
.+ var bondingManagerAgentRoot [32]byte + contractCall := func(ctx context.Context) (err error) { + bondingManagerAgentRoot, err = n.summitDomain.BondingManager().GetAgentRoot(ctx) + if err != nil { + return fmt.Errorf("could not get agent root: %w", err) + } + + return nil + } + err := retry.WithBackoff(ctx, contractCall, n.retryConfig...) + + var destinationLightManagerAgentRoot [32]byte + contractCall = func(ctx context.Context) (err error) { + destinationLightManagerAgentRoot, err = n.destinationDomain.LightManager().GetAgentRoot(ctx) + if err != nil { + return fmt.Errorf("could not get agent root: %w", err) + } + + return nil + } + err = retry.WithBackoff(ctx, contractCall, n.retryConfig...)
- 249-265: The
GetAgentStatus
contract call has been wrapped in a retry mechanism. If the call fails, it will be retried according to theretryConfig
.+ var agentStatus types.AgentStatus + contractCall = func(ctx context.Context) (err error) { + agentStatus, err = n.destinationDomain.LightManager().GetAgentStatus(ctx, n.bondedSigner.Address()) + if err != nil { + return fmt.Errorf("could not get agent status: %w", err) + } + + return nil + } + err = retry.WithBackoff(ctx, contractCall, n.retryConfig...)
- 292-310: The
GetAttestationNonce
contract call has been wrapped in a retry mechanism. If the call fails, it will be retried according to theretryConfig
.+ var attNonce uint32 + contractCall := func(ctx context.Context) (err error) { + attNonce, err = n.destinationDomain.Destination().GetAttestationNonce(ctx, n.myLatestNotaryAttestation.Attestation().SnapshotRoot()) + if err != nil { + return fmt.Errorf("could not get attestation nonce: %w", err) + } + + return nil + } + err := retry.WithBackoff(ctx, contractCall, n.retryConfig...)
- 501-535: The
GetProof
andGetAgentStatus
contract calls on the bonding manager have been wrapped in a retry mechanism. If any of these calls fail, they will be retried according to theretryConfig
.+ var agentProof [][32]byte + contractCall := func(ctx context.Context) (err error) { + agentProof, err = n.summitDomain.BondingManager().Get </blockquote></details></blockquote></details> </details>
agents/agents/guard/guard.go
Outdated
if cfg.MaxRetrySeconds == 0 { | ||
cfg.MaxRetrySeconds = 30 | ||
} | ||
|
||
guard.retryConfig = []retry.WithBackoffConfigurator{ | ||
retry.WithMaxAttemptTime(time.Second * time.Duration(cfg.MaxRetrySeconds)), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new hunk introduces a default value for MaxRetrySeconds
if it is not set in the configuration. This is a good practice to ensure that the retry mechanism works even when the configuration does not explicitly specify a value for MaxRetrySeconds
. However, it would be better to move this logic to where the configuration is loaded or validated, rather than having it in the NewGuard
function.
- if cfg.MaxRetrySeconds == 0 {
- cfg.MaxRetrySeconds = 30
- }
agents/agents/guard/guard.go
Outdated
// var latestState types.State | ||
// var err error | ||
// contractCall := func(ctx context.Context) error { | ||
// latestState, err = domain.Origin().SuggestLatestState(ctx) | ||
// if err != nil { | ||
// latestState = nil | ||
// return fmt.Errorf("failed calling SuggestLatestState for originID %d on the Origin contract: %w", originID, err) | ||
// } | ||
// | ||
// return nil | ||
// } | ||
// err = retry.WithBackoff(ctx, contractCall, g.retryConfig...) | ||
// if latestState == nil || latestState.Nonce() == uint32(0) { | ||
// logger.Errorf("No latest state found for origin id %d", originID) | ||
// span.AddEvent("No latest state found for origin id", trace.WithAttributes( | ||
// attribute.Int("originID", int(originID)), | ||
// )) | ||
// } | ||
// | ||
// if err == nil { | ||
// g.originLatestStates[originID] = latestState | ||
// } | ||
|
||
latestState, err := domain.Origin().SuggestLatestState(ctx) | ||
if err != nil { | ||
latestState = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out code in this hunk suggests that a similar retry mechanism was planned for the loadOriginLatestStates
function but was not implemented. If the retry mechanism is beneficial for the loadSummitLatestStates
function, it might also be beneficial here. Consider implementing it.
agents/agents/notary/notary.go
Outdated
|
||
originID := domain.Config().DomainID | ||
|
||
// var guardLatestState types.State | ||
// contractCall := func(ctx context.Context) (err error) { | ||
// guardLatestState, err = n.summitDomain.Summit().GetLatestState(ctx, originID) | ||
// if err != nil { | ||
// return fmt.Errorf("could not get latest state: %w", err) | ||
// } | ||
// | ||
// return nil | ||
// } | ||
// err := retry.WithBackoff(ctx, contractCall, n.retryConfig...) | ||
// if err == nil && guardLatestState.Nonce() > uint32(0) { | ||
// n.summitGuardLatestStates[originID] = guardLatestState | ||
// } | ||
|
||
guardLatestState, err := n.summitDomain.Summit().GetLatestState(ctx, originID) | ||
if err != nil { | ||
guardLatestState = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for getting the latest state from the Summit has been commented out and replaced with the old code that doesn't use the retry mechanism. This seems like an oversight or unfinished work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 6
Files selected for processing (3)
- agents/agents/guard/guard.go (3 hunks)
- agents/agents/notary/notary.go (11 hunks)
- session.vim (1 hunks)
Files skipped from review due to trivial changes (1)
- session.vim
Additional comments (Suppressed): 6
agents/agents/guard/guard.go (1)
- 314-317: The comment suggests that the
SuggestLatestState
call should be wrapped with a retry mechanism if theStart
behavior changes. This indicates that there are plans to modify theStart
function, which could potentially impact the behavior of this code. It would be beneficial to clarify what changes are expected in theStart
function and how they might necessitate a retry mechanism for theSuggestLatestState
call.agents/agents/notary/notary.go (5)
7-27: The new hunk introduces the
github.com/synapsecns/sanguine/core/retry
package for implementing a retry mechanism with backoff functionality. Ensure that this package is correctly installed and imported.42-47: A new field
retryConfig
of type[]retry.WithBackoffConfigurator
is added to theNotary
struct. This field will be used to configure the retry mechanism.91-103: The
NewNotary
function now initializes theretryConfig
field based on theMaxRetrySeconds
configuration value. IfMaxRetrySeconds
is zero, it's set to a default value of 30 seconds. This change introduces a retry mechanism with backoff functionality to the notary agent.132-146: No significant changes are made in this hunk. The comment on line 143 suggests that a retry loop might be added in the future if the current
Start
behavior deviates.588-593: No significant changes are made in this hunk. The
Start
method remains unchanged.
if cfg.MaxRetrySeconds == 0 { | ||
cfg.MaxRetrySeconds = 60 | ||
} | ||
|
||
guard.retryConfig = []retry.WithBackoffConfigurator{ | ||
retry.WithMaxAttemptTime(time.Second * time.Duration(cfg.MaxRetrySeconds)), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new hunk introduces a default value for cfg.MaxRetrySeconds
if it's zero. This is a good practice to ensure that the retry mechanism works even if the configuration value is not explicitly set. However, it would be better to move this logic to a configuration validation or initialization function to keep the NewGuard
function focused on initializing the Guard
struct.
- if cfg.MaxRetrySeconds == 0 {
- cfg.MaxRetrySeconds = 60
- }
contractCall := func(ctx context.Context) error { | ||
latestState, err = g.domains[g.summitDomainID].Summit().GetLatestAgentState(ctx, originID, g.bondedSigner) | ||
if err != nil { | ||
return fmt.Errorf("failed calling GetLatestAgentState for originID %d on the Summit contract: err = %w", originID, err) | ||
} | ||
|
||
return nil | ||
} | ||
if latestState != nil && latestState.Nonce() > uint32(0) { | ||
err = retry.WithBackoff(ctx, contractCall, g.retryConfig...) | ||
if err == nil && latestState.Nonce() > uint32(0) { | ||
g.summitLatestStates[originID] = latestState | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new hunk introduces a retry mechanism with backoff functionality for the GetLatestAgentState
contract call. This is a good improvement as it makes the code more resilient to transient errors. However, the error handling could be improved. If the retry mechanism exhausts all attempts and still fails, the error is logged but the function continues execution. It might be better to return the error to the caller so that it can decide how to handle the failure.
- err = retry.WithBackoff(ctx, contractCall, g.retryConfig...)
- if err == nil && latestState.Nonce() > uint32(0) {
- g.summitLatestStates[originID] = latestState
- }
+ if err := retry.WithBackoff(ctx, contractCall, g.retryConfig...); err != nil {
+ return fmt.Errorf("failed to get latest agent state after retries: %w", err)
+ }
+ if latestState.Nonce() > uint32(0) {
+ g.summitLatestStates[originID] = latestState
+ }
)) | ||
defer span.End() | ||
|
||
latestNotaryAttestation, err := n.summitDomain.Summit().GetLatestNotaryAttestation(ctx, n.bondedSigner) | ||
var latestNotaryAttestation types.NotaryAttestation | ||
contractCall := func(ctx context.Context) (err error) { | ||
latestNotaryAttestation, err = n.summitDomain.Summit().GetLatestNotaryAttestation(ctx, n.bondedSigner) | ||
if err != nil { | ||
return fmt.Errorf("could not get latest notary attestation: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err := retry.WithBackoff(ctx, contractCall, n.retryConfig...) | ||
if err != nil { | ||
span.AddEvent("GetLatestNotaryAttestation failed", trace.WithAttributes( | ||
attribute.String("err", err.Error()), | ||
)) | ||
} | ||
|
||
if latestNotaryAttestation != nil { | ||
if n.myLatestNotaryAttestation == nil || | ||
latestNotaryAttestation.Attestation().SnapshotRoot() != n.myLatestNotaryAttestation.Attestation().SnapshotRoot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loadNotaryLatestAttestation
method now uses the retry mechanism when calling the GetLatestNotaryAttestation
contract method. If the call fails, it will be retried according to the retryConfig
settings.
attribute.Int(metrics.ChainID, int(n.destinationDomain.Config().DomainID)), | ||
)) | ||
defer span.End() | ||
var bondingManagerAgentRoot [32]byte | ||
contractCall := func(ctx context.Context) (err error) { | ||
bondingManagerAgentRoot, err = n.summitDomain.BondingManager().GetAgentRoot(ctx) | ||
if err != nil { | ||
return fmt.Errorf("could not get agent root: %w", err) | ||
} | ||
|
||
bondingManagerAgentRoot, err := n.summitDomain.BondingManager().GetAgentRoot(ctx) | ||
return nil | ||
} | ||
err := retry.WithBackoff(ctx, contractCall, n.retryConfig...) | ||
if err != nil { | ||
span.AddEvent("GetAgentRoot failed on bonding manager", trace.WithAttributes( | ||
attribute.String("err", err.Error()), | ||
)) | ||
|
||
return false, false | ||
} | ||
|
||
destinationLightManagerAgentRoot, err := n.destinationDomain.LightManager().GetAgentRoot(ctx) | ||
var destinationLightManagerAgentRoot [32]byte | ||
contractCall = func(ctx context.Context) (err error) { | ||
destinationLightManagerAgentRoot, err = n.destinationDomain.LightManager().GetAgentRoot(ctx) | ||
if err != nil { | ||
return fmt.Errorf("could not get agent root: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, contractCall, n.retryConfig...) | ||
if err != nil { | ||
span.AddEvent("GetAgentRoot failed on destination light manager", trace.WithAttributes( | ||
attribute.String("err", err.Error()), | ||
)) | ||
|
||
return false, false | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shouldNotaryRegisteredOnDestination
method now uses the retry mechanism when calling the GetAgentRoot
and GetAgentStatus
contract methods. If any of these calls fail, they will be retried according to the retryConfig
settings.
return false, false | ||
} | ||
|
||
agentStatus, err := n.destinationDomain.LightManager().GetAgentStatus(ctx, n.bondedSigner.Address()) | ||
var agentStatus types.AgentStatus | ||
contractCall = func(ctx context.Context) (err error) { | ||
agentStatus, err = n.destinationDomain.LightManager().GetAgentStatus(ctx, n.bondedSigner.Address()) | ||
if err != nil { | ||
return fmt.Errorf("could not get agent status: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
err = retry.WithBackoff(ctx, contractCall, n.retryConfig...) | ||
if err != nil { | ||
span.AddEvent("GetAgentStatus failed", trace.WithAttributes( | ||
attribute.String("err", err.Error()), | ||
)) | ||
|
||
return false, false | ||
} | ||
|
||
if agentStatus.Flag() == types.AgentFlagUnknown { | ||
// Here we want to add the Notary and proceed with sending to destination | ||
return true, true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checkDidSubmitNotaryLatestAttestation
method now uses the retry mechanism when calling the GetAttestationNonce
contract method. If the call fails, it will be retried according to the retryConfig
settings.
"time" | ||
|
||
"github.com/ethereum/go-ethereum/accounts/abi/bind" | ||
"github.com/ethereum/go-ethereum/common" | ||
ethTypes "github.com/ethereum/go-ethereum/core/types" | ||
"github.com/synapsecns/sanguine/agents/agents/notary/db" | ||
"github.com/synapsecns/sanguine/core/metrics" | ||
signerConfig "github.com/synapsecns/sanguine/ethergo/signer/config" | ||
"github.com/synapsecns/sanguine/ethergo/submitter" | ||
omnirpcClient "github.com/synapsecns/sanguine/services/omnirpc/client" | ||
"go.opentelemetry.io/otel/attribute" | ||
"go.opentelemetry.io/otel/trace" | ||
|
||
"github.com/ethereum/go-ethereum/common" | ||
"github.com/synapsecns/sanguine/agents/config" | ||
"github.com/synapsecns/sanguine/agents/contracts/summit" | ||
"github.com/synapsecns/sanguine/agents/domains" | ||
"github.com/synapsecns/sanguine/agents/domains/evm" | ||
"github.com/synapsecns/sanguine/agents/types" | ||
"github.com/synapsecns/sanguine/core/metrics" | ||
"github.com/synapsecns/sanguine/core/retry" | ||
signerConfig "github.com/synapsecns/sanguine/ethergo/signer/config" | ||
"github.com/synapsecns/sanguine/ethergo/signer/signer" | ||
"github.com/synapsecns/sanguine/ethergo/submitter" | ||
omnirpcClient "github.com/synapsecns/sanguine/services/omnirpc/client" | ||
"go.opentelemetry.io/otel/attribute" | ||
"go.opentelemetry.io/otel/trace" | ||
"golang.org/x/sync/errgroup" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This review was outside of the patch, but no patch was found that overlapped with it. Original lines [313-298]
The registerNotaryOnDestination
method now uses the retry mechanism when calling the GetProof
, GetAgentStatus
, and SubmitTransaction
contract methods. If any of these calls fail, they will be retried according to the retryConfig
settings.
This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Description
This PR adds the retry.WithBackoff functionality to the agents' RPC calls.
Summary by CodeRabbit
Executor
,Guard
, andNotary
modules. This enhancement increases the resilience of contract calls, improving overall system reliability.NewExecutor
,NewGuard
, andNewNotary
functions to initialize the retry configuration based on theMaxRetrySeconds
setting.handleStatusUpdated
andupdateAgentStatus
functions for better visibility during execution.MaxRetrySeconds
to customize the retry behavior.