Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failover that retries rpc #346

Merged
merged 25 commits into from
Mar 21, 2024
Merged

Conversation

bxue-l2
Copy link
Contributor

@bxue-l2 bxue-l2 commented Mar 14, 2024

Why are these changes needed?

This PR replaces a previous PR #333 by a new implementation for RPC retry logic.

The retry logic is only implemented for batcher, as operator uses instrumentedClient. It is possible to enable Multi-homing on operators too, but some metrics definitions need further clearation. As the result, it is only included in the PR.

In the new Retry logic, a multi-homing client takes both read/write rpccall to Ethereum. It is a wrapper to geth.EthClient. The alternative way would be using goethereum.Client, but it would require copying our functions that implement our custom interface.

The multi-homing client uses numRetries (configurable by argument) to decide how many more to retry once failed in the beginning.

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@bxue-l2 bxue-l2 requested review from jianoaix, ian-shim and dmanc and removed request for jianoaix March 14, 2024 20:06
@bxue-l2 bxue-l2 marked this pull request as ready for review March 14, 2024 20:14
@ian-shim ian-shim requested a review from mooselumph March 14, 2024 23:51
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating. Looks good overall!

common/geth/cli.go Outdated Show resolved Hide resolved
common/geth/client.go Outdated Show resolved Hide resolved
common/geth/multihoming_client.go Outdated Show resolved Hide resolved
common/geth/failover.go Outdated Show resolved Hide resolved
operators/churner/cmd/main.go Outdated Show resolved Hide resolved
retriever/cmd/main.go Outdated Show resolved Hide resolved
common/geth/handle_error.go Outdated Show resolved Hide resolved
common/geth/failover.go Outdated Show resolved Hide resolved
common/geth/handle_error.go Outdated Show resolved Hide resolved
common/geth/multihoming_client.go Show resolved Hide resolved
@bxue-l2
Copy link
Contributor Author

bxue-l2 commented Mar 15, 2024

There is additional Note, I will see if it make senses to add timeout at multihoming client level. Depending on if the caller does not add timeout already

common/geth/cli.go Show resolved Hide resolved
common/geth/failover.go Outdated Show resolved Hide resolved
rpcFault := f.handleError(err)

if rpcFault {
f.NumberRpcFault += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be "ServerFault", which aligns better with gRPC/HTTP error code definition?

common/geth/failover.go Outdated Show resolved Hide resolved
)

// handleHttpError returns a boolean indicating if error atrributes to remote RPC
func (f *FailoverController) handleHttpError(httpRespError rpc.HTTPError) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assume the service provider expose gRPC interface, but using HTTP error code?
It looks quite mixed whether it's RPC or HTTP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


func (m *MultiHomingClient) SuggestGasTipCap(ctx context.Context) (*big.Int, error) {
var errLast error
for i := 0; i < m.NumRetries+1; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code is repeating in each function, can it be pulled out and shared? It looks just need to wrap the instance.SomeFunc(SomeArgs) part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ethclient from geth support grpc, only jsonRpc. https://geth.ethereum.org/docs/interacting-with-geth/rpc

common/geth/handle_error.go Outdated Show resolved Hide resolved
common/geth/handle_error.go Outdated Show resolved Hide resolved
common/geth/handle_error.go Outdated Show resolved Hide resolved
common/geth/multihoming_client_test.go Outdated Show resolved Hide resolved
common/geth/multihoming_client_test.go Outdated Show resolved Hide resolved
common/geth/multihoming_client_test.go Outdated Show resolved Hide resolved
common/geth/multihoming_client_test.go Show resolved Hide resolved
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great
sorry there's a lot of conflicts from my other PR 😭

return NewRPC, Retry
}

// handleError returns a boolean indicating if the current connection should be rotated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated with the new return type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still to be done

common/geth/failover.go Outdated Show resolved Hide resolved
@bxue-l2 bxue-l2 force-pushed the ethclient-failover-retry branch from 9148ba5 to a1fa817 Compare March 21, 2024 07:40
@bxue-l2
Copy link
Contributor Author

bxue-l2 commented Mar 21, 2024

looks great sorry there's a lot of conflicts from my other PR 😭

all conflict fixed

Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

)

type RPCStatistics struct {
numberRpcFault uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: group private members and public members separately

{
  mu *sync.RWMutex
  numberRpcFault uint64

  Logger logging.Logger
}

return NewRPC, Retry
}

// handleError returns a boolean indicating if the current connection should be rotated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still to be done

if errors.As(err, &httpRespError) {
// if error is http error, i.e. non 2xx error, it is handled here
// if it is 2xx error, the error message is nil, https://github.com/ethereum/go-ethereum/blob/master/rpc/http.go,
// execution does not entere here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entere -> enter

"github.com/Layr-Labs/eigensdk-go/logging"
)

type RPCStatistics struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you change to this name, this does more than statistics like counting num errors, but actually handle and make failover decisions. The old name may fit better.

@bxue-l2 bxue-l2 merged commit f9c3c67 into Layr-Labs:master Mar 21, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants