[node] Add checks for RPC node health #475
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
EigenDA relies on a healthy JSON-RPC endpoint to conduct many of its operations. Currently, such an endpoint could be unhealthy, and while EigenDA will throw errors, these won't tell if they are related to the JSON-RPC being unhealthy. In this case, the EigenDA operator can suspect it could be an EigenDA bug, an issue on the Dispersers, JSON-RPC endpoint, a networking issue in the machine, etc.
If EigenDA can conduct health checks on the JSON-RPC node (by checking the
eth_syncing
endpoint), troubleshooting would be considerably more robust.This PR add such health checks in the following places:
ValidateBatch
)The following logs shows instances of the aforementioned changes (tested on Nethermind's Holesky EigenDA node):
JSON-RPC node unhealthy
Goroutine check
This fork use Bump v0.6.1 (#458) as a stable reference.
This PR also introduces a Chain ID check at the EigenDA initialization for the Operators to double check if they target JSON-RPC endpoint is pointing to the proper Network:
In the above logs, EigenDA is intended to be used for Holesky but a Mainnet JSON-RPC node was used. The node is synced and healthy, but the
Transactor
fails due to being on Mainnet instead of Holesky. This would help to tell the Operator rapidly what the issue is.Checks