Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track maintenance blocks #449

Merged
merged 3 commits into from
Aug 12, 2022
Merged

Track maintenance blocks #449

merged 3 commits into from
Aug 12, 2022

Conversation

nlordell
Copy link
Contributor

@nlordell nlordell commented Aug 11, 2022

Fixes #444.

This PR replaces maintenance error alerts (i.e. tracing::error! log messages that cause alert notifications) with metrics tracking the last seen block, as well as the last successfully updated block.

This allows us to have more complex alerts (such as >3 successive maintenance failures, or no new block in past minute) and reduce noise.

EDIT: The context for this change is that we run connected to load-balanced nodes that have non-consistent views of the blockchain - where one may return eth_blockNumber(latest) as N but subsequent queries to eth_call(..., N) gets served by another node and may cause an error if the other node hasn't seen block N yet.

Test Plan

Check the metrics are there:

% cargo run -p orderbook &
% cargo run -p solver &
% cargo run -p autopilot &

% curl -s http://localhost:9586/metrics | rg maintenance
# HELP gp_v2_api_maintenance_last_seen_block Service maintenance last seen block.
# TYPE gp_v2_api_maintenance_last_seen_block gauge
gp_v2_api_maintenance_last_seen_block 15322558
# HELP gp_v2_api_maintenance_last_updated_block Service maintenance last seen block.
# TYPE gp_v2_api_maintenance_last_updated_block gauge
gp_v2_api_maintenance_last_updated_block 15322558

% curl -s http://localhost:9587/metrics | rg maintenance
# HELP gp_v2_solver_maintenance_last_seen_block Service maintenance last seen block.
# TYPE gp_v2_solver_maintenance_last_seen_block gauge
gp_v2_solver_maintenance_last_seen_block 15322560
# HELP gp_v2_solver_maintenance_last_updated_block Service maintenance last seen block.
# TYPE gp_v2_solver_maintenance_last_updated_block gauge
gp_v2_solver_maintenance_last_updated_block 15322560

% curl -s http://localhost:9589/metrics | rg maintenance
# HELP gp_v2_autopilot_maintenance_last_seen_block Service maintenance last seen block.
# TYPE gp_v2_autopilot_maintenance_last_seen_block gauge
gp_v2_autopilot_maintenance_last_seen_block 15322560
# HELP gp_v2_autopilot_maintenance_last_updated_block Service maintenance last seen block.
# TYPE gp_v2_autopilot_maintenance_last_updated_block gauge
gp_v2_autopilot_maintenance_last_updated_block 15322560

@nlordell nlordell requested a review from a team as a code owner August 11, 2022 19:40
@nlordell nlordell enabled auto-merge (squash) August 12, 2022 09:08
@nlordell nlordell merged commit f1731d4 into main Aug 12, 2022
@nlordell nlordell deleted the maintenance-metrics branch August 12, 2022 09:10
@github-actions github-actions bot locked and limited conversation to collaborators Aug 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't Alert On Maintenance Errors
3 participants