-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: e2e metrics reporting #9776
Merged
Merged
Changes from all commits
Commits
Show all changes
84 commits
Select commit
Hold shift + click to select a range
ae8d791
fix: gossipsub metrics adapter
Maddiaa0 923db38
check point
Maddiaa0 0287326
fix: view discv5 metric gauges
Maddiaa0 33f45f0
fix: enable for other discv5 service
Maddiaa0 09e27f2
temp
Maddiaa0 54d9b6c
fix: noop non gauge for now
Maddiaa0 fe6974c
fmt
Maddiaa0 c57fab1
chore: add telemetry client to bootstrap node start
Maddiaa0 b29bd48
fmt
Maddiaa0 ea6180e
fix: build
Maddiaa0 d4704ed
fix: prepare
Maddiaa0 a5dd3e7
feat: make collection time configuratble
Maddiaa0 d8a8e26
fmt
Maddiaa0 8602665
fix
Maddiaa0 a74f1e1
fmt
Maddiaa0 c05b2ae
Merge branch 'master' into md/gossip-sub-metrics
Maddiaa0 5dcaf15
Merge branch 'md/gossip-sub-metrics' into md/configure-network-scrapi…
Maddiaa0 3ace13b
fix: remove shorter export times
Maddiaa0 c4adf22
fix: clean todos
Maddiaa0 ce27e08
fix
Maddiaa0 b11f0f8
Merge branch 'md/gossip-sub-metrics' into md/configure-network-scrapi…
Maddiaa0 62c672c
fix: add unit test
Maddiaa0 3f52f71
fix: Include revertdata in Avm simulation errors for failures in nonr…
sirasistant 1337807
chore: switch to installing published binaries of foundry (#9731)
ludamad 594f7f6
fix: fix broken e2e_pending_note_hashes (#9748)
sklppy88 e677782
chore(docs): authwit note, not simulating simulations (#9438)
critesjosh f7f5069
fix: non state update from pub processor (#9634)
MirandaWood 976e530
chore: simplify docker compose instrumentation and native testnet met…
Maddiaa0 9862bab
fix: limit max block size (#9757)
just-mitch a88cf6e
feat(avm): remove rethrowable reverts hack (#9752)
fcarreiro aabf175
feat: recursive verifier for decider and last folding proof (#9626)
maramihali 9c1423a
Rename DISABLE_TBB flag and disable on MacOS by default (#9747)
wraitii 9110c36
fix: telemetry stopping on shutdown (#9740)
ludamad 8282f14
feat: Validator stateful set load balancers (#9765)
stevenplatt efdd3db
chore: upload logs in kind-network-test (#9755)
ludamad fee53be
feat: lock to propose (#9430)
LHerskind 95dd880
git subrepo push --branch=master barretenberg
AztecBot bc1225d
chore: replace relative paths to noir-protocol-circuits
AztecBot b77bda0
git_subrepo.sh: Fix parent in .gitrepo file. [skip ci]
AztecBot bc22444
git subrepo push --branch=master noir-projects/aztec-nr
AztecBot c9971b1
Merge branch 'master' into md/gossip-sub-metrics
Maddiaa0 82043b8
fmt
Maddiaa0 678c24f
feat: e2e test metrics alerting
Maddiaa0 80942c9
feat: alert checker
Maddiaa0 682539d
fix: reenable "flakey" p2p tests
Maddiaa0 9f0c692
feat: run e2e test with alerts
Maddiaa0 8151e7b
fix: add e2e p2p to test config
Maddiaa0 22eab85
fix
Maddiaa0 d77b1f4
run seperately
Maddiaa0 1748069
fix: extend allowlist to matching prefixes
Maddiaa0 dbe9432
fix: shorten reqresp test name
Maddiaa0 f3fd818
fix fmt
Maddiaa0 028c637
fix: simplify labels
Maddiaa0 1fac13d
Merge branch 'master' into md/re-enable-p2p
Maddiaa0 0f71edb
fix: chmod
Maddiaa0 cae0811
fix
Maddiaa0 55c232a
Merge branch 'md/re-enable-p2p' into md/e2e-metrics-alerting
Maddiaa0 f45268a
fix: activate with alerts for e2e tests
Maddiaa0 9874d3a
fmt
Maddiaa0 7b1af82
test: trigger comment
Maddiaa0 1e65e94
fix: test
Maddiaa0 59c19d7
Merge branch 'master' into md/re-enable-p2p
Maddiaa0 0d9d68b
fix: update gerousia test name
Maddiaa0 3244c37
Merge branch 'md/re-enable-p2p' into md/e2e-metrics-alerting
Maddiaa0 279a865
Merge branch 'md/configure-network-scraping-time' into md/e2e-metrics…
Maddiaa0 8d4cafb
fix: add metrics collection to other p2p tests
Maddiaa0 aa40acb
Merge branch 'master' into md/e2e-metrics-alerting
Maddiaa0 76ad9ad
Merge branch 'master' into md/e2e-metrics-alerting
Maddiaa0 f1c48d9
fix: eslint
Maddiaa0 281573c
fix: read metrics port from env
Maddiaa0 e87d277
fix: update metrics port
Maddiaa0 2edad87
fix: quote port
Maddiaa0 8cd9f8f
fix: move octo out of dev deps
Maddiaa0 4646909
fix: boxes
Maddiaa0 5923150
fix: remove posting on pr
Maddiaa0 b9cd7cd
Merge branch 'master' into md/e2e-metrics-alerting
Maddiaa0 35013ee
Merge branch 'master' into md/e2e-metrics-alerting
Maddiaa0 9311b96
fix
Maddiaa0 47ded5f
fmt
Maddiaa0 43945ee
fix: bump number, run on gossip only
Maddiaa0 264f920
Merge branch 'master' into md/e2e-metrics-alerting
Maddiaa0 cfbc38f
Merge branch 'master' into md/e2e-metrics-alerting
Maddiaa0 caaf7c3
fmt
Maddiaa0 e8da93e
bump time post reex
Maddiaa0 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
#! /bin/bash | ||
## Run an end to end test with alerts | ||
|
||
# This will run an end to end test running the otel-lgtm stack (otel-collector, grafana, prometheus, tempo and loki) | ||
# Then check the test against a set of alerts defined in the alerts.yaml file | ||
# Note: these tests must run with METRICS enabled | ||
|
||
# Usage: ./e2e_test_with_alerts.sh <test-name> <...extra-args> | ||
# Example: ./e2e_test_with_alerts.sh gossip_network | ||
|
||
set -e | ||
|
||
test_path=$1 | ||
|
||
echo "Running otel stack" | ||
CONTAINER_ID=$(docker run -d -p 3000:3000 -p 4317:4317 -p 4318:4318 --rm grafana/otel-lgtm) | ||
|
||
trap "docker stop $CONTAINER_ID" EXIT SIGINT SIGTERM | ||
|
||
echo "Waiting for LGTM stack to be ready..." | ||
timeout=90 | ||
while [ $timeout -gt 0 ]; do | ||
if docker logs $CONTAINER_ID 2>&1 | grep -q "The OpenTelemetry collector and the Grafana LGTM stack are up and running"; then | ||
echo "LGTM stack is ready!" | ||
break | ||
fi | ||
sleep 1 | ||
((timeout--)) | ||
done | ||
|
||
if [ $timeout -eq 0 ]; then | ||
echo "Timeout waiting for LGTM stack to be ready" | ||
docker stop $CONTAINER_ID | ||
exit 1 | ||
fi | ||
|
||
## Pass through run the existing e2e test | ||
docker run \ | ||
--network host \ | ||
-e HARDWARE_CONCURRENCY="$HARDWARE_CONCURRENCY" \ | ||
-e FAKE_PROOFS="$FAKE_PROOFS" \ | ||
-e METRICS_PORT="4318" \ | ||
-e COLLECT_METRICS="true" \ | ||
-e PULL_REQUEST="$PULL_REQUEST" \ | ||
$env_args \ | ||
--rm aztecprotocol/end-to-end:$AZTEC_DOCKER_TAG \ | ||
"$test_path" "$@" || [ "$ignore_failures" = "true" ] | ||
|
||
|
||
echo "Running alert checker..." | ||
docker run --network host --rm aztecprotocol/end-to-end:$AZTEC_DOCKER_TAG quality_of_service/alert_checker.test.ts |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
88 changes: 88 additions & 0 deletions
88
yarn-project/end-to-end/src/quality_of_service/alert_checker.test.ts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
import { type DebugLogger, createDebugLogger } from '@aztec/aztec.js'; | ||
import { fileURLToPath } from '@aztec/foundation/url'; | ||
|
||
import * as fs from 'fs'; | ||
import * as yaml from 'js-yaml'; | ||
import { dirname, join } from 'path'; | ||
|
||
const GRAFANA_ENDPOINT = 'http://localhost:3000/api/datasources/proxy/uid/prometheus/api/v1/query'; | ||
interface AlertConfig { | ||
alert: string; | ||
expr: string; | ||
for: string; | ||
labels: Record<string, string>; | ||
annotations: Record<string, string>; | ||
} | ||
// Define __dirname for ES modules | ||
const __filename = fileURLToPath(import.meta.url); | ||
const __dirname = dirname(__filename); | ||
|
||
// Load YAML configuration | ||
function loadAlertsConfig(filePath: string): AlertConfig[] { | ||
const fileContents = fs.readFileSync(join(__dirname, filePath), 'utf8'); | ||
const data = yaml.load(fileContents) as { alerts: AlertConfig[] }; | ||
return data.alerts; | ||
} | ||
|
||
// Function to query Grafana based on an expression | ||
async function queryGrafana(expr: string): Promise<number> { | ||
// Create base64 encoded credentials for basic auth | ||
const credentials = Buffer.from('admin:admin').toString('base64'); | ||
|
||
const response = await fetch(`${GRAFANA_ENDPOINT}?query=${encodeURIComponent(expr)}`, { | ||
headers: { | ||
Authorization: `Basic ${credentials}`, | ||
}, | ||
}); | ||
|
||
if (!response.ok) { | ||
throw new Error(`Failed to fetch data from Grafana: ${response.statusText}`); | ||
} | ||
|
||
const data = await response.json(); | ||
const result = data.data.result; | ||
return result.length > 0 ? parseFloat(result[0].value[1]) : 0; | ||
} | ||
|
||
// Function to check alerts based on expressions | ||
async function checkAlerts(alerts: AlertConfig[], logger: DebugLogger) { | ||
let alertTriggered = false; | ||
|
||
for (const alert of alerts) { | ||
logger.info(`Checking alert: ${JSON.stringify(alert)}`); | ||
|
||
const metricValue = await queryGrafana(alert.expr); | ||
logger.info(`Metric value: ${metricValue}`); | ||
if (metricValue > 0) { | ||
logger.error(`Alert ${alert.alert} triggered! Value: ${metricValue}`); | ||
alertTriggered = true; | ||
} else { | ||
logger.info(`Alert ${alert.alert} passed.`); | ||
} | ||
} | ||
|
||
// If any alerts have been triggered we fail the test | ||
if (alertTriggered) { | ||
throw new Error('Test failed due to triggered alert'); | ||
} | ||
} | ||
|
||
// Main function to run tests | ||
async function runAlertChecker(logger: DebugLogger) { | ||
const alerts = loadAlertsConfig('alerts.yaml'); | ||
try { | ||
await checkAlerts(alerts, logger); | ||
logger.info('All alerts passed.'); | ||
} catch (error) { | ||
logger.error(error instanceof Error ? error.message : String(error)); | ||
process.exit(1); // Exit with error code | ||
} | ||
} | ||
|
||
// Running as a jest test to use existing end to end test framework | ||
describe('Alert Checker', () => { | ||
const logger = createDebugLogger('aztec:alert-checker'); | ||
it('should check alerts', async () => { | ||
await runAlertChecker(logger); | ||
}); | ||
}); |
10 changes: 10 additions & 0 deletions
10
yarn-project/end-to-end/src/quality_of_service/alerts.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
## A set of alerts for the quality of service of the sequencer, these are tested for in certain e2e tests | ||
|
||
## In end to end tests - page, will cause a test to fail | ||
## Warning will write a message to the PR | ||
|
||
alerts: | ||
- alert: SequencerTimeToCollectAttestations | ||
expr: aztec_sequencer_time_to_collect_attestations > 2500 | ||
labels: | ||
severity: page | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually use severity yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used yet, but i was toying with different severity levels sending alerts / failing a test. Just made everything fail the test for now