Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: collect and add proposal stats in grafana metrics #5448

Merged
merged 15 commits into from
May 15, 2023

Conversation

g11tech
Copy link
Contributor

@g11tech g11tech commented May 1, 2023

Compute and add proposal stats in grafana metrics on finalization

TODO:

  • Test extensively
  • Add Grafana metrics
    image
  • Add unit tests

Closes #4636

@g11tech g11tech changed the title feat: Add proposal stats in grafana metrics feat: Collect and add proposal stats in grafana metrics May 1, 2023
@g11tech g11tech changed the title feat: Collect and add proposal stats in grafana metrics feat: collect and add proposal stats in grafana metrics May 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 2, 2023

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: ae299f4 Previous: c42adb2 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 850.27 us/op 719.78 us/op 1.18
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 45.809 us/op 47.044 us/op 0.97
BLS verify - blst-native 1.2067 ms/op 1.2190 ms/op 0.99
BLS verifyMultipleSignatures 3 - blst-native 2.4510 ms/op 2.4908 ms/op 0.98
BLS verifyMultipleSignatures 8 - blst-native 5.2644 ms/op 5.3351 ms/op 0.99
BLS verifyMultipleSignatures 32 - blst-native 19.029 ms/op 19.378 ms/op 0.98
BLS aggregatePubkeys 32 - blst-native 25.545 us/op 25.945 us/op 0.98
BLS aggregatePubkeys 128 - blst-native 99.808 us/op 101.14 us/op 0.99
getAttestationsForBlock 50.749 ms/op 56.323 ms/op 0.90
isKnown best case - 1 super set check 245.00 ns/op 248.00 ns/op 0.99
isKnown normal case - 2 super set checks 240.00 ns/op 250.00 ns/op 0.96
isKnown worse case - 16 super set checks 243.00 ns/op 245.00 ns/op 0.99
CheckpointStateCache - add get delete 4.8520 us/op 5.1390 us/op 0.94
validate gossip signedAggregateAndProof - struct 2.7321 ms/op 2.7767 ms/op 0.98
validate gossip attestation - struct 1.3067 ms/op 1.3253 ms/op 0.99
pickEth1Vote - no votes 1.2497 ms/op 1.3109 ms/op 0.95
pickEth1Vote - max votes 9.5481 ms/op 10.613 ms/op 0.90
pickEth1Vote - Eth1Data hashTreeRoot value x2048 8.6483 ms/op 9.1202 ms/op 0.95
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 14.246 ms/op 14.443 ms/op 0.99
pickEth1Vote - Eth1Data fastSerialize value x2048 634.91 us/op 656.48 us/op 0.97
pickEth1Vote - Eth1Data fastSerialize tree x2048 7.2685 ms/op 4.6979 ms/op 1.55
bytes32 toHexString 469.00 ns/op 530.00 ns/op 0.88
bytes32 Buffer.toString(hex) 329.00 ns/op 381.00 ns/op 0.86
bytes32 Buffer.toString(hex) from Uint8Array 527.00 ns/op 603.00 ns/op 0.87
bytes32 Buffer.toString(hex) + 0x 329.00 ns/op 379.00 ns/op 0.87
Object access 1 prop 0.15600 ns/op 0.18200 ns/op 0.86
Map access 1 prop 0.15600 ns/op 0.16700 ns/op 0.93
Object get x1000 6.1800 ns/op 7.1950 ns/op 0.86
Map get x1000 0.60400 ns/op 0.64200 ns/op 0.94
Object set x1000 51.653 ns/op 55.971 ns/op 0.92
Map set x1000 42.823 ns/op 45.231 ns/op 0.95
Return object 10000 times 0.23410 ns/op 0.24140 ns/op 0.97
Throw Error 10000 times 4.1550 us/op 4.2326 us/op 0.98
fastMsgIdFn sha256 / 200 bytes 3.4160 us/op 3.4810 us/op 0.98
fastMsgIdFn h32 xxhash / 200 bytes 278.00 ns/op 297.00 ns/op 0.94
fastMsgIdFn h64 xxhash / 200 bytes 395.00 ns/op 416.00 ns/op 0.95
fastMsgIdFn sha256 / 1000 bytes 11.536 us/op 11.599 us/op 0.99
fastMsgIdFn h32 xxhash / 1000 bytes 402.00 ns/op 438.00 ns/op 0.92
fastMsgIdFn h64 xxhash / 1000 bytes 451.00 ns/op 508.00 ns/op 0.89
fastMsgIdFn sha256 / 10000 bytes 103.93 us/op 103.67 us/op 1.00
fastMsgIdFn h32 xxhash / 10000 bytes 1.9050 us/op 1.9680 us/op 0.97
fastMsgIdFn h64 xxhash / 10000 bytes 1.3660 us/op 1.4380 us/op 0.95
enrSubnets - fastDeserialize 64 bits 1.2670 us/op 1.3610 us/op 0.93
enrSubnets - ssz BitVector 64 bits 485.00 ns/op 518.00 ns/op 0.94
enrSubnets - fastDeserialize 4 bits 165.00 ns/op 186.00 ns/op 0.89
enrSubnets - ssz BitVector 4 bits 473.00 ns/op 561.00 ns/op 0.84
prioritizePeers score -10:0 att 32-0.1 sync 2-0 108.88 us/op 108.81 us/op 1.00
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 131.57 us/op 152.88 us/op 0.86
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 167.42 us/op 186.32 us/op 0.90
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 298.01 us/op 330.38 us/op 0.90
prioritizePeers score 0:0 att 64-1 sync 4-1 369.61 us/op 407.32 us/op 0.91
array of 16000 items push then shift 1.6265 us/op 1.6597 us/op 0.98
LinkedList of 16000 items push then shift 8.7680 ns/op 8.8020 ns/op 1.00
array of 16000 items push then pop 77.101 ns/op 108.99 ns/op 0.71
LinkedList of 16000 items push then pop 8.4640 ns/op 8.7900 ns/op 0.96
array of 24000 items push then shift 2.3265 us/op 2.4070 us/op 0.97
LinkedList of 24000 items push then shift 8.5450 ns/op 9.0320 ns/op 0.95
array of 24000 items push then pop 76.799 ns/op 86.522 ns/op 0.89
LinkedList of 24000 items push then pop 8.1300 ns/op 8.7000 ns/op 0.93
intersect bitArray bitLen 8 12.766 ns/op 13.227 ns/op 0.97
intersect array and set length 8 74.188 ns/op 83.127 ns/op 0.89
intersect bitArray bitLen 128 42.445 ns/op 43.915 ns/op 0.97
intersect array and set length 128 1.0138 us/op 1.1125 us/op 0.91
Buffer.concat 32 items 2.6020 us/op 2.8930 us/op 0.90
Uint8Array.set 32 items 2.8710 us/op 2.1850 us/op 1.31
pass gossip attestations to forkchoice per slot 2.5731 ms/op 3.0400 ms/op 0.85
computeDeltas 3.4764 ms/op 3.0063 ms/op 1.16
computeProposerBoostScoreFromBalances 1.7837 ms/op 1.7760 ms/op 1.00
altair processAttestation - 250000 vs - 7PWei normalcase 2.0458 ms/op 2.8171 ms/op 0.73
altair processAttestation - 250000 vs - 7PWei worstcase 3.2455 ms/op 3.9306 ms/op 0.83
altair processAttestation - setStatus - 1/6 committees join 133.84 us/op 149.34 us/op 0.90
altair processAttestation - setStatus - 1/3 committees join 277.40 us/op 284.35 us/op 0.98
altair processAttestation - setStatus - 1/2 committees join 360.26 us/op 364.48 us/op 0.99
altair processAttestation - setStatus - 2/3 committees join 454.72 us/op 463.05 us/op 0.98
altair processAttestation - setStatus - 4/5 committees join 635.91 us/op 649.22 us/op 0.98
altair processAttestation - setStatus - 100% committees join 755.05 us/op 768.31 us/op 0.98
altair processBlock - 250000 vs - 7PWei normalcase 16.972 ms/op 17.197 ms/op 0.99
altair processBlock - 250000 vs - 7PWei normalcase hashState 27.346 ms/op 24.724 ms/op 1.11
altair processBlock - 250000 vs - 7PWei worstcase 48.032 ms/op 51.144 ms/op 0.94
altair processBlock - 250000 vs - 7PWei worstcase hashState 67.527 ms/op 68.661 ms/op 0.98
phase0 processBlock - 250000 vs - 7PWei normalcase 1.9019 ms/op 2.1006 ms/op 0.91
phase0 processBlock - 250000 vs - 7PWei worstcase 28.040 ms/op 29.000 ms/op 0.97
altair processEth1Data - 250000 vs - 7PWei normalcase 454.49 us/op 488.74 us/op 0.93
vc - 250000 eb 1 eth1 1 we 0 wn 0 - smpl 15 7.0620 us/op 9.2560 us/op 0.76
vc - 250000 eb 0.95 eth1 0.1 we 0.05 wn 0 - smpl 219 19.766 us/op 31.102 us/op 0.64
vc - 250000 eb 0.95 eth1 0.3 we 0.05 wn 0 - smpl 42 10.849 us/op 13.880 us/op 0.78
vc - 250000 eb 0.95 eth1 0.7 we 0.05 wn 0 - smpl 18 7.3440 us/op 9.3100 us/op 0.79
vc - 250000 eb 0.1 eth1 0.1 we 0 wn 0 - smpl 1020 94.997 us/op 115.80 us/op 0.82
vc - 250000 eb 0.03 eth1 0.03 we 0 wn 0 - smpl 11777 628.92 us/op 654.28 us/op 0.96
vc - 250000 eb 0.01 eth1 0.01 we 0 wn 0 - smpl 16384 893.38 us/op 904.19 us/op 0.99
vc - 250000 eb 0 eth1 0 we 0 wn 0 - smpl 16384 905.43 us/op 892.60 us/op 1.01
vc - 250000 eb 0 eth1 0 we 0 wn 0 nocache - smpl 16384 2.3598 ms/op 2.4228 ms/op 0.97
vc - 250000 eb 0 eth1 1 we 0 wn 0 - smpl 16384 1.4684 ms/op 1.5835 ms/op 0.93
vc - 250000 eb 0 eth1 1 we 0 wn 0 nocache - smpl 16384 3.8979 ms/op 4.1535 ms/op 0.94
Tree 40 250000 create 323.24 ms/op 352.69 ms/op 0.92
Tree 40 250000 get(125000) 176.25 ns/op 193.36 ns/op 0.91
Tree 40 250000 set(125000) 933.29 ns/op 1.0984 us/op 0.85
Tree 40 250000 toArray() 17.582 ms/op 22.054 ms/op 0.80
Tree 40 250000 iterate all - toArray() + loop 17.888 ms/op 22.247 ms/op 0.80
Tree 40 250000 iterate all - get(i) 67.787 ms/op 76.784 ms/op 0.88
MutableVector 250000 create 9.8108 ms/op 10.387 ms/op 0.94
MutableVector 250000 get(125000) 6.2260 ns/op 6.2540 ns/op 1.00
MutableVector 250000 set(125000) 245.56 ns/op 258.76 ns/op 0.95
MutableVector 250000 toArray() 2.5867 ms/op 2.9910 ms/op 0.86
MutableVector 250000 iterate all - toArray() + loop 2.8213 ms/op 3.0935 ms/op 0.91
MutableVector 250000 iterate all - get(i) 1.5212 ms/op 1.5076 ms/op 1.01
Array 250000 create 2.4858 ms/op 2.7521 ms/op 0.90
Array 250000 clone - spread 1.1536 ms/op 1.1386 ms/op 1.01
Array 250000 get(125000) 0.53500 ns/op 0.58300 ns/op 0.92
Array 250000 set(125000) 0.62100 ns/op 0.66700 ns/op 0.93
Array 250000 iterate all - loop 81.529 us/op 86.963 us/op 0.94
effectiveBalanceIncrements clone Uint8Array 300000 25.726 us/op 32.881 us/op 0.78
effectiveBalanceIncrements clone MutableVector 300000 342.00 ns/op 356.00 ns/op 0.96
effectiveBalanceIncrements rw all Uint8Array 300000 169.45 us/op 169.29 us/op 1.00
effectiveBalanceIncrements rw all MutableVector 300000 80.646 ms/op 83.924 ms/op 0.96
phase0 afterProcessEpoch - 250000 vs - 7PWei 119.77 ms/op 115.56 ms/op 1.04
phase0 beforeProcessEpoch - 250000 vs - 7PWei 48.315 ms/op 43.827 ms/op 1.10
altair processEpoch - mainnet_e81889 312.01 ms/op 330.33 ms/op 0.94
mainnet_e81889 - altair beforeProcessEpoch 55.205 ms/op 68.200 ms/op 0.81
mainnet_e81889 - altair processJustificationAndFinalization 17.598 us/op 20.890 us/op 0.84
mainnet_e81889 - altair processInactivityUpdates 5.7107 ms/op 6.0642 ms/op 0.94
mainnet_e81889 - altair processRewardsAndPenalties 68.937 ms/op 64.098 ms/op 1.08
mainnet_e81889 - altair processRegistryUpdates 2.5200 us/op 2.5900 us/op 0.97
mainnet_e81889 - altair processSlashings 458.00 ns/op 516.00 ns/op 0.89
mainnet_e81889 - altair processEth1DataReset 514.00 ns/op 645.00 ns/op 0.80
mainnet_e81889 - altair processEffectiveBalanceUpdates 2.2266 ms/op 1.2592 ms/op 1.77
mainnet_e81889 - altair processSlashingsReset 3.5440 us/op 4.8840 us/op 0.73
mainnet_e81889 - altair processRandaoMixesReset 4.6150 us/op 4.7670 us/op 0.97
mainnet_e81889 - altair processHistoricalRootsUpdate 1.6020 us/op 943.00 ns/op 1.70
mainnet_e81889 - altair processParticipationFlagUpdates 3.7910 us/op 4.2870 us/op 0.88
mainnet_e81889 - altair processSyncCommitteeUpdates 741.00 ns/op 694.00 ns/op 1.07
mainnet_e81889 - altair afterProcessEpoch 130.35 ms/op 128.35 ms/op 1.02
phase0 processEpoch - mainnet_e58758 378.04 ms/op 363.19 ms/op 1.04
mainnet_e58758 - phase0 beforeProcessEpoch 143.21 ms/op 125.21 ms/op 1.14
mainnet_e58758 - phase0 processJustificationAndFinalization 16.410 us/op 21.915 us/op 0.75
mainnet_e58758 - phase0 processRewardsAndPenalties 64.119 ms/op 56.374 ms/op 1.14
mainnet_e58758 - phase0 processRegistryUpdates 7.8200 us/op 8.2340 us/op 0.95
mainnet_e58758 - phase0 processSlashings 478.00 ns/op 1.5760 us/op 0.30
mainnet_e58758 - phase0 processEth1DataReset 578.00 ns/op 715.00 ns/op 0.81
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.8412 ms/op 1.0337 ms/op 1.78
mainnet_e58758 - phase0 processSlashingsReset 4.7120 us/op 4.1330 us/op 1.14
mainnet_e58758 - phase0 processRandaoMixesReset 4.7190 us/op 5.1500 us/op 0.92
mainnet_e58758 - phase0 processHistoricalRootsUpdate 585.00 ns/op 734.00 ns/op 0.80
mainnet_e58758 - phase0 processParticipationRecordUpdates 5.1310 us/op 4.4860 us/op 1.14
mainnet_e58758 - phase0 afterProcessEpoch 101.35 ms/op 99.006 ms/op 1.02
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.6720 ms/op 1.2539 ms/op 1.33
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 2.3576 ms/op 1.6424 ms/op 1.44
altair processInactivityUpdates - 250000 normalcase 25.656 ms/op 21.101 ms/op 1.22
altair processInactivityUpdates - 250000 worstcase 26.495 ms/op 25.502 ms/op 1.04
phase0 processRegistryUpdates - 250000 normalcase 8.2630 us/op 15.596 us/op 0.53
phase0 processRegistryUpdates - 250000 badcase_full_deposits 281.50 us/op 304.62 us/op 0.92
phase0 processRegistryUpdates - 250000 worstcase 0.5 134.55 ms/op 118.31 ms/op 1.14
altair processRewardsAndPenalties - 250000 normalcase 63.371 ms/op 62.618 ms/op 1.01
altair processRewardsAndPenalties - 250000 worstcase 74.620 ms/op 65.284 ms/op 1.14
phase0 getAttestationDeltas - 250000 normalcase 6.6867 ms/op 7.1462 ms/op 0.94
phase0 getAttestationDeltas - 250000 worstcase 6.6669 ms/op 6.8557 ms/op 0.97
phase0 processSlashings - 250000 worstcase 3.5702 ms/op 3.4050 ms/op 1.05
altair processSyncCommitteeUpdates - 250000 189.14 ms/op 180.60 ms/op 1.05
BeaconState.hashTreeRoot - No change 326.00 ns/op 273.00 ns/op 1.19
BeaconState.hashTreeRoot - 1 full validator 55.041 us/op 54.256 us/op 1.01
BeaconState.hashTreeRoot - 32 full validator 586.23 us/op 514.56 us/op 1.14
BeaconState.hashTreeRoot - 512 full validator 6.5854 ms/op 5.4991 ms/op 1.20
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 65.311 us/op 65.080 us/op 1.00
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 967.92 us/op 868.85 us/op 1.11
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 12.594 ms/op 11.360 ms/op 1.11
BeaconState.hashTreeRoot - 1 balances 52.420 us/op 46.665 us/op 1.12
BeaconState.hashTreeRoot - 32 balances 451.36 us/op 465.34 us/op 0.97
BeaconState.hashTreeRoot - 512 balances 4.8740 ms/op 4.3749 ms/op 1.11
BeaconState.hashTreeRoot - 250000 balances 76.852 ms/op 74.166 ms/op 1.04
aggregationBits - 2048 els - zipIndexesInBitList 16.696 us/op 17.806 us/op 0.94
regular array get 100000 times 41.993 us/op 33.095 us/op 1.27
wrappedArray get 100000 times 32.989 us/op 33.043 us/op 1.00
arrayWithProxy get 100000 times 16.996 ms/op 15.694 ms/op 1.08
ssz.Root.equals 551.00 ns/op 606.00 ns/op 0.91
byteArrayEquals 550.00 ns/op 563.00 ns/op 0.98
shuffle list - 16384 els 7.0828 ms/op 6.9243 ms/op 1.02
shuffle list - 250000 els 104.38 ms/op 101.66 ms/op 1.03
processSlot - 1 slots 8.3970 us/op 9.1680 us/op 0.92
processSlot - 32 slots 1.3539 ms/op 1.3723 ms/op 0.99
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 37.883 ms/op 35.870 ms/op 1.06
getCommitteeAssignments - req 1 vs - 250000 vc 2.9038 ms/op 2.9227 ms/op 0.99
getCommitteeAssignments - req 100 vs - 250000 vc 4.1186 ms/op 4.1499 ms/op 0.99
getCommitteeAssignments - req 1000 vs - 250000 vc 4.4594 ms/op 4.5131 ms/op 0.99
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 4.6100 ns/op 4.7800 ns/op 0.96
state getBlockRootAtSlot - 250000 vs - 7PWei 964.02 ns/op 621.29 ns/op 1.55
computeProposers - vc 250000 10.776 ms/op 10.362 ms/op 1.04
computeEpochShuffling - vc 250000 99.039 ms/op 103.23 ms/op 0.96
getNextSyncCommittee - vc 250000 166.85 ms/op 174.33 ms/op 0.96
computeSigningRoot for AttestationData 13.085 us/op 13.378 us/op 0.98
hash AttestationData serialized data then Buffer.toString(base64) 2.3530 us/op 2.4777 us/op 0.95
toHexString serialized data 1.0270 us/op 1.1070 us/op 0.93
Buffer.toString(base64) 320.37 ns/op 347.43 ns/op 0.92

by benchmarkbot/action

packages/beacon-node/src/metrics/metrics/beacon.ts Outdated Show resolved Hide resolved
packages/beacon-node/src/chain/archiver/index.ts Outdated Show resolved Hide resolved
@@ -58,6 +58,66 @@ export function createBeaconMetrics(register: RegistryMetricCreator) {

// Non-spec'ed

// Finalized block and proposal stats
allValidators: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading the metric name all_validators_expected_count or allValidators.expected does not make it obvious that this is about blocks

@nflaig
Copy link
Member

nflaig commented May 10, 2023

Should this PR also add dashboard panels in grafana to visualize the metrics? The title suggests that but right now it only added the prometheus metrics

@g11tech g11tech marked this pull request as ready for review May 11, 2023 07:41
@g11tech g11tech requested a review from a team as a code owner May 11, 2023 07:41
@g11tech
Copy link
Contributor Author

g11tech commented May 11, 2023

Should this PR also add dashboard panels in grafana to visualize the metrics? The title suggests that but right now it only added the prometheus metrics

updated

Comment on lines 299 to 307
this.logger.info("All validators finalized proposal stats", {
...allValidators,
finalizedCanonicalCheckpointsCount,
finalizedFoundCheckpointsInStateCache,
});
this.logger.info("Attached validators finalized proposal stats", {
...attachedValidators,
finalizedAttachedValidatorsCount,
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like more eyes here @philknows @dapplion since these are new info logs and need to be extremely polished and well thought out.

My first thought is that I don't think we want the "all validators" log, and only want the "attached validators" log.
Also I think we should remove finalizedAttachedValidatorsCount (or at the very least rename it to be less wordy).

My justification for ^ is that info logs should probably not be added for extensive general chain metrics (rather use prom metrics for that), but rather be reserved for individual node status and individual validator statuses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just hide behind a feature flag, similar to validator monitor logs (to be added)

Copy link
Contributor Author

@g11tech g11tech May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can push all validators log (and finalizedAttachedValidatorsCount) to debug and retain attached validator logs for info

reasoning: not all deployments have metrics set, but all deployments by default write in debug file which has been helpful in past for debugging

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@g11tech After seeing this log (Attached validators finalized proposal stats expected=0, finalized=0, orphaned=0, missed=0, finalizedAttachedValidatorsCount=0) on different beacon node instances I find it not very useful as a info log which should be printed out by default.

I think should reconsider and only enable it via flag as I suggested above. Right now, it seems really random as we do report stats on block proposals but not attestations.

It also just prints out the proposal stats during which the beacon node was running, on restart, it will reset all values to 0. Unless someone is running hundreds of validators this log will not change for months.

Focusing the BN logs on just sync and peer info by default feels a lot cleaner to be as there might not even be validators attached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion about this in Discord here: https://discord.com/channels/593655374469660673/1105347927548907600/1110563872152236133

The compromise made is that this will not show on info if node is still syncing, there are no validators attached or if all stats are 0. There will also be a flag to disable this.

Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the functionality is important I think this PR breaks a bunch of separation of concerns in our code that makes me very reluctant to approve

  • Now archiver class is mixed with the role of the validator monitor
  • Dependency between archiver / BeaconProposerCache is nasty
  • Having to bloat the fork-choice data structure for a purely metric feature is also not pretty
  • Archiver doing a validator status info log at non-regular intervals is nasty, which should be the responsability of the validator monitor, see Expose validator monitor via logs #5336

Also that random const checkpointState = checkpointStateCache.get(checkpointHex); in the middle of the archiver is not good, as we want to eventually move to a more black-box style regen and there's no guarantee the state is available at all

I don't have good suggestions for all the points brought up but the current implementation seems expensive to maintain long term

@g11tech
Copy link
Contributor Author

g11tech commented May 14, 2023

While the functionality is important I think this PR breaks a bunch of separation of concerns in our code that makes me very reluctant to approve

* Now archiver class is mixed with the role of the validator monitor

* Dependency between archiver / BeaconProposerCache is nasty

* Having to bloat the fork-choice data structure for a purely metric feature is also not pretty

* Archiver doing a validator status info log at non-regular intervals is nasty, which should be the responsability of the validator monitor, see [Expose validator monitor via logs #5336](https://github.com/ChainSafe/lodestar/issues/5336)

Also that random const checkpointState = checkpointStateCache.get(checkpointHex); in the middle of the archiver is not good, as we want to eventually move to a more black-box style regen and there's no guarantee the state is available at all

I don't have good suggestions for all the points brought up but the current implementation seems expensive to maintain long term

hmmm, let me suggest ways to address these concerns :

  1. what we need is the state is what proposers in BeaconProposerCache have to propose in that epoch, may be this can be computed and cached in BeaconProposerCache itself in prepareForNextSlot's epoch transition.
  2. move the computation to forkchoice where finalization event is thrown
  3. collect proposerIndex in the block db although imo putting proposerIndex in forkchoice isn't much of the bloat but i can understand the concern

If you like this approach I can refac this

Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a second pass and I think the approach is good. Attempted to do alternative solutions and they have their own trade-offs, specifically:

  • Register in validator monitor seen blocks by attached validators
  • Extend the EpochContext to include the prev epoch proposers
  • Track orphan data at the end of each epoch cross referencing those data points

That approach does not require to change the fork-choice but requires another cache in the validator monitor. It think we can explore that latter, but for now this PR is good. Also there's a bug that prevents that approach from working always

@g11tech g11tech merged commit 249aa75 into unstable May 15, 2023
@wemeetagain
Copy link
Member

🎉 This PR is included in v1.9.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric to track missed blocks
5 participants