-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expensive fork-choice queued attestation mutation #6206
Comments
Nice detective work @jimmygchen ! |
I've done some further investigation and noticed an invalid attestation (bad I think it might be worth running a memtest on the machine. |
Further investigation shows that this cannot happen (see the link above) and @michaelsproul found that the bad attestation's slot actually has 1 bit flipped compared to its correct slot, so it could very well be a memory / disk issue. I couldn't get anything from memtests on that machine. I think we could deploy v5.3.0 to more testnet nodes and see if it happens elsewhere (including slasher), if it doesn't we could probably conclude this bug to be likely a hardware issue, and move this issue out of v5.3.0. |
If this happens more often with nodes that have been running slashers, maybe it is some UB in the slasher code, potentially the C database code! We could try running a slasher node under valgrind |
This undefined behaviour might have surfaced since the recent refactor in #4529, which revealed a bug in the LMDB bindings. See #6211 for a workaround fix. Under normal circumstances the queued attestations vec should be quite small as there are validations to prevent attestations for future slots to be queued. It would be useful to add a metric for this though. |
@jimmygchen this PR adds a metric for the queued attestations vec: Closing this issue as won't fix |
For reference, the metric to check for this issue is cc @chong-he |
Description
Debugging a Holesky node we noted that the split_off here was moving ~1 GB of data every time we process an attestation
lighthouse/consensus/fork_choice/src/fork_choice.rs
Line 258 in c52c598
Questions:
Tackling 2 and 3, we could switch to a
HashMap<Slot, Vec<QueuedAttestation>>
as time strictly advances forward, so each slot Vec is strictly append-only.WIP of this approach: https://github.com/sigp/lighthouse/compare/stable...dapplion:lighthouse:fork-choice-queued-attestations?expand=1
However we should answer 1 too as there may be some other lingering issue
The text was updated successfully, but these errors were encountered: