-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Disallow attestation production earlier than head #2130
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
michaelsproul
approved these changes
Jan 7, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree with your assessment of the situation. This is a good simplification. That codepath was unlikely to be useful, and getting rid of it removes DoS risk.
4 tasks
paulhauner
added
ready-for-review
The code is ready for review
and removed
work-in-progress
PR is a work-in-progress
labels
Jan 19, 2021
bors r+ |
paulhauner
added
ready-for-merge
This PR is ready to merge.
and removed
ready-for-review
The code is ready for review
labels
Jan 20, 2021
bors bot
pushed a commit
that referenced
this pull request
Jan 20, 2021
## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
Pull request successfully merged into unstable. Build succeeded: |
bors
bot
changed the title
Disallow attestation production earlier than head
[Merged by Bors] - Disallow attestation production earlier than head
Jan 20, 2021
bors bot
pushed a commit
that referenced
this pull request
Jan 20, 2021
## Issue Addressed - Resolves #2064 ## Proposed Changes Adds a `ValidatorMonitor` struct which provides additional logging and Grafana metrics for specific validators. Use `lighthouse bn --validator-monitor` to automatically enable monitoring for any validator that hits the [subnet subscription](https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet) HTTP API endpoint. Also, use `lighthouse bn --validator-monitor-pubkeys` to supply a list of validators which will always be monitored. See the new docs included in this PR for more info. ## TODO - [x] Track validator balance, `slashed` status, etc. - [x] ~~Register slashings in current epoch, not offense epoch~~ - [ ] Publish Grafana dashboard, update TODO link in docs - [x] ~~#2130 is merged into this branch, resolve that~~
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue Addressed
The non-finality period on Pyrmont between epochs
9114
and9182
was contributed to by all thelighthouse_team
validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill thelighthouse bn
process. TheRestart=on-failure
directive forsystemd
caused the process to bounce in ~10-30m intervals.Diagnosis with
heaptrack
showed that theBeaconChain::produce_unaggregated_attestation
function was callingstore::beacon_state::get_full_state
and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated whenlighthouse bn
was killed by the OS.There was no CPU analysis (e.g.,
perf
), but theBeaconChain::produce_unaggregated_attestation
is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too.Proposed Changes
BeaconChain::produce_unaggregated_attestation
has two paths:Path (2) is the only path that calls
store::beacon_state::get_full_state
, therefore it is the path causing this excessive CPU/RAM usage.This PR removes the current functionality of path (2) and replaces it with a static error (
BeaconChainError::AttestingPriorToHead
).This change reduces the generality of
BeaconChain::produce_unaggregated_attestation
(and therefore/eth/v1/validator/attestation_data
), but I argue that this functionality is an edge-case and arguably a violation of the Honest Validator spec.It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is:
Additional Info
It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.