Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Use advanced state for block production #2241

Closed
wants to merge 6 commits into from

Conversation

paulhauner
Copy link
Member

@paulhauner paulhauner commented Mar 2, 2021

Issue Addressed

NA

Proposed Changes

  • Use the pre-states from [Merged by Bors] - Advance state to next slot after importing block #2174 during block production.
    • Running this on Pyrmont shows block production times dropping from ~550ms to ~150ms.
  • Create crit and warn logs when a block is published to the API later than we expect.
    • On mainnet we are issuing a warn if the block is published more than 1s later than the slot start and a crit for more than 3s.
  • Rename some methods on the SnapshotCache for clarity.
  • Add the ability to pass the state root to BeaconChain::produce_block_on_state to avoid computing a state root. This is a very common LH optimization.
  • Add a metric that tracks how late we broadcast blocks received from the HTTP API. This is technically a duplicate of a ValidatorMonitor log, but I wanted to have it for the case where we aren't monitoring validators too.

@paulhauner paulhauner added the work-in-progress PR is a work-in-progress label Mar 2, 2021
@paulhauner paulhauner marked this pull request as ready for review March 3, 2021 04:51
@paulhauner paulhauner added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Mar 3, 2021
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Super clean, can't fault it 🚀

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Mar 4, 2021
@michaelsproul
Copy link
Member

Bors should succeed now, the cargo-audit issue is fixed on unstable

@paulhauner
Copy link
Member Author

Thank you!!

bors r+

bors bot pushed a commit that referenced this pull request Mar 4, 2021
## Issue Addressed

NA

## Proposed Changes

- Use the pre-states from #2174 during block production.
    - Running this on Pyrmont shows block production times dropping from ~550ms to ~150ms.
- Create `crit` and `warn` logs when a block is published to the API later than we expect.
    - On mainnet we are issuing a warn if the block is published more than 1s later than the slot start and a crit for more than 3s.
- Rename some methods on the `SnapshotCache` for clarity.
- Add the ability to pass the state root to `BeaconChain::produce_block_on_state` to avoid computing a state root. This is a very common LH optimization.
- Add a metric that tracks how late we broadcast blocks received from the HTTP API. This is *technically* a duplicate of a `ValidatorMonitor` log, but I wanted to have it for the case where we aren't monitoring validators too.
@bors bors bot changed the title Use advanced state for block production [Merged by Bors] - Use advanced state for block production Mar 4, 2021
@bors bors bot closed this Mar 4, 2021
michaelsproul pushed a commit that referenced this pull request Mar 10, 2021
## Issue Addressed

NA

## Proposed Changes

- Use the pre-states from #2174 during block production.
    - Running this on Pyrmont shows block production times dropping from ~550ms to ~150ms.
- Create `crit` and `warn` logs when a block is published to the API later than we expect.
    - On mainnet we are issuing a warn if the block is published more than 1s later than the slot start and a crit for more than 3s.
- Rename some methods on the `SnapshotCache` for clarity.
- Add the ability to pass the state root to `BeaconChain::produce_block_on_state` to avoid computing a state root. This is a very common LH optimization.
- Add a metric that tracks how late we broadcast blocks received from the HTTP API. This is *technically* a duplicate of a `ValidatorMonitor` log, but I wanted to have it for the case where we aren't monitoring validators too.
michaelsproul added a commit that referenced this pull request Mar 10, 2021
bors bot pushed a commit that referenced this pull request Mar 17, 2021
## Issue Addressed

Closes #2052

## Proposed Changes

- Refactor the attester/proposer duties endpoints in the BN
    - Performance improvements
    - Fixes some potential inconsistencies with the dependent root fields.
    - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead.
    - Move the code for the proposer/attester duties endpoints into separate files, for readability.
- Refactor the `DutiesService` in the VC
    - Required to reduce the delay on broadcasting new blocks.
    - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API.
    - Separate block/attestation duty tasks so that they don't block each other when one is slow.
- In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes.
    - Unfortunately this has created lots of dust changes.
 - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont).
 - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code.
    - This also fixes a bug with some functions which were failing to include a state root as per [this comment](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/consensus/state_processing/src/state_advance.rs#L69-L74). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root.
 - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base.
    
~~This PR *reduces* the size of the codebase 🎉~~ It *used* to reduce the size of the code base before I added more comments. 

## Observations on Prymont

- Proposer duties times down from peaks of 450ms to consistent <1ms.
- Current epoch attester duties times down from >1s peaks to a consistent 20-30ms.
- Block production down from +600ms to 100-200ms.

## Additional Info

- ~~Blocked on #2241~~
- ~~Blocked on #2234~~

## TODO

- [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now.
- [x] Address `per_slot_processing` roots.
- [x] Investigate slow next epoch times. Not getting added to cache on block processing?
- [x] Consider [this](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/beacon_node/store/src/hot_cold_store.rs#L811-L812) in the scenario of replacing the state roots


Co-authored-by: pawan <[email protected]>
Co-authored-by: Michael Sproul <[email protected]>
@paulhauner paulhauner deleted the faster-block-prod branch March 17, 2021 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants