Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Run fork choice before block proposal #3168

Closed

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented May 5, 2022

Issue Addressed

Upcoming spec change ethereum/consensus-specs#2878

Proposed Changes

  1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
  2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), dequeueing attestations for the next slot.
  3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

Additional Info

Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

Attestation Accuracy

This change also makes us more likely to attest to the correct head. Currently in the case of a skipped slot at slot we only run fork choice 9s into slot - 1. This means the attestations from slot - 1 aren't taken into consideration, and any boost applied to the block from slot - 1 is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag --fork-choice-before-proposal-timeout 0. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

Implementation

Fork choice gets invoked at the start of the slot via the per_slot_task function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.

@michaelsproul michaelsproul added the work-in-progress PR is a work-in-progress label May 5, 2022
@michaelsproul michaelsproul force-pushed the fork-choice-before-proposal branch from 8a8eb08 to a247b35 Compare May 6, 2022 02:06
@michaelsproul
Copy link
Member Author

michaelsproul commented May 10, 2022

I've decided we need to move to the fork choice schedule that Prysm use: always run fork choice in the start of the slot, and then wait for that to finish (perhaps with a timeout) when proposing a block. As an optimisation we can also run fork choice preemptively in the state advance timer at 9s through the previous slot.

@michaelsproul
Copy link
Member Author

michaelsproul commented May 10, 2022

Current TODO:

  • Fix Clippy
  • Make the wait timeout dynamic with a new CLI flag
  • Double-check the full implications for merged networks: forkchoiceUpdated, payload preparation, etc.
  • Test on a merge network

@michaelsproul
Copy link
Member Author

To establish that the impact of this change on merge networks is minimal, let's consider a few scenarios. We'll assume that the BN needs to propose a block in slot n + 1.

Case 1: Healthy chain, blocks every slot

[n - 1] <- [n] <- [n + 1]
  1. At the start of slot n before the block arrives we run fork choice. We issue one fcU without payload attributes. We skip issuing a second fcU with payload attributes as neither of the conditions here are satisfied (we're not in the last 4s of the slot, and the head slot is n - 1, behind the prepare slot n + 1).
  2. The block arrives on time and we run fork choice again. We issue fcU without payload attributes, and then again with payload attributes because the head is now one slot behind the prepare slot. This yields payloadID#1.
  3. In the state advance timer we run fork choice for slot n + 1 at 9s into slot n. We issue fcU again first without and then with payload attributes. It would seem that the EL would be justified in returning a different payload ID, payloadID#2 here, which we would seemingly use to overwrite payloadID#1 here? If so, I think this is sub-optimal, as it gives the EL less time to prepare a payload. This issue if it exists also exists in the current flow where we run fork choice before the state advance.
  4. We run fork choice at the start of slot n. We do not send any payload attributes because we're too early in the slot, and the prepare_slot is now two slots ahead of the head (= n + 2).

Aside from the 2nd payload ID issue, I think this case looks good: we've added just 1 extra fcU relative to what we have currently.

Case 2: Skip slot at n

[n - 1] <-- _ <-- [n + 1]

Same as Case 1 above, except that we don't issue any fcU for the block arriving (because it doesn't). We'll issue an fcU with payload attributes at 9s into slot n, and then propose a payload based on that.

Case 3: Re-org from n to n - 1

[n - 2] <-- [n - 1] <--  _  <-- [n + 1]
      ^---------------- [n]

This is the case this PR is designed to help with.

  1. At the start of slot n we issue fcU with n - 1 as head. No payload attributes.
  2. Block n arrives and becomes head, we issue fcU with n as head, and we issue a second fcU with payload attributes for a payload building atop n.
  3. In the state advance we run fork choice taking into account the slot n attestations and we re-org to slot n - 1 based on their weight. We issue two new fcU messages for n - 1, including one with payload attributes for building atop n - 1 (because we're later than 8s in the slot). The payload ID cache is keyed by the head block hash, timestamp and other parameters which prevent a collision with the payload ID from step 2.
  4. We issue another fcU at the start of slot n + 1 with n - 1 as head. No payload attributes.
  5. We propose a block upon n - 1 at slot n + 1 using the payload prepared 3+ seconds earlier in step 3.

Case 4: Re-org from n - 1 to n

This is the same as Case 4 except that there's no re-org in step 3, so we re-issue fcUs for the same head at slot n. This suffers from the same payload-forgetfulness issue as Case 1 because we get two payload IDs for building atop block n.

Case 5: Late block in slot n

With proposer boost re-org:

[n - 1] <--  _  <-- [n + 1]
      ^---- [n]

Without proposer boost re-org:

[n - 1] <-- [n] <-- [n + 1]
  1. Start of slot n, issue fcU for n - 1 as usual. No payload attributes.
  2. State advance runs before the block arrives: issue fcU for n - 1 again, with payload attributes to build atop n - 1.
  3. Block n arrives and becomes head (regardless of proposer boost). Issue fcU for n with payload attributes to build atop n.
  4. Fork choice runs at the start of slot n + 1 and block n is still head, so we issue another fcU without payload attributes for n. As the proposer we then have a choice between:
    a) Re-org out the block at slot n using proposer boost ([Merged by Bors] - Enable proposer boost re-orging #2860). We fetch the payload ID for the parent block at slot n - 1 that was prepared in step 2.
    b) Don't re-org out the block at slot n. We fetch the payload ID for the parent block n that was prepared in step 3.

Both variants of this case seem fine as well.

@paulhauner paulhauner self-requested a review May 17, 2022 06:18
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels May 17, 2022
@michaelsproul
Copy link
Member Author

In the last week there was one instance of a warning from this branch being tripped on Prater, and it was on the notoriously flaky Singapore node while it was overwhelmed doing a bunch of other stuff:

May 16 14:57:48 prater-sin-bv0 lighthouse-bn[2945737]: May 16 14:57:48.258 WARN Timed out waiting for fork choice before proposal, message: this block may be orphaned, service: beacon
May 16 14:57:55 prater-sin-bv0 lighthouse-bn[2945737]: May 16 14:57:55.170 CRIT Block was broadcast too late root: 0xb84dc63c75611188390bc4d1fd043e7e01a1f998f00ccdd712c88e8adfde0dab, slot: 3017089, delay_ms: 5225, msg: system may be overloaded, block likely to be orphaned

Even so, the timeout was effective, and the block managed to become part of the canonical chain: https://prater.beaconcha.in/block/3017089

@michaelsproul
Copy link
Member Author

Based on out-of-band discussions with @paulhauner I've just pushed two commit 06908ad and 9808754 which move the preemptive fork choice run to 500ms before the start of the next slot, rather than 3s before. This has the following advantages:

  • Allows block proposals up to 500ms before the start of the slot. Previously such block proposals would skip waiting for fork choice and log a warning. Now they'll only log a warning if they time out waiting for fork choice (i.e. they're made more than 750ms before their slot).
  • Defensively reduces the time during which fork choice is "time warped" (ahead of the wall-clock slot).

The disadvantages are:

  • Slightly increased complexity in the state advance timer.
  • If fork choice takes 500ms+ then it impacts timely block proposal more than if it takes 500ms+ at -3s. This seems unlikely and we should probably make sure fork choice doesn't ever take much longer than 500ms. The timeout from block proposal also puts a cap on how late it can be.
  • On merge networks, we may give the EL less time to prepare a payload in case of a re-org.

In commit 06908ad I also fixed an issue that occurs when the state advance runs over the slot during which it's started, which would cause it to skip the following slot and wait for the next one. I'm not strongly attached to this change and would be happy to revert the offset to its previous value which assumes that the state advance can run to completion before the end of the slot.

Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels May 20, 2022
@paulhauner
Copy link
Member

bors r+

bors bot pushed a commit that referenced this pull request May 20, 2022
## Issue Addressed

Upcoming spec change ethereum/consensus-specs#2878

## Proposed Changes

1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

## Additional Info

### Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

### Attestation Accuracy

This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

### Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

### Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

### Implementation

Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
@bors
Copy link

bors bot commented May 20, 2022

Build failed (retrying...):

bors bot pushed a commit that referenced this pull request May 20, 2022
## Issue Addressed

Upcoming spec change ethereum/consensus-specs#2878

## Proposed Changes

1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

## Additional Info

### Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

### Attestation Accuracy

This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

### Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

### Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

### Implementation

Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
@michaelsproul
Copy link
Member Author

bors r-

gotta fix udeps

@bors
Copy link

bors bot commented May 20, 2022

Canceled.

@michaelsproul
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request May 20, 2022
## Issue Addressed

Upcoming spec change ethereum/consensus-specs#2878

## Proposed Changes

1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

## Additional Info

### Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

### Attestation Accuracy

This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

### Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

### Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

### Implementation

Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
@bors bors bot changed the title Run fork choice before block proposal [Merged by Bors] - Run fork choice before block proposal May 20, 2022
@bors bors bot closed this May 20, 2022
@michaelsproul michaelsproul deleted the fork-choice-before-proposal branch May 20, 2022 07:03
bors bot pushed a commit that referenced this pull request May 20, 2022
*This PR was adapted from @pawanjay176's work in #3197.*

## Issue Addressed

Fixes a regression in #3168

## Proposed Changes

#3168 added calls to `fork_choice` in  `BeaconChain::per_slot_task` function. This leads to a panic as `per_slot_task` is called from an async context which calls fork choice, which then calls `block_on`.

This PR changes the timer to call the `per_slot_task` function in a blocking thread.

Co-authored-by: Pawan Dhananjay <[email protected]>
bors bot pushed a commit that referenced this pull request May 25, 2022
## Issue Addressed

Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync:

```
May 24 23:06:40.542 WARN Error signalling fork choice waiter     slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon
```

This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync.

Additionally, these parallel fork choice runs were causing issues in the database:

```
May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon
May 24 07:49:05.101 WARN Database migration failed               error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon
```

In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ :flushed:. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (#3175).

## Proposed Changes

* Don't run per-slot fork choice during sync (if head is older than 4 slots)
* Don't run state-advance fork choice during sync (if head is older than 4 slots)
* Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants