Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Epoch Extensions in HotStuff Committee #6154

Merged
merged 36 commits into from
Jul 9, 2024

Conversation

jordanschalm
Copy link
Member

@jordanschalm jordanschalm commented Jun 25, 2024

This PR updates the Consensus Committee to support epoch extensions. Addresses #5730.

Changes

  • Modifies protocol.Epoch FinalView method to account for any extensions.
  • Modifies committees.Consensus to handle EpochExtended protocol event.
    When we observe an epoch extension, we simply re-compute the leader selection for the entire epoch, including the new view range added by the extension. This is slightly wasteful, incurring a cost of around 20-30ms per extension on Mainnet, but simplifies the business logic and state quite a bit compared to tracking each extension independently.
  • Moves internal event processing to a shared channel
    Now that we can move into and out of EFM several times over the lifecycle of the component, it is easier to reason about correctness if we use a shared channel for events to enforce in-order processing. The event consumer function prepares a (potentially blocking) event handler function to be executed and passes the closure into the channel. The worker goroutine pulls event handler closures off the channel and executes them in order.

@codecov-commenter
Copy link

codecov-commenter commented Jun 25, 2024

Codecov Report

Attention: Patch coverage is 61.15702% with 47 lines in your changes missing coverage. Please review.

Project coverage is 41.70%. Comparing base (9c3ed48) to head (43754d1).

Files Patch % Lines
...nsensus/hotstuff/committees/consensus_committee.go 74.39% 16 Missing and 5 partials ⚠️
consensus/hotstuff/committees/leader/consensus.go 0.00% 13 Missing ⚠️
state/protocol/inmem/epoch.go 31.25% 10 Missing and 1 partial ⚠️
state/protocol/badger/snapshot.go 77.77% 2 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                    @@
##           feature/efm-recovery    #6154      +/-   ##
========================================================
+ Coverage                 41.68%   41.70%   +0.01%     
========================================================
  Files                      1975     1975              
  Lines                    139306   139258      -48     
========================================================
- Hits                      58072    58071       -1     
+ Misses                    75182    75139      -43     
+ Partials                   6052     6048       -4     
Flag Coverage Δ
unittests 41.70% <61.15%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jordanschalm jordanschalm marked this pull request as ready for review June 27, 2024 18:58
@jordanschalm jordanschalm requested review from durkmurder, AlexHentschel and kc1116 and removed request for durkmurder June 27, 2024 18:58
@jordanschalm jordanschalm changed the title WIP: Support Epoch Extensions in HotStuff Committee Support Epoch Extensions in HotStuff Committee Jun 27, 2024
Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

started to take a look, but didn't have much time. First set of comments.

Copy link
Member

@durkmurder durkmurder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, nicely done!

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great work. Enjoyed reading the PR. Very clean code and detailed, clear documentation 👏. My comments are mostly suggestions for further refining code clarity, comments and documentation. Structurally, I'd like to suggest the following revisions

  1. reduce surface for inconsistent inputs for method epochInfo.recomputeLeaderSelectionForExtendedViewRange as described in my comment right below
  2. I think adding the sanity check as described here would be good
  3. testing a broader range of views for the committee.Consensus (see comment here, here, here and here)
  4. I have major reservations about including EpochExtensions in the setupEpoch as explained in this comment
    • While I think it is beyond the scope of this PR, I would like to explore if we can remove setupEpoch entirely (also explained in this comment). Could you please make sure an issue exists and add a links to my comments (i), (ii), (iii), (iv)

consensus/hotstuff/committees/consensus_committee.go Outdated Show resolved Hide resolved
consensus/hotstuff/committees/consensus_committee.go Outdated Show resolved Hide resolved
consensus/hotstuff/committees/consensus_committee.go Outdated Show resolved Hide resolved
consensus/hotstuff/committees/consensus_committee.go Outdated Show resolved Hide resolved
consensus/hotstuff/committees/consensus_committee.go Outdated Show resolved Hide resolved
state/protocol/badger/mutator_test.go Show resolved Hide resolved
Comment on lines +45 to 52
// setupEpoch is an implementation of protocol.Epoch backed by an EpochSetup service event.
// Includes any extensions which have been included as of the reference block.
// This is used for converting service events to inmem.Epoch.
type setupEpoch struct {
// EpochSetup service event
setupEvent *flow.EpochSetup
extensions []flow.EpochExtension
}
Copy link
Member

@AlexHentschel AlexHentschel Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ I find this a problematic structure, because it is inconsistent with the protocol definition: only committed epochs can have extensions. Previously (without extensions), our model was quite a bit simpler. But now with extension driving up the mental complexity I think it becomes very important to have a clear consistent implementation.

In my opinion, setupEpoch simply cannot be a valid representation of protocol.Epoch as it misses a bunch of the important information. We have the convention that an epoch essentially only starts to exist after the EpochCommit Service event has been finalized. I think we had the plan to remove any exposure of uncommitted epochs from the major APIs. Though, NewSetupEpoch is still used and returned in some of the broadly-used interfaces.

  • For the scope of this PR, I would not include extensions in the setupEpoch because it further increases the inconsistency of protocol definition and implementation. Instead, I would include the extensions into committedEpoch and override the FinalView() method inherited from setupEpoch.
  • Not sure if we have already an issue for removing setupEpoch as a return value from the EpochQuery API and only return epochs once they are committed. Is such issue already exists, maybe add a reference to this comment. If not, would you mind creating such issue please? 🙏 🙇

Copy link
Member Author

@jordanschalm jordanschalm Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only committed epochs can have extensions

👍 Agree, will move the extensions to committedEpoch.

We have the convention that an epoch essentially only starts to exist after the EpochCommit Service event has been finalized

Epoch APIs cannot entirely avoid exposing data for un-committed epochs, because the node software requires access to a subset of un-committed epoch data in order to commit that epoch, by completing the DKG and generating root cluster QCs.

There are other ways we can improve the epoch APIs, but we will need something similar to the progressive disclosure of setupEpoch and committedEpoch. For example, we could hide some info from the EpochSetup event until the epoch is committed (like the view range), and we could use different interface types to represent the two states.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this tech debt issue as a place to capture these ideas for improving the API: #6191

}

func (eq Epochs) Next() protocol.Epoch {
switch eq.entry.EpochPhase() {
case flow.EpochPhaseStaking, flow.EpochPhaseFallback:
return invalid.NewEpoch(protocol.ErrNextEpochNotSetup)
case flow.EpochPhaseSetup:
return NewSetupEpoch(eq.entry.NextEpochSetup)
return NewSetupEpoch(eq.entry.NextEpochSetup, eq.entry.NextEpoch.EpochExtensions)
Copy link
Member

@AlexHentschel AlexHentschel Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this should yield an error ErrNextEpochNotCommitted

return invalid.NewEpoch(protocol.ErrNextEpochNotCommitted)

Though, this would require an update of the EpochQuery API ... so maybe add a todo to the issue is mentioned in my previous comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 248 to +249
// No errors are expected during normal operations.
func NewSetupEpoch(setupEvent *flow.EpochSetup) protocol.Epoch {
func NewSetupEpoch(setupEvent *flow.EpochSetup, extensions []flow.EpochExtension) protocol.Epoch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this method should be deprecated for the scope of this Pr, including a TODO to remove it entirely

Suggested change
// No errors are expected during normal operations.
func NewSetupEpoch(setupEvent *flow.EpochSetup) protocol.Epoch {
func NewSetupEpoch(setupEvent *flow.EpochSetup, extensions []flow.EpochExtension) protocol.Epoch {
//
// DEPRECATED and TO BE REMOVED, because setupEpoch simply cannot be a valid representation of protocol.Epoch
// as it misses a bunch of the important information. We have updated our convention such that an epoch
// essentially only starts to exist after the EpochCommit Service event has been finalized. The plan is to
// remove any exposure of uncommitted epochs from the `EpochQuery` API.
//
// No errors are expected during normal operations.
func NewSetupEpoch(setupEvent *flow.EpochSetup) protocol.Epoch {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -419,12 +419,12 @@ func (q *EpochQuery) Next() protocol.Epoch {
// if we are in setup phase, return a SetupEpoch
nextSetup := entry.NextEpochSetup
if phase == flow.EpochPhaseSetup {
return inmem.NewSetupEpoch(nextSetup)
return inmem.NewSetupEpoch(nextSetup, entry.NextEpoch.EpochExtensions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would suggest adding TODO to indicate that epochs will only be exposed once they are committed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jordanschalm jordanschalm merged commit edc301d into feature/efm-recovery Jul 9, 2024
55 checks passed
@jordanschalm jordanschalm deleted the jord/5730-hotstuff-committee branch July 9, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants