Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic Protocol State] KV store state machine #5513

Conversation

durkmurder
Copy link
Member

@durkmurder durkmurder commented Mar 5, 2024

#5312

Context

This PR implements a dedicated state machine in context of Dynamic Protocol State to process service events that update the KV store.

This PR relies on previous PR created by Jordan that introduces new service event type for upgrading protocol state version.

  • I've added processing of the upgrade event by the state machine.
  • Additionally I had to extend model interface to return VersionUpgrade and be able to set it.
  • All models MUST support this notion to have the KV store upgradable.

I've took the liberty to update package structure to avoid nasty import cycles and to increase modularity in overall.

⚠️
I've made a debatable choice of using EpochCommitSafetyThreshold as threshold for activation of upgrade event. I feel this is legit since it serves the same purpose but maybe we should change the naming then. Happy to use a separate param as well.

jordanschalm and others added 30 commits February 20, 2024 12:20
Change to encode/decode passes tests, still requires cleanup:
- adds encode/decode for new event type
- adds tests for new event type
- changes how decoding is implemented to prevent a bug. Previously we
  would re-encode, then re-decode the entire service event, after first
  decoding it into a weakly typed map. In particular with JSON, this
  could cause the data to change between encode/decode cycles, I think because
  of numeric type sizes.
…version upgarde fields. Reflected that in API and Reader
Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ver nice work. Mostly comments regarding documentation. My only design concern is regarding the KVStoreAPI's Clone method. Added a detailed comment and suggestion here.

state/protocol/protocol_state/epochs/base_statemachine.go Outdated Show resolved Hide resolved
state/protocol/protocol_state/state/mutator.go Outdated Show resolved Hide resolved
state/protocol/protocol_state/kvstore.go Outdated Show resolved Hide resolved
state/protocol/protocol_state/kvstore.go Outdated Show resolved Hide resolved
state/protocol/protocol_state/kvstore.go Outdated Show resolved Hide resolved
state/protocol/protocol_state/kvstore.go Outdated Show resolved Hide resolved
Comment on lines 68 to 70
// Clone returns a deep copy of the KVStoreAPI.
// This is used to create a new instance of the KVStoreAPI to avoid mutating the original.
Clone() KVStoreAPI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️

Where do the version upgrades happen, i.e. switching from a prior data model / protocol version to a newer? For example, consider the following scenario:

  • We are the leader constructing the block at ActivationView. So the parent's protocol state is still the old data model. But we are now switching to the newer data model.

  • Lets assume the new protocol version / data model adds a new filed X that wasn't there before.

  • I get the intend: we are copying the parent block's protocol state 👍

    • However, to "clone" generally means to create an identical copy. In other words, when we clone the parent state, we should get (according to the name) and interface backed by the prior data model that still does not know X
    • But now, we are to generate a Protocol State Snapshot with the new data model / new protocol version that can have a value for X

    So where is that conversion happening?

  • A further complicating factor is that various state machines might need to have their data converted at the same view.

Suggestions

  • I think we need a more generalized concept compared to "clone". In my head it is something like the following:

    • Given a Protocol State Snapshot of version v (represented here abstractly as through the KVStoreAPI interface), we want the ability to create a new Protocol State Snapshot of version w. Lets use the terminology of "Parent Snapshot" and "Child Snapshot" for now.
    • In the most simple case, we desire all existing entries in Child Snapshot to have the same value as the Parent Snapshot.
    • BUT, there could be data conversion necessary in some cases.
    • Lets assume that value X is required in the new protocol version, and we maybe can't just use a fixed constant. Instead, we need to compute something to get a legitimate starting value for X

    In my head, this sounds much more like a "replication" than a "clone". If my understanding of the word "replication" is correct, they are very similar in their outwards form and shape but can differ by minor internal details ... exactly the situation we have here.

  • Happy with other suggestions for terminology, but in absence of a better term I would suggest to name this method Replicate

  • As I have explained, I think we need to represent the concept of changing from one protocol version to a newer one. So "create a protocol state snapshot of a newer protocol version (i.e. the child) and replicate the content of the parent snapshot into it".

    • The version of the protocol state data model should be an input to the Replicate method. Something like
      Replicate(protocolVersion uint64)
    • I would like to accommodate scenarios where the Replicate step still requires further inputs, or work, or mutations to result in the valid Protocol State Snapshot according to the new protocol version. In other words, I would like the code to be very explicit that the return value from Replicate call is not necessarily valid as is. We generally do this though builder pattern.

In summary, my suggestion would look something like the following (tried to already add documentation, so its less work for you):

// KVStoreAPI is the latest read-writer interface to the Protocol State key-value store.
// It is backward-compatible with all versioned model types defined in the models.go
// for this software version.
type KVStoreAPI interface {
	Reader

	// Replicate instantiates a Protocol State Snapshot of the given `protocolVersion`.
	// We reference to the Protocol State Snapshot, whose `Replicate` method is called
	// as the 'Parent Snapshot'.
	// If the `protocolVersion` matches the version of the Parent Snapshot, `Replicate` behaves
	// exactly like a deep copy. If `protocolVersion` is newer, the data model corresponding
	// to the newer version is used and values from the Parent Snapshot are replicated into
	// the new data model. In all cases, the new Snapshot can be mutated without changing the
	// Parent Snapshot.
	//
	// Caution:
	// Since we are potentially converting from the older protocol version of the parent to
	// a newer version, it is possible that the replicated state snapshot is incomplete
	// or otherwise requires further inputs or mutations to result in the valid Protocol
	// State Snapshot according to the new protocol version. Therefore, we represent outcome
	// of the replication as a `KVStoreMutator`
	// Expected errors during normal operations:
	//  - ErrIncompatibleVersionChange if replicating the Parent Snapshot into a Snapshot
	//    with the specified `protocolVersion` is not supported.
	Replicate(protocolVersion uint64) (KVStoreMutator, error)
}

// KVStoreMutator represents a Protocol State Snapshot in an iterim format. It might be
// incomplete or otherwise requires further inputs or mutations to evolve into a valid Protocol
// State Snapshot according to its protocol version.
// After all operations required to complete the conversion have been applied and furthermore all
// Service Events in the current block were processed as well, a valid Protocol State Snapshot
// can be generated via the `Build` method.
type KVStoreMutator interface {
	Reader

	// v0

	// SetVersionUpgrade sets the protocol upgrade version. This method is used
	// to update the Protocol State version when a flow.ProtocolStateVersionUpgrade is processed.
	// It contains the new version and the view at which it has to be applied.
	SetVersionUpgrade(version *ViewBasedActivator[uint64])

	// v1

	// SetInvalidEpochTransitionAttempted sets the epoch fallback mode flag.
	// Errors:
	//  - ErrKeyNotSupported if the key does not exist in the current model version.
	SetInvalidEpochTransitionAttempted(attempted bool) error

	// Build returns the Snapshot of the updated key-value store together with its hash commitment `snapshotID`.
	// The flag `hasChanges` indicates whether there were any changes (version and/or content) compared to its
	// Parent Snapshot.
	// `Build` may apply consistency and other checks:
	//  * In particular we recommend that `Build` enforces that all mandatory new fields have proper values,
	//    in case the Snapshot has a newer protocol version compared to its parent.
	//  * However, `Build` does not necessarily guarantee that only valid Snapshots are returned. It is the
	//    responsibility of the respective state machines to guarantee protocol-compliant evolution of the
	//    Protocol State.
	// Errors:
	//  * MissingValueError if some required value is still missing.
	//  * NoncompliantStateError if some values failed a consistency or sanity check.
	Build() (updatedSnapshot KVStoreAPI, snapshotID flow.Identifier, hasChanges bool, err error)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for detailed explanation and argumentation, the more I read it, the more I tend to get rid of Clone/Replicate as I have implemented it in the first place. I feel like we are mixing two concepts, mutating data and creation process.
I have tried to replicate the same structure as we have with epoch protocol state but it doesn't seem to work out. Meaning the state machine doesn't have enough context to provide a correct version to make a replication with correct version even if it had we are mixing creation pattern and behavior(factory + state machine).

In my mind it works well where the stateMutator acts as a factory and creates a correct versioned KVStoreMutator which is then passed to the state machine. Similarly to what we have with the epoch state machine. This way we separate creation and mutation. This means we don't need Replicate or Clone as part of API and the state machine won't take a copy on it's own, instead it will rely on the instance that will be received via constructor from a component that has needed context to create it.

Lets continue this discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of sync (Yurii, Jordan, Alex, March 12, 2024)

  1. Replicate(..) or Clone must produce valid Snapshot, because in many blocks there are no service events
  2. from 1. it follows that we don't need builder pattern
  3. extending stateMutator to be the orchestrator of all state machines (currently only orchestrates Epoch State machine)
    • stateMutator will call Replicate
    • state machines are applied, they mutate the combined state in-place
    • we Build method in stateMutator where we could add sanity checks

We want to move the entire Epoch data into the KV-store, but that is beyond the current deliverables.

// A separate instance is created for each block that is being processed.
type KeyValueStoreStateMachine interface {
// Build returns updated key-value store model, state ID and a flag indicating if there were any changes.
Build() (updatedState KVStoreReader, stateID flow.Identifier, hasChanges bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, it would be great if the Build method returned a representation of the protocol state snapshot that permits mutations. So specifically KVStoreAPI. For instance, if we cache the results to apply the state-changing events from the next block (one we receive it), we would want them represented via the mutable KVStoreAPI (hope that makes sense).

Suggested change
Build() (updatedState KVStoreReader, stateID flow.Identifier, hasChanges bool)
Build() (updatedState KVStoreAPI, snapshotID flow.Identifier, hasChanges bool)

Base automatically changed from jordan/5351-upgrade-service-event to feature/protocol-state-kvstore March 11, 2024 21:01
@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2024

Codecov Report

Attention: Patch coverage is 80.53097% with 22 lines in your changes are missing coverage. Please review.

Project coverage is 55.97%. Comparing base (4d20066) to head (ba1f763).

Files Patch % Lines
...te/protocol/protocol_state/kvstore/statemachine.go 65.51% 16 Missing and 4 partials ⚠️
state/protocol/protocol_state/kvstore/models.go 93.10% 2 Missing ⚠️
Additional details and impacted files
@@                        Coverage Diff                         @@
##           feature/protocol-state-kvstore    #5513      +/-   ##
==================================================================
- Coverage                           56.02%   55.97%   -0.05%     
==================================================================
  Files                                1025     1019       -6     
  Lines                               98784    98142     -642     
==================================================================
- Hits                                55339    54932     -407     
+ Misses                              39303    39097     -206     
+ Partials                             4142     4113      -29     
Flag Coverage Δ
unittests 55.97% <80.53%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@jordanschalm jordanschalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the diff since our last discussion. Looks good overall -- I have added a suggestion for how to organize the upgrade logic (in Replicate vs in Replicates caller), although I am happy to punt that discussion to the PR where we hook up the state machine.

// Expected errors during normal operations:
// - ErrIncompatibleVersionChange if replicating the Parent Snapshot into a Snapshot
// with the specified `protocolVersion` is not supported.
func (model *modelv0) Replicate(protocolVersion uint64) (protocol_state.KVStoreMutator, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed having Replicate live alongside the models and be an "atomic" process, in the sense that we will strictly return a valid KVStore model from Replicate.

In my mind that constraint includes making sure that a KVStore which has been upgraded to protocol version N does not include a pending VersionUpgrade for protocol version N. However we're not doing that here.

I'm wondering if Replicate should instead receive as input the block view, and internally decide whether an upgrade is needed, based on the VersionUpgrade field?

That way, the higher-level logic would not need to deal with:

  • deciding when the version becomes active
  • removing the old VersionUpgrade following the upgrade

Based on your comment here, I assume your plan is to keep outdated VersionUpgrades in the protocol state until they are replaced. If we choose that route, this comment is mostly irrelevant (we could still choose to pass in view rather than protocolVersion to Replicate).

Overall, I'm happy to merge this as-is and make a final decision when we hook up the state machine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about cleaning VersionUpgrade when we replicate, but the process still waits for trigger. Happy to discuss after. For now leaving as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you moved the epoch-related state machines into the dedicated package epochs.

For clarity of naming, I would suggest to consolidate the names. That might be better to do in a separate PR do reduce conflicts and avoid further growing this PR.

My thoughts:

  • the package hierarchy protocol_state/epochs is already very clear what this logic is about
    • therefore, I think the terms protocol and epoch in the file names and struct names are largely redundant
    • I think the differentiation between happy path and fallback is the most important in this package
  • Specifically, I would suggest the following renaming:
    1. rename file state/protocol/protocol_state/epochs/protocol_statemachine.gohappypath_statemachine.go (similarly the corresponding test file)
    2. rename struct protocolStateMachinehappyPathStateMachine
    3. rename file state/protocol/protocol_state/epochs/epoch_fallback_statemachine.gofallback_statemachine.go (similarly the corresponding test file)
    4. rename struct EpochFallbackStateMachineFallbackStateMachine
    5. the interface ProtocolStateMachine in state/protocol/protocol_state/epochs/base_statemachine.go, I would suggest to rename to EpochStateMachine and move it into its own file statemachine.go
    6. the struct baseProtocolStateMachine I would rename to baseStateMachine
      In short, in all cases, we remove epoch and protocol from the struct and file names, because this is already contained in the package hierarchy

As a side comment:
noticed that EpochFallbackStateMachine struct is exported while protocolStateMachine is not. That is fine, but if we can make it consistent, that would be even better! 😅 (feel free to ignore)

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are on the architecture level. I think this PR is good to go. I would like to iterate on the design to get to more modularity and a clearer representation of the conceptual aspects. I felt a variety of detailed comments (partially on existing discussions).

If we could get this issued up, that would be great. But before we write issues, let's make sure we are on the same page regarding the design outlook.

@durkmurder durkmurder merged commit 6fbc7d6 into feature/protocol-state-kvstore Mar 19, 2024
50 of 51 checks passed
@durkmurder durkmurder deleted the yurii/5312-key-value-store-statemachine branch March 19, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants