-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dynamic Protocol State] KV store state machine #5513
[Dynamic Protocol State] KV store state machine #5513
Conversation
Change to encode/decode passes tests, still requires cleanup: - adds encode/decode for new event type - adds tests for new event type - changes how decoding is implemented to prevent a bug. Previously we would re-encode, then re-decode the entire service event, after first decoding it into a weakly typed map. In particular with JSON, this could cause the data to change between encode/decode cycles, I think because of numeric type sizes.
… Added base interface for KV store.
…m/onflow/flow-go into yurii/5312-key-value-store-statemachine
…m/onflow/flow-go into yurii/5312-key-value-store-statemachine
…version upgarde fields. Reflected that in API and Reader
…hub.com/onflow/flow-go into yurii/5312-key-value-store-statemachine
Co-authored-by: Jordan Schalm <[email protected]>
Co-authored-by: Jordan Schalm <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ver nice work. Mostly comments regarding documentation. My only design concern is regarding the KVStoreAPI
's Clone
method. Added a detailed comment and suggestion here.
// Clone returns a deep copy of the KVStoreAPI. | ||
// This is used to create a new instance of the KVStoreAPI to avoid mutating the original. | ||
Clone() KVStoreAPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓⚠️ ❓
Where do the version upgrades happen, i.e. switching from a prior data model / protocol version to a newer? For example, consider the following scenario:
-
We are the leader constructing the block at
ActivationView
. So the parent's protocol state is still the old data model. But we are now switching to the newer data model. -
Lets assume the new protocol version / data model adds a new filed
X
that wasn't there before. -
I get the intend: we are copying the parent block's protocol state 👍
- However, to "clone" generally means to create an identical copy. In other words, when we clone the parent state, we should get (according to the name) and interface backed by the prior data model that still does not know
X
- But now, we are to generate a Protocol State Snapshot with the new data model / new protocol version that can have a value for
X
So where is that conversion happening?
- However, to "clone" generally means to create an identical copy. In other words, when we clone the parent state, we should get (according to the name) and interface backed by the prior data model that still does not know
-
A further complicating factor is that various state machines might need to have their data converted at the same view.
Suggestions
-
I think we need a more generalized concept compared to "clone". In my head it is something like the following:
- Given a Protocol State Snapshot of version
v
(represented here abstractly as through theKVStoreAPI
interface), we want the ability to create a new Protocol State Snapshot of versionw
. Lets use the terminology of "Parent Snapshot" and "Child Snapshot" for now. - In the most simple case, we desire all existing entries in Child Snapshot to have the same value as the Parent Snapshot.
- BUT, there could be data conversion necessary in some cases.
- Lets assume that value
X
is required in the new protocol version, and we maybe can't just use a fixed constant. Instead, we need to compute something to get a legitimate starting value forX
In my head, this sounds much more like a "replication" than a "clone". If my understanding of the word "replication" is correct, they are very similar in their outwards form and shape but can differ by minor internal details ... exactly the situation we have here.
- Given a Protocol State Snapshot of version
-
Happy with other suggestions for terminology, but in absence of a better term I would suggest to name this method
Replicate
-
As I have explained, I think we need to represent the concept of changing from one protocol version to a newer one. So "create a protocol state snapshot of a newer protocol version (i.e. the child) and replicate the content of the parent snapshot into it".
- The version of the protocol state data model should be an input to the
Replicate
method. Something likeReplicate(protocolVersion uint64)
- I would like to accommodate scenarios where the
Replicate
step still requires further inputs, or work, or mutations to result in the valid Protocol State Snapshot according to the new protocol version. In other words, I would like the code to be very explicit that the return value fromReplicate
call is not necessarily valid as is. We generally do this though builder pattern.
- The version of the protocol state data model should be an input to the
In summary, my suggestion would look something like the following (tried to already add documentation, so its less work for you):
// KVStoreAPI is the latest read-writer interface to the Protocol State key-value store.
// It is backward-compatible with all versioned model types defined in the models.go
// for this software version.
type KVStoreAPI interface {
Reader
// Replicate instantiates a Protocol State Snapshot of the given `protocolVersion`.
// We reference to the Protocol State Snapshot, whose `Replicate` method is called
// as the 'Parent Snapshot'.
// If the `protocolVersion` matches the version of the Parent Snapshot, `Replicate` behaves
// exactly like a deep copy. If `protocolVersion` is newer, the data model corresponding
// to the newer version is used and values from the Parent Snapshot are replicated into
// the new data model. In all cases, the new Snapshot can be mutated without changing the
// Parent Snapshot.
//
// Caution:
// Since we are potentially converting from the older protocol version of the parent to
// a newer version, it is possible that the replicated state snapshot is incomplete
// or otherwise requires further inputs or mutations to result in the valid Protocol
// State Snapshot according to the new protocol version. Therefore, we represent outcome
// of the replication as a `KVStoreMutator`
// Expected errors during normal operations:
// - ErrIncompatibleVersionChange if replicating the Parent Snapshot into a Snapshot
// with the specified `protocolVersion` is not supported.
Replicate(protocolVersion uint64) (KVStoreMutator, error)
}
// KVStoreMutator represents a Protocol State Snapshot in an iterim format. It might be
// incomplete or otherwise requires further inputs or mutations to evolve into a valid Protocol
// State Snapshot according to its protocol version.
// After all operations required to complete the conversion have been applied and furthermore all
// Service Events in the current block were processed as well, a valid Protocol State Snapshot
// can be generated via the `Build` method.
type KVStoreMutator interface {
Reader
// v0
// SetVersionUpgrade sets the protocol upgrade version. This method is used
// to update the Protocol State version when a flow.ProtocolStateVersionUpgrade is processed.
// It contains the new version and the view at which it has to be applied.
SetVersionUpgrade(version *ViewBasedActivator[uint64])
// v1
// SetInvalidEpochTransitionAttempted sets the epoch fallback mode flag.
// Errors:
// - ErrKeyNotSupported if the key does not exist in the current model version.
SetInvalidEpochTransitionAttempted(attempted bool) error
// Build returns the Snapshot of the updated key-value store together with its hash commitment `snapshotID`.
// The flag `hasChanges` indicates whether there were any changes (version and/or content) compared to its
// Parent Snapshot.
// `Build` may apply consistency and other checks:
// * In particular we recommend that `Build` enforces that all mandatory new fields have proper values,
// in case the Snapshot has a newer protocol version compared to its parent.
// * However, `Build` does not necessarily guarantee that only valid Snapshots are returned. It is the
// responsibility of the respective state machines to guarantee protocol-compliant evolution of the
// Protocol State.
// Errors:
// * MissingValueError if some required value is still missing.
// * NoncompliantStateError if some values failed a consistency or sanity check.
Build() (updatedSnapshot KVStoreAPI, snapshotID flow.Identifier, hasChanges bool, err error)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for detailed explanation and argumentation, the more I read it, the more I tend to get rid of Clone/Replicate
as I have implemented it in the first place. I feel like we are mixing two concepts, mutating data and creation process.
I have tried to replicate the same structure as we have with epoch protocol state but it doesn't seem to work out. Meaning the state machine doesn't have enough context to provide a correct version to make a replication with correct version even if it had we are mixing creation pattern and behavior(factory + state machine).
In my mind it works well where the stateMutator
acts as a factory and creates a correct versioned KVStoreMutator
which is then passed to the state machine. Similarly to what we have with the epoch state machine. This way we separate creation and mutation. This means we don't need Replicate
or Clone
as part of API and the state machine won't take a copy on it's own, instead it will rely on the instance that will be received via constructor from a component that has needed context to create it.
Lets continue this discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of sync (Yurii, Jordan, Alex, March 12, 2024)
Replicate(..)
orClone
must produce valid Snapshot, because in many blocks there are no service events- from 1. it follows that we don't need builder pattern
- extending
stateMutator
to be the orchestrator of all state machines (currently only orchestrates Epoch State machine)stateMutator
will callReplicate
- state machines are applied, they mutate the combined state in-place
- we
Build
method instateMutator
where we could add sanity checks
We want to move the entire Epoch data into the KV-store, but that is beyond the current deliverables.
// A separate instance is created for each block that is being processed. | ||
type KeyValueStoreStateMachine interface { | ||
// Build returns updated key-value store model, state ID and a flag indicating if there were any changes. | ||
Build() (updatedState KVStoreReader, stateID flow.Identifier, hasChanges bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, it would be great if the Build
method returned a representation of the protocol state snapshot that permits mutations. So specifically KVStoreAPI
. For instance, if we cache the results to apply the state-changing events from the next block (one we receive it), we would want them represented via the mutable KVStoreAPI
(hope that makes sense).
Build() (updatedState KVStoreReader, stateID flow.Identifier, hasChanges bool) | |
Build() (updatedState KVStoreAPI, snapshotID flow.Identifier, hasChanges bool) |
Co-authored-by: Alexander Hentschel <[email protected]>
…nflow/flow-go into yurii/5312-key-value-store-statemachine
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/protocol-state-kvstore #5513 +/- ##
==================================================================
- Coverage 56.02% 55.97% -0.05%
==================================================================
Files 1025 1019 -6
Lines 98784 98142 -642
==================================================================
- Hits 55339 54932 -407
+ Misses 39303 39097 -206
+ Partials 4142 4113 -29
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…ocol state versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the diff since our last discussion. Looks good overall -- I have added a suggestion for how to organize the upgrade logic (in Replicate
vs in Replicate
s caller), although I am happy to punt that discussion to the PR where we hook up the state machine.
// Expected errors during normal operations: | ||
// - ErrIncompatibleVersionChange if replicating the Parent Snapshot into a Snapshot | ||
// with the specified `protocolVersion` is not supported. | ||
func (model *modelv0) Replicate(protocolVersion uint64) (protocol_state.KVStoreMutator, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed having Replicate
live alongside the models and be an "atomic" process, in the sense that we will strictly return a valid KVStore model from Replicate
.
In my mind that constraint includes making sure that a KVStore which has been upgraded to protocol version N does not include a pending VersionUpgrade
for protocol version N. However we're not doing that here.
I'm wondering if Replicate
should instead receive as input the block view, and internally decide whether an upgrade is needed, based on the VersionUpgrade
field?
That way, the higher-level logic would not need to deal with:
- deciding when the version becomes active
- removing the old
VersionUpgrade
following the upgrade
Based on your comment here, I assume your plan is to keep outdated VersionUpgrade
s in the protocol state until they are replaced. If we choose that route, this comment is mostly irrelevant (we could still choose to pass in view rather than protocolVersion
to Replicate
).
Overall, I'm happy to merge this as-is and make a final decision when we hook up the state machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about cleaning VersionUpgrade
when we replicate, but the process still waits for trigger. Happy to discuss after. For now leaving as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you moved the epoch-related state machines into the dedicated package epochs
.
For clarity of naming, I would suggest to consolidate the names. That might be better to do in a separate PR do reduce conflicts and avoid further growing this PR.
My thoughts:
- the package hierarchy
protocol_state/epochs
is already very clear what this logic is about- therefore, I think the terms
protocol
andepoch
in the file names and struct names are largely redundant - I think the differentiation between happy path and fallback is the most important in this package
- therefore, I think the terms
- Specifically, I would suggest the following renaming:
- rename file
state/protocol/protocol_state/epochs/protocol_statemachine.go
→happypath_statemachine.go
(similarly the corresponding test file) - rename struct
protocolStateMachine
→happyPathStateMachine
- rename file
state/protocol/protocol_state/epochs/epoch_fallback_statemachine.go
→fallback_statemachine.go
(similarly the corresponding test file) - rename struct
EpochFallbackStateMachine
→FallbackStateMachine
- the interface
ProtocolStateMachine
instate/protocol/protocol_state/epochs/base_statemachine.go
, I would suggest to rename toEpochStateMachine
and move it into its own filestatemachine.go
- the struct
baseProtocolStateMachine
I would rename tobaseStateMachine
In short, in all cases, we removeepoch
andprotocol
from the struct and file names, because this is already contained in the package hierarchy
- rename file
As a side comment:
noticed that EpochFallbackStateMachine
struct is exported while protocolStateMachine
is not. That is fine, but if we can make it consistent, that would be even better! 😅 (feel free to ignore)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are on the architecture level. I think this PR is good to go. I would like to iterate on the design to get to more modularity and a clearer representation of the conceptual aspects. I felt a variety of detailed comments (partially on existing discussions).
If we could get this issued up, that would be great. But before we write issues, let's make sure we are on the same page regarding the design outlook.
6fbc7d6
into
feature/protocol-state-kvstore
#5312
Context
This PR implements a dedicated state machine in context of Dynamic Protocol State to process service events that update the KV store.
This PR relies on previous PR created by Jordan that introduces new service event type for upgrading protocol state version.
VersionUpgrade
and be able to set it.I've took the liberty to update package structure to avoid nasty import cycles and to increase modularity in overall.
I've made a debatable choice of using
EpochCommitSafetyThreshold
as threshold for activation of upgrade event. I feel this is legit since it serves the same purpose but maybe we should change the naming then. Happy to use a separate param as well.