You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optimize storage and convergence time in causal CmRDTs
Context
Eventual consistency
Unlike Strongly Consistent systems, Eventually Consistent (EC) systems don't require synchronization between peers in order to modify data. Changes can be done locally or to a small number of replicas and then asynchronously replicated to others, eventually reaching them. These systems are more resistant to network partitions, and thus are suited to being used in decentralized environments where connectivity can be low. The drawback is that, in a given point in time, data is not guaranteed to be synchronized between peers.
Strong Eventual Consistency
Strong Eventual Consistency (SEC) is a stronger type of EC where some properties are guaranteed: when all the replicas have received all the messages independently of their order, they are guaranteed to reach the same state.
Under these constraints, data is still not guaranteed to be equal in all replicas in a given point in time, but data is guaranteed to eventually converge and to be monotonic (a replica never undoes a change). Conflict-free Replicated Data Types are a data type that guarantees Strong Eventual Consistency.
Conflict-Free Replicated Data Types
Conflict-Free Replicated Data Types (CRDTs) are a class of data types that provides strong eventual consistency guarantees, and which has several interesting properties:
Changes are local: All changes are performed locally, without the need to contact other replicas to guarantee a write. The synchronization is done asynchronously, when and if there is network connectivity.
Determinism: No matter what the order of delivery of messages, replicas are guaranteed to converge. This is particularly suitable in poorly connected networks, while also supporting offline editing.
Operation-based CRDTs
CRDTs can be of two different basic types: State-based (convergent or CvRDTs) and operation-based (commutatite or CmRDTs). State-based CRDTs replicate by transmitting their state, while operation-based CRDTs store and transmit operations that must be transmitted exactly once.
Operation DAG
Operation-based CRDTs that enforce causality tend to form a directed acyclic graph (DAG) of operations. A fork in this operation graph represents a divergence, where two or more replicas performed concurrent operations.
The problem
Intuition
When a network partition occurs, each partition can keep editing the document independently, joining once the network enables them to reconnect and sync the changes to each other. This means that the operation graph can keep growing independently. These changes will be based on a common operation ancester, but will keep diverging. Once the network heals, all partitions will propagate the changes to each other, referring back to the common ancester.
Intuitively, we can say that every replica needs to keep the entire operation history around in case there is an offline replica that needs to replicate some time in the future.
This brings some obvious problems:
Unbounded storage size
In a naive implementation, an operation-based CRDT requires an unbounded storage size. This can be a problem for long-running CRDTs, since, although the state size can be bound, the size of the storage required for the operations is not.
Slow join and catch-up
For new replicas that enter a CRDT that has a long history, all these operations have to be sent across the network. Convergence time for newly-joined fresh replicas is then proportional to the size of the history, which is not ideal for long-running CRDTs. This problem is also present in CRDTs that have been offline for a while and need to catch up with the latest operations in other replicas — and replicate the new operations, — where the time it takes to for a replica to catch up with another is proportional to the size of the new operations that need to be transmitted.
Hard requirements
Reduce the time a new replica takes to sync the state by optimizing the amount of messages and their total payload size, without compromising system security.
Keep in mind that replicas can be offline for a long period of time, creating local operations that will be later replicated some time in the future. The ability for other replicas to accept these changes should not be compromised.
(If necessary, define upper bounds for the amount of offline time / divergence this system can sustain).
Soft requirements
Enable defining an upper bound for the total size of local storage (dedicated to storing operations) required for each replica.
Open problems
In any causal CmRDT, is there a protocol that can optimize the time required to synchronize any fresh replica without compromising eventual changes in offline replicas?
In any causal CmRDT, is there a protocol that can optimize the time required for synchronizing any two replicas that have strongly diverged without compromising changes in offline replicas?
In any causal CmRDT, is there a protocol that allows defining an upper bound for the required total size of local storage dedicated to operations without compromising changes in offline replicas?
Hi @pgte, thank you for your input! 🚀 This discussion was pointed to a RFP discussion, so we are now closing the issue. Feel free to reopen it in the future if you want to restart the conversation on this topic, and please keep sharing your ideas with us.
Optimize storage and convergence time in causal CmRDTs
Context
Eventual consistency
Unlike Strongly Consistent systems, Eventually Consistent (EC) systems don't require synchronization between peers in order to modify data. Changes can be done locally or to a small number of replicas and then asynchronously replicated to others, eventually reaching them. These systems are more resistant to network partitions, and thus are suited to being used in decentralized environments where connectivity can be low. The drawback is that, in a given point in time, data is not guaranteed to be synchronized between peers.
Strong Eventual Consistency
Strong Eventual Consistency (SEC) is a stronger type of EC where some properties are guaranteed: when all the replicas have received all the messages independently of their order, they are guaranteed to reach the same state.
Under these constraints, data is still not guaranteed to be equal in all replicas in a given point in time, but data is guaranteed to eventually converge and to be monotonic (a replica never undoes a change). Conflict-free Replicated Data Types are a data type that guarantees Strong Eventual Consistency.
Conflict-Free Replicated Data Types
Conflict-Free Replicated Data Types (CRDTs) are a class of data types that provides strong eventual consistency guarantees, and which has several interesting properties:
Operation-based CRDTs
CRDTs can be of two different basic types: State-based (convergent or CvRDTs) and operation-based (commutatite or CmRDTs). State-based CRDTs replicate by transmitting their state, while operation-based CRDTs store and transmit operations that must be transmitted exactly once.
Operation DAG
Operation-based CRDTs that enforce causality tend to form a directed acyclic graph (DAG) of operations. A fork in this operation graph represents a divergence, where two or more replicas performed concurrent operations.
The problem
Intuition
When a network partition occurs, each partition can keep editing the document independently, joining once the network enables them to reconnect and sync the changes to each other. This means that the operation graph can keep growing independently. These changes will be based on a common operation ancester, but will keep diverging. Once the network heals, all partitions will propagate the changes to each other, referring back to the common ancester.
Intuitively, we can say that every replica needs to keep the entire operation history around in case there is an offline replica that needs to replicate some time in the future.
This brings some obvious problems:
Unbounded storage size
In a naive implementation, an operation-based CRDT requires an unbounded storage size. This can be a problem for long-running CRDTs, since, although the state size can be bound, the size of the storage required for the operations is not.
Slow join and catch-up
For new replicas that enter a CRDT that has a long history, all these operations have to be sent across the network. Convergence time for newly-joined fresh replicas is then proportional to the size of the history, which is not ideal for long-running CRDTs. This problem is also present in CRDTs that have been offline for a while and need to catch up with the latest operations in other replicas — and replicate the new operations, — where the time it takes to for a replica to catch up with another is proportional to the size of the new operations that need to be transmitted.
Hard requirements
Reduce the time a new replica takes to sync the state by optimizing the amount of messages and their total payload size, without compromising system security.
Keep in mind that replicas can be offline for a long period of time, creating local operations that will be later replicated some time in the future. The ability for other replicas to accept these changes should not be compromised.
(If necessary, define upper bounds for the amount of offline time / divergence this system can sustain).
Soft requirements
Enable defining an upper bound for the total size of local storage (dedicated to storing operations) required for each replica.
Open problems
Further reading
The text was updated successfully, but these errors were encountered: