Skip to content
Kinshuk Bairagi edited this page Dec 16, 2022 · 11 revisions

dkv is designed with following principles, capabilities:

  • Ability to distribute copies of data across cluster members. This data distribution is for two primary purposes:
    • Improve data durability guarantees by having multiple copies of data, made consistent using consensus protocols. dkv uses Raft via the Nexus project for data replication across DCs.
    • Scale read throughout of data by replicating to multiple Follower(s)(typically across DCs) that are used for reads or promoted to Leader, and Slave(s) used primarily to serve reads.
  • A simplified runtime that combines into a single OS process concerns like Service interface, storage, replication mechanism (using distributed consensus).
  • Ability to incrementally add additional features like Sharding, Data Structures - e.g. queues, as the system evolves.
  • Efficient API interface for remote clients(using gRPC) and network abstraction sidecar processes(using Envoy).
  • Support expansion of an existing cluster by adding new dkv nodes, restore data on existing nodes from backups (with some caveats on consistency)
  • Provide API interface for performing most commonly used KV operations such as - Put, Get, Iterate.

Data flow design for Put, Get requests

The below diagram depicts a cluster setup across 2 DCs simulating 3 availability zones that supports Linearizable, Sequential consistent reads . Also demonstrates data replication within a DC using the CDC(Change Data Capture) mechanism to support scaling of data reads offering Eventual consistency.

design