kvserver: track bytes size of raft receive queue #82144
Labels
A-kv-observability
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
sync-me
sync-me-5
In #80155 we observed OOMs below raft. It has been diffcult diagnosing these issues due to a lack of observability. We should add metrics about the state of the raft queue to help understand how utilized the queues are.
cr.store.raft.rcvd.queued_bytes: gauge (sum of size of all entries waiting to be handed to raft)
cr.store.raft.rcvd.stepped_bytes: counter (sum of size of all entries handed to RawNode.Step)
cr.store.raft.rcvd.dropped_bytes: counter (sum of size of all entries that were dropped because recv queue filled up)
Jira issue: CRDB-16464
The text was updated successfully, but these errors were encountered: