kvserver: re-evaluate raft log truncation heuristics #75802
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Is your feature request related to a problem? Please describe.
The raft log truncation heuristics date back to a world in which we were very aggressive about truncations (since the log needed to be included in snapshots and could make them very large) while simultaneously being very defensive (if a snapshot is in flight, we will never cut it off from the log).
We just saw (internal link) that these heuristics could be problematic, and that some of the thresholds were not adjusted when we changed our default range size from 64mb to 512mb.
We should revisit all of these heuristics and, through thought and experimentation, adjust them with the goal of avoiding pathological build-up of raft log as well as "unnecessary" rejection of snapshots
Additional context
#36262 also proposes allowing followers to truncate "locally" (i.e. without being prompted to truncated by the leader), in which case they would use different (simpler) heuristics than the leader.
Jira issue: CRDB-12843
Epic CRDB-39898
The text was updated successfully, but these errors were encountered: