You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a lot of (say, size-based) splits happen alongside replica movement, one can end up in a situation where lots of replicas need a raft snapshot, and the raft snapshots trickle in very slowly essentially due to "keyspace contention" between "stale" replicas (that haven't caught up across all splits) and "new" snapshots (that reflect all splits).
Fundamentally this is because splitting a range for which a follower needs a snapshot results in two ranges for which a follower needs a snapshot, but the snapshot for the right hand side can only go through once the snapshot for the left-hand side has. This interdependence between snapshots which is not visible to the raft snapshot queue causes the build-up of snapshots to be processed very slowly, especially once it has ballooned to hundreds of snapshots of backlog, which can easily happen with enough splits. Additionally, if the snapshots involved are large, there are additional pathologies such as the lease changing hands while snapshots are still in flight1, resulting in wasted work.
To Reproduce
Not sure how to reliably trigger this. These kinds of issues have kept us busy for a long time2, usually stressing a test that suitably combines rebalancing and splits while verifying that there aren't any raft snaps is enough to see these kinds of interactions.
Expected behavior
The goal should be that the only raft snapshots we ever see are due to log truncations (in which case we may wonder if the log truncation heuristics could be improved, but this is outside of the scope of this issue).
Describe the problem
See the detailed analysis in #104588.
When a lot of (say, size-based) splits happen alongside replica movement, one can end up in a situation where lots of replicas need a raft snapshot, and the raft snapshots trickle in very slowly essentially due to "keyspace contention" between "stale" replicas (that haven't caught up across all splits) and "new" snapshots (that reflect all splits).
Fundamentally this is because splitting a range for which a follower needs a snapshot results in two ranges for which a follower needs a snapshot, but the snapshot for the right hand side can only go through once the snapshot for the left-hand side has. This interdependence between snapshots which is not visible to the raft snapshot queue causes the build-up of snapshots to be processed very slowly, especially once it has ballooned to hundreds of snapshots of backlog, which can easily happen with enough splits. Additionally, if the snapshots involved are large, there are additional pathologies such as the lease changing hands while snapshots are still in flight1, resulting in wasted work.
To Reproduce
Not sure how to reliably trigger this. These kinds of issues have kept us busy for a long time2, usually stressing a test that suitably combines rebalancing and splits while verifying that there aren't any raft snaps is enough to see these kinds of interactions.
Expected behavior
The goal should be that the only raft snapshots we ever see are due to log truncations (in which case we may wonder if the log truncation heuristics could be improved, but this is outside of the scope of this issue).
Jira issue: CRDB-30091
Epic CRDB-39952
Footnotes
this happens a few times in roachtest: splits/largerange/size=32GiB,nodes=6 failed [raft snaps; needs #106813] #104588 but I'm not sure why. ↩
see https://cockroachlabs.atlassian.net/wiki/spaces/CORE/pages/64749670/Raft+Snapshots+and+why+you+see+them+when+you+oughtn+t (internal) ↩
The text was updated successfully, but these errors were encountered: