Skip to content

Commit

Permalink
kv: add to replicaGCQueue in replicaMsgAppDropper, not gcQueue
Browse files Browse the repository at this point in the history
Fixes #73838.

This commit is the first of the three "next steps" identified in #73838.
It fixes a case where we were accidentally adding a replica to the wrong
queue. When dropping a `MsgApp` in `maybeDropMsgApp`, we want to GC the
replica on the LHS of the split if it has been removed from its range.
However, we were instead passing it to the MVCC GC queue, which was both
irrelevant and also a no-op because the LHS was not the leaseholder.

It's possible that we have seen the effects of this in roachtests like
`splits/largerange`. This but could have delayed a snapshot to the RHS
of a split for up to `maxDelaySplitTriggerTicks * 200ms = 20s` in some
rare cases. We've seen the logs corresponding to this issue in a few
tests over the past year: https://github.com/cockroachdb/cockroach/issues?q=is%3Aissue+%22would+have+dropped+incoming+MsgApp+to+wait+for+split+trigger%22+is%3Aclosed.
  • Loading branch information
nvanbenschoten committed Dec 20, 2021
1 parent 7bd8974 commit f11f912
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions pkg/kv/kvserver/client_merge_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2756,7 +2756,7 @@ func TestStoreRangeMergeSlowUnabandonedFollower_WithSplit(t *testing.T) {
t.Fatal(pErr)
}

// Now split the newly merged range splits back out at exactly the same key.
// Now split the newly merged range back out at exactly the same key.
// When the replica GC queue looks in meta2 it will find the new RHS range, of
// which store2 is a member. Note that store2 does not yet have an initialized
// replica for this range, since it would intersect with the old RHS replica.
Expand All @@ -2769,7 +2769,7 @@ func TestStoreRangeMergeSlowUnabandonedFollower_WithSplit(t *testing.T) {
tc.RemoveVotersOrFatal(t, lhsDesc.StartKey.AsRawKey(), tc.Target(2))

// Transfer the lease on the new RHS to store2 and wait for it to apply. This
// will force its replica to of the new RHS to become up to date, which
// will force its replica of the new RHS to become up to date, which
// indirectly tests that the replica GC queue cleans up both the LHS replica
// and the old RHS replica.
tc.TransferRangeLeaseOrFatal(t, *newRHSDesc, tc.Target(2))
Expand Down
4 changes: 2 additions & 2 deletions pkg/kv/kvserver/split_trigger_helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ func (rd *replicaMsgAppDropper) ShouldDrop(startKey roachpb.RKey) (fmt.Stringer,
if lhsRepl == nil {
return nil, false
}
lhsRepl.store.gcQueue.AddAsync(context.Background(), lhsRepl, replicaGCPriorityDefault)
lhsRepl.store.replicaGCQueue.AddAsync(context.Background(), lhsRepl, replicaGCPriorityDefault)
return lhsRepl, true
}

Expand All @@ -48,7 +48,7 @@ type msgAppDropper interface {

// maybeDropMsgApp returns true if the incoming Raft message should be dropped.
// It does so if the recipient replica is uninitialized (i.e. has no state) and
// is waiting for a split trigger to apply,in which case delivering the message
// is waiting for a split trigger to apply,in which case delivering the message
// in this situation would result in an unnecessary Raft snapshot: the MsgApp
// would be rejected and the rejection would prompt the leader to send a
// snapshot, while the split trigger would likely populate the replica "for
Expand Down

0 comments on commit f11f912

Please sign in to comment.