kvserver: pass Desc and SpanConf through allocator #108197

andrewbaptist · 2023-08-04T18:31:27Z

Previously the different layers of the allocator would load the Desc and
SpanConf as needed, this had a risk of them changing between various
loads and could cause strange and hard to track down races. Now they are
loaded once and passed through all the layers.

Epic: none

Release note: None

cockroach-teamcity · 2023-08-04T18:31:36Z

This change is

kvoli

Reviewed 13 of 13 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)

pkg/kv/kvserver/replica_command.go line 3994 at r1 (raw file):

			allowLeaseTransfer = true
		}
		desc := r.Desc()

nit: Use DescAndSpanConfig() so that both values are read under the same read lock.

cockroach/pkg/kv/kvserver/replica.go

Lines 1033 to 1039 in 4c39761

    
           // DescAndSpanConfig returns the authoritative range descriptor as well 
        
           // as the span config for the replica. 
        
           func (r *Replica) DescAndSpanConfig() (*roachpb.RangeDescriptor, roachpb.SpanConfig) { 
        
           	r.mu.RLock() 
        
           	defer r.mu.RUnlock() 
        
           	return r.mu.state.Desc, r.mu.conf 
        
           }

Or is that the intention, that in a follow-up PR, SpanConfig calls the store reader? I still think it would be better to read under the same lock here, while the code is transitioning.

pkg/kv/kvserver/replica_command.go line 4016 at r1 (raw file):

	if args.RandomizeLeases && r.OwnsValidLease(ctx, r.store.Clock().NowAsClockTimestamp()) {
		desc := r.Desc()
		conf := r.SpanConfig()

nit: same as above.

pkg/kv/kvserver/replica_range_lease.go line 1578 at r1 (raw file):

	now := r.Clock().NowAsClockTimestamp()
	r.mu.RLock()
	preferences := conf.LeasePreferences

I don't think this needs to be done under a read lock now.

pkg/kv/kvserver/replicate_queue.go line 654 at r1 (raw file):

	// TODO(baptist): Its possible that the conf or desc have changed between the
	// call to shouldQueue above and this call to process. Consider calling
	// shouldQueue again here and verify that it returns the same results.

Why should shouldQueue be called again? It seems reasonable that the state of the cluster/range may have changed between calling shouldQueue and process.

Is there a benefit to doing this? I could see priority inversion being a problem, where we want to process the highest priority at some point in time.

However, doing so doesn't seem feasible due to overhead of checking each leaseholder replica every X seconds.

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1969 at r1 (raw file):

	leaseRepl interface {
		StoreID() roachpb.StoreID
		RaftStatus() *raft.Status

Do you anticipate any issues w/ the descriptor being a snapshot, whilst the RaftStatus is pulled adhoc here? i.e. inconsistency, which was less likely to exist previously?

pkg/kv/kvserver/asim/queue/replicate_queue.go line 80 at r1 (raw file):

	if err != nil {
		log.VEventf(ctx, 1, "conf not found=%s, config=%s", desc, &conf)
		return false

Could this instead panic() or log.Fatalf? This isn't expected to every occur in the simulator (yet). Similar to what you added below.

Previously the different layers of the allocator would load the Desc and SpanConf as needed, this had a risk of them changing between various loads and could cause strange and hard to track down races. Now they are loaded once and passed through all the layers. Epic: none Release note: None

andrewbaptist

Thanks for the feedback. Can you take one more pass.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)

pkg/kv/kvserver/replica_command.go line 3994 at r1 (raw file):