Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: pass Desc and SpanConf through allocator #108197

Merged

Conversation

andrewbaptist
Copy link
Contributor

@andrewbaptist andrewbaptist commented Aug 4, 2023

Previously the different layers of the allocator would load the Desc and
SpanConf as needed, this had a risk of them changing between various
loads and could cause strange and hard to track down races. Now they are
loaded once and passed through all the layers.

Epic: none

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andrewbaptist andrewbaptist force-pushed the 2023-08-03-allocator-dont-reload branch 8 times, most recently from 68c2f9a to be35e0d Compare August 8, 2023 15:02
@andrewbaptist andrewbaptist marked this pull request as ready for review August 8, 2023 16:01
@andrewbaptist andrewbaptist requested a review from a team as a code owner August 8, 2023 16:01
@andrewbaptist andrewbaptist changed the title kvserver: pass Desc and SpanConf through kvserver: pass Desc and SpanConf through allocator Aug 8, 2023
@andrewbaptist andrewbaptist requested a review from kvoli August 8, 2023 16:01
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)


pkg/kv/kvserver/replica_command.go line 3994 at r1 (raw file):

			allowLeaseTransfer = true
		}
		desc := r.Desc()

nit: Use DescAndSpanConfig() so that both values are read under the same read lock.

// DescAndSpanConfig returns the authoritative range descriptor as well
// as the span config for the replica.
func (r *Replica) DescAndSpanConfig() (*roachpb.RangeDescriptor, roachpb.SpanConfig) {
r.mu.RLock()
defer r.mu.RUnlock()
return r.mu.state.Desc, r.mu.conf
}

Or is that the intention, that in a follow-up PR, SpanConfig calls the store reader? I still think it would be better to read under the same lock here, while the code is transitioning.


pkg/kv/kvserver/replica_command.go line 4016 at r1 (raw file):

	if args.RandomizeLeases && r.OwnsValidLease(ctx, r.store.Clock().NowAsClockTimestamp()) {
		desc := r.Desc()
		conf := r.SpanConfig()

nit: same as above.


pkg/kv/kvserver/replica_range_lease.go line 1578 at r1 (raw file):

	now := r.Clock().NowAsClockTimestamp()
	r.mu.RLock()
	preferences := conf.LeasePreferences

I don't think this needs to be done under a read lock now.


pkg/kv/kvserver/replicate_queue.go line 654 at r1 (raw file):

	// TODO(baptist): Its possible that the conf or desc have changed between the
	// call to shouldQueue above and this call to process. Consider calling
	// shouldQueue again here and verify that it returns the same results.

Why should shouldQueue be called again? It seems reasonable that the state of the cluster/range may have changed between calling shouldQueue and process.

Is there a benefit to doing this? I could see priority inversion being a problem, where we want to process the highest priority at some point in time.

However, doing so doesn't seem feasible due to overhead of checking each leaseholder replica every X seconds.


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1969 at r1 (raw file):

	leaseRepl interface {
		StoreID() roachpb.StoreID
		RaftStatus() *raft.Status

Do you anticipate any issues w/ the descriptor being a snapshot, whilst the RaftStatus is pulled adhoc here? i.e. inconsistency, which was less likely to exist previously?


pkg/kv/kvserver/asim/queue/replicate_queue.go line 80 at r1 (raw file):

	if err != nil {
		log.VEventf(ctx, 1, "conf not found=%s, config=%s", desc, &conf)
		return false

Could this instead panic() or log.Fatalf? This isn't expected to every occur in the simulator (yet). Similar to what you added below.

Previously the different layers of the allocator would load the Desc and
SpanConf as needed, this had a risk of them changing between various
loads and could cause strange and hard to track down races. Now they are
loaded once and passed through all the layers.

Epic: none

Release note: None
@andrewbaptist andrewbaptist force-pushed the 2023-08-03-allocator-dont-reload branch from be35e0d to 8478b85 Compare August 8, 2023 18:42
Copy link
Contributor Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Can you take one more pass.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/replica_command.go line 3994 at r1 (raw file):

Previously, kvoli (Austen) wrote…

nit: Use DescAndSpanConfig() so that both values are read under the same read lock.

// DescAndSpanConfig returns the authoritative range descriptor as well
// as the span config for the replica.
func (r *Replica) DescAndSpanConfig() (*roachpb.RangeDescriptor, roachpb.SpanConfig) {
r.mu.RLock()
defer r.mu.RUnlock()
return r.mu.state.Desc, r.mu.conf
}

Or is that the intention, that in a follow-up PR, SpanConfig calls the store reader? I still think it would be better to read under the same lock here, while the code is transitioning.

Good idea. I did change it in a future PR, but there is still a bit of work to do on that, so better to keep these combined as you mention.


pkg/kv/kvserver/replica_command.go line 4016 at r1 (raw file):

Previously, kvoli (Austen) wrote…

nit: same as above.

Done


pkg/kv/kvserver/replica_range_lease.go line 1578 at r1 (raw file):

Previously, kvoli (Austen) wrote…

I don't think this needs to be done under a read lock now.

Changed. Only the lease status check needed the read lock, and it was easier to call it using the CurrentLeaseStatus method.


pkg/kv/kvserver/replicate_queue.go line 654 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Why should shouldQueue be called again? It seems reasonable that the state of the cluster/range may have changed between calling shouldQueue and process.

Is there a benefit to doing this? I could see priority inversion being a problem, where we want to process the highest priority at some point in time.

However, doing so doesn't seem feasible due to overhead of checking each leaseholder replica every X seconds.

Removed this comment, I had originally thought we would want to keep the conf between calls, but it doesn't really serve a purpose and this doesn't need to change.


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1969 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Do you anticipate any issues w/ the descriptor being a snapshot, whilst the RaftStatus is pulled adhoc here? i.e. inconsistency, which was less likely to exist previously?

That is a good point that there could be more inconsistency between these. One option would be to also check the RaftStatus at the start of the loop. I'm not sure if that is better though since we want the "most up-to-date" Raft status. Generally, the conf is still only held for a short time (<1s) especially since we are not going to reuse the conf from the shouldQueue call in the process later. So while this does add a longer delay between the calls, I don't know if that is worse (or maybe its even better?)


pkg/kv/kvserver/asim/queue/replicate_queue.go line 80 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Could this instead panic() or log.Fatalf? This isn't expected to every occur in the simulator (yet). Similar to what you added below.

Changed to log.Fatalf. For the real queue it would have to just be a warning, but as you mention I don't think the simulator should ever hit this condition.

@andrewbaptist andrewbaptist requested a review from kvoli August 9, 2023 14:10
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1969 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

That is a good point that there could be more inconsistency between these. One option would be to also check the RaftStatus at the start of the loop. I'm not sure if that is better though since we want the "most up-to-date" Raft status. Generally, the conf is still only held for a short time (<1s) especially since we are not going to reuse the conf from the shouldQueue call in the process later. So while this does add a longer delay between the calls, I don't know if that is worse (or maybe its even better?)

The conf being stale is okay. I agree, it is unlikely to change that quickly, and probably won't have disastrous consequences if it does.

The range descriptor being inconsistent with the raft status is the issue I mentioned above. That said, the status is only used for checking whether a target may need a snapshot afaik, then excluding it.

@andrewbaptist
Copy link
Contributor Author

bors r=kvoli

@craig
Copy link
Contributor

craig bot commented Aug 9, 2023

Build succeeded:

@craig craig bot merged commit 1925449 into cockroachdb:master Aug 9, 2023
@andrewbaptist andrewbaptist deleted the 2023-08-03-allocator-dont-reload branch December 15, 2023 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants