-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: pass Desc and SpanConf through allocator #108197
kvserver: pass Desc and SpanConf through allocator #108197
Conversation
68c2f9a
to
be35e0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 13 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)
pkg/kv/kvserver/replica_command.go
line 3994 at r1 (raw file):
allowLeaseTransfer = true } desc := r.Desc()
nit: Use DescAndSpanConfig()
so that both values are read under the same read lock.
cockroach/pkg/kv/kvserver/replica.go
Lines 1033 to 1039 in 4c39761
// DescAndSpanConfig returns the authoritative range descriptor as well | |
// as the span config for the replica. | |
func (r *Replica) DescAndSpanConfig() (*roachpb.RangeDescriptor, roachpb.SpanConfig) { | |
r.mu.RLock() | |
defer r.mu.RUnlock() | |
return r.mu.state.Desc, r.mu.conf | |
} |
Or is that the intention, that in a follow-up PR, SpanConfig
calls the store reader? I still think it would be better to read under the same lock here, while the code is transitioning.
pkg/kv/kvserver/replica_command.go
line 4016 at r1 (raw file):
if args.RandomizeLeases && r.OwnsValidLease(ctx, r.store.Clock().NowAsClockTimestamp()) { desc := r.Desc() conf := r.SpanConfig()
nit: same as above.
pkg/kv/kvserver/replica_range_lease.go
line 1578 at r1 (raw file):
now := r.Clock().NowAsClockTimestamp() r.mu.RLock() preferences := conf.LeasePreferences
I don't think this needs to be done under a read lock now.
pkg/kv/kvserver/replicate_queue.go
line 654 at r1 (raw file):
// TODO(baptist): Its possible that the conf or desc have changed between the // call to shouldQueue above and this call to process. Consider calling // shouldQueue again here and verify that it returns the same results.
Why should shouldQueue
be called again? It seems reasonable that the state of the cluster/range may have changed between calling shouldQueue
and process
.
Is there a benefit to doing this? I could see priority inversion being a problem, where we want to process the highest priority at some point in time.
However, doing so doesn't seem feasible due to overhead of checking each leaseholder replica every X seconds.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1969 at r1 (raw file):
leaseRepl interface { StoreID() roachpb.StoreID RaftStatus() *raft.Status
Do you anticipate any issues w/ the descriptor being a snapshot, whilst the RaftStatus is pulled adhoc here? i.e. inconsistency, which was less likely to exist previously?
pkg/kv/kvserver/asim/queue/replicate_queue.go
line 80 at r1 (raw file):
if err != nil { log.VEventf(ctx, 1, "conf not found=%s, config=%s", desc, &conf) return false
Could this instead panic()
or log.Fatalf
? This isn't expected to every occur in the simulator (yet). Similar to what you added below.
Previously the different layers of the allocator would load the Desc and SpanConf as needed, this had a risk of them changing between various loads and could cause strange and hard to track down races. Now they are loaded once and passed through all the layers. Epic: none Release note: None
be35e0d
to
8478b85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. Can you take one more pass.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/replica_command.go
line 3994 at r1 (raw file):
Previously, kvoli (Austen) wrote…
nit: Use
DescAndSpanConfig()
so that both values are read under the same read lock.cockroach/pkg/kv/kvserver/replica.go
Lines 1033 to 1039 in 4c39761
// DescAndSpanConfig returns the authoritative range descriptor as well // as the span config for the replica. func (r *Replica) DescAndSpanConfig() (*roachpb.RangeDescriptor, roachpb.SpanConfig) { r.mu.RLock() defer r.mu.RUnlock() return r.mu.state.Desc, r.mu.conf } Or is that the intention, that in a follow-up PR,
SpanConfig
calls the store reader? I still think it would be better to read under the same lock here, while the code is transitioning.
Good idea. I did change it in a future PR, but there is still a bit of work to do on that, so better to keep these combined as you mention.
pkg/kv/kvserver/replica_command.go
line 4016 at r1 (raw file):
Previously, kvoli (Austen) wrote…
nit: same as above.
Done
pkg/kv/kvserver/replica_range_lease.go
line 1578 at r1 (raw file):
Previously, kvoli (Austen) wrote…
I don't think this needs to be done under a read lock now.
Changed. Only the lease status check needed the read lock, and it was easier to call it using the CurrentLeaseStatus
method.
pkg/kv/kvserver/replicate_queue.go
line 654 at r1 (raw file):
Previously, kvoli (Austen) wrote…
Why should
shouldQueue
be called again? It seems reasonable that the state of the cluster/range may have changed between callingshouldQueue
andprocess
.Is there a benefit to doing this? I could see priority inversion being a problem, where we want to process the highest priority at some point in time.
However, doing so doesn't seem feasible due to overhead of checking each leaseholder replica every X seconds.
Removed this comment, I had originally thought we would want to keep the conf between calls, but it doesn't really serve a purpose and this doesn't need to change.
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1969 at r1 (raw file):
Previously, kvoli (Austen) wrote…
Do you anticipate any issues w/ the descriptor being a snapshot, whilst the RaftStatus is pulled adhoc here? i.e. inconsistency, which was less likely to exist previously?
That is a good point that there could be more inconsistency between these. One option would be to also check the RaftStatus at the start of the loop. I'm not sure if that is better though since we want the "most up-to-date" Raft status. Generally, the conf is still only held for a short time (<1s) especially since we are not going to reuse the conf from the shouldQueue
call in the process
later. So while this does add a longer delay between the calls, I don't know if that is worse (or maybe its even better?)
pkg/kv/kvserver/asim/queue/replicate_queue.go
line 80 at r1 (raw file):
Previously, kvoli (Austen) wrote…
Could this instead
panic()
orlog.Fatalf
? This isn't expected to every occur in the simulator (yet). Similar to what you added below.
Changed to log.Fatalf. For the real queue it would have to just be a warning, but as you mention I don't think the simulator should ever hit this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go
line 1969 at r1 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
That is a good point that there could be more inconsistency between these. One option would be to also check the RaftStatus at the start of the loop. I'm not sure if that is better though since we want the "most up-to-date" Raft status. Generally, the conf is still only held for a short time (<1s) especially since we are not going to reuse the conf from the
shouldQueue
call in theprocess
later. So while this does add a longer delay between the calls, I don't know if that is worse (or maybe its even better?)
The conf being stale is okay. I agree, it is unlikely to change that quickly, and probably won't have disastrous consequences if it does.
The range descriptor being inconsistent with the raft status is the issue I mentioned above. That said, the status is only used for checking whether a target may need a snapshot afaik, then excluding it.
bors r=kvoli |
Build succeeded: |
Previously the different layers of the allocator would load the Desc and
SpanConf as needed, this had a risk of them changing between various
loads and could cause strange and hard to track down races. Now they are
loaded once and passed through all the layers.
Epic: none
Release note: None