Skip to content

Commit

Permalink
kvserver: track allocator errors via replicate queue action metrics
Browse files Browse the repository at this point in the history
While allocator errors were intended to be reported via the "Replicate
Queue Failures by Allocator Action" metric, i.e.
`queue.replicate.<action>.error`, these were not getting reported. This
change makes sure to report these errors whenever we are not in a dry
run.

Epic: None

Release note: None
  • Loading branch information
AlexTalks committed Dec 3, 2022
1 parent 7e04319 commit 7d1793b
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 1 deletion.
8 changes: 8 additions & 0 deletions pkg/kv/kvserver/allocation_op.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ type AllocationTransferLeaseOp struct {
sideEffects func()
}

var _ AllocationOp = &AllocationTransferLeaseOp{}

// lhBeingRemoved returns true when the leaseholder is will be removed if this
// operation succeeds, otherwise false. This is always true for lease
// transfers.
Expand Down Expand Up @@ -76,6 +78,8 @@ type AllocationChangeReplicasOp struct {
sideEffects func()
}

var _ AllocationOp = &AllocationChangeReplicasOp{}

// lhBeingRemoved returns true when the voter removals for this change replicas
// operation includes the leaseholder store.
func (o AllocationChangeReplicasOp) lhBeingRemoved() bool {
Expand Down Expand Up @@ -107,6 +111,8 @@ func (o AllocationChangeReplicasOp) trackPlanningMetrics() {
// atomic change replicas operation and remove any remaining learners.
type AllocationFinalizeAtomicReplicationOp struct{}

var _ AllocationOp = &AllocationFinalizeAtomicReplicationOp{}

// TODO(kvoli): This always returns false, however it is possible that the LH
// may have been removed here.
func (o AllocationFinalizeAtomicReplicationOp) lhBeingRemoved() bool { return false }
Expand All @@ -116,6 +122,8 @@ func (o AllocationFinalizeAtomicReplicationOp) trackPlanningMetrics()
// AllocationNoop represents no operation.
type AllocationNoop struct{}

var _ AllocationOp = &AllocationNoop{}

func (o AllocationNoop) lhBeingRemoved() bool { return false }
func (o AllocationNoop) applyImpact(storepool storepool.AllocatorStorePool) {}
func (o AllocationNoop) trackPlanningMetrics() {}
Expand Down
9 changes: 8 additions & 1 deletion pkg/kv/kvserver/replicate_queue.go
Original file line number Diff line number Diff line change
Expand Up @@ -938,7 +938,14 @@ func (rq *replicateQueue) processOneChange(
// will change quickly enough in order to not get the same error and
// outcome.
if err != nil {
// Annotate the planning error if it is associated with a decomission
// If there was a change during the planning process, possibly due to
// allocator errors finding a target, we should report this as a failure
// for the associated allocator action metric if we are not in dry run.
if !dryRun {
rq.metrics.trackErrorByAllocatorAction(ctx, change.Action)
}

// Annotate the planning error if it is associated with a decommission
// allocator action so that the replica will be put into purgatory
// rather than waiting for the next scanner cycle. This is also done
// for application failures below.
Expand Down

0 comments on commit 7d1793b

Please sign in to comment.