Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: prioritize non-voter promotion to satisfy voter constraints #111609

Merged

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Oct 2, 2023

Previously, it was possible for a satisfiable voter constraint to never
be satisfied when:

  1. There were a correct number of VOTER and NON_VOTER replicas.
  2. All existing replicas were necessary to satisfy a replica constraint,
    or voter constraint.

The allocator relies on the RebalanceVoter path to resolve voter
constraint violations when there are a correct number of each replica
type.

Candidates which are necessary to satisfy a constraint are
ranked higher as rebalance targets than those which are not. Under most
circumstances this leads to constraint conformance. However, when every
existing replica is necessary to satisfy a replica constraint, and a
voter constraint is unsatisfied -- RebalanceVoter would not consider
swapping a VOTER and NON_VOTER to satisfy the constraint.

For example, consider a setup where there are two stores, one in
locality a and the other b, where some range has the following
config and initial placement:

replicas          = a(non-voter) b(voter)
constraints       = a:1 b:1
voter_constraints = a:1

In this example, the only satisfiable placement is a(voter)
b(non-voter), which would require promoting a(non-voter) -> a(voter), and demoting b(voter)->b(non-voter). However, both are
necessary to satisfy constraints leading to no rebalance occurring.

Add an additional field to the allocator candidate struct, which is used
to sort rebalance candidates. The new field, voterNecessary is sorted
strictly after necessary, but before diversityScore.

The voterNecessary field can be true only when rebalancing voters, and
when the rebalance candidate is necessary to satisfy a voter constraint,
the rebalance candidate already has a non-voter, and the existing voter
is not necessary to satisfy any voter constraint.

Note these rebalances are turned into swaps (promotion and demotion) in
plan.ReplicationChangesForRebalance, so incur no snapshots.

Fixes: #98020
Fixes: #106559
Fixes: #108127

Release note (bug fix): Voter constraints which were never satisfied due
to all existing replicas being considered necessary to satisfy a replica
constraint, will now be satisfied by promoting existing non-voters.

@kvoli kvoli self-assigned this Oct 2, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli force-pushed the 230922.allocator-voter-constraint-violation branch from 0fa98e5 to abe8923 Compare October 2, 2023 21:02
@kvoli kvoli changed the title allocator: promote non-voters to satisfy voter constraint allocator: prioritize non-voter promotion to satisfy voter constraints Oct 2, 2023
@kvoli kvoli force-pushed the 230922.allocator-voter-constraint-violation branch 6 times, most recently from c55adce to e123576 Compare October 3, 2023 14:13
@kvoli kvoli marked this pull request as ready for review October 3, 2023 14:59
@kvoli kvoli requested review from a team as code owners October 3, 2023 14:59
@kvoli kvoli requested a review from sumeerbhola October 3, 2023 14:59
kvoli added 2 commits October 3, 2023 17:47
Include the replicate queue trace and expected matching string for
`TestReplicateQueueShouldQueueNonVoter`.

Epic: none
Release note: None
`TestReplicateQueueShouldQueueNonVoter` takes 300+s when tested under
stress race. The majority of this time is spent setting up the replica
placement for the later should queue check.

Skip under stress race.

Epic: none
Release note: None
@kvoli kvoli force-pushed the 230922.allocator-voter-constraint-violation branch from e123576 to d871218 Compare October 3, 2023 21:47
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 6 of 6 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)


-- commits line 77 at r4:
nit: the code calls it necessaryVoter


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1788 at r4 (raw file):

		// removed) in order to satisfy a voter constraint. When every replica is
		// necessary to satisfy an all-replica, or voter constraint, the remove
		// replica will not always be the existingCandidate.

should this be "the remove replica will always be the existingCandidate"?


pkg/kv/kvserver/asim/tests/testdata/non_rand/example_conformance line 53 at r3 (raw file):

# #106559.
set_span_config delay=15m
[0,10000): num_replicas=6 num_voters=5 constraints={'+region=eu-west-1':1,'+region=us-central-1':1,'+region=us-east-1':1,'+region=us-west-1':1} voter_constraints={'+region=us-west-1':2,'+region=us-east-1':2}

We've increased the number of voters we need in us-west-1, so I thought that was the constraint that would not be satisfied.
But r1 has (n6,s6):9, (n7,s7):4 which are in us-west-1.
Can you add some code comments specifying what is unexpected in the violating constraints output?

@kvoli kvoli force-pushed the 230922.allocator-voter-constraint-violation branch from d871218 to 3681197 Compare October 5, 2023 21:55
kvoli added 2 commits October 6, 2023 00:53
Add a simulator reproduction for the voter constraint violation bug
tracked in cockroachdb#106559.

The reproduction uses 10 ranges, using a simplified locality setup to
that seen in the linked issue. Initially, the span config specifies the
final replica position seen in the issue, then after 5 minutes switches
the span config to require a voter <-> non-voter swap between the
localities.

As expected, the conformance assertion fails due to voter constraint
violations.

A datadriven simulator command `SetNodeLocality` is added to simplify
the reproduction.

Part of: cockroachdb#106559
Epic: None
Release note: None
Previously, it was possible for a satisfiable voter constraint to never
be satisfied when:

1. There were a correct number of `VOTER` and `NON_VOTER` replicas.
2. All existing replicas were necessary to satisfy a replica constraint,
   or voter constraint.

The allocator relies on the `RebalanceVoter` path to resolve voter
constraint violations when there are a correct number of each replica
type.

Candidates which are `necessary` to satisfy a constraint are
ranked higher as rebalance targets than those which are not. Under most
circumstances this leads to constraint conformance. However, when every
existing replica is necessary to satisfy a replica constraint, and a
voter constraint is unsatisfied -- `RebalanceVoter` would not consider
swapping a `VOTER` and `NON_VOTER` to satisfy the constraint.

For example, consider a setup where there are two stores, one in
locality `a` and the other `b`, where some range has the following
config and initial placement:

```
replicas          = a(non-voter) b(voter)
constraints       = a:1 b:1
voter_constraints = a:1
```

In this example, the only satisfiable placement is `a(voter)`
`b(non-voter)`, which would require promoting `a(non-voter) ->
a(voter)`, and demoting `b(voter)->b(non-voter)`. However, both are
necessary to satisfy `constraints` leading to no rebalance occurring.

Add an additional field to the allocator candidate struct, which is used
to sort rebalance candidates. The new field, `voterNecessary` is sorted
strictly after `necessary`, but before `diversityScore`.

The `voterNecessary` field can be true only when rebalancing voters, and
when the rebalance candidate is necessary to satisfy a voter constraint,
the rebalance candidate already has a non-voter, and the existing voter
is not necessary to satisfy *any* voter constraint.

Note these rebalances are turned into swaps (promotion and demotion) in
`plan.ReplicationChangesForRebalance`, so incur no snapshots.

Fixes: cockroachdb#98020
Fixes: cockroachdb#106559
Fixes: cockroachdb#108127

Release note (bug fix): Voter constraints which were never satisfied due
to all existing replicas being considered necessary to satisfy a replica
constraint, will now be satisfied by promoting existing non-voters.
@kvoli kvoli force-pushed the 230922.allocator-voter-constraint-violation branch from 3681197 to ed51752 Compare October 6, 2023 01:06
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli and @sumeerbhola)


-- commits line 77 at r4:

Previously, sumeerbhola wrote…

nit: the code calls it necessaryVoter

Renamed to voterNecessary in the code./


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1788 at r4 (raw file):

Previously, sumeerbhola wrote…

should this be "the remove replica will always be the existingCandidate"?

I was referring to the simulateRemoveTarget below. Which normally is selected as the same target as the replica being removed. Here it won't be because the voterNecessary field is only used for rebalancing. Updated to extend the comment.


pkg/kv/kvserver/asim/tests/testdata/non_rand/example_conformance line 53 at r3 (raw file):

Previously, sumeerbhola wrote…

We've increased the number of voters we need in us-west-1, so I thought that was the constraint that would not be satisfied.
But r1 has (n6,s6):9, (n7,s7):4 which are in us-west-1.
Can you add some code comments specifying what is unexpected in the violating constraints output?

If the nodes in the AU locality were not added, then that would be the case. I realized that the initial config constraints were running into the same bug here, so the placement doesn't actually match the initial span config applied.

This led to the AU locality nodes violating the constraints for some of the ranges, as the issue persisted.

I updated to use simpler localities and removed the AU nodes by setting the node locality after the cluster was setup, rather than adding nodes with a specific locality (shortcoming of the test harness currently).

Added additional comments as well.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 8 files at r5, 6 of 6 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/asim/tests/testdata/non_rand/example_conformance line 53 at r3 (raw file):

Previously, kvoli (Austen) wrote…

If the nodes in the AU locality were not added, then that would be the case. I realized that the initial config constraints were running into the same bug here, so the placement doesn't actually match the initial span config applied.

This led to the AU locality nodes violating the constraints for some of the ranges, as the issue persisted.

I updated to use simpler localities and removed the AU nodes by setting the node locality after the cluster was setup, rather than adding nodes with a specific locality (shortcoming of the test harness currently).

Added additional comments as well.

This is easier to understand -- thanks.
Is there a way to log the placement after 4m, to show the initial assignment, and to log the placement after 10m (even for the successful case -- I realize that the unsuccessful one has the failed assertion that logs the placement).

Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/asim/tests/testdata/non_rand/example_conformance line 53 at r3 (raw file):

Previously, sumeerbhola wrote…

This is easier to understand -- thanks.
Is there a way to log the placement after 4m, to show the initial assignment, and to log the placement after 10m (even for the successful case -- I realize that the unsuccessful one has the failed assertion that logs the placement).

This is possible in the rand_test framework, but not with arbitrary localities and constraints.

https://github.com/kvoli/cockroach/blob/ed5175253df75879c39d3c8ec155364852167ea1/pkg/kv/kvserver/asim/tests/rand_test.go#L140-L148

So there's no easy way atm, but it would be nice to add in the near future.

@kvoli
Copy link
Collaborator Author

kvoli commented Oct 6, 2023

bors r=sumeerbhola

@craig
Copy link
Contributor

craig bot commented Oct 6, 2023

Build succeeded:

@craig craig bot merged commit 44c96e9 into cockroachdb:master Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants