Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: refactor replicate queue to enable allocator code reuse #94114

Merged
merged 2 commits into from
Jan 7, 2023

Conversation

AlexTalks
Copy link
Contributor

@AlexTalks AlexTalks commented Dec 21, 2022

This change refactors parts of the replicate queue's PlanOneChange(..)
and addOrRemove{Non}Voters(..) functions to reusable helper functions
that simplify usage of the allocator and deduplicate repeated code
paths. The change also adds convenience methods to the AllocatorAction
enum, to move certain determinations (such as if a computed allocator
action is a remove or a replace) closer to the allocator type it is
based on. These changes further enable the ability to use the allocator
outside of the replicate queue, and enable the ability for some of this
logic to move into the allocator itself in the future.

Depends on #91941.

Epic: none

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@AlexTalks AlexTalks force-pushed the dpf_rq_refactor branch 3 times, most recently from d45aa72 to fdce47e Compare December 22, 2022 06:17
@AlexTalks AlexTalks changed the title kvserver: refactor replicate queue kvserver: refactor replicate queue to enable allocator code reuse Dec 22, 2022
@AlexTalks AlexTalks requested a review from kvoli December 22, 2022 06:22
@AlexTalks AlexTalks marked this pull request as ready for review December 22, 2022 06:22
@AlexTalks AlexTalks requested a review from a team as a code owner December 22, 2022 06:22
// In case of an add action, no replicas are removed and -1 is returned, and if no
// candidates for replacement can be found during a replace action, the returned
// nothingToDo flag will be set to true.
// TODO(sarkesian): If possible, move this logic into the allocator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should certainly try and move this into the allocatorimpl pkg.

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/replicate_queue.go line 1178 at r1 (raw file):

Previously, kvoli (Austen) wrote…

We should certainly try and move this into the allocatorimpl pkg.

Done.

Also moved the "avoid fragile quorum" check into the allocator as well.

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. There is one test failing that I would look into:

TestPromoteNonVoterInAddVoter

There is fragile logic which decides to promote a non-voter on removal.

Reviewed 1 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks)


pkg/kv/kvserver/replicate_queue.go line 1303 at r2 (raw file):

) (op AllocationOp, _ error) {
	effects := effectBuilder{}
	conf := repl.SpanConfig()

This should be passed in to be consistent with the other commit that makes desc/conf consistent.

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/replicate_queue.go line 1303 at r2 (raw file):

Previously, kvoli (Austen) wrote…

This should be passed in to be consistent with the other commit that makes desc/conf consistent.

Some work will be required to rebase that change on top of all the other changes to the replicate queue/allocator, so I'll wait until these are merged to do that.

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks)


pkg/kv/kvserver/replicate_queue.go line 1178 at r1 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Done.

Also moved the "avoid fragile quorum" check into the allocator as well.

Great!


pkg/kv/kvserver/replicate_queue.go line 1303 at r2 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Some work will be required to rebase that change on top of all the other changes to the replicate queue/allocator, so I'll wait until these are merged to do that.

ack


pkg/kv/kvserver/replicate_queue.go line 1038 at r5 (raw file):

				action,
				voterReplicas, nonVoterReplicas,
				liveVoterReplicas, deadVoterReplicas,

Just curious - was this the source of the test failures previously?


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 136 at r5 (raw file):

)

func (a AllocatorAction) Add() bool {

nit: does the linter enforce export function commenting? These are pretty self explanatory, just wondering why that got removed.

Copy link
Contributor Author

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/replicate_queue.go line 1303 at r2 (raw file):

Previously, kvoli (Austen) wrote…

ack

Done.


pkg/kv/kvserver/replicate_queue.go line 1038 at r5 (raw file):

Previously, kvoli (Austen) wrote…

Just curious - was this the source of the test failures previously?

One of them, unfortunately.


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 136 at r5 (raw file):

Previously, kvoli (Austen) wrote…

nit: does the linter enforce export function commenting? These are pretty self explanatory, just wondering why that got removed.

I don't think so, can add if needed.

This change refactors parts of the replicate queue's `PlanOneChange(..)`
and `addOrRemove{Non}Voters(..)` functions to reusable helper functions
that simplify usage of the allocator and deduplicate repeated code
paths. The change also adds convenience methods to the `AllocatorAction`
enum, to move certain determinations (such as if a computed allocator
action is a remove or a replace) closer to the allocator type it is
based on. These changes move more of the logic needed to use the
allocator into the `allocatorimpl` package itself, enabling usage of the
allocator outside of the replicate queue.

Part of cockroachdb#91571.

Release note: None
While previously we checked that we can avoid a fragile quorum during
the addition of new voters in the replicate queue, this change moves the
check logic into the allocator code itself, allowing it to be reused by
other users of the allocator. This enables us to perform this check when
evaluating decommission viability, or anywhere else that uses the
allocator for new voter replicas.

Part of cockroachdb#91570.

Release note: None
@AlexTalks
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 7, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 7, 2023

Build succeeded:

@craig craig bot merged commit 7594fb2 into cockroachdb:master Jan 7, 2023
@AlexTalks AlexTalks deleted the dpf_rq_refactor branch January 7, 2023 09:26
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 7, 2023
This change exposes support via a store for checking the allocator
action and upreplication target (if applicable) for any range descriptor.
The range does not need to have a replica on the given store, nor is it
required to evaluate given the current state of the cluster (i.e. the
store's configured StorePool), as a store pool override can be
provided in order to evaluate possible future states.

Depends on cockroachdb#94114.

Part of cockroachdb#91570.

Release note: None
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 10, 2023
This change exposes support via a store for checking the allocator
action and upreplication target (if applicable) for any range descriptor.
The range does not need to have a replica on the given store, nor is it
required to evaluate given the current state of the cluster (i.e. the
store's configured StorePool), as a store pool override can be
provided in order to evaluate possible future states.

Depends on cockroachdb#94114.

Part of cockroachdb#91570.

Release note: None
AlexTalks added a commit to AlexTalks/cockroach that referenced this pull request Jan 10, 2023
This change exposes support via a store for checking the allocator
action and upreplication target (if applicable) for any range descriptor.
The range does not need to have a replica on the given store, nor is it
required to evaluate given the current state of the cluster (i.e. the
store's configured StorePool), as a store pool override can be
provided in order to evaluate possible future states.

Depends on cockroachdb#94114.

Part of cockroachdb#91570.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 10, 2023
94024: kvserver: support checking allocator action and target by range r=AlexTalks a=AlexTalks

This change exposes support via a store for checking the allocator
action and upreplication target (if applicable) for any range descriptor.
The range does not need to have a replica on the given store, nor is it
required to evaluate given the current state of the cluster (i.e. the
store's configured StorePool), as a store pool override can be
provided in order to evaluate possible future states.

Depends on #94114.

Part of #91570.

Release note: None

Co-authored-by: Alex Sarkesian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants