Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: sustained constraint non-conformance for MR schemas #108127

Closed
irfansharif opened this issue Aug 3, 2023 · 13 comments · Fixed by #111609
Closed

kvserver: sustained constraint non-conformance for MR schemas #108127

irfansharif opened this issue Aug 3, 2023 · 13 comments · Fixed by #111609
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. A-kv-server Relating to the KV-level RPC server branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Aug 3, 2023

Originally posted by @dikshant in #106128 (comment)

Here is a debug zip, see below for repro steps.
https://drive.google.com/file/d/1Ilkl1vWS8CpyuNDku93dC9XLs0U9aI4k/view?usp=sharing

I tried this on a 23.1.7 on a 18 node multi region cluster in roachprod.

So this is interesting. Mapping replicas to replica_localities using @j82w 's fixed query shows the correct mappings:

root@localhost:26257/meetup> SELECT DISTINCT
                          ->       split_part(unnest(replica_localities), ',', 2) replica_localities,
                          ->       unnest(replicas) replica,
                          ->       range_id
                          ->     FROM [SHOW RANGE FROM TABLE product FOR ROW ('europe-west1', '2f22da46-d983-4878-8ad2-a6e6ff7e8f39')];
    replica_localities   | replica | range_id
-------------------------+---------+-----------
  region=europe-west1    |       7 |       69
  region=europe-west1    |       9 |       69
  region=europe-central2 |      11 |       69
  region=europe-central2 |      12 |       69
  region=europe-north1   |      15 |       69
(5 rows)

However, the violating range is still present and this is after waiting 10+ minutes:

root@localhost:26257/meetup> SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
  zone_id | subzone_id |       type       |       config       | report_id |        violation_start        | violating_ranges
----------+------------+------------------+--------------------+-----------+-------------------------------+-------------------
      116 |          0 | voter_constraint | +region=us-east4:2 |         1 | 2023-08-02 23:44:58.271424+00 |                3
(1 row)

Time: 105ms total (execution 105ms / network 0ms)

Reproduction steps:

  1. Create a MR cluster. I used:

    roachprod create dikshant-test -n 18 --gce-zones 'us-east4-a','us-east4-a','us-east4-a','us- 
    central1-a','us-central1-a','us-central1-a','europe-west1-b','europe-west1-b','europe-west1- 
    b','europe-central2-b','europe-central2-b','europe-central2-b',"europe-north1-b","europe- 
    north1-b","europe-north1-b","us-west1-a","us-west1-a","us-west1-a" && roachprod stage 
    dikshant-test release v23.1.7 && roachprod start dikshant-test:1-18
    
  2. Apply the following DDL and DML:
    https://gist.github.com/dikshant/d4d170d70e493119b7cb6306aedb7551

  3. Check for violating ranges after waiting for ~10 minutes:

    SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
    

It seems the violating range always has the primary region on the config. I don't know if this is expected behavior.

For example I ran an ALTER to change the primary region:

ALTER DATABASE "meetup" SET PRIMARY REGION "us-west1";
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;

And got:

  zone_id | subzone_id |       type       |       config       | report_id |        violation_start        | violating_ranges
----------+------------+------------------+--------------------+-----------+-------------------------------+-------------------
      116 |          0 | voter_constraint | +region=us-west1:2 |         1 | 2023-08-03 00:11:17.833192+00 |                2
(1 row)

Whereas running:

SET alter_primary_region_super_region_override = 'on';
ALTER DATABASE "meetup" SET PRIMARY REGION "europe-west1";

Gives us (after waiting a bit):

SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
  zone_id | subzone_id |       type       |         config         | report_id |        violation_start        | violating_ranges
----------+------------+------------------+------------------------+-----------+-------------------------------+-------------------
      116 |          0 | voter_constraint | +region=europe-west1:2 |         1 | 2023-08-03 00:16:19.488701+00 |                8

Jira issue: CRDB-30324

@blathers-crl
Copy link

blathers-crl bot commented Aug 3, 2023

Hi @irfansharif, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 3, 2023
@irfansharif irfansharif added the A-kv-server Relating to the KV-level RPC server label Aug 3, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Aug 3, 2023
@irfansharif irfansharif added the A-kv-replication Relating to Raft, consensus, and coordination. label Aug 3, 2023
@blathers-crl
Copy link

blathers-crl bot commented Aug 3, 2023

cc @cockroachdb/replication

@andrewbaptist andrewbaptist removed A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication labels Aug 4, 2023
@shralex shralex added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Aug 4, 2023
@arulajmani arulajmani added the A-kv-distribution Relating to rebalancing and leasing. label Aug 4, 2023
@kvoli
Copy link
Collaborator

kvoli commented Aug 7, 2023

It looks likely that this is an instance of #106559.

The violated voter constraint:

root@localhost:26257/defaultdb> select full_config_yaml from crdb_internal.zones INNER JOIN system.replication_constraint_stats on crdb_internal.zones.zone_id = system.replication_constraint_stats.zone_id where system.replication_constraint_stats.violating_ranges > 0;                                                                                                                                                                                                   
                                                                         full_config_yaml
------------------------------------------------------------------------------------------------------------------------------------------------------------------
  range_min_bytes: 134217728
  range_max_bytes: 536870912
  gc:
    ttlseconds: 14400
  global_reads: null
  num_replicas: 7
  num_voters: 5
  constraints: {+region=europe-central2: 1, +region=europe-north1: 1, +region=europe-west1: 1, +region=us-central1: 1, +region=us-east4: 1, +region=us-west1: 1}
  voter_constraints: {+region=us-east4: 2}
  lease_preferences: [[+region=us-east4]]

(1 row)

Time: 1.270s total (execution 1.270s / network 0.000s)

@andrewbaptist
Copy link
Collaborator

@kvoli can you clarify the following statement? "The number of constraints for each region should be greater than or equal to the sum of the voter_constraints and non_voter_constraints for that region." Should we add a doc change to specify this and let customers who violate this know? We should still leave the issue open and eventually fix it, but having clear guidance about how to avoid this will be important. Is there ever a reason why this wouldn't make sense to do?

@kvoli
Copy link
Collaborator

kvoli commented Aug 7, 2023

There is no such thing as non_voter_constraints. These are zone configs which SQL is generating for the user, from higher level MR primitives.

There used to be a loose invariant (which sometime ago stopped being true) that:

// NB: This assumes that the sum of all constraints.NumReplicas is equal to
// configured number of replicas for the range, or that there's just one set of
// constraints with NumReplicas set to 0. This is meant to be enforced in the
// config package.

@andrewbaptist
Copy link
Collaborator

Ah ok - that makes sense. Does it still make sense to strongly recommend (or force) the num_voters in a region <= constraints for that region? I'm not sure I fully understand the use case for why the constraints are set up as they are, so there may be a use case I'm missing.

@kvoli kvoli removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 8, 2023
@kvoli
Copy link
Collaborator

kvoli commented Aug 8, 2023

Ah ok - that makes sense. Does it still make sense to strongly recommend (or force) the num_voters in a region <= constraints for that region? I'm not sure I fully understand the use case for why the constraints are set up as they are, so there may be a use case I'm missing.

It does make sense. I think we need to take a stroll through the SQL code which validates and generates these configs, and look for some low-hanging fruit like this (if possible).

@dikshant
Copy link

dikshant commented Aug 11, 2023

I was speaking to @otan who mentioned that the code for generating the zone config from high level abstractions is located here:
https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/region_util.go#L765-L765

I also realized that this particular issue only happens when using Super Regions + Regional By Row Tables. With Super Regions + Regional tables, I was unable to reproduce.

@rafiss
Copy link
Collaborator

rafiss commented Aug 11, 2023

I also realized that this particular issue only happens when using Super Regions + Regional By Row Tables. With Super Regions + Regional tables, I was unable to reproduce.

I saw above that this issue was linked to #106559, which was initially discovered through a test that only uses secondary regions (no super regions or RBR). I don't understand yet how the issues relate; could someone explain?

@kvoli
Copy link
Collaborator

kvoli commented Aug 14, 2023

I saw above that this issue was linked to #106559, which was initially discovered through a test that only uses secondary regions (no super regions or RBR). I don't understand yet how the issues relate; could someone explain?

They both have configs which require a non-voter in a specific region, and voter constraints which aren't also present in the all replica constraints. They both end up with a voter somewhere there should be a non-voter, which needs to be swapped with a non-voter (in the voter locality) to satisfy the voter constraint.

So they are related because they both have configs which are vulnerable to the bug described in #106559.

This issue zcfg
num_replicas: 7
num_voters: 5
constraints: {+region=europe-central2: 1, +region=europe-north1: 1, +region=europe-west1: 1, +region=us-central1: 1, +region=us-east4: 1, +region=us-west1: 1}
voter_constraints: {+region=us-east4: 2}
Other issue zcfg
num_replicas=6 
num_voters=5 
constraints={'+region=eu-west-1':1,'+region=us-central-1':1,'+region=us-east-1':1,'+region=us-west-1':1} 
voter_constraints={'+region=us-west-1':2,'+region=us-east-1':2}

kvoli added a commit to kvoli/cockroach that referenced this issue Oct 3, 2023
Previously, it was possible for a satisfiable voter constraint to never
be satisfied when:

1. There were a correct number of `VOTER` and `NON_VOTER` replicas.
2. All existing replicas were necessary to satisfy a replica constraint,
   or voter constraint.

The allocator relies on the `RebalanceVoter` path to resolve voter
constraint violations when there are a correct number of each replica
type.

Candidates which are `necessary` to satisfy a constraint are
ranked higher as rebalance targets than those which are not. Under most
circumstances this leads to constraint conformance. However, when every
existing replica is necessary to satisfy a replica constraint, and a
voter constraint is unsatisfied -- `RebalanceVoter` would not consider
swapping a `VOTER` and `NON_VOTER` to satisfy the constraint.

For example, consider a setup where there are two stores, one in
locality `a` and the other `b`, where some range has the following
config and initial placement:

```
replicas          = a(non-voter) b(voter)
constraints       = a:1 b:1
voter_constraints = a:1
```

In this example, the only satisfiable placement is `a(voter)`
`b(non-voter)`, which would require promoting `a(non-voter) ->
a(voter)`, and demoting `b(voter)->b(non-voter)`. However, both are
necessary to satisfy `constraints` leading to no rebalance occurring.

Add an additional field to the allocator candidate struct, which is used
to sort rebalance candidates. The new field, `voterNecessary` is sorted
strictly after `necessary`, but before `diversityScore`.

The `voterNecessary` field can be true only when rebalancing voters, and
when the rebalance candidate is necessary to satisfy a voter constraint,
the rebalance candidate already has a non-voter, and the existing voter
is not necessary to satisfy *any* voter constraint.

Note these rebalances are turned into swaps (promotion and demotion) in
`plan.ReplicationChangesForRebalance`, so incur no snapshots.

Fixes: cockroachdb#98020
Fixes: cockroachdb#106559
Fixes: cockroachdb#108127

Release note (bug fix): Voter constraints which were never satisfied due
to all existing replicas being considered necessary to satisfy a replica
constraint, will now be satisfied by promoting existing non-voters.
craig bot pushed a commit that referenced this issue Oct 6, 2023
111609: allocator: prioritize non-voter promotion to satisfy voter constraints r=sumeerbhola a=kvoli

Previously, it was possible for a satisfiable voter constraint to never
be satisfied when:

1. There were a correct number of `VOTER` and `NON_VOTER` replicas.
2. All existing replicas were necessary to satisfy a replica constraint,
   or voter constraint.

The allocator relies on the `RebalanceVoter` path to resolve voter
constraint violations when there are a correct number of each replica
type.

Candidates which are `necessary` to satisfy a constraint are
ranked higher as rebalance targets than those which are not. Under most
circumstances this leads to constraint conformance. However, when every
existing replica is necessary to satisfy a replica constraint, and a
voter constraint is unsatisfied -- `RebalanceVoter` would not consider
swapping a `VOTER` and `NON_VOTER` to satisfy the constraint.

For example, consider a setup where there are two stores, one in
locality `a` and the other `b`, where some range has the following
config and initial placement:

```
replicas          = a(non-voter) b(voter)
constraints       = a:1 b:1
voter_constraints = a:1
```

In this example, the only satisfiable placement is `a(voter)`
`b(non-voter)`, which would require promoting `a(non-voter) ->
a(voter)`, and demoting `b(voter)->b(non-voter)`. However, both are
necessary to satisfy `constraints` leading to no rebalance occurring.

Add an additional field to the allocator candidate struct, which is used
to sort rebalance candidates. The new field, `voterNecessary` is sorted
strictly after `necessary`, but before `diversityScore`.

The `voterNecessary` field can be true only when rebalancing voters, and
when the rebalance candidate is necessary to satisfy a voter constraint,
the rebalance candidate already has a non-voter, and the existing voter
is not necessary to satisfy *any* voter constraint.

Note these rebalances are turned into swaps (promotion and demotion) in
`plan.ReplicationChangesForRebalance`, so incur no snapshots.

Fixes: #98020
Fixes: #106559
Fixes: #108127

Release note (bug fix): Voter constraints which were never satisfied due
to all existing replicas being considered necessary to satisfy a replica
constraint, will now be satisfied by promoting existing non-voters.

Co-authored-by: Austen McClernon <[email protected]>
@craig craig bot closed this as completed in ed51752 Oct 6, 2023
@kvoli
Copy link
Collaborator

kvoli commented Oct 16, 2023

Re-opening based on conversation with @dikshant.

@kvoli kvoli reopened this Oct 16, 2023
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 17, 2023
Add a manual simulator reproduction for cockroachdb#108127.

Note the reproduction doesn't fail on this commit, but may not be
meeting the conditions for the failure when setting up the initial
replcia placement.

Epic: none
Release note: None
@kvoli
Copy link
Collaborator

kvoli commented Oct 17, 2023

I couldn't reproduce this following the original steps. The violated constraints get resolved within a few minutes of setting the cfg. The same config as listed above is applied.

Note to setup the cluster you'll need to use n1's, since some zones don't support ice lake.

export CLUSTER=austen-repro-108127
roachprod create $CLUSTER -n 18 --gce-zones 'us-east4-a','us-east4-a','us-east4-a','us-central1-a','us-central1-a','us-central1-a','europe-west1-b','europe-west1-b','europe-west1-b','europe-central2-b','europe-central2-b','europe-central2-b',"europe-north1-b","europe-north1-b","europe-north1-b","us-west1-a","us-west1-a","us-west1-a" --gce-machine-type=n1-standard-4

I also tried reproducing with the allocation simulator, similar to a test added in the patch.

This was using V23.2.0-ALPHA.4-DEV-2FF635E724FC56B1603E9871B033A15F82FA3D34, the latest master binary. The patch is on the release-23.2 branch, but might not have made it into an earlier alpha?

EDIT: The patch isn't in an alpha yet (v23.2.0-alpha.3 was cut 2 days before it merged.)

I would try again when the next alpha comes out, that (hopefully) explains the behavior in testing.

@kvoli
Copy link
Collaborator

kvoli commented Oct 19, 2023

Closing again, this doesn't reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. A-kv-server Relating to the KV-level RPC server branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants