Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: replication reports consider range with non-voting replicas to be underreplicated #69335

Closed
nvanbenschoten opened this issue Aug 24, 2021 · 5 comments · Fixed by #76430
Closed
Assignees
Labels
A-kv-observability A-multiregion Related to multi-region C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Aug 24, 2021

➜ ./cockroach demo --empty --nodes=4

[email protected]:26257/defaultdb> create table t (i int primary key);
CREATE TABLE

[email protected]:26257/defaultdb> alter table t configure zone using num_replicas = 3;
CONFIGURE ZONE 1

[email protected]:26257/defaultdb> SET CLUSTER setting kv.replication_reports.interval = '1s';
SET CLUSTER SETTING

[email protected]:26257/defaultdb> select * from system.replication_stats where zone_id = (select zone_id from crdb_internal.zones where table_name = 't');
  zone_id | subzone_id | report_id | total_ranges | unavailable_ranges | under_replicated_ranges | over_replicated_ranges
----------+------------+-----------+--------------+--------------------+-------------------------+-------------------------
       52 |          0 |         3 |            1 |                  0 |                       0 |                      0
(1 row)

[email protected]:26257/defaultdb> alter table t configure zone using num_replicas = 4, num_voters = 1;
CONFIGURE ZONE 1

[email protected]:26257/defaultdb> select * from system.replication_stats where zone_id = (select zone_id from crdb_internal.zones where table_name = 't');
  zone_id | subzone_id | report_id | total_ranges | unavailable_ranges | under_replicated_ranges | over_replicated_ranges
----------+------------+-----------+--------------+--------------------+-------------------------+-------------------------
       52 |          0 |         3 |            1 |                  0 |                       1 |                      0
(1 row)

This is because the replication reports only consult the NumReplicas field on zone configurations, and not the NumVoters (when set).

if zone.NumReplicas != nil {
numReplicas = int(*zone.NumReplicas)
return true
}

cc. @aayushshah15

Jira issue: CRDB-9550

@nvanbenschoten nvanbenschoten added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-observability A-multiregion Related to multi-region T-kv KV Team labels Aug 24, 2021
@exalate-issue-sync exalate-issue-sync bot added T-kv-observability and removed T-kv KV Team labels Jan 19, 2022
@rmloveland
Copy link
Collaborator

Is this also why the replication_constraint_stats replication report does not appear to show the table (or index) names of non-voting replicas that are violating constraints?

E.g. using this query from the docs:

WITH
    partition_violations
        AS (
            SELECT
                *
            FROM
                system.replication_constraint_stats
            WHERE
                violating_ranges > 0
        ),
    report
        AS (
            SELECT
                crdb_internal.zones.zone_id,
                crdb_internal.zones.subzone_id,
                target,
                database_name,
                table_name,
                index_name,
                partition_violations.type,
                partition_violations.config,
                partition_violations.violation_start,
                partition_violations.violating_ranges
            FROM
                crdb_internal.zones, partition_violations
            WHERE
                crdb_internal.zones.zone_id
                = partition_violations.zone_id
        )
SELECT * FROM report;

results in output with NULL table and index names

  zone_id | subzone_id |    target     | database_name | table_name | index_name |    type    |         config         |        violation_start        | violating_ranges
----------+------------+---------------+---------------+------------+------------+------------+------------------------+-------------------------------+-------------------
       52 |          0 | DATABASE movr | movr          | NULL       | NULL       | constraint | +region=europe-west1:1 | 2022-01-19 16:33:49.485535+00 |               16
       52 |          0 | DATABASE movr | movr          | NULL       | NULL       | constraint | +region=us-west1:1     | 2022-01-19 16:34:49.930886+00 |               78
       52 |          0 | DATABASE movr | movr          | NULL       | NULL       | constraint | +region=us-east1:1     | 2022-01-19 16:34:49.930886+00 |               78

Please let me know if this is the wrong issue to comment on, or if I should open another one. This seems adjacent but IDK if it's the same issue.

(Discovered while working on DOC-1178)

@nvanbenschoten
Copy link
Member Author

@rmloveland I'm not sure. I don't see why this issue would have that effect, but I also don't understand this code enough to say that it wouldn't. I think it's worth opening a separate issue.

@rmloveland
Copy link
Collaborator

I think it's worth opening a separate issue.

OK thanks @nvanbenschoten - filed #75821

@Santamaura
Copy link
Contributor

Santamaura commented Feb 9, 2022

I've been trying to reproduce this but I actually don't get any rows on the replication_stats when I do exactly as the above. Is there another way to populate replication_stats?
Screen Shot 2022-02-09 at 2 16 34 PM
Screen Shot 2022-02-09 at 2 16 47 PM

@Santamaura
Copy link
Contributor

Figured out why from #kv slack channel, requires --multitenant=false flag

Santamaura added a commit to Santamaura/cockroach that referenced this issue Mar 3, 2022
…cation status

Before this patch, the replication report's over/under-replication determination
for a range was broken for ranges with non-voting replicas; we were comparing the
desired number of voters against all the current replicas (voters and non-voters).
This patch fixes it by comparing the desired number of voters with the current voters.
In doing so, the over/under-replication counts in the report become defined as relating
exclusively to voters; a range with the wrong number of non-voters is not counted
anywhere in the report. Resolves cockroachdb#69335.

Release note (bug fix): A bug causing incorrect counts of under_replicated_ranges and
over_replicated_ranges in the crdb_internal.replication_stats table for multi-region
databases has been fixed.
craig bot pushed a commit that referenced this issue Mar 10, 2022
72991: server,sql: implement connection_wait for graceful draining r=ZhouXing19 a=ZhouXing19

Currently, the draining process is consist of three consecutive periods:

1. Server enters the "unready" state: The `/health?ready=1` http endpoint starts to show that the node is shutting down, but new SQL connections and new queries are still allowed. The server does a hard wait till the timeout. This phrase's duration is set with cluster setting `server.shutdown.drain_wait`.

2. Drain SQL connections: New SQL connections are not allowed. SQL Connections with no queries in flight will be closed by the server immediately. The rest of these SQL connections will be terminated by the server as soon as their queries are finished. Early exit if all queries are finished. This phrase's maximum duration is set with cluster setting `server.shutdown.query_wait`.

3. Drain range lease: the server keeps retrying forever until all range leases on this draining node have been transferred. Each retry iteration's duration is specified by the cluster setting `server.shutdown.lease_transfer_timeout`.

This commit reorganizes the draining process by adding a phrase where the server waits SQL connections to be closed, and once all SQL connections are closed before timeout, the server proceeds to the next draining phase.

The newly proposed draining process is:

1. (unchanged) Server enters the "unready" state: The `/health?ready=1` http endpoint starts to show that the node is shutting down, but new SQL connections and new queries are still allowed. The server does a hard wait till the timeout. This phrase's duration is set with cluster setting `server.shutdown.drain_wait`.

2. (new phase) Wait SQL connections to be closed: New SQL connections are not allowed now. The server waits for the remaining SQL connections to be closed or timeout. Once all SQL connections are closed, the draining proceed to the next phase. The maximum duration of this phase is determined by the cluster setting `server.shutdown.connection_wait`.

3. (unchanged) Drain SQL connections: New SQL connections are not allowed. SQL Connections with no queries in flight will be closed by the server immediately. The rest of these SQL connections will be terminated by the server as soon as their queries are finished. Early exit if all queries are finished. This phrase's maximum duration is set with cluster setting `server.shutdown.query_wait`.

4. (unchanged) Drain range lease: the server keeps retrying forever until all range leases on this draining node have been transferred. Each retry iteration's duration is specified by the cluster setting `server.shutdown.lease_transfer_timeout`.

The duration of the new phase ("Wait SQL connections to close") can be set similarly to the other 3 existing draining phases:
```
SET CLUSTER SETTING server.shutdown.connection_wait = '40s'
```

Resolves #66319

Release note (ops change):  add `server.shutdown.connection_wait` to the
draining process configuration. This provides a workaround when customers
encountered intermittent blips and failed requests when they were performing
operations that are related to restarting nodes.

Release justification: Low risk, high benefit changes to existing functionality
(optimize the node draining process).

76430: [CRDB-9550] kv: adjust number of voters needed calculation when determining replication status r=Santamaura a=Santamaura

Currently, when a range has non-voting replicas and it is queried through replication
stats, it will be reported as underreplicated. This is because in the case where a
zone is configured to have non-voting replicas, for the over/under replicated counts,
we compare the number of current voters to the total number of replicas which is
erroneus. Instead, we will compare current number of voters to the total number of
voters if voters has been set and otherwise will defer to the total number of replicas.
This patch ignores the desired non-voters count for the purposes of this report, for
better or worse. Resolves #69335.

Release justification: low risk bug fix

Release note (bug fix): use total number of voters if set when determining replication
status

Before change:
![Screen Shot 2022-02-11 at 10 03 57 AM](https://user-images.githubusercontent.com/17861665/153615571-85163409-5bac-40f4-9669-20dce77185cf.png)

After change:
![Screen Shot 2022-02-11 at 9 53 04 AM](https://user-images.githubusercontent.com/17861665/153615316-785b156b-bd23-4cfa-a76d-7c9fa47fbf1e.png)

77315: backupccl: backup correctly tries reading in from base directory if l… r=DarrylWong a=DarrylWong

…atest/checkpoint files aren't found

Before, we only tried reading from the base directory if we caught a ErrFileDoesNotExist error. However
this does not account for the potential error thrown when the progress/latest directories don't exist.
This changes it so we now correctly retry reading from the base directory.

We also put the latest directory inside of a metadata directory, in order to avoid any potential
conflicts with there being a latest file and latest directory in the same base directory.

Also wraps errors in findLatestFile and readLatestCheckpointFile for more clarity when both base and
latest/progress directories fail to read.

Fixes #77312

Release justification: Low risk bug fix
Release note: none

77406: backupccl: test ignore ProtectionPolicy for exclude_data_from_backup r=dt a=adityamaru

This change adds an end to end test to ensure that a table excluded
from backup will not holdup GC on its replica even in the presence
of a protected timestamp record covering the replica

From a users point of view, this allows them to mark a table whose
row data will be excluded from backup, and to set that tables gc.ttl
to a very low value. Backups that write PTS records will no longer
holdup GC on such low GC TTL tables.

Fixes: #73536

Release note: None

Release justification: low risk update to new functionality

77450: ui: add selected period as part of cached key r=maryliag a=maryliag

Previously, the fingerprint id and the app names were used
as a key for a statement details cache. This commits adds
the start and end time (when existing) to the key, so
the details are correctly assigned to the selected period.

This commit also rounds the selected value period to the hour,
since that is what is used on the persisted statistics, with
the start value keeping the hour and the end value adding one
hour, for example:
start: 17:45:23  ->  17:00:00
end:   20:14:32  ->  21:00:00

Partially addresses #72129

Release note: None
Release Justification: Low risk, high benefit change

77597: kv: Add `created` column to `active_range_feeds` table. r=miretskiy a=miretskiy

Add `created` column to `active_range_feeds` table.
This column is initialized to the time when the partial range feed
was created.  This allows us to determine, among other things,
whether or not the rangefeed is currently performing a catchup scan
(i.e. it's resolved column is 0), and how long the scan has been running
for.

Release Notes (enterprise): Add created time column
to `crdb_internal.active_range_feeds` virtual table to improve observability
and debugability of rangefeed system.

Fixes #77581

Release Justification: Low impact observability/debugability improvement.

Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Santamaura <[email protected]>
Co-authored-by: Darryl <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@craig craig bot closed this as completed in e6a483c Mar 10, 2022
Santamaura added a commit to Santamaura/cockroach that referenced this issue Jul 20, 2022
…cation status

Before this patch, the replication report's over/under-replication determination
for a range was broken for ranges with non-voting replicas; we were comparing the
desired number of voters against all the current replicas (voters and non-voters).
This patch fixes it by comparing the desired number of voters with the current voters.
In doing so, the over/under-replication counts in the report become defined as relating
exclusively to voters; a range with the wrong number of non-voters is not counted
anywhere in the report. Resolves cockroachdb#69335.

Release note (bug fix): A bug causing incorrect counts of under_replicated_ranges and
over_replicated_ranges in the crdb_internal.replication_stats table for multi-region
databases has been fixed.
aliher1911 pushed a commit to aliher1911/cockroach that referenced this issue Jul 21, 2022
…cation status

Before this patch, the replication report's over/under-replication determination
for a range was broken for ranges with non-voting replicas; we were comparing the
desired number of voters against all the current replicas (voters and non-voters).
This patch fixes it by comparing the desired number of voters with the current voters.
In doing so, the over/under-replication counts in the report become defined as relating
exclusively to voters; a range with the wrong number of non-voters is not counted
anywhere in the report. Resolves cockroachdb#69335.

Release note (bug fix): A bug causing incorrect counts of under_replicated_ranges and
over_replicated_ranges in the crdb_internal.replication_stats table for multi-region
databases has been fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability A-multiregion Related to multi-region C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
3 participants