Skip to content

Commit

Permalink
Merge pull request #955 from GuptaManan100/durability-doc
Browse files Browse the repository at this point in the history
Add docs for the durability policies
  • Loading branch information
deepthi authored Feb 2, 2022
2 parents 098e06a + 5506cff commit b3b34e6
Show file tree
Hide file tree
Showing 4 changed files with 57 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ This command performs the following actions when used to initialize the first pr
- On the primary-elect tablet, insert a row into an internal table and then update the global shard object's PrimaryAlias record.
- In parallel on each replica, set the new primary and wait for the inserted row to replicate to the replica tablet.

The new primary (if unspecified) is chosen using the configured [Durability Policy](../durability_policy).

### EmergencyReparentShard: Emergency reparenting

The `EmergencyReparentShard` command is used to force a reparent to a new primary when the current primary is unavailable. The command assumes that data cannot be retrieved from the current primary because it is dead or not working properly.
Expand All @@ -76,6 +78,8 @@ This command performs the following actions:
- On the primary-elect tablet, Vitess inserts an entry in a test table and then updates the `PrimaryAlias` record of the global Shard object.
- In parallel on each replica, excluding the old primary, Vitess sets the primary and waits for the test entry to replicate to the replica tablet. Replica tablets that had not been replicating before the command was called are left in their current state and do not start replication after the reparenting process.

The new primary (if unspecified) is chosen using the configured [Durability Policy](../durability_policy).

## External Reparenting

External reparenting occurs when another tool handles the process of changing a shard's primary tablet. After that occurs, the tool needs to call the [`vtctl TabletExternallyReparented`](../../../reference/programs/vtctl/shards/#tabletexternallyreparented) command to ensure that the topology service, replication graph, and serving graph are updated accordingly.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
title: Durability Policy
weight: 10
---

Vitess now supports a configurable interface for durability policies. Users can now define, in the interface, which tablets are eligible to be promoted to a PRIMARY instance. They can also specify the number of semi-sync ACKs it requires and the tablets which are eligible to send these ACKs.

The interface definition looks like:
```go
// durabler is the interface which is used to get the promotion rules for candidates and the semi sync setup
type durabler interface {
promotionRule(*topodatapb.Tablet) promotionrule.CandidatePromotionRule
semiSyncAckers(*topodatapb.Tablet) int
isReplicaSemiSync(primary, replica *topodatapb.Tablet) bool
}
```

There are 3 implementations bundled with Vitess:
- ***semi_sync*** - This durability policy sets the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers to acknowledge semi sync. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else
- ***none** (default)* - This durability policy does not set any semi-sync configurations. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else
- ***cross_cell*** - This durability policy sets the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers from a different cell to acknowledge semi sync. This means that a write must be in two cells for it to be acknowledged. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else


[EmergencyReparentShard](../../configuration-advanced/reparenting/#emergencyreparentshard-emergency-reparenting) and [PlannedReparentShard](../../configuration-advanced/reparenting/#plannedreparentshard-planned-reparenting) will use the durability rules while choosing the correct candidate for promotion.

This configuration should be specified in [vtctld](../vtctld), [vtctl](../../../concepts/vtctl) and vtworker as a flag `-durability_policy`. It should be specified in [vtorc](../vtorc) as `Durability` config.

{{< info >}}
Currently the durability policies are not used to setup semi-sync in EmergencyReparentShard or PlannedReparentShard. All the RPCs are still using the `-enable_semi_sync` flag on vttablet to setup semi-sync. This flag is currently being used for promotion rules and to log discrepancies in semi-sync setup. Nonetheless, this flag should be specified correctly for upgrade considerations to future releases when the durability policies will be used to setup semi-sync and `-enable_semi_sync` is deprecated.
{{< /info >}}

{{< info >}}
In case you notice any logs that look like the following, please check that your vtctld and vttablet configurations match:
```
invalid configuration - semi-sync should be setup according to durability policies, but enable_semi_sync is not set
```
```
invalid configuration - semi-sync should be setup according to durability policies, but the tablet is not primaryEligible
```
If `-enable_semi_sync` is set on the vttablets, then `semi_sync` durability policy should be used. If semi-sync is not being used then `-durability_policy` should be set to `none`.

If the configurations are in order , then you should create an issue [here](https://github.com/vitessio/vitess/issues) and report it.

If the following log is noticed when all the components are upgraded, then it should also be reported:
```
invalid configuration - enabling semi sync even though not specified by durability policies. Possibly in the process of upgrading
```
{{< /info >}}
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,12 @@ vtctld <topo_flags> <backup_flags> \
-log_dir=${VTDATAROOT}/tmp \
-port=15000 \
-grpc_port=15999 \
-service_map='grpc-vtctl'
-service_map='grpc-vtctl' \
-durability_policy='none'
```

Look at [Durability Policies](../durability_policy) for more information on the available durability policies.

If the TopoServer is unreachable, or if the topo flags are incorrectly configured, vtctld will fail to start. You may see an error message like the following in the logs:

```text
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ vtorc <topo_flags> \

Bringing up `vtorc` should immediately cause a primary to be elected among the vttablets that have come up.

The `vtorc` config supports a new `Durability` setting that can currently be set to `none`, `semi_sync` or `cross_cell`. The `semi_sync` setting is the equivalent to setting the vttablet’s `enable_semi_sync` flag, whereas `cross_cell` will ensure that a primary will acknowledge a commit only if a `replica` that is not in the current cell has received the binary logs.
The `vtorc` config supports `Durability` setting. Look at [Durability Policies](../durability_policy) for more information on the available durability policies.

You can optionally add a `clusters_to_watch` flag that contains a comma separated list of keyspaces or `keyspace/shard` values. If specified, `vtorc` will manage only those clusters.

Expand Down

0 comments on commit b3b34e6

Please sign in to comment.