From 175b8acd5466d301836b837403ce32787d0b74e3 Mon Sep 17 00:00:00 2001 From: Manan Gupta Date: Thu, 27 Jan 2022 17:04:35 +0530 Subject: [PATCH 1/7] docs: added documentation explaining all the durability policies Signed-off-by: Manan Gupta --- .../configuration-basic/durability_policy.md | 44 +++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md diff --git a/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md b/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md new file mode 100644 index 000000000..1735362e8 --- /dev/null +++ b/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md @@ -0,0 +1,44 @@ +--- +title: Durability Policy +weight: 10 +--- + +Vitess now supports a configurable interface for the durability policies. The users can now define in the interface which tablets are eligible to be promoted to a PRIMARY instance. They can also specify the number of semi-sync ACKs it requires and the tablets which are eligible to send these ACKs. + +The interface definition looks like - +```go +// durabler is the interface which is used to get the promotion rules for candidates and the semi sync setup +type durabler interface { + promotionRule(*topodatapb.Tablet) promotionrule.CandidatePromotionRule + semiSyncAckers(*topodatapb.Tablet) int + isReplicaSemiSync(primary, replica *topodatapb.Tablet) bool +} +``` + +There are 3 implementations bundled with Vitess - + - ***semi_sync*** - This durability policy setups the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers to acknowledge semi sync. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else + - ***none** (default)* - This durability policy does not setup any semi-sync configurations. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else + - ***cross_cell*** - This durability policy setups the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers from a different cell to acknowledge semi sync. This means that a transaction must be in two cells for it to be acknowledged. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else + + +[EmergencyReparentShard](../../configuration-advanced/reparenting/#emergencyreparentshard-emergency-reparenting) and [PlannedReparentShard](../../configuration-advanced/reparenting/#plannedreparentshard-planned-reparenting) will use the durability rules while choosing the correct candidate for promotion. + +This configuration should be specified in [vtctld](../vtctld), [vtctl](../../../concepts/vtctl) and vtworker as a flag `-durability_policy`. It should be specified in [vtorc](../vtorc) as `Durability` config. + +{{< info >}} +Currently the durability policies are not used to setup semi-sync in EmergencyReparentShard or PlannedReparentShard. All the RPCs are still using `-enable_semi_sync` flag on vttablet to setup semi-sync. This flag is currently being used for promotion rules and to log discrepancies in semi-sync setup. Nonetheless, this flag should be specified correctly for upgrade considerations to future releases when the durability policies will be used to setup semi-sync and `-enable_semi_sync` is deprecated. +{{< /info >}} + +{{< info >}} +In case the user notices any logs that look like the following, they should create an issue and report it - +``` +invalid configuration - semi-sync should be setup according to durability policies, but enable_semi_sync is not set +``` +``` +invalid configuration - semi-sync should be setup according to durability policies, but the tablet is not primaryEligible +``` +If the following log is noticed when all the components are upgraded, then it should also be reported - +``` +invalid configuration - enabling semi sync even though not specified by durability policies. Possibly in the process of upgrading +``` +{{< /info >}} From 10084865496fdbeb271ed15319335e2b08f68cc9 Mon Sep 17 00:00:00 2001 From: Manan Gupta Date: Thu, 27 Jan 2022 17:08:16 +0530 Subject: [PATCH 2/7] docs: link durability policies from vtctld Signed-off-by: Manan Gupta --- .../en/docs/13.0/user-guides/configuration-basic/vtctld.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/content/en/docs/13.0/user-guides/configuration-basic/vtctld.md b/content/en/docs/13.0/user-guides/configuration-basic/vtctld.md index b55859a92..e1172e24e 100644 --- a/content/en/docs/13.0/user-guides/configuration-basic/vtctld.md +++ b/content/en/docs/13.0/user-guides/configuration-basic/vtctld.md @@ -12,9 +12,12 @@ vtctld \ -log_dir=${VTDATAROOT}/tmp \ -port=15000 \ -grpc_port=15999 \ - -service_map='grpc-vtctl' + -service_map='grpc-vtctl' \ + -durability_policy='none' ``` +Look at [Durability Policies](../durability_policy) for more information on the possible durability policies. + If the TopoServer is unreachable, or if the topo flags are incorrectly configured, vtctld will fail to start. You may see an error message like the following in the logs: ```text From 518e933ee4f7d3b7cf26a1054aa3b2306c25250d Mon Sep 17 00:00:00 2001 From: Manan Gupta Date: Thu, 27 Jan 2022 17:10:55 +0530 Subject: [PATCH 3/7] docs: link durability policies from vtorc Signed-off-by: Manan Gupta --- content/en/docs/13.0/user-guides/configuration-basic/vtorc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/13.0/user-guides/configuration-basic/vtorc.md b/content/en/docs/13.0/user-guides/configuration-basic/vtorc.md index 526c776e5..fb0ce3faf 100644 --- a/content/en/docs/13.0/user-guides/configuration-basic/vtorc.md +++ b/content/en/docs/13.0/user-guides/configuration-basic/vtorc.md @@ -41,7 +41,7 @@ vtorc \ Bringing up `vtorc` should immediately cause a primary to be elected among the vttablets that have come up. -The `vtorc` config supports a new `Durability` setting that can currently be set to `none`, `semi_sync` or `cross_cell`. The `semi_sync` setting is the equivalent to setting the vttablet’s `enable_semi_sync` flag, whereas `cross_cell` will ensure that a primary will acknowledge a commit only if a `replica` that is not in the current cell has received the binary logs. +The `vtorc` config supports `Durability` setting. Look at [Durability Policies](../durability_policy) for more information on the possible durability policies. You can optionally add a `clusters_to_watch` flag that contains a comma separated list of keyspaces or `keyspace/shard` values. If specified, `vtorc` will manage only those clusters. From 0e4024ea1e239f12a80f2f6c493f5669e2cd263b Mon Sep 17 00:00:00 2001 From: Manan Gupta Date: Thu, 27 Jan 2022 17:14:04 +0530 Subject: [PATCH 4/7] docs: link durability policies from ers and prs Signed-off-by: Manan Gupta --- .../13.0/user-guides/configuration-advanced/reparenting.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/content/en/docs/13.0/user-guides/configuration-advanced/reparenting.md b/content/en/docs/13.0/user-guides/configuration-advanced/reparenting.md index 52e43f04a..49930ad9c 100644 --- a/content/en/docs/13.0/user-guides/configuration-advanced/reparenting.md +++ b/content/en/docs/13.0/user-guides/configuration-advanced/reparenting.md @@ -58,6 +58,8 @@ This command performs the following actions when used to initialize the first pr - On the primary-elect tablet, insert a row into an internal table and then update the global shard object's PrimaryAlias record. - In parallel on each replica, set the new primary and wait for the inserted row to replicate to the replica tablet. +The new primary (if unspecified) is chosen using the [Durability Policies](../durability_policy) configured. + ### EmergencyReparentShard: Emergency reparenting The `EmergencyReparentShard` command is used to force a reparent to a new primary when the current primary is unavailable. The command assumes that data cannot be retrieved from the current primary because it is dead or not working properly. @@ -76,6 +78,8 @@ This command performs the following actions: - On the primary-elect tablet, Vitess inserts an entry in a test table and then updates the `PrimaryAlias` record of the global Shard object. - In parallel on each replica, excluding the old primary, Vitess sets the primary and waits for the test entry to replicate to the replica tablet. Replica tablets that had not been replicating before the command was called are left in their current state and do not start replication after the reparenting process. +The new primary (if unspecified) is chosen using the [Durability Policies](../durability_policy) configured. + ## External Reparenting External reparenting occurs when another tool handles the process of changing a shard's primary tablet. After that occurs, the tool needs to call the [`vtctl TabletExternallyReparented`](../../../reference/programs/vtctl/shards/#tabletexternallyreparented) command to ensure that the topology service, replication graph, and serving graph are updated accordingly. From 962681a7b0d59121401359d991324e0e12e57fef Mon Sep 17 00:00:00 2001 From: Manan Gupta Date: Fri, 28 Jan 2022 11:48:40 +0530 Subject: [PATCH 5/7] docs: fix grammar and representation Signed-off-by: Manan Gupta --- .../configuration-basic/durability_policy.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md b/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md index 1735362e8..a1364cfc3 100644 --- a/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md +++ b/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md @@ -3,9 +3,9 @@ title: Durability Policy weight: 10 --- -Vitess now supports a configurable interface for the durability policies. The users can now define in the interface which tablets are eligible to be promoted to a PRIMARY instance. They can also specify the number of semi-sync ACKs it requires and the tablets which are eligible to send these ACKs. +Vitess now supports a configurable interface for durability policies. Users can now define, in the interface, which tablets are eligible to be promoted to a PRIMARY instance. They can also specify the number of semi-sync ACKs it requires and the tablets which are eligible to send these ACKs. -The interface definition looks like - +The interface definition looks like: ```go // durabler is the interface which is used to get the promotion rules for candidates and the semi sync setup type durabler interface { @@ -15,10 +15,10 @@ type durabler interface { } ``` -There are 3 implementations bundled with Vitess - - - ***semi_sync*** - This durability policy setups the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers to acknowledge semi sync. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else - - ***none** (default)* - This durability policy does not setup any semi-sync configurations. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else - - ***cross_cell*** - This durability policy setups the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers from a different cell to acknowledge semi sync. This means that a transaction must be in two cells for it to be acknowledged. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else +There are 3 implementations bundled with Vitess: + - ***semi_sync*** - This durability policy sets the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers to acknowledge semi sync. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else + - ***none** (default)* - This durability policy does not set any semi-sync configurations. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else + - ***cross_cell*** - This durability policy sets the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers from a different cell to acknowledge semi sync. This means that a transaction must be in two cells for it to be acknowledged. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else [EmergencyReparentShard](../../configuration-advanced/reparenting/#emergencyreparentshard-emergency-reparenting) and [PlannedReparentShard](../../configuration-advanced/reparenting/#plannedreparentshard-planned-reparenting) will use the durability rules while choosing the correct candidate for promotion. @@ -26,18 +26,18 @@ There are 3 implementations bundled with Vitess - This configuration should be specified in [vtctld](../vtctld), [vtctl](../../../concepts/vtctl) and vtworker as a flag `-durability_policy`. It should be specified in [vtorc](../vtorc) as `Durability` config. {{< info >}} -Currently the durability policies are not used to setup semi-sync in EmergencyReparentShard or PlannedReparentShard. All the RPCs are still using `-enable_semi_sync` flag on vttablet to setup semi-sync. This flag is currently being used for promotion rules and to log discrepancies in semi-sync setup. Nonetheless, this flag should be specified correctly for upgrade considerations to future releases when the durability policies will be used to setup semi-sync and `-enable_semi_sync` is deprecated. +Currently the durability policies are not used to setup semi-sync in EmergencyReparentShard or PlannedReparentShard. All the RPCs are still using the `-enable_semi_sync` flag on vttablet to setup semi-sync. This flag is currently being used for promotion rules and to log discrepancies in semi-sync setup. Nonetheless, this flag should be specified correctly for upgrade considerations to future releases when the durability policies will be used to setup semi-sync and `-enable_semi_sync` is deprecated. {{< /info >}} {{< info >}} -In case the user notices any logs that look like the following, they should create an issue and report it - +In case you notice any logs that look like the following, you should create an issue [here](https://github.com/vitessio/vitess/issues) and report it: ``` invalid configuration - semi-sync should be setup according to durability policies, but enable_semi_sync is not set ``` ``` invalid configuration - semi-sync should be setup according to durability policies, but the tablet is not primaryEligible ``` -If the following log is noticed when all the components are upgraded, then it should also be reported - +If the following log is noticed when all the components are upgraded, then it should also be reported: ``` invalid configuration - enabling semi sync even though not specified by durability policies. Possibly in the process of upgrading ``` From 777380333eacf8eaf5ff18a701dca18917d97012 Mon Sep 17 00:00:00 2001 From: Manan Gupta Date: Tue, 1 Feb 2022 08:05:09 +0530 Subject: [PATCH 6/7] docs: improve language Signed-off-by: Manan Gupta --- .../13.0/user-guides/configuration-advanced/reparenting.md | 4 ++-- .../13.0/user-guides/configuration-basic/durability_policy.md | 2 +- .../en/docs/13.0/user-guides/configuration-basic/vtctld.md | 2 +- content/en/docs/13.0/user-guides/configuration-basic/vtorc.md | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/content/en/docs/13.0/user-guides/configuration-advanced/reparenting.md b/content/en/docs/13.0/user-guides/configuration-advanced/reparenting.md index 49930ad9c..8e88e6bc9 100644 --- a/content/en/docs/13.0/user-guides/configuration-advanced/reparenting.md +++ b/content/en/docs/13.0/user-guides/configuration-advanced/reparenting.md @@ -58,7 +58,7 @@ This command performs the following actions when used to initialize the first pr - On the primary-elect tablet, insert a row into an internal table and then update the global shard object's PrimaryAlias record. - In parallel on each replica, set the new primary and wait for the inserted row to replicate to the replica tablet. -The new primary (if unspecified) is chosen using the [Durability Policies](../durability_policy) configured. +The new primary (if unspecified) is chosen using the configured [Durability Policy](../durability_policy). ### EmergencyReparentShard: Emergency reparenting @@ -78,7 +78,7 @@ This command performs the following actions: - On the primary-elect tablet, Vitess inserts an entry in a test table and then updates the `PrimaryAlias` record of the global Shard object. - In parallel on each replica, excluding the old primary, Vitess sets the primary and waits for the test entry to replicate to the replica tablet. Replica tablets that had not been replicating before the command was called are left in their current state and do not start replication after the reparenting process. -The new primary (if unspecified) is chosen using the [Durability Policies](../durability_policy) configured. +The new primary (if unspecified) is chosen using the configured [Durability Policy](../durability_policy). ## External Reparenting diff --git a/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md b/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md index a1364cfc3..011b72c6a 100644 --- a/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md +++ b/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md @@ -18,7 +18,7 @@ type durabler interface { There are 3 implementations bundled with Vitess: - ***semi_sync*** - This durability policy sets the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers to acknowledge semi sync. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else - ***none** (default)* - This durability policy does not set any semi-sync configurations. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else - - ***cross_cell*** - This durability policy sets the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers from a different cell to acknowledge semi sync. This means that a transaction must be in two cells for it to be acknowledged. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else + - ***cross_cell*** - This durability policy sets the number of required semi-sync ACKers to 1. It only allows Primary and Replica type servers from a different cell to acknowledge semi sync. This means that a write must be in two cells for it to be acknowledged. It returns NeutralPromoteRule for Primary and Replica tablet types, MustNotPromoteRule for everything else [EmergencyReparentShard](../../configuration-advanced/reparenting/#emergencyreparentshard-emergency-reparenting) and [PlannedReparentShard](../../configuration-advanced/reparenting/#plannedreparentshard-planned-reparenting) will use the durability rules while choosing the correct candidate for promotion. diff --git a/content/en/docs/13.0/user-guides/configuration-basic/vtctld.md b/content/en/docs/13.0/user-guides/configuration-basic/vtctld.md index e1172e24e..dd5e8eddb 100644 --- a/content/en/docs/13.0/user-guides/configuration-basic/vtctld.md +++ b/content/en/docs/13.0/user-guides/configuration-basic/vtctld.md @@ -16,7 +16,7 @@ vtctld \ -durability_policy='none' ``` -Look at [Durability Policies](../durability_policy) for more information on the possible durability policies. +Look at [Durability Policies](../durability_policy) for more information on the available durability policies. If the TopoServer is unreachable, or if the topo flags are incorrectly configured, vtctld will fail to start. You may see an error message like the following in the logs: diff --git a/content/en/docs/13.0/user-guides/configuration-basic/vtorc.md b/content/en/docs/13.0/user-guides/configuration-basic/vtorc.md index fb0ce3faf..9ac2ba657 100644 --- a/content/en/docs/13.0/user-guides/configuration-basic/vtorc.md +++ b/content/en/docs/13.0/user-guides/configuration-basic/vtorc.md @@ -41,7 +41,7 @@ vtorc \ Bringing up `vtorc` should immediately cause a primary to be elected among the vttablets that have come up. -The `vtorc` config supports `Durability` setting. Look at [Durability Policies](../durability_policy) for more information on the possible durability policies. +The `vtorc` config supports `Durability` setting. Look at [Durability Policies](../durability_policy) for more information on the available durability policies. You can optionally add a `clusters_to_watch` flag that contains a comma separated list of keyspaces or `keyspace/shard` values. If specified, `vtorc` will manage only those clusters. From 5506cfff906f1fd4e90fb7fc508b481e17c7d510 Mon Sep 17 00:00:00 2001 From: Manan Gupta Date: Tue, 1 Feb 2022 08:15:00 +0530 Subject: [PATCH 7/7] docs: only ask the report an issue if the configuration of vtctld and vttablet are in order Signed-off-by: Manan Gupta --- .../user-guides/configuration-basic/durability_policy.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md b/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md index 011b72c6a..4c58f13b8 100644 --- a/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md +++ b/content/en/docs/13.0/user-guides/configuration-basic/durability_policy.md @@ -30,13 +30,17 @@ Currently the durability policies are not used to setup semi-sync in EmergencyRe {{< /info >}} {{< info >}} -In case you notice any logs that look like the following, you should create an issue [here](https://github.com/vitessio/vitess/issues) and report it: +In case you notice any logs that look like the following, please check that your vtctld and vttablet configurations match: ``` invalid configuration - semi-sync should be setup according to durability policies, but enable_semi_sync is not set ``` ``` invalid configuration - semi-sync should be setup according to durability policies, but the tablet is not primaryEligible ``` +If `-enable_semi_sync` is set on the vttablets, then `semi_sync` durability policy should be used. If semi-sync is not being used then `-durability_policy` should be set to `none`. + +If the configurations are in order , then you should create an issue [here](https://github.com/vitessio/vitess/issues) and report it. + If the following log is noticed when all the components are upgraded, then it should also be reported: ``` invalid configuration - enabling semi sync even though not specified by durability policies. Possibly in the process of upgrading