You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Somewhat related to #8316, in that the testcase is the same up to a point:
Scenario:
Simple test environment with 3 tablets in single unsharded keyspace
No write traffic running to the keyspace at all
Semi-sync is on
Version (recent main): Version: 11.0.0-SNAPSHOT (Git revision e018d0fd94 branch 'main') built on Thu Jun 10 20:52:10 PDT 2021 by jacques@dhoomtop using go1.16.3 linux/amd64
Run ERS, it succeeds (in spite of the error, it's basically saying that the existing master tablet is not a replica):
$ ~/vt/vitess/bin/vtctlclient -server 127.0.0.1:15999 EmergencyReparentShard -keyspace_shard=keyspace1/0
W0610 21:09:15.753305 172587 main.go:67] W0611 04:09:15.752892 replication.go:221] failed to get replication status from zone1-0000000100: rpc error: code = Unknown desc = TabletManager.StopReplicationAndGetStatus on zone1-0000000100 error: before status failed: no replication status: before status failed: no replication status
Now, inspect the tablet data directories for do_not_replicate sentinel files:
$ find vt_000000010* -type f | grep do_not_repl
vt_0000000101/do_not_replicate
vt_0000000102/do_not_replicate
This is a problem, since upon the next ERS where tablet zone1-0000000100 may become the master, the presence of these files will prevent the tablets zone1-0000000101 and zone1-0000000102 from starting to replicate from these tablets. Since semi-sync is on, this will end up blocking writes to the master.
It is clear that a clean ERS that is successful should terminate without these flags being left on disk. Note that the purpose of these flags being set during ERS is to prevent the replication reporter from restarting replication while we are inspecting the replicas deciding which to promote. Once this phase is done, and the reparent is complete, these flags should be removed.
The text was updated successfully, but these errors were encountered:
Somewhat related to #8316, in that the testcase is the same up to a point:
Scenario:
Version (recent main):
Version: 11.0.0-SNAPSHOT (Git revision e018d0fd94 branch 'main') built on Thu Jun 10 20:52:10 PDT 2021 by jacques@dhoomtop using go1.16.3 linux/amd64
Initial keyspace layout:
ShardReplicationPositions:
do_not_replicate
sentinel files:This is a problem, since upon the next ERS where tablet
zone1-0000000100
may become the master, the presence of these files will prevent the tabletszone1-0000000101
andzone1-0000000102
from starting to replicate from these tablets. Since semi-sync is on, this will end up blocking writes to the master.It is clear that a clean ERS that is successful should terminate without these flags being left on disk. Note that the purpose of these flags being set during ERS is to prevent the replication reporter from restarting replication while we are inspecting the replicas deciding which to promote. Once this phase is done, and the reparent is complete, these flags should be removed.
The text was updated successfully, but these errors were encountered: