Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Shard-by-shard migrations mangle output on the second sharded keyspace: #13508

Closed
FancyFane opened this issue Jul 14, 2023 · 2 comments · Fixed by #13515
Closed

Comments

@FancyFane
Copy link
Collaborator

Overview of the Issue

When doing a shard by shard migration for two sharded keyspaces; the second keyspace appears to provide mangled output.

$ sharded_keyspace="keyspace_1"
$ shard="80-"

$ vtctlclient --server localhost:15999 MoveTables -- --cells="${CELLS}" --all --auto_start=false --tablet_types="in_order:REPLICA,PRIMARY" --source="${sharded_keyspace}_source" --source_shards="${shard}" Create ${sharded_keyspace}.import-shard${shard}
Workflow has been created in Stopped state
Create was successful for workflow keyspace_1.import-shard80-
Start State: Not Created
Current State: Reads Not Switched. Writes Not Switched

$ sharded_keyspace="keyspace_2"
$ shard="-40"
$ vtctlclient --server localhost:15999 MoveTables -- --cells="${CELLS}" --all --auto_start=false --tablet_types="in_order:REPLICA,PRIMARY" --source="${sharded_keyspace}_source" --source_shards="${shard}" Create ${sharded_keyspace}.import-shard${shard}
Workflow has been created in Stopped state
Create was successful for workflow keyspace_2.import-shard-40
Start State: Not Created
Current State: Reads partially switched, for shards: -40,40-80,80-. Writes partially switched, for shards: -40,40-80,80-

Reproduction Steps

  1. Configure two sharded keyspaces
  2. Run a shard-by-shard workflow on all of the shard for the first sharded keyspace
  3. On the second keyspace when starting the shard-by-shard workflow there will be mangled output for the first shard and all subsequent shards.

Binary Version

Version: 16.0.X

Operating System and Environment details

n/a

Log Fragments

n/a
@FancyFane FancyFane added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged labels Jul 14, 2023
@mattlord mattlord self-assigned this Jul 16, 2023
@mattlord mattlord added Component: VReplication and removed Needs Triage This issue needs to be correctly labelled and triaged labels Jul 16, 2023
@mattlord
Copy link
Contributor

mattlord commented Jul 16, 2023

I was able to repeat it using the local examples this way:

git checkout main && make build

cd examples/local

./101_initial_cluster.sh

for i in 200 201 202; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=-40 CELL=zone1 KEYSPACE=customer TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force customer/-40 zone1-200

for i in 210 211 212; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=40-80 CELL=zone1 KEYSPACE=customer TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force customer/40-80 zone1-210

for i in 220 221 222; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=80- CELL=zone1 KEYSPACE=customer TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force customer/80- zone1-220

for i in 230 231 232; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=-40 CELL=zone1 KEYSPACE=customer2 TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force customer2/-40 zone1-230

for i in 240 241 242; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=40-80 CELL=zone1 KEYSPACE=customer2 TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force customer2/40-80 zone1-240

for i in 250 251 252; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=80- CELL=zone1 KEYSPACE=customer2 TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force customer2/80- zone1-250

for i in 300 301 302; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=-80 CELL=zone1 KEYSPACE=xtra TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force xtra/-80 zone1-300

for i in 310 311 312; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=80- CELL=zone1 KEYSPACE=xtra TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force xtra/80- zone1-310

for i in 320 321 322; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=-80 CELL=zone1 KEYSPACE=xtra2 TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force xtra2/-80 zone1-320

for i in 330 331 332; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=80- CELL=zone1 KEYSPACE=xtra2 TABLET_UID=$i ../common/scripts/vttablet-up.sh
done
vtctldclient InitShardPrimary --force xtra2/80- zone1-330

vtctldclient ApplyVSchema --vschema-file vschema_customer_sharded.json customer

vtctldclient ApplyVSchema --vschema-file vschema_customer_sharded.json xtra

mysql customer < create_commerce_schema.sql
vtctlclient MoveTables -- --source customer --source_shards='-40' --tables 'customer' --auto_start=false Create customer2.dash40
vtctlclient MoveTables -- --source customer --source_shards='40-80' --tables 'customer' --auto_start=false Create customer2.40dash80

sleep 60

echo -n "\n\nFinal result:\n"
mysql xtra < create_commerce_schema.sql
vtctlclient MoveTables -- --source xtra --source_shards='-80' --tables 'customer' --auto_start=false Create xtra2.dash80

With the final command output being:

Workflow has been created in Stopped state
Create was successful for workflow xtra2.dash80
Start State: Not Created
Current State: Reads partially switched, for shards: 40-80,80-,-40. Writes partially switched, for shards: 40-80,80-,-40

@mattlord
Copy link
Contributor

This patch seems to fix the issue:

diff --git a/go/vt/wrangler/traffic_switcher.go b/go/vt/wrangler/traffic_switcher.go
index 43ab57b8ca..86968e3bb9 100644
--- a/go/vt/wrangler/traffic_switcher.go
+++ b/go/vt/wrangler/traffic_switcher.go
@@ -252,10 +252,13 @@ func (wr *Wrangler) getWorkflowState(ctx context.Context, targetKeyspace, workfl

 			rules := shardRoutingRules.Rules
 			for _, rule := range rules {
-				if rule.ToKeyspace == ts.SourceKeyspaceName() {
+				switch rule.ToKeyspace {
+				case ts.SourceKeyspaceName():
 					state.ShardsNotYetSwitched = append(state.ShardsNotYetSwitched, rule.Shard)
-				} else {
+				case ts.TargetKeyspaceName():
 					state.ShardsAlreadySwitched = append(state.ShardsAlreadySwitched, rule.Shard)
+				default:
+					// Not a relevant rule.
 				}
 			}
 		} else {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants