Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vtctlclient Validate claims slaves are not replicating #4532

Closed
slanning opened this issue Jan 17, 2019 · 0 comments · Fixed by #4595
Closed

vtctlclient Validate claims slaves are not replicating #4532

slanning opened this issue Jan 17, 2019 · 0 comments · Fixed by #4595

Comments

@slanning
Copy link
Contributor

slanning commented Jan 17, 2019

vtctlclient Validate claims that slaves aren't replicating although they are.

Running this command
$ vtctlclient -server :15999 Validate -ping-tablets

The output includes pairs of error lines like this for each slave:

E0117 13:36:01.812848    3926 main.go:60] E0117 13:36:01.812707 validator.go:52] slave hostname.example.com not in replication graph for shard ks1/80- (mysql instance without vttablet?)
E0117 13:36:01.814426    3926 main.go:60] E0117 13:36:01.814245 validator.go:52] slave cell1-0000001234 not replicating: 192.168.0.1 slave list: ["hostname.example.com" (others elided...)]

As far as I can tell, the code around https://github.com/vitessio/vitess/blob/850574e0f70/go/vt/wrangler/validator.go#L203 assumes that GetSlaves returns a list of IP addresses. It obtains those using FindSlaves in https://github.com/vitessio/vitess/blob/850574e0f70/go/vt/mysqlctl/replication.go#L274
Although that's documented as "FindSlaves gets IP addresses for all currently connected slaves", what it does is: run show processlist on the master, find rows whose Command is like 'Binlog Dump%', and strip off the port from the Host column. Poking around various databases here, the Host column generally has hostnames instead of IP addresses, so that GetSlaves returns hostnames. As a result, back in validator.go tabletIPMap does indeed map from IP addresses, so that when comparing if tabletIPMap[normalizeIP(slaveAddr)] == nil it's always nil, because normalizeIP(slaveAddr) is a hostname instead of an IP address. I guess that FindSlaves should be fixed to return IP addresses as it is documented.

slanning pushed a commit to slanning/vitess that referenced this issue Feb 6, 2019
fixes vitessio#4532

I didn't see a good way to test it (other than manually running the command).

Signed-off-by: Scott Lanning <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant