Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't delete ipv4 node from ipv6 consul cluster #7691

Open
archekb opened this issue Apr 23, 2020 · 1 comment
Open

Can't delete ipv4 node from ipv6 consul cluster #7691

archekb opened this issue Apr 23, 2020 · 1 comment
Labels
theme/ipv6 Relating to IPv6 theme/service-metadata Anything related to management/tracking of service metadata type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp

Comments

@archekb
Copy link

archekb commented Apr 23, 2020

Overview of the Issue

I have consul cluster with 3 ipv6 nodes. Yesterday I add to cluster ipv4 node, it's my fault, but now I don't know how delete node from cluster. Correlating with #6856

I was try:

  1. Stop ipv4 consul client node. (after stop ipv4 client node, web UI show: Agent alive and reachable)
  2. Execute on server node:
    consul force-leave test0
    curl -i -X PUT -d '{"Node":"test0"}' 'http://[xxxx:xxx:xxx:xxx::2]:8500/v1/catalog/deregister'

Now I can't see this node in the Consul Web UI, but consul members told me:
Node Address Status Type Build Protocol DC Segment
s0.dev.example.com [xxxx:xxx:xxx:xxx::2]:8301 alive server 1.7.2 2 dc1
s1.dev.example.com [xxxx:xxx:xxx:yyy::2]:8301 alive server 1.7.2 2 dc1
s2.dev.example.com [xxxx:xxx:xxx:zz::2]:8301 alive server 1.7.2 2 dc1
e0.dev.example.com [xxxx:xxx:xxx:aaa::2]:8301 alive client 1.7.1 2 dc1
e1.dev.example.com [xxxx:xxx:xxx:bbb::2]:8301 alive client 1.7.1 2 dc1
test0 172.16.147.16:8301 leaving client 1.7.2 2 dc1
test1 [xxxx:xxx:xxx:ccc::2]:8301 alive client 1.7.2 2 dc1

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create a cluster with 1 client ipv4 node and 3 server ipv6 nodes

Consul info for both Client and Server

Client info

./consul info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 9ea1a20
version = 1.7.2
consul:
acl = disabled
known_servers = 3
server = false
runtime:
arch = amd64
cpu_count = 4
goroutines = 42
max_procs = 4
os = linux
version = go1.13.7
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 55
failed = 0
health_score = 1
intent_queue = 0
left = 0
member_time = 256
members = 7
query_queue = 0
query_time = 21

Server info

./consul info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 9ea1a20
version = 1.7.2
consul:
acl = disabled
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = [xxxx:xxx:xxx:yyy::2]:8300
server = true
raft:
applied_index = 425297
commit_index = 425297
fsm_pending = 0
last_contact = 0
last_log_index = 425297
last_log_term = 117
last_snapshot_index = 409679
last_snapshot_term = 117
latest_configuration = [{Suffrage:Voter ID:26365a1c-28c7-cd87-c604-2eb8faf78f81 Address:[xxxx:xxx:xxx:xxx::2]:8300} {Suffrage:Voter ID:4188bade-2a7b-ff34-90e4-ade73cf7c052 Address:[xxxx:xxx:xxx:yyy::2]:8300} {Suffrage:Voter ID:2fae1f7b-b49b-0749-6f9c-38a4db60111d Address:[xxxx:xxx:xxx:zz::2]:8300}]
latest_configuration_index = 0
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 117
runtime:
arch = amd64
cpu_count = 4
goroutines = 122
max_procs = 4
os = linux
version = go1.13.7
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 55
failed = 0
health_score = 0
intent_queue = 4969
left = 0
member_time = 255
members = 7
query_queue = 0
query_time = 21
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 130
members = 3
query_queue = 0
query_time = 1

Client config

{
"bootstrap": false,
"server": false,
"bind_addr": "172.16.147.16",
"client_addr": "172.16.147.16",
"datacenter": "DC1",
"encrypt": "=======encrypted key here===========",
"data_dir": "/tmp/consul",
"enable_local_script_checks": true,
"log_level": "INFO",
"retry_join": ["xxxx:xxx:xxx:xxx::2", "xxxx:xxx:xxx:yyy::2", "xxxx:xxx:xxx:zz::2"],
"ui": false,
"leave_on_terminate": true,
"disable_update_check": true,
"disable_host_node_id": true,
"skip_leave_on_interrupt": false,
"reconnect_timeout": "8h"
}

Server config

{
"bootstrap": false,
"server": true,
"bind_addr": "xxxx:xxx:xxx:xxx::2",
"client_addr": "xxxx:xxx:xxx:xxx::2",
"datacenter": "DC1",
"encrypt": "=======encrypted key here===========",
"data_dir": "/tmp/consul",
"enable_local_script_checks": false,
"log_level": "INFO",
"retry_join": ["xxxx:xxx:xxx:xxx::2", "xxxx:xxx:xxx:yyy::2", "xxxx:xxx:xxx:zz::2"],
"ui": true,
"leave_on_terminate": false,
"disable_update_check": true,
"disable_host_node_id": true,
"skip_leave_on_interrupt": false,
"reconnect_timeout": "8h"
}

Operating system and Environment details

Linux test0 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64 GNU/Linux
Docker version 19.03.5, build 633a0ea838

Log Fragments

Server error

2020-04-23T07:00:35.415Z [ERROR] agent.server.memberlist.lan: memberlist: Failed to send ping: write udp [xxxx:xxx:xxx:xxx::2]:8301->172.16.147.16:8301: sendto: network is unreachable

Client error

consul_1 | ==> Starting Consul agent...
consul_1 | Version: 'v1.7.2'
consul_1 | Node ID: '2d35f5d6-f85f-3958-e2d9-044a3d0cdcf4'
consul_1 | Node name: 'test0'
consul_1 | Datacenter: 'dc1' (Segment: '')
consul_1 | Server: false (Bootstrap: false)
consul_1 | Client Addr: [172.16.147.16] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
consul_1 | Cluster Addr: 172.16.147.16 (LAN: 8301, WAN: 8302)
consul_1 | Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
consul_1 |
consul_1 | ==> Log data will now stream in as it occurs:
consul_1 |
consul_1 | 2020-04-23T07:13:06.810Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: test0 172.16.147.16
consul_1 | 2020-04-23T07:13:06.811Z [INFO] agent: Started DNS server: address=172.16.147.16:8600 network=udp
consul_1 | 2020-04-23T07:13:06.811Z [INFO] agent: Started DNS server: address=172.16.147.16:8600 network=tcp
consul_1 | 2020-04-23T07:13:06.811Z [INFO] agent: Started HTTP server: address=172.16.147.16:8500 network=tcp
consul_1 | 2020-04-23T07:13:06.812Z [INFO] agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
consul_1 | 2020-04-23T07:13:06.812Z [INFO] agent: Joining cluster...: cluster=LAN
consul_1 | 2020-04-23T07:13:06.812Z [INFO] agent: (LAN) joining: lan_addresses=[xxxx:xxx:xxx:xxx::2, xxxx:xxx:xxx:yyy::2, xxxx:xxx:xxx:zz::2]
consul_1 | 2020-04-23T07:13:06.812Z [INFO] agent: started state syncer
consul_1 | ==> Consul agent running!
consul_1 | 2020-04-23T07:13:06.812Z [WARN] agent.client.manager: No servers available
consul_1 | 2020-04-23T07:13:06.812Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
consul_1 | 2020-04-23T07:13:06.815Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: e1.dev.example.com xxxx:xxx:xxx:aaa::2
consul_1 | 2020-04-23T07:13:06.815Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: s1.dev.example.com xxxx:xxx:xxx:yyy::2
consul_1 | 2020-04-23T07:13:06.815Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: e0.dev.example.com xxxx:xxx:xxx:bbb::2
consul_1 | 2020-04-23T07:13:06.815Z [INFO] agent.client: adding server: server="s1.dev.example.com (Addr: tcp/[xxxx:xxx:xxx:yyy::2]:8300) (DC: id1)"
consul_1 | 2020-04-23T07:13:06.815Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: s0.dev.example.com xxxx:xxx:xxx:xxx::2
consul_1 | 2020-04-23T07:13:06.815Z [INFO] agent.client: adding server: server="s0.dev.example.com (Addr: tcp/[xxxx:xxx:xxx:xxx::2]:8300) (DC: dc1)"
consul_1 | 2020-04-23T07:13:06.815Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: test1 xxxx:xxx:xxx:ccc::2
consul_1 | 2020-04-23T07:13:06.816Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: s2.dev.example.com xxxx:xxx:xxx:zz::2
consul_1 | 2020-04-23T07:13:06.816Z [WARN] agent.client.memberlist.lan: memberlist: Refuting an alive message for 'test0' (172.16.147.16:8301) meta:([255 137 165 98 117 105 108 100 174 49 46 55 46 50 58 57 101 97 49 97 50 48 52 164 97 99 108 115 161 48 164 114 111 108 101 164 110 111 100 101 162 105 100 218 0 36 51 54 99 102 98 50 101 55 45 57 49 100 97 45 49 57 101 98 45 54 101 100 102 45 53 49 50 54 53 57 54 101 57 99 54 53 163 118 115 110 161 50 167 118 115 110 95 109 105 110 161 50 167 118 115 110 95 109 97 120 161 51 162 100 99 164 105 116 109 104 167 115 101 103 109 101 110 116 160] VS [255 137 163 118 115 110 161 50 167 118 115 110 95 109 105 110 161 50 167 115 101 103 109 101 110 116 160 162 105 100 218 0 36 50 100 51 53 102 53 100 54 45 102 56 53 102 45 51 57 53 56 45 101 50 100 57 45 48 52 52 97 51 100 48 99 100 99 102 52 167 118 115 110 95 109 97 120 161 51 165 98 117 105 108 100 174 49 46 55 46 50 58 57 101 97 49 97 50 48 52 164 97 99 108 115 161 48 164 114 111 108 101 164 110 111 100 101 162 100 99 164 105 116 109 104]), vsn:([1 5 2 2 5 4] VS [1 5 2 2 5 4])
consul_1 | 2020-04-23T07:13:06.816Z [INFO] agent.client: adding server: server="s2.dev.example.com (Addr: tcp/[xxxx:xxx:xxx:yyy::2]:8300) (DC: dc1)"
consul_1 | 2020-04-23T07:13:06.819Z [INFO] agent: (LAN) joined: number_of_nodes=2
consul_1 | 2020-04-23T07:13:06.819Z [INFO] agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=2
consul_1 | 2020-04-23T07:13:07.011Z [ERROR] agent.client.memberlist.lan: memberlist: Failed to send gossip to [xxxx:xxx:xxx:yyy::2]:8301: write udp 172.16.147.16:8301->[2a02:17d0:8115:4::2]:8301: address xxxx:xxx:xxx:aaa::2: non-IPv4 address
consul_1 | 2020-04-23T07:13:07.011Z [ERROR] agent.client.memberlist.lan: memberlist: Failed to send gossip to [xxxx:xxx:xxx:zz::2]:8301: write udp 172.16.147.16:8301->[2a02:17d0:8115:107::2]:8301: address xxxx:xxx:xxx:bbb::2: non-IPv4 address
consul_1 | 2020-04-23T07:13:07.011Z [ERROR] agent.client.memberlist.lan: memberlist: Failed to send gossip to [xxxx:xxx:xxx:ccc::2]:8301: write udp 172.16.147.16:8301->[xxxx:xxx:xxx:ccc::2]:8301: address xxxx:xxx:xxx:ccc::2: non-IPv4 address
consul_1 | 2020-04-23T07:13:07.211Z [ERROR] agent.client.memberlist.lan: memberlist: Failed to send gossip to [xxxx:xxx:xxx:yyy::2]:8301: write udp 172.16.147.16:8301->[2a02:17d0:8110:11a::2]:8301: address xxxx:xxx:xxx:yyy::2 non-IPv4 address

@jsosulska jsosulska added theme/service-metadata Anything related to management/tracking of service metadata type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp labels May 4, 2020
@gabrielexoscale
Copy link

I have the same issue but with an IPv6 server in an IPv4 cluster.

I was testing adding an IPv6 to verify that we can run mixed IPv6/IPv4 clusters and I hit the issue.

The ipv6 consul node appeared on the cluster but was not able to talk to any of the machines (my mistake).

I then tried to stop the node as we have it set to leave on exit and that should have been the end of it, but the server was still there.

I run consul force-leave -prune consul02 to forcibly remove it and nothing:

Error force leaving: Unexpected response code: 500 (agent: No node found with name 'consul02')

but, if I run a catalog on the nodes, there it is:
consul catalog nodes | grep consul002
consul002 fa6b7a7d ipv6 DC

I tried re-registering the node with a new ipv4 address and it complains that there already is a node with the same name!

At one point I stopped the other nodes and tried rebuilding the cluster with the instructions at https://learn.hashicorp.com/consul/day-2-operations/outage but it still would not use the new ipv4, it would still complain about the old one.

This is on Ubuntu 18.04 with the standard binary downloaded.

Any suggestions on how to fix this?

consul --version
Consul v1.7.3
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

@jsosulska jsosulska added the theme/ipv6 Relating to IPv6 label Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/ipv6 Relating to IPv6 theme/service-metadata Anything related to management/tracking of service metadata type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp
Projects
None yet
Development

No branches or pull requests

3 participants