-
-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] k3d IPAM (to prevent etcd failures) #550
Comments
Hi @mindkeep , thanks for opening this issue! |
Log
If there's no flag for k3s/etcd that we could use to work around this, we'll have to check how feasible it is to manage static IPs for k3d nodes (i.e. similar to using the |
Worth noting, if I shut down both containers and restart k3d-test-k3d-server-0 first, we get the original IP ordering and everything comes back up. So it's purely a docker IP ordering problem. Given that we already spin up a new docker network, could we explore assigning more static IPs within that network? |
Yep..
Implementing some IPAM in k3d would be the only possible solution, yep 🤔 Let's turn this into a feature request 👍 |
Hi @mindkeep, while I'm still trying to figure out the best way to cope with this, I already did some work on this. |
First try
|
@bukowa ... well.. crap 😁 |
After restarting docker and deleting that cluster, my old clusters with 1 server won't bootup :(
|
@bukowa can you inspect those node containers? |
@iwilltry42 no overlapping networks + all nodes started right now after another docker restart, strange :) before they were in exit loop |
@bukowa should be fixed here: https://github.com/rancher/k3d/releases/tag/v4.5.0-dev.1 |
@iwilltry42 i made a small github action that may be usefull to test this https://github.com/bukowa/k3dtest/runs/2352095712?check_suite_focus=true |
Thanks for that @bukowa , I forked it and added kubectl to check that output. |
@iwilltry42 seems to work but not on my machine ^^
|
@iwilltry42 i think ive got something, because i still have old clusters running:
now each time i restart docker looks like alive nodes are shifting, take a look: restart 1:
restart 2:
restart 3:
restart 4:
|
@bukowa , I am like absolutely lost there 🤔 |
@iwilltry42 ok maybe this finding can become useful when someone encounters similar issue |
I tried to do
Now it hangs here: INFO[0000] Stopping cluster 'dev'
INFO[0000] Starting cluster 'dev'
INFO[0000] Starting the initializing server...
INFO[0000] Starting Node 'k3d-dev-server-0'
INFO[0001] Starting servers...
INFO[0001] Starting Node 'k3d-dev-server-1' The cluster was initially created with: k3d registry create registry.localhost --port 5000
k3d cluster create dev \
--k3s-server-arg "--no-deploy=traefik" \
--registry-use k3d-registry.localhost:5000 \
--port 80:80@loadbalancer \
--port 443:443@loadbalancer \
--api-port 6443 --servers 3 --agents 3 |
Hi @renepardon , can you please share the logs of the |
Hello Thorsten,
I work on Ubuntu 20.04 and have the latest version of k3d.
I had to delete and recreate the cluster to be able to continue with my workload. Next time this issue appears I will post the logs.
Von unterwegs gesendet
Am 27.05.2021 um 12:36 schrieb Thorsten Klein ***@***.***>:
Hi @renepardon<https://github.com/renepardon> , can you please share the logs of the k3d-dev-server-0 and k3d-dev-server-1 containers?
Also, which version of k3d and k3s are you using and what is your environment?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#550 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAHGZK2TMXYEE7J6CRNQELTTPYOBDANCNFSM42PYHG6A>.
|
k3d version v4.4.3 But I can't find the log files. Neither in my home directory, nor in /var/log |
@renepardon , if you were to search for actual log files, they'd be in |
k3d-dev-server-0 log repeats infinite like this: time="2021-06-07T06:42:26.583579237Z" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [k3d-dev-server-1-b8022d18=https://172.21.0.4:2380 k3d-dev-server-2-bb40fdf0=https://172.21.0.5:2380 k3d-dev-server-0-f3a8f84f=https://172.21.0.3:2380], expect: k3d-dev-server-0-f3a8f84f=172.21.0.7"
time="2021-06-07T06:42:26.881852734Z" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1622098330: notBefore=2021-05-27 06:52:10 +0000 UTC notAfter=2022-06-07 06:42:26 +0000 UTC"
time="2021-06-07T06:42:26.882548039Z" level=info msg="Cluster-Http-Server 2021/06/07 06:42:26 http: TLS handshake error from 172.21.0.6:48878: remote error: tls: bad certificate"
time="2021-06-07T06:42:26.883032016Z" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1622098330: notBefore=2021-05-27 06:52:10 +0000 UTC notAfter=2022-06-07 06:42:26 +0000 UTC"
time="2021-06-07T06:42:27.350483832Z" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1622098330: notBefore=2021-05-27 06:52:10 +0000 UTC notAfter=2022-06-07 06:42:27 +0000 UTC"
time="2021-06-07T06:42:27.351461555Z" level=info msg="Cluster-Http-Server 2021/06/07 06:42:27 http: TLS handshake error from 172.21.0.2:35352: remote error: tls: bad certificate" and k3d-dev-server-1: {"level":"info","ts":"2021-06-07T06:43:10.559Z","caller":"mvcc/index.go:189","msg":"compact tree index","revision":427102}
{"level":"info","ts":"2021-06-07T06:43:10.572Z","caller":"mvcc/kvstore_compaction.go:56","msg":"finished scheduled compaction","compact-revision":427102,"took":"12.938268ms"}
W0607 06:43:13.421471 8 handler_proxy.go:102] no RequestInfo found in the context
E0607 06:43:13.421512 8 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0607 06:43:13.421518 8 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
{"level":"warn","ts":"2021-06-07T06:43:13.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:13.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:18.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:18.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:23.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:23.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:28.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:28.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:33.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:33.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:38.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:38.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
W0607 06:43:43.357254 8 info.go:53] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
{"level":"warn","ts":"2021-06-07T06:43:43.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:43.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:48.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:48.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:53.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:53.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:58.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:58.506Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:44:03.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:44:03.506Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:44:08.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:44:08.506Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"} |
@renepardon , I'll follow up on the similar issue in #619 👍 |
@iwilltry42 I can still reproduce that IPs are occasionally shuffled on restart on k3d 5.4.9 - k3s v1.24.12+k3s1 (especially after a host reboot). Does it help if I open a new issue? |
I'm experiencing the same issue. After rebooting the host, the cluster becomes irrecoverably lost. k3d v5.6.3 - k3s v1.28.8-k3s1. |
Feature Request
Original Bug Report
What did you do
I was toying with system resiliency around restarting servers and agents, and found that the Init server didn't come back after the following sequence:
where we wind up with this message repeated:
As it turns out the IP for k3d-test-k3d-server-0 moves from 172.20.0.2 to 172.20.0.6 (presumably swapping with k3d-test-k3d-agent-1). Is there any way to lock this down a bit so docker doesn't accidentally flip things around?
What did you expect to happen
I expected the IPs to be maintained...
Which OS & Architecture
Which version of
k3d
k3d version
Which version of docker
docker version
anddocker info
The text was updated successfully, but these errors were encountered: