-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-node: Kubernetes cluster does not start after Docker re-assigns node's IP addresses after (Docker) restart #2045
Comments
Just wanted to open the same thing, my information is as follows Environment:
Kind setup
After creating it the inspect of the related docker network shows
After a reboot of the whole machine/ or a docker restart it usually comes to the situation that the IPs have changed
Changed IP adresses
This mixup now confuses the cluster internally, since it doesn't get this updates somehow. The pods itself all get into running state, but the cluster is in a non-functional state ... e.g. you can't startup a new pod.
Looking through variuous pods: kube-controller-manager-my-k8s-control-plane
coredns
kube-apiserver-my-k8s-control-plane
kube-scheduler-my-k8s-control-plane
|
OKi, I've realized that I've patched This seems to solve the issue. Docker reuses the IPs after restart. Take a look at my branch. I'll try to generalize it and make it more flexible, but I might fail as I have no idea about Go. :-/ |
/assign custom IP allocation was discussed before and it not easy to implement, check that you can have multiple cluster created at same time ... It will be interesting to understand what is the root cause, we switched to dns in most parts to allow restarts ... can this be a kubernets limitation or are we missing something? |
So, I've implemented simple mechanism which:
The mechanism has however one disadvantage: it fills possible gaps. But it works even for subsequent additional cluster creations. Checkout the code but don't take it seriously, these are my first lines of code in Go in my life. P.S.: I'll take a look how difficult it could be for me to implement values parameter so one can specify docker IP manually, e.g.:
P.P.S.: I've uploaded compiled binary for anybody who wants to test it: https://github.com/hadrabap/kind-test-snapshots/blob/main/kind |
I think this is Docker related. The kubernetes cluster itself uses DNS but it will not get there as kubeadm is launched by systemd with configuration files using IPs only. These files are generated by |
not strictly. this is a bug in
|
Actually, docker does not guarantee this unless you create a network that excludes the chosen IPs from the auto-allocated range. You can see more on this in the previous discussion in this repo. |
This also smells like a bug somewhere between kubeadm and kind, there's no reason I shouldn't be able to put the APIserver behind DNS and have components respect that. |
I hope I understand it better now. Let me summarize and please correct me if I'm wrong:
What I've found is that the second step does something silly. Originally I made myself a shell script which re-configures the IP addresses to mach the current state, but that fails as well as all the security certificates generated in step 1 are based on IP addresses in their CNs. Which leads to a situation that the services are finally able to contact to each other but they reject the certificates and we are back in square one. I see only two ways to solve this issue permanently:
I hope I got the idea. |
Yeah I think that's pretty much it. We could maybe fix 2) from the first list by regenerating the certs potentially (bit of a headache + at bootup I think we could also take some additional approaches re: the second list:
Regarding 1.) we could also limit this to just control plane nodes. One additional problematic thing for kind is that we support provisioning N clusters in parallel in separate invocations against the same docker. Currently some users depend on this in an even worse extended circumstance: the dockerd is not on the host the I think the cleanest solution is getting everything to use DNS names, but I don't remember why that's not happening right now. I'm not sure how soon I can dig into this deeper ... lots going on right now. |
And to further clarify: the entrypoint IP update is intended to handle places where we bind to an IP in addition to the node reporting it's IP via kubelet. That intention should be fine, because certs should also be signed for the hostname and the hostname should be correctly mapped to the new IP. So the problem with that approach is things still connecting to IPs instead of domains, if we can fix that we still want to update the local node's references to it's old IP on restart. |
do we know exactly which components are failing or ALL the component are failing? @neolit123 do you have the case in kubeadm of people switching IPs on their cluster once installed? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
One thing to clarify: It's perfectly fine for the nodes to have changing IPs and be fixing that up in some places in the config, notably we have kubelet registering and setting the node IP on the node object, which is used for things like routing traffic to blocks of pod IPs, which is fine to do dynamically. What is problematic is the parts that use fixed certificates / contacting the control plane components. All of that should be using DNS, but it's not. Unfortunately rebootable multi-node clusters are something with a somewhat limited (but certainly not none!) use case and personally this is not a priority versus other work (mostly Kubernetes things which is not even kind related to begin with at the moment...) The bot will not close issues in this repo now, I've disabled it for us. A good start if someone wants to see progress on this would be identifying where IP addresses are being used. |
Hi, thanks for disabling the bot ! the use case is simple : you work on a project and need to have a stable multinode env. You need to rebuild the cluster each day or each reboot. The goal of kind is to simulate an env, but if one needs to rebuild each time you reboot, it is clearly a game breaker for kind. I stopped using kind and built a real cluster, until issue solved. thanks. |
Hi, yes if a persistent "real" cluster suits your needs kind is not what you're looking for.
Regarding goals, see: https://kind.sigs.k8s.io/docs/contributing/project-scope/ For application testing long persistence is a nice to have. kind is not intended to be a persistent workload cluster and it would be insecure to keep it permanently (persistent clusters should be upgraded regularly to keep ahead of security fixes ...) See also: https://kind.sigs.k8s.io/docs/user/configuration/#api-server Most apps should not need multi-node at all to test, we need that to test some particular Kubernetes expectations around rolling node behavior for a few tests. |
No news here, Is a pity! It is neccesary in many kind of context such as making/teaching/learning curses for example |
can you expand on that? Why do you need to restart docker? |
Of course, just imagine I start a course that lasts 2 days (when it could well be 5),
I think kind is almost perfect to teach (and learn) but this issue continue being a little headache Best Regards |
There are some people that have implemented creative solutions to workaround it, I think that some of them shared them in the slack channels, using some bash scripts with |
@pablodgonzalez this happens with multinode cluster ,right? |
@aojea Yes! the issue is with multinode HA cluster with 2 or more control-planes. |
I see, because multinode works after restarts too
what is the requirement for HA? |
Mainly I use it to show how to responde the cluster when a control-plane fail, the multi node is simple to show but control-plane is a extra. |
ok, the problem is that the etcd cluster has the IPs hardcoded
hence, when the cluster reboots and the nodes change their ips it blows up |
Changing the IPs of CP nodes would require updating static pods on disk,
and also changing annotations on the mirror pods.
Perhaps there is a way to reserve Docker IPs on restart instead of adapting
to IP changes.
|
and using dns names instead of IPs? |
kube-* components have flags that only work with IPs. Those could be set to 0.0.0.0, but the mirror pod annotations still have to be updated post ip detection. |
Howdy. I was just pointed here from Slack because I had a KinD cluster that has two control-plane nodes, and was perplexed at why it didn't survive a reboot of the host OS. However, I saw your comment, and while it is a bit of a pain to set up first-day, you and your students could use Vagrant to manage suspending and bringing up a VM with a KinD cluster in the guest OS as a workaround. I shared what I use to manage setup here: https://github.com/boldandbusted/vagrant-kind . Cheers. |
@boldandbusted Thanks for share your repo but the idea is avoid install any VMs or take time for explain another tools. |
This should be fixed for most multi-node clusters in the latest sources at HEAD, and in the forthcoming v0.15.0 (TBD, we'll want to wrap up some other things and make sure this is working widely before cutting a release). #1689 remains for tracking clusters with multiple control-plane nodes ("HA") which we haven't dug into yet. |
(Thanks @tnqn !) |
Kind Kubernetes cluster does not survive Docker restart. It seems that docker assigns new IPs to containers on each start-up. The KIND nodes however have original IP addresses specified in the generated configuration files causing kubernetes services unable to talk to each other. The most affected ones are scheduler and controller.
What happened:
Kubernetes starts in broken state even though
kubectl get pods -A
reports otherwise (everything 1/1). The cluster is unable start deployed pods (if deployed before restart) and is unable to deploy anything new due to scheduler is not connected to apiserver.What you expected to happen:
Kubernetes cluster continues working as expected even after Docker restart.
How to reproduce it (as minimally and precisely as possible):
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
dnsutils
are inPending
stateAnything else we need to know?:
/kind
and/etc/kubernets
files but than the services start complaining about certificate not issued for the IP address. Changing IP addresses each time the cluster starts is therefor not a solution.Environment:
kind version
):kubectl version
):docker info
):/etc/os-release
):macOS Catalina 10.15.7 Intel
The text was updated successfully, but these errors were encountered: