-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating from docker to containerd results in dependency errors -- docker-ce : Depends: containerd.io (>= 1.4.1) but it is not going to be installed #8431
Comments
If anyone else hits this bump, I was able to workaround by manually draining, isolating the node and uninstalling docker and containerd from the node before running cluster.yml again.
|
Note that we don't quite support transitioning from one container engine to another. If you are upgrading from pre-2.18 to 2.18+ you need to change your Changing the |
Thank you for the feedback, @cristicalin ! Migrating the container engine does not seem any more disruptive than any other Kubespray upgrade. For years, I have been upgrading our clusters with Kubespray and that usually means restarting docker daemon, calico and other core components. It has always been necessary to cordon and drain nodes before proceeding. I can see how Kubespray does not yet support this officially. Correct me if I'm wrong, but as far as I understand, changing the container engine involves few simple steps:
Natively, k8s supports having nodes with different engines. I was able to run the above steps and, apparently, it works fine as long as you don't need to run Kubespray playbooks to reconfigure something else in the cluster during the process. |
In theory yes, those are the steps but there are a few more changes than kubespray does not handle and would need to be done manually. The node cleanup is not something we do in the upgrade procedure and kubespray itself is unaware of what was the old container manager since we don't have any detection to actually care to clean up the old one. You would have a much easier time if you can just remove and reprovision the nodes though we never tested the cluster expansion procedure with changing the container manager. We would be happy to have feedback on how well this works in practice and fix issues you may encounter or accept your code contributions for any fixes. Note that |
Thank you. I'll be glad to get back with feedback on our cluster after this. |
We came across the same issue. I think the very least that has to be tested before a release is:
This test would have failed. |
Such tests are performed as part of the CI, it is just that we don't test the change of engines. The CI was explicitly hardcoded to test upgrades for the new default configuration. There are some discussions on how to properly support container manager changes as in the linked PR but explicitly removing docker engine in the containerd role is not a long term maintainable solution. Once we have a clean solution we will consider backporting it to 2.18.x. For now, it is highly recomended to test the upgrade with your existing configuration, read the release notes for things that have changed in the public configuration variables and update and pin your inventories before performing production upgrades. |
I'll have to agree with @cristicalin. The scenario raised by this issue is not currently supported, so discussing any related errors seem like a moot point at this moment. I'll be in the look out for PRs and new improvements in that sense. If there are dependency error during a normal upgrade between k8s versions, that deserves a new issue on its own. Closing this for now. Thank you! |
Hi guys. Just popping back to provide some feedback on this adventure. We managed to migrate container engine from docker to containerd in our Kubespray installation, along with a few manual steps. Docker has been fully uninstalled from all nodes and, as far as we can tell, everything is working as expected. The following steps have been performed in our particular environment (specs below) and, as mentioned before, there are no guarantees that anything else is needed to adjust or cleanup the cluster. Any other ideas and missing steps are apreciated as further feedback. Hopefully, this can provide some insights into how this procedure could be integrated into the playbooks officially. EnvironmentNodes: Ubuntu 18.04 LTS Important considerationsIf you require minimum downtime, nodes need to be cordoned and drained before being processed, one by one. If you wish to run Processing nodes one-by-one also means you will not be able to update any other cluster configuration using Kubespray before this procedure is finished and the cluster is fully migrated. Everything done here requires full root access to every node. Steps1) Pick one or more nodes for processing. I am not sure how the order might affect this procedure. So, to be sure, I decided to start with master and etcd nodes all together, followed by each worker node individually. 2) Cordon and drain the node ... because, downtime. 3) Adjust
4) Stop docker and kubelet daemons
5) Uninstall docker + dependencies
6) Run
This effectively reinstalls containerd and seems to place all config files in the right place. When this completes, kubelet will immediately pick up the new container engine and start spinning up DaemonSets and kube-system Pods. Optionally, if you feel confident, you can remove
You can watch new containers using
7) Replace the cri-socket node annotation to point to the new container engine Node annotations need to be adjusted. Kubespray will not do this, but a simple
As far as I can tell, the annotation is only required by kubeadm to follow through future cluster upgrades. 8) Reboot the node Reboot, just to make sure everything restarts fresh before the node is uncordoned. After thoughtsIf your cluster runs a log aggregator, like fluentd+Graylog, you will likely need to adjust collection filters and parsers. While docker generates Json logs, containerd has its own space delimited format. For example:
In our case, we just had to switch the fluentd parser to fluent-plugin-parser-cri. |
PS: we also tested the next Kubernetes upgrade from 1.21 to 1.22. Works like a charm :D |
This procedure would actually make for a good addition to our docs, would you mind submitting a documentation PR and updating https://github.com/kubernetes-sigs/kubespray/blob/master/docs/upgrades.md ? |
Sure. It's a lot of new information. Should I add a new *.md or just append to upgrades.md? |
You can create a new folder called |
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. Follow up from kubernetes-sigs#8431. Signed-off-by: Julio Morimoto <[email protected]>
On the way |
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. Follow up from kubernetes-sigs#8431. Signed-off-by: Julio Morimoto <[email protected]>
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. Follow up from kubernetes-sigs#8431. Signed-off-by: Julio Morimoto <[email protected]>
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. Follow up from kubernetes-sigs#8431. Signed-off-by: Julio Morimoto <[email protected]>
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. Follow up from kubernetes-sigs#8431. Signed-off-by: Julio Morimoto <[email protected]>
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. (#8471) Follow up from #8431. Signed-off-by: Julio Morimoto <[email protected]>
@juliohm1978 You did never encounter any issues with etcd? I'm trying to do a migration from kubespray 2.17.1 -> 2.18.0 following your guide to migrate to containerd. Because when I now run When I look in the cluster I see that etcd is still running since the static manifest (from kubespray 2.17) is still there and it's using the old version (etcd v3.4.13) EDIT: Just remebered that I have |
Can confirm that there seemed to be a missing step if you had the |
I am using the Kubespray default for etcd_kubeadm_enabled. Our etcd was installed as a docker container. It was replaced by a host native binary. |
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. (kubernetes-sigs#8471) Follow up from kubernetes-sigs#8431. Signed-off-by: Julio Morimoto <[email protected]>
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. (kubernetes-sigs#8471) Follow up from kubernetes-sigs#8431. Signed-off-by: Julio Morimoto <[email protected]>
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. (kubernetes-sigs#8471) Follow up from kubernetes-sigs#8431. Signed-off-by: Julio Morimoto <[email protected]>
Environment:
Cloud provider or hardware configuration: Baremetal
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):Kubespray version (commit) (
git rev-parse --short HEAD
):Using ansible/python/kubespray provided by the official docker image:
quay.io/kubespray/kubespray:v2.18.0
Network plugin used: Calico
Full inventory with variables (
ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"
):inventory.zip
Command used to invoke ansible:
Output of ansible run:
output.zip
Anything else do we need to know:
We deployed and have been upgrading our k8s clusters with Kubepsray for a few years now. With more recent upgrades, we decided to migrate the container engine from docker to containerd, in preparation for the definitive deprecation of docker.
Running the adjusted playbook
cluster.yml
with version 2.18.0, results in the following error:As far as I can tell, the container-engine role is trying to uninstall a previous container engine dependency:
All dependencies were installed by kubespray itself, so I guess I expected the playbook to be able to handle this type of migration.
Any notes on how we are supposed to proceed? I guess I could always isolate nodes and remove all dependencies by force. But it would be nice to know if I'm missing something or doing something wrong here.
Regards.
The text was updated successfully, but these errors were encountered: