Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating from docker to containerd results in dependency errors -- docker-ce : Depends: containerd.io (>= 1.4.1) but it is not going to be installed #8431

Closed
juliohm1978 opened this issue Jan 14, 2022 · 17 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@juliohm1978
Copy link
Contributor

juliohm1978 commented Jan 14, 2022

Environment:

  • Cloud provider or hardware configuration: Baremetal

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 4.15.0-166-generic x86_64
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Kubespray version (commit) (git rev-parse --short HEAD):

Using ansible/python/kubespray provided by the official docker image: quay.io/kubespray/kubespray:v2.18.0

Network plugin used: Calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

inventory.zip

Command used to invoke ansible:

ansible-playbook -i /inventory/inventory.ini --become \
  /kubespray/cluster.yml \
  -u infrasw \
  --limit=k8s-master01-lab-20190604.dis.tjpr.jus.br

Output of ansible run:

output.zip

Anything else do we need to know:

We deployed and have been upgrading our k8s clusters with Kubepsray for a few years now. With more recent upgrades, we decided to migrate the container engine from docker to containerd, in preparation for the definitive deprecation of docker.

Running the adjusted playbook cluster.yml with version 2.18.0, results in the following error:

TASK [container-engine/containerd : containerd | Remove any package manager controlled containerd package] ***
fatal: [k8s-master01-lab-20190604.dis.tjpr.jus.br]: FAILED! => {"changed": false, "msg": "'apt-get remove 'containerd.io'' failed: E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.\n", "rc": 100, "stderr": "E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.\n", "stderr_lines": ["E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nSome packages could not be installed. This may mean that you have\nrequested an impossible situation or if you are using the unstable\ndistribution that some required packages have not yet been created\nor been moved out of Incoming.\nThe following information may help to resolve the situation:\n\nThe following packages have unmet dependencies:\n docker-ce : Depends: containerd.io (>= 1.4.1) but it is not going to be installed\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Some packages could not be installed. This may mean that you have", "requested an impossible situation or if you are using the unstable", "distribution that some required packages have not yet been created", "or been moved out of Incoming.", "The following information may help to resolve the situation:", "", "The following packages have unmet dependencies:", " docker-ce : Depends: containerd.io (>= 1.4.1) but it is not going to be installed"]}

As far as I can tell, the container-engine role is trying to uninstall a previous container engine dependency:

# apt-get remove 'containerd.io'
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 docker-ce : Depends: containerd.io (>= 1.4.1) but it is not going to be installed

All dependencies were installed by kubespray itself, so I guess I expected the playbook to be able to handle this type of migration.

Any notes on how we are supposed to proceed? I guess I could always isolate nodes and remove all dependencies by force. But it would be nice to know if I'm missing something or doing something wrong here.

Regards.

@juliohm1978 juliohm1978 added the kind/bug Categorizes issue or PR as related to a bug. label Jan 14, 2022
@juliohm1978
Copy link
Contributor Author

If anyone else hits this bump, I was able to workaround by manually draining, isolating the node and uninstalling docker and containerd from the node before running cluster.yml again.

sudo apt remove docker-ce docker-ce-cli containerd.io

@cristicalin
Copy link
Contributor

Note that we don't quite support transitioning from one container engine to another. If you are upgrading from pre-2.18 to 2.18+ you need to change your container_manager inventory variable to keep compatibility with your old version in this case set it to container_manager: docker.

Changing the container_manager of a cluster is extremely disruptive and involves redeploying the cluster at the moment and is your best course of action. In this case just run the reset.yml playbook to clean the cluster and then deploy the new version with cluster.yml.

@juliohm1978
Copy link
Contributor Author

Thank you for the feedback, @cristicalin !

Migrating the container engine does not seem any more disruptive than any other Kubespray upgrade. For years, I have been upgrading our clusters with Kubespray and that usually means restarting docker daemon, calico and other core components. It has always been necessary to cordon and drain nodes before proceeding.

I can see how Kubespray does not yet support this officially. Correct me if I'm wrong, but as far as I understand, changing the container engine involves few simple steps:

  • Cordon and drain the node
  • Uninstall docker-ce and docker-ce-cli
  • Install containerd (if that wasn't already there because of docker)
  • Change the references of the container engine in the kubelet and a couple of other places
  • Reboot the node to get a fresh clean set of containers in the new engine

Natively, k8s supports having nodes with different engines. I was able to run the above steps and, apparently, it works fine as long as you don't need to run Kubespray playbooks to reconfigure something else in the cluster during the process.

@cristicalin
Copy link
Contributor

cristicalin commented Jan 15, 2022

In theory yes, those are the steps but there are a few more changes than kubespray does not handle and would need to be done manually.

The node cleanup is not something we do in the upgrade procedure and kubespray itself is unaware of what was the old container manager since we don't have any detection to actually care to clean up the old one.

You would have a much easier time if you can just remove and reprovision the nodes though we never tested the cluster expansion procedure with changing the container manager. We would be happy to have feedback on how well this works in practice and fix issues you may encounter or accept your code contributions for any fixes.

Note that containerd for kubespray means the one from upstream github release page, we no longer support the one built by docker outside of using it with the docker engine itself.

@juliohm1978
Copy link
Contributor Author

Thank you.

I'll be glad to get back with feedback on our cluster after this.

@dtodor
Copy link

dtodor commented Jan 18, 2022

We came across the same issue. I think the very least that has to be tested before a release is:

  • create a clean installation with the current tag (e.g. v2.17.1), no configuration changes whatsoever
  • perform an upgrade to the next tag (e.g. v2.18.0)

This test would have failed.

@cristicalin
Copy link
Contributor

Such tests are performed as part of the CI, it is just that we don't test the change of engines. The CI was explicitly hardcoded to test upgrades for the new default configuration.

There are some discussions on how to properly support container manager changes as in the linked PR but explicitly removing docker engine in the containerd role is not a long term maintainable solution. Once we have a clean solution we will consider backporting it to 2.18.x.

For now, it is highly recomended to test the upgrade with your existing configuration, read the release notes for things that have changed in the public configuration variables and update and pin your inventories before performing production upgrades.

@juliohm1978
Copy link
Contributor Author

I'll have to agree with @cristicalin. The scenario raised by this issue is not currently supported, so discussing any related errors seem like a moot point at this moment.

I'll be in the look out for PRs and new improvements in that sense.

If there are dependency error during a normal upgrade between k8s versions, that deserves a new issue on its own.

Closing this for now.

Thank you!

@juliohm1978
Copy link
Contributor Author

Hi guys. Just popping back to provide some feedback on this adventure.

We managed to migrate container engine from docker to containerd in our Kubespray installation, along with a few manual steps. Docker has been fully uninstalled from all nodes and, as far as we can tell, everything is working as expected.

The following steps have been performed in our particular environment (specs below) and, as mentioned before, there are no guarantees that anything else is needed to adjust or cleanup the cluster. Any other ideas and missing steps are apreciated as further feedback. Hopefully, this can provide some insights into how this procedure could be integrated into the playbooks officially.

Environment

Nodes: Ubuntu 18.04 LTS
Cloud Provider: None (baremetal or VMs)
Kubernetes version: 1.21.5
Kubespray version: 2.18.0

Important considerations

If you require minimum downtime, nodes need to be cordoned and drained before being processed, one by one. If you wish to run cluster.yml only once and get it all done in one swoop, downtime will be significanly higher, since docker will need to be manually removed from all nodes before the playbook runs. For minimum downtime, the following steps will be executed multiple times, once for each node.

Processing nodes one-by-one also means you will not be able to update any other cluster configuration using Kubespray before this procedure is finished and the cluster is fully migrated.

Everything done here requires full root access to every node.

Steps

1) Pick one or more nodes for processing.

I am not sure how the order might affect this procedure. So, to be sure, I decided to start with master and etcd nodes all together, followed by each worker node individually.

2) Cordon and drain the node

... because, downtime.

3) Adjust k8s-cluster.yml in your inventory.

resolvconf_mode: host_resolvconf
container_manager: containerd

4) Stop docker and kubelet daemons

service kubelet stop
service docker stop

5) Uninstall docker + dependencies

apt-get remove -y --allow-change-held-packages containerd.io docker-ce docker-ce-cli docker-ce-rootless-extras

6) Run cluster.yml playbook with --limit

cluster.yml --limit=NODENAME

This effectively reinstalls containerd and seems to place all config files in the right place. When this completes, kubelet will immediately pick up the new container engine and start spinning up DaemonSets and kube-system Pods.

Optionally, if you feel confident, you can remove /var/lib/docker anytime after this step.

rm -fr /var/lib/docker

You can watch new containers using crictl.

crictl ps -a

7) Replace the cri-socket node annotation to point to the new container engine

Node annotations need to be adjusted. Kubespray will not do this, but a simple kubectl is enough.

kubectl annotate node NODENAME --overwrite kubeadm.alpha.kubernetes.io/cri-socket=/var/run/containerd/containerd.sock

As far as I can tell, the annotation is only required by kubeadm to follow through future cluster upgrades.

8) Reboot the node

Reboot, just to make sure everything restarts fresh before the node is uncordoned.

After thoughts

If your cluster runs a log aggregator, like fluentd+Graylog, you will likely need to adjust collection filters and parsers. While docker generates Json logs, containerd has its own space delimited format. For example:

2020-01-10T18:10:40.01576219Z stdout F application log message...

In our case, we just had to switch the fluentd parser to fluent-plugin-parser-cri.

@juliohm1978
Copy link
Contributor Author

PS: we also tested the next Kubernetes upgrade from 1.21 to 1.22.

Works like a charm :D

@cristicalin
Copy link
Contributor

This procedure would actually make for a good addition to our docs, would you mind submitting a documentation PR and updating https://github.com/kubernetes-sigs/kubespray/blob/master/docs/upgrades.md ?

@juliohm1978
Copy link
Contributor Author

Sure. It's a lot of new information.

Should I add a new *.md or just append to upgrades.md?

@cristicalin
Copy link
Contributor

You can create a new folder called docs/upgrades and create the procedure then and link it from upgrades.md this would open up the docs for future more complex upgrade procedures.

juliohm1978 added a commit to jhmorimoto/kubespray that referenced this issue Jan 25, 2022
…-containerd), with special emphasis on the fact that the procedure is still not officially supported.

Follow up from kubernetes-sigs#8431.

Signed-off-by: Julio Morimoto <[email protected]>
@juliohm1978
Copy link
Contributor Author

On the way
#8471

juliohm1978 added a commit to jhmorimoto/kubespray that referenced this issue Jan 25, 2022
…-containerd), with special emphasis on the fact that the procedure is still not officially supported.

Follow up from kubernetes-sigs#8431.

Signed-off-by: Julio Morimoto <[email protected]>
juliohm1978 added a commit to jhmorimoto/kubespray that referenced this issue Jan 25, 2022
…-containerd), with special emphasis on the fact that the procedure is still not officially supported.

Follow up from kubernetes-sigs#8431.

Signed-off-by: Julio Morimoto <[email protected]>
juliohm1978 added a commit to jhmorimoto/kubespray that referenced this issue Jan 26, 2022
…-containerd), with special emphasis on the fact that the procedure is still not officially supported.

Follow up from kubernetes-sigs#8431.

Signed-off-by: Julio Morimoto <[email protected]>
juliohm1978 added a commit to jhmorimoto/kubespray that referenced this issue Jan 26, 2022
…-containerd), with special emphasis on the fact that the procedure is still not officially supported.

Follow up from kubernetes-sigs#8431.

Signed-off-by: Julio Morimoto <[email protected]>
k8s-ci-robot pushed a commit that referenced this issue Jan 27, 2022
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. (#8471)

Follow up from #8431.

Signed-off-by: Julio Morimoto <[email protected]>
@Xartos
Copy link
Contributor

Xartos commented Feb 10, 2022

@juliohm1978 You did never encounter any issues with etcd? I'm trying to do a migration from kubespray 2.17.1 -> 2.18.0 following your guide to migrate to containerd.

Because when I now run ansible-playbook cluster.yml ... --limit=NODENAME on one of the masters I get an error.

When I look in the cluster I see that etcd is still running since the static manifest (from kubespray 2.17) is still there and it's using the old version (etcd v3.4.13)
I tried to remove that manifest and run it again, but then I get stuck on some other task.

EDIT: Just remebered that I have etcd_kubeadm_enabled set. Maybe that can cause the issue?

@Xartos
Copy link
Contributor

Xartos commented Feb 10, 2022

Can confirm that there seemed to be a missing step if you had the etcd_kubeadm_enabled set.
PTAL #8528

@juliohm1978
Copy link
Contributor Author

I am using the Kubespray default for etcd_kubeadm_enabled. Our etcd was installed as a docker container. It was replaced by a host native binary.

sakuraiyuta pushed a commit to sakuraiyuta/kubespray that referenced this issue Apr 16, 2022
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. (kubernetes-sigs#8471)

Follow up from kubernetes-sigs#8431.

Signed-off-by: Julio Morimoto <[email protected]>
LuckySB pushed a commit to southbridgeio/kubespray that referenced this issue Jun 29, 2023
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. (kubernetes-sigs#8471)

Follow up from kubernetes-sigs#8431.

Signed-off-by: Julio Morimoto <[email protected]>
LuckySB pushed a commit to southbridgeio/kubespray that referenced this issue Oct 20, 2023
…-containerd), with special emphasis on the fact that the procedure is still not officially supported. (kubernetes-sigs#8471)

Follow up from kubernetes-sigs#8431.

Signed-off-by: Julio Morimoto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants