Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerd_registries is ignored by nerdctl #8375

Closed
sathieu opened this issue Jan 5, 2022 · 19 comments
Closed

containerd_registries is ignored by nerdctl #8375

sathieu opened this issue Jan 5, 2022 · 19 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@sathieu
Copy link
Contributor

sathieu commented Jan 5, 2022

Environment:

  • Cloud provider or hardware configuration:

not relevant

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.19.0-17-amd64 x86_64
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Version of Ansible (ansible --version):
ansible 2.9.20
  config file = /opt/kubitus/kubespray/ansible.cfg
  configured module search path = ['/opt/kubitus/kubespray/library']
  ansible python module location = /opt/kubitus/kubespray/venv/lib/python3.7/site-packages/ansible
  executable location = /opt/kubitus/kubespray/venv/bin/ansible
  python version = 3.7.3 (default, Jan 22 2021, 20:04:44) [GCC 8.3.0]
  • Version of Python (python --version):
Python 3.7.3

Kubespray version (commit) (git rev-parse --short HEAD):

92f25bf (v2.18.0)

Network plugin used:

calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

Only relevant part:

# $ cat inventory/foobar/group_vars/all/containerd.yml
containerd_registries:
  "docker.io":
  - "https://gitlab-registry.k8s.example.org/v2/external-registries/docker.io"
  "gcr.io":
  - "https://gitlab-registry.k8s.example.org/v2/external-registries/gcr.io"
  "ghcr.io":
  - "https://gitlab-registry.k8s.example.org/v2/external-registries/ghcr.io"
  "k8s.gcr.io":
  - "https://gitlab-registry.k8s.example.org/v2/external-registries/k8s.gcr.io"
  "quay.io":
  - "https://gitlab-registry.k8s.example.org/v2/external-registries/quay.io"
  "registry.access.redhat.com":
  - "https://gitlab-registry.k8s.example.org/v2/external-registries/registry.access.redhat.com"
  "registry.developers.crunchydata.com":
  - "https://gitlab-registry.k8s.example.org/v2/external-registries/registry.developers.crunchydata.com"
  "registry.gitlab.com":
  - "https://gitlab-registry.k8s.example.org/v2/external-registries/registry.gitlab.com"

⚠️ The host has no direct access to quay.io.

Command used to invoke ansible:

 ansible-playbook       -i inventory/sample/hosts.yaml       -i inventory/foobar/hosts.yaml       --ask-pass       --become --ask-become-pass       upgrade-cluster.yml

Output of ansible run:

TASK [download_container | Download image if required] ************************************************
fatal: [node1 -> node1]: FAILED! => {"attempts": 4, "changed": true, "cmd": ["/usr/local/bin/nerdctl", "-n", "k8s.io", "pull", "--quiet", "quay.io/calico/node:v3.20.3"], "delta": "0:00:30.052334", "end": "2022-01-05 15:34:47.855447", "msg": "non-zero return code", "rc": 1, "start": "2022-01-05 15:34:17.803113", "stderr": "time=\"2022-01-05T15:34:47+01:00\" level=info msg=\"trying next host\" error=\"failed to do request: Head \\\"https://quay.io/v2/calico/node/manifests/v3.20.3\\\": dial tcp 34.199.249.102:443: i/o timeout\" host=quay.io\ntime=\"2022-01-05T15:34:47+01:00\" level=fatal msg=\"failed to resolve reference \\\"quay.io/calico/node:v3.20.3\\\": failed to do request: Head \\\"https://quay.io/v2/calico/node/manifests/v3.20.3\\\": dial tcp 34.199.249.102:443: i/o timeout\"", "stderr_lines": ["time=\"2022-01-05T15:34:47+01:00\" level=info msg=\"trying next host\" error=\"failed to do request: Head \\\"https://quay.io/v2/calico/node/manifests/v3.20.3\\\": dial tcp 34.199.249.102:443: i/o timeout\" host=quay.io", "time=\"2022-01-05T15:34:47+01:00\" level=fatal msg=\"failed to resolve reference \\\"quay.io/calico/node:v3.20.3\\\": failed to do request: Head \\\"https://quay.io/v2/calico/node/manifests/v3.20.3\\\": dial tcp 34.199.249.102:443: i/o timeout\""], "stdout": "", "stdout_lines": []}

Anything else do we need to know:

This is a regression from #8239, more precisely 056a566.

@sathieu sathieu added the kind/bug Categorizes issue or PR as related to a bug. label Jan 5, 2022
@sathieu
Copy link
Contributor Author

sathieu commented Jan 5, 2022

There is an enty in the FAQ:

nerdctl ignores [plugins."io.containerd.grpc.v1.cri"] config

Expected behavior, because nerdctl does not use CRI (Kubernetes Container Runtime Interface) API.

See the questions below for how to configure nerdctl.

@cristicalin
Copy link
Contributor

The reason I introduced nerdctl was to be able to account for multi-arch images which neither ctr nor crictl handle properly, now nerdctl is ignoring the cri interface (I did not spot this earlier). It seems we have to wait for one of the other projects to fix multi-arch support as using CRI is outside the scope of nerdctl.

It's not clear to me from the FAQ if a CRI like approach is doable with nerdctl, is it possible to set the aliases you need in ${DOCKER_CONFIG}/config.json ?

@cristicalin
Copy link
Contributor

Indeed, I forgot about that issue. The docs for it sound like it should apply to using nerdctl as well. Would be great if it did and I think we would need to backport the fix to 2.18 since this is a major regression.

@cristicalin
Copy link
Contributor

/cc @floryut

@sathieu
Copy link
Contributor Author

sathieu commented Jan 5, 2022

Unfortunately, this doesn't work:

$ sudo diff -u /etc/containerd/config.toml{.orig,}
--- /etc/containerd/config.toml.orig    2022-01-05 16:45:57.417920441 +0100
+++ /etc/containerd/config.toml 2022-01-05 16:40:48.606142310 +0100
@@ -29,6 +29,7 @@
           [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
             systemdCgroup = true
     [plugins."io.containerd.grpc.v1.cri".registry]
+      config_path = "/etc/containerd/certs.d"
       [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
         [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
           endpoint = ["https://gitlab-registry.k8s.example.org/v2/external-registries/docker.io"]

$ sudo mkdir -p /etc/containerd/certs.d/quay.io
$ sudo cat /etc/containerd/certs.d/quay.io/hosts.toml
# server = "https://quay.io"

[host."https://gitlab-registry.k8s.example.org/v2/external-registries/quay.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
$ sudo systemctl restart containerd.service
$ sudo "/usr/local/bin/nerdctl" -n k8s.io pull --debug-full quay.io/calico/node:v3.20.3

EDIT: this is expected as documented, because part of [plugins."io.containerd.grpc.v1.cri"].

@sathieu
Copy link
Contributor Author

sathieu commented Jan 5, 2022

I've submitted containerd/nerdctl#668, asking an implementation on the nerdctl side.

In the meantime, 056a566 should probably be reverted or made configurable.

@cristicalin
Copy link
Contributor

Reverting 056a566 would unfortunately break the CI due to some of the images we use being multi-arch.

The only way would be to revert back to containerd 1.4.x which does not exhibit the issue with multi-arch images.

@cristicalin
Copy link
Contributor

You might try setting registry-mirrors in ~/.docker/config.json as the nerdctl would seem to imply that this should work.

@sathieu
Copy link
Contributor Author

sathieu commented Jan 5, 2022

@cristicalin I don't see any way to configure mirrors in ~/.docker/config.json.

Which images are multi-arch?

It looks like sudo crictl pull quay.io/calico/node:v3.20.3 works. Maybe use this instead? (it probably supports multi-arch images).

@sathieu
Copy link
Contributor Author

sathieu commented Jan 5, 2022

I'm unable to reproduce the bug of https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/1826926575#L3500 (mentioned in #8245):

$ sudo crictl pull docker.io/library/nginx:1.19
Image is up to date for sha256:f0b8a9a541369db503ff3b9d4fa6de561b300f7363920c2bff4577c6c24c5cf6

$ sudo ctr -n k8s.io image export  /var/tmp/docker.io_library_nginx_1.19.tar docker.io/library/nginx:1.19

$ ls -lh /var/tmp/docker.io_library_nginx_1.19.tar
-rw-r----- 1 root root 52M janv.  5 18:15 /var/tmp/docker.io_library_nginx_1.19.tar

@sathieu
Copy link
Contributor Author

sathieu commented Jan 5, 2022

@cristicalin Please add ok-to-test to #8376 🙏 (this is just to test).

@floryut
Copy link
Member

floryut commented Jan 6, 2022

@cristicalin Please add ok-to-test to #8376 🙏 (this is just to test).

FYI ok-to-test doesn't really do anything beside a yaml prow test, all other test are trigger using a mirroring to gitlab-ci 😋

@sathieu
Copy link
Contributor Author

sathieu commented Jan 6, 2022

Thanks @floryut. I've seen that most of the tests were run before the ok-to-test.

This CI setup is amazing. Is it documented somewhere (I need something similar for kubitus-installer#51)?

@floryut
Copy link
Member

floryut commented Jan 6, 2022

Thanks @floryut. I've seen that most of the tests were run before the ok-to-test.

This CI setup is amazing. Is it documented somewhere (I need something similar for kubitus-installer#51)?

Yes, it is kind of documented here https://github.com/failfast-ci/failfast-api (the most verbose part being https://github.com/failfast-ci/failfast-api#architecture) 👍

@sathieu
Copy link
Contributor Author

sathieu commented Jan 6, 2022

@floryut Didn't knew about failfast. This is a great tool!

Is there any docs about the runners setup? Is this on GCE?

sathieu added a commit to sathieu/kubespray that referenced this issue Jan 6, 2022
This allow to workaround kubernetes-sigs#8375 by using image_command_tool=crictl
when containerd_registries is used for containerd.

Also changes image_info_command_on_localhost for docker to return digests.
@floryut
Copy link
Member

floryut commented Jan 6, 2022

@floryut Didn't knew about failfast. This is a great tool!

Is there any docs about the runners setup? Is this on GCE?

No sure there is much doc, you may contact @ant31 using k8s slack maybe.
For us failfast worker are hosted on our k8s cluster.

k8s-ci-robot pushed a commit that referenced this issue Jan 11, 2022
This allow to workaround #8375 by using image_command_tool=crictl
when containerd_registries is used for containerd.

Also changes image_info_command_on_localhost for docker to return digests.
sathieu added a commit to sathieu/kubespray that referenced this issue Jan 12, 2022
This allow to workaround kubernetes-sigs#8375 by using image_command_tool=crictl
when containerd_registries is used for containerd.

Also changes image_info_command_on_localhost for docker to return digests.

(cherry picked from commit cfd9873)

The cherry-pick was adapted because nerdctl_extra_flags is not in
the release-2.18 branch (kubernetes-sigs#8339).
k8s-ci-robot pushed a commit that referenced this issue Jan 17, 2022
This allow to workaround #8375 by using image_command_tool=crictl
when containerd_registries is used for containerd.

Also changes image_info_command_on_localhost for docker to return digests.

(cherry picked from commit cfd9873)

The cherry-pick was adapted because nerdctl_extra_flags is not in
the release-2.18 branch (#8339).
@oomichi
Copy link
Contributor

oomichi commented Jan 25, 2022

@sathieu
The related pull requests have been merged already, can we close this issue?

@sathieu
Copy link
Contributor Author

sathieu commented Jan 25, 2022

@oomichi The related PRs (#8380 + #8409) are more like workarounds. The proper fix is upstream containerd/nerdctl#668, which is marked as resolved in nerdctl 0.16.0. We need to test it, but I guess at least #8199 needs to be fixed first.

Closing this issue, as I don't plan to work on this (now that I have a workaround).

@sathieu sathieu closed this as completed Jan 25, 2022
sakuraiyuta pushed a commit to sakuraiyuta/kubespray that referenced this issue Apr 16, 2022
This allow to workaround kubernetes-sigs#8375 by using image_command_tool=crictl
when containerd_registries is used for containerd.

Also changes image_info_command_on_localhost for docker to return digests.
LuckySB pushed a commit to southbridgeio/kubespray that referenced this issue Jun 29, 2023
This allow to workaround kubernetes-sigs#8375 by using image_command_tool=crictl
when containerd_registries is used for containerd.

Also changes image_info_command_on_localhost for docker to return digests.
LuckySB pushed a commit to southbridgeio/kubespray that referenced this issue Oct 20, 2023
This allow to workaround kubernetes-sigs#8375 by using image_command_tool=crictl
when containerd_registries is used for containerd.

Also changes image_info_command_on_localhost for docker to return digests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants