-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue pulling only some images #2455
Comments
According to the log message, the image in question has an excessively long label:
Can you check the labels on this image and confirm that none of them violate this limit? Is there a public copy of the image with the same labels that I can look at? |
Hi Brandon, The image is a private one but I can provide the labels from docker inspect:
Those seem to be picked up from the base image which is
Nothing stands out to me as being unusual here and I can indeed use the |
The interesting thing about the error message is that it shows an empty key for the label. I don't see any empty keys in the output you shared. The message is coming from here: https://github.com/rancher/k3s/blob/v1.19.3+k3s1/vendor/github.com/containerd/containerd/labels/validate.go#L34 Can you try a tool like skopeo to inspect the image directly in the registry? skopeo inspect docker://dockerregistry.xxx.co.uk:5000/xxx/r-session-base:latest --config | jq '.config.Labels' |
Here's the output using skopeo:
So there doesn't seem to be a difference there from the docker inspect. It's most puzzling as we don't add our own labels at any point, so all of these are straight from SUSE. I'll give more background to the image, it actually is the result of several Dockerfiles building from each other. Everything from 'suse-base' onwards is ours and in our private repo: opensuse/leap:15.1 -> suse-base -> r-base -> r-packages -> r-generic -> r-session-base I was concerned it was some complication of having such a hierarchy but I've traced the issue all the way up to r-base, so it does not happen with suse-base but does with r-base and everything built from that. Here is the Dockerfile for r-base:
And here's the skopeo output for that:
|
I've reproduced the same problem on Ubuntu 18.04 in my home lab on k3s v1.19.3, but it does not happen on k3s v1.18.10 where I can pull the image with no issues. The containerd version on k3s v1.18.10 is 1.3.3-k3s2 and 1.19.3 is on containerd 1.4.0-k3s1 so perhaps something is going on here, or there is a breaking change I'm not aware of. |
Can you make available the image you reproduced with in your home lab? |
I couldn't make that image available but I've now replicated the issue by building an image from
(the RUN is repeated 60 times) It seemed the magic number was about 60 RUN lines, at which point the error is seen which you should be able to replicate yourself now:
Is there a limit of number of layers in containerd I'm not aware of? |
I don't believe there is any such limit. This will probably require a fix to containerd upstream, if it hasn't already been resolved. |
OK, so I added some debug prints to the containerd code, and it looks like it injects some labels into the image - including one that is a comma-separated list of layers, apparently as a way to pass the layer list to the snapshotter implementation: This would obviously present a problem if you have a bunch of RUN commands, since each command creates a layer.
This should probably be handled better - in particular, the label key is truncated to 10 characters for the log message which makes it really difficult to figure out what's going on. I'll take a look at opening an issue upstream, but I imagine their immediate recommendation will be to follow best practices and minimize the number of image layers by combining your RUN commands. |
As per containerd/containerd#4684 (comment) these labels are in support of an experimental feature and should not have been enabled by default; the next release of containerd will turn them off. In the mean time we can probably update our containerd config.toml template to set |
These were not meant to be enabled by default, break images with many layers, and will be disabled by default on the next containerd release. Related to k3s-io#2455 and containerd/containerd#4684 Signed-off-by: Brad Davidson <[email protected]>
Related to k3s-io#2455 and containerd/containerd#4684 These were not meant to be enabled by default, break images with many layers, and will be disabled by default on the next containerd release. Signed-off-by: Brad Davidson <[email protected]>
Related to #2455 and containerd/containerd#4684 These were not meant to be enabled by default, break images with many layers, and will be disabled by default on the next containerd release. Signed-off-by: Brad Davidson <[email protected]>
Reproduced the issue in v1.19.3+k3s2 and validated the fix using commit ID c72c186, created an image with the example above.
|
While the issue is fixed, the location of config.toml has moved from /var/lib/rancher/k3s/agent/etc/containerd/config.toml to /var/lib/rancher/k3s/etc/containerd/config.toml which is unexpected.
|
@ShylajaDevadiga create a separate issue for that. |
Related to k3s-io#2455 and containerd/containerd#4684 These were not meant to be enabled by default, break images with many layers, and will be disabled by default on the next containerd release. Signed-off-by: Brad Davidson <[email protected]>
Environmental Info:
K3s Version:
k3s version v1.19.3+k3s1 (974ad30)
Node(s) CPU architecture, OS, and Version:
Linux devkubewkr04 5.3.18-lp152.47-default #1 SMP Thu Oct 15 16:05:25 UTC 2020 (41f7396) x86_64 x86_64 x86_64 GNU/Linux
OpenSUSE Leap 15.2
Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Cluster Configuration:
3 masters (OpenSUSE Leap 15.2 virtual machine), 3 workers (OpenSUSE Leap 15.2 physical server)
Describe the bug:
Some container images cannot be pulled and throw an error:
Others do not experience an issue:
This happens on all nodes in the cluster, master and worker.
Steps To Reproduce:
Expected behavior:
Pods are able to pull any specified image and run them.
Actual behavior:
Pods are not able to pull some images and cannot start.
Additional context / logs:
kubectl describe from an affected pod:
The text was updated successfully, but these errors were encountered: