Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: packet_centos7-flannel-addons-ha failing #10670

Closed
VannTen opened this issue Nov 29, 2023 · 15 comments · Fixed by #10775
Closed

CI: packet_centos7-flannel-addons-ha failing #10670

VannTen opened this issue Nov 29, 2023 · 15 comments · Fixed by #10775
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.

Comments

@VannTen
Copy link
Contributor

VannTen commented Nov 29, 2023

Which jobs are failing:
packet_centos7-flannel-addons-ha

Since when has it been failing:

Around 15 hour ago, first affected PR : https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/commit/3ba93daa7071f7277525389325c06b95befa0160

Last successful run : https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/5635821499
Reason for failure:
No idea yet

Anything else we need to know:

@VannTen VannTen added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Nov 29, 2023
@VannTen
Copy link
Contributor Author

VannTen commented Nov 29, 2023

Actually the CI as a whole seems in a bad shape.
This job is apparently stuck for hours and some other try to create VMs which apparently can't get any IPs (for instance here ).
Infrastructure trouble maybe ? @yankay @floryut (whenever you have some time)

@VannTen
Copy link
Contributor Author

VannTen commented Nov 29, 2023

@ErikJiang
Copy link
Member

similar to this issue:#8786

@VannTen
Copy link
Contributor Author

VannTen commented Dec 1, 2023

Actually the centos7-flannel-addons-ha seems to be a separate problem : we're hitting this containerd/nerdctl#2357 I think. @yankay did you found a workaround last time ? (I see you authored that issue).

@yankay
Copy link
Member

yankay commented Dec 4, 2023

Actually the centos7-flannel-addons-ha seems to be a separate problem : we're hitting this containerd/nerdctl#2357 I think. @yankay did you found a workaround last time ? (I see you authored that issue).

HI @VannTen

The root cause it bug of containerd: containerd/containerd#8973 , but it's a little complex to fix.

The workaround is to use the ctr to pull, because the ctr using different code between the nerdctl :-)

using

ctr  -n k8s.io  i pull ghcr.io/stargz-containers/alpine:3.13-org

instead of

nerdctl  pull ghcr.io/stargz-containers/alpine:3.13-org

The bug is just occured by the layer by removed. So I'm not sure the relationship between the error log in kubespray in CI and the CI, and it's a good way for kubespray to workaround.

@VannTen
Copy link
Contributor Author

VannTen commented Dec 4, 2023 via email

@yankay
Copy link
Member

yankay commented Dec 4, 2023

The root cause it bug of containerd: containerd/containerd#8973 , but it's a little complex to fix.
Ok, makes sense, now I understand why the --all-platforms is a workaround. I'm fine with using ctr, do you want to Make the PR ? Should we replace all instance of nerdctl by ctr ? (Thanks for the background)

The ctr is different with the nerdctl. So I think the change may be a little big :-)
And the root cause has not found yet.
The error is not occured before 2 month ago. So I just want to wait for a month to see how often is it.

@VannTen
Copy link
Contributor Author

VannTen commented Dec 4, 2023 via email

@VannTen
Copy link
Contributor Author

VannTen commented Dec 4, 2023 via email

@VannTen
Copy link
Contributor Author

VannTen commented Dec 5, 2023

/close
Fixed by #10687

@k8s-ci-robot
Copy link
Contributor

@VannTen: Closing this issue.

In response to this:

/close
Fixed by #10687

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tmurakam
Copy link
Contributor

tmurakam commented Jan 4, 2024

@yankay
I found that this fix breaks pulling from insecure registries.
For the nerdctl, this is fixed by #8339 and #10196. But the 'ctr' command is not cared for insecure registry.

@yankay
Copy link
Member

yankay commented Jan 5, 2024

@yankay I found that this fix breaks pulling from insecure registries. For the nerdctl, this is fixed by #8339 and #10196. But the 'ctr' command is not cared for insecure registry.

Thanks @tmurakam

It can succeed, using

ctr -n k8s.io i pull --hosts-dir "/etc/containerd/certs.d" 10.6.108.200:5000/static:latest 

Can it solve the problem?

If it can , we can submit a PR to fix it :-)

@tmurakam
Copy link
Contributor

tmurakam commented Jan 5, 2024

It may solve the problem, but I have no confidence. I will try it from now on.

@tmurakam
Copy link
Contributor

tmurakam commented Jan 5, 2024

@yankay
I confirmed that your configuration solves the problem.

Here is my configuration in group_vars/all:
nerdctl_image_pull_command: "{{ bin_dir }}/ctr -n k8s.io images pull --hosts-dir /etc/containerd/certs.d"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.
Projects
None yet
5 participants