-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodeup can't find container-selinux-2.68-1.el7.noarch.rpm when trying to bootstrap a new node to a cluster #7608
Comments
I'm seeing this as well |
We are seeing this issue as well. Looks like this package was removed from centos repo, returning a 404:
This causes a major issue when considering autoscaling (cluster-autoscaler) which takes down nodes and new ones never join the cluster. Ideally, for resiliency Kops should not be resolving artifacts required for nodeup/bootstrapping during node runtime from public repos - not sure if this is the way to go but possibly consider placing such critical rpms/binaries in the state store during init and fetching from there during runtime? |
I noticed that the recent container-selinux issue on centos was reporting a hash mismatch rather than a 404. See the error message here: kubernetes#7608 and the "actual" sha1 response is that of the 404 page: ``` curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm | shasum -a 1 ```
Experiencing this in a production cluster as well. Is there any way to fast track this? Added a PR |
A manual workaround is downloading the following file from a working node Some Centos mirrors sites might still have the old RPM file. see: https://mirror-status.centos.org/ |
This has just bitten us as well, #7609 should resolve it however. |
I noticed that the recent container-selinux issue on centos was reporting a hash mismatch rather than a 404. See the error message here: kubernetes#7608 and the "actual" sha1 response is that of the 404 page: ``` curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm | shasum -a 1 ```
I noticed that the recent container-selinux issue on centos was reporting a hash mismatch rather than a 404. See the error message here: kubernetes#7608 and the "actual" sha1 response is that of the 404 page: ``` curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm | shasum -a 1 ```
I noticed that the recent container-selinux issue on centos was reporting a hash mismatch rather than a 404. See the error message here: kubernetes#7608 and the "actual" sha1 response is that of the 404 page: ``` curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm curl http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm | shasum -a 1 ```
@rdjy Thanks for the answer, it did the trick for us. |
Now that #7609 is merged how would I be able to leverage this change? Do I have to wait for a new kops release or how is nodeup released? |
We're working on getting a 1.13/1.14 cut with these fixes asap. You'll either need to build and deploy your own version of kops (including protokube and kubeup), a workaround as suggested above (you can probably utilize a hook to automate it https://github.com/kubernetes/kops/blob/master/docs/cluster_spec.md#hooks) or wait for a release which we're actively working on getting out asap! |
Hi, I had no luck using a hook to curl the correct file, as hooks seem to run AFTER nodeup. All I can think of is to build a custom AMI instead of vanillia amazon linux 2. |
Indeed, hooks won't work. We figured that out the exact same time as @alexinthesky 😂 Then we switched for the Debian AMI to avoid further damage by dying spot instances. |
+1 Seeing the same |
it's a bit involved but we found a workaround until a new release is cut (especially for people having this issue in production).
This way the cache is there before nodeup is ran. |
Below is an improved workaround, inspired by previous comments and pull requests. Kops supports arbitrary userdata. The snippet needs to be added to each instance group spec.
|
Hi, I connected to the node via ssh and download the package from another URL.
|
Was able to workaround the issue by running the below commands on both Master and Nodes
|
This workaround no longer works. As of today As a workaround you can use But really contianer-selinux needsto be updated to |
OK so looks like we'll be doing 1.13.2 this morning. I'd also really prefer to get away from the OS packaging (towards "tar.gz" installation) as it seems to be introducing more problems than it solves. For 2.68.1 -> 2.107.3: We try not to make potentially breaking changes once we have released the 1.x.0 of kops. But we do so for security fixes etc. So we can look at getting it into 1.14.0 (which hasn't quite released yet). But is it a security fix (in which case we would get it into 1.13.0)? |
Here's the changelog, looks like there's not a strict security fix vs feature distinction, so we should probably shouldn't introduce the new version in kops 1.13:
|
Can the packages be externalised into a yaml/json file that nodeup reads in instead of being compiled into the binary? That would enable people to source the rpm and store it locally (s3, cloud storage, etc). I've opted to save the rpm in S3 and then add it into kops with this in the instance groups:
Then you just need to sort out the bucket policy and iam privileges for kops to read from the bucket. This is in an AWS environment obviously, I'm sure there are similar approaches for the other cloud platforms. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi Everyone, Our team encountered this issue yesterday on a Kops 1.14.8 cluster we have, related to this vault.centos.org issue. We had previously successfully used this fix on an older cluster, but we had an issue using the bootcmd approach detailed in that comment. We ended up using the following approach in an additionalUserData stanza:
For full disclosure, our actual fix pointed to our company's internal yum repo, so if you have the ability to do that, it's probably a better solution than relying on a public mirror. Hope this helps save everyone else some pain! |
1. What
kops
version are you running? The commandkops version
, will displaythis information.
Version 1.13.0
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.Version 1.13.0
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Adding a node to a cluster results in nodeup to look for
Downloading "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm"
which it does not exist anymore due to centos 7.7 release.5. What happened after the commands executed?
kops tries to boostrap the node but nodeup fails due to pointing to a nonexistent package
6. What did you expect to happen?
New node bootstrapped and joined to the cluster
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
The text was updated successfully, but these errors were encountered: