-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade causes POD failures if the cluster scales during the master upgrade. #7323
Comments
I had a similar problem with a I used The new IG nodes came up with 1.13.10 before the masters were updated and failed to join the cluster, causing the cluster to fail validation and halting the rolling update. I was unable to pinpoint the exact cause of the failure, but the stuck state was that the instance running 1.13.10 reported:
we are specifying
but the calico pod won't initialize
This does not make sense to me. According to the documentation, the state I notice I don't know enough about the startup sequence to debug this further. I welcome suggestions. steps to reproduceThis is easy to reproduce, given a cluster running Kubernetes 1.12.8 using
That's it. Do NOT run The new node will be created but fail to join the cluster. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We just experienced this during an upgrade from 1.16 -> 1.17. It seems like we need a way to update the userdata of the master ASGs in a separate step from the node ASGs, that way any node autoscaling would create new nodes with the old k8s version until all masters have been upgraded to the new k8s version.
An implement workaround could be disabling the cluster-autoscaler before I also wonder how #8198 would affect this, with kops-controller serving artifacts to nodes we'll need to make sure they're receiving the desired version of those artifacts during an upgrade. /reopen |
@rifelpet: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Kops does upgrades ASG by ASG, and the master ones before the node ones. I think a quite a lot would benefit from having a step in between rolling updates of each ASG |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1. What
kops
version are you running? The commandkops version
, will displaythis information.
1.13.0-beta.2
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.v1.12 -> 1.13
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
After setting the new config to k8s 1.13 I started a cluster rotation with:
kops rolling-update cluster --yes
5. What happened after the commands executed?
The master rotation started upgrading the master nodes. However because I make use of the cluster-autoscaler new nodes started coming up with 1.13 before all the master nodes were upgraded. This would normally have been fine, however due to the changes made in kubernetes/kubernetes#74529 it requires the kubelet version to be at the same version or older than the API. Because of this change, PODs deployed to these new hosts brought up by auto-scaling fail with
Error: nil pod.spec.enableServiceLinks encountered, cannot construct envvars
Kops k8s versioning is global, so if there is no way to upgrade just the masters before you upgrade the nodes if you use auto-scaling.
6. What did you expect to happen?
For k8s to be able to handle the upgrade with newer version of kubelet coming up.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
N/A
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
N/A
9. Anything else do we need to know?
The text was updated successfully, but these errors were encountered: