-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster validation didn't pass after upgrading to kops version 1.11.0 #6292
Comments
I'm not able to reproduce this. I've tried with Does the cluster recover despite the validation failure? In other words, is it just that 5 minutes is too short a time? It seems unlikely, but maybe if you have a pod that is slow to terminate or restart. |
@justinsb Is kops 1.11.0 supporting etcdv2 ? |
@tsahoo yes, and etcd3. The upgrade from etcd2 -> etcd3 relies on etcd-manager, and the plan is to finish up the final edge cases for that upgrade in kops 1.12. |
Edit: actually, it looks like we know what happened - the new machine did not join the cluster. |
@justinsb Yes .While we upgrade the cluster the new master node is not joining the cluster with kubernetes version 1.11.6. And after that the cluster validation didn't pass. But kops 1.11.0 is running fine with kubernetes version below 1.11.X. |
Thanks @tsahoo - are you able to SSH to the instance which didn't join (it should be the one that started most recently), and look at the logs to figure out what went wrong. The error should either be in You could also try |
Also experiencing this problem upgrading from I am seeing this log multiple times in
|
I had same problem but turned out because of enable-custom-metrics flag, which is deprecated in 1.11. |
@justinsb It would be a good idea to put it in required actions. |
I think I've figured out the cause of my problems so I'll post a new issue as I don't want to hijack this one. |
For folks having problems with 1.11, if you are using OIDC for cluster authentication, see this comment
|
Have the same issue in AWS, kops 1.11, trying to upgrade 1.10.6, then 1.10.12 to the 1.11.6. Every time got like this: Any advices with horizontalPodAutoscalerUseRestClients and rbac does not work for me. |
Here was my upgrade procedure which worked from 1.9 -> 1.11 ProcedurePre-upgradeThe kubelet configuration (in my case) needed to be changed from:
to
The Post-upgradeNecessary kubelet-api fix:
Introduced v1.10: Need to authorize kubelet-api to access kublet API Looks like it's fixed in the next version of kops |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
**1. What
kops
version are you running?Version 1.11.0
**2. What Kubernetes version are you running?
v1.10.11
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
5. What happened after the commands executed?
Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "*"has not yet joined cluster
master not healthy after update, stopping rolling-update: "error validating cluster after removing a node: cluster did not validate within a duration of "5m0s""
6. What did you expect to happen?
Cluster validation should pass for up kops as well kubernetes version upgrade.
**7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.The text was updated successfully, but these errors were encountered: