Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPA Helm Chart.: Updater error: fail to get pod controller, node is not a valid owner #656

Closed
sarg3nt opened this issue May 16, 2024 · 4 comments

Comments

@sarg3nt
Copy link

sarg3nt commented May 16, 2024

I'm deploying the latest chart with no configuration changes and am finding that the
kube-system/vertical-pod-autoscaler-updater
Is throwing the following errors:

vertical-pod-autoscaler-updater-bdcd45465-qgdh4 E0516 17:03:33.841287       1 api.go:153] fail to get pod controller: pod=kube-proxy-lpul-vault-k8s-server-0.vault.ad.selinc.com err=Unhandled targetRef v1 / Node / lpul-vault-k8s-server-0.vault.ad.selinc.com, last error node is not a valid owner
vertical-pod-autoscaler-updater-bdcd45465-qgdh4 E0516 17:03:33.841304       1 api.go:153] fail to get pod controller: pod=cloud-controller-manager-lpul-vault-k8s-server-2.vault.ad.selinc.com err=Unhandled targetRef v1 / Node / lpul-vault-k8s-server-2.vault.ad.selinc.com, last error node is not a valid owner
vertical-pod-autoscaler-updater-bdcd45465-qgdh4 E0516 17:03:33.841316       1 api.go:153] fail to get pod controller: pod=kube-apiserver-lpul-vault-k8s-server-0.vault.ad.selinc.com err=Unhandled targetRef v1 / Node / lpul-vault-k8s-server-0.vault.ad.selinc.com, last error node is not a valid owner
vertical-pod-autoscaler-updater-bdcd45465-qgdh4 E0516 17:03:33.841325       1 api.go:153] fail to get pod controller: pod=kube-apiserver-lpul-vault-k8s-server-2.vault.ad.selinc.com err=Unhandled targetRef v1 / Node / lpul-vault-k8s-server-2.vault.ad.selinc.com, last error node is not a valid owner 
vertical-pod-autoscaler-updater-bdcd45465-qgdh4 E0516 17:03:33.841331       1 api.go:153] fail to get pod controller: pod=kube-proxy-lpul-vault-k8s-server-2.vault.ad.selinc.com err=Unhandled targetRef v1 / Node / lpul-vault-k8s-server-2.vault.ad.selinc.com, last error node is not a valid owner 
etc.

The only ones that seem to work are in the kube-system namespace:

vertical-pod-autoscaler-updater-bdcd45465-qgdh4 I0516 17:03:33.841558       1 pods_eviction_restriction.go:226] too few replicas for ReplicaSet kube-system/rke2-snapshot-controller-59cc9cd8f4. Found 1 live pods, needs 2 (global 2) 
vertical-pod-autoscaler-updater-bdcd45465-qgdh4 I0516 17:03:33.841585       1 pods_eviction_restriction.go:226] too few replicas for ReplicaSet kube-system/rke2-snapshot-validation-webhook-54c5989b65. Found 1 live pods, needs 2 (global 2) 
vertical-pod-autoscaler-updater-bdcd45465-qgdh4 I0516 17:03:33.841604       1 pods_eviction_restriction.go:226] too few replicas for ReplicaSet kube-system/rke2-metrics-server-655477f655. Found 1 live pods, needs 2 (global 2)        

I have the Terraform to deploy the raw files and that works fine but would like to switch to your Helm chart, which is not working.
I tried comparing what the raw files deploy for ClusterRoles vs the Helm chart but they are so different the comparison is difficult.

In any case, this does not appear to work for us.
Maybe the version of k8s?

Specs:

  • Kubernetes: v1.28.9+rke2r1
  • OS: Rocky Linux
  • Deployment: Terraform Helm Providor
@sebastien-prudhomme
Copy link
Contributor

Hi @sarg3nt, it seems it's related to this bug in the latest version of the app, can you try the chart in version 9.7.0? kubernetes/autoscaler#6808

@sebastien-prudhomme
Copy link
Contributor

It should be fixed by #657

@sarg3nt
Copy link
Author

sarg3nt commented May 17, 2024

I tried the new version and am getting the same error.
I confirmed the update is now at 1.1.2
autoscaling/vpa-updater:1.1.2

0517 23:33:11.844076       1 api.go:153] fail to get pod controller: pod=cloud-controller-manager-lpul-vault-k8s-server-1.vault.ad.selinc.com err=Unhandled target ││ Ref v1 / Node / lpul-vault-k8s-server-1.vault.ad.selinc.com, last error node is not a valid owner   

@sarg3nt
Copy link
Author

sarg3nt commented May 17, 2024

Update:
I noticed my custom deployment gives me those errors for the kube-system static pods as well, so I think that is normal, which kind of makes sense?
However the Helm chart deployment is only trying to update stuff in the kube-system namespace whereas my custom deployment updates everything.

I'm not seeing a config option that would limit it to the kube-system namespace. Am I missing something?

Also, when I deploy the helm chart with Terraform the VPA resources fail to deploy. It's like the Helm chart finished the install but the CRD's are not quite up yet. When I install my custom version this doesn't happen. I have the same Terraform depends_on logic in place so I'm not sure why it's doing this.

A question as well. How does the chart handle certificate renewal? Does it do it automatically on chart upgrade or are the certs going to expire?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants