-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot deploy cluster-autoscaler with Rancher RKE2 #5140
Comments
@ctrox Do you have any ideas from the above? |
From your message in the logs the autoscaler does not have permissions to get nodes, so I'm assuming it is missing some permissions (on the downstream cluster). If the autoscaler is running on the downstream cluster, you need to make sure the service account you set in the deployment has these permissions but that should happen automatically with the helm chart default values. Also please note that cluster-autoscaler 1.23.1 that you linked does not contain the rancher provider, I'm assuming it will be in the next minor release (1.25.0). |
Thanks @ctrox . I was wondering about that 1.23.1. I will double check the service account. |
Hi @ctrox , I'm deploying the cluster-autoscaler chart similar to what's described in this issue, although my The API calls seem to be successful since autoscaler is able to read the node names in the cluster and discover the node group based on the log message: I'm on kubernetes version |
The same issue here. cluster-autoscaler version 1.25.0, installed via helm. rancher version 2.6.9, rke2 version 1.24.4+rke2r1. Cluster type Amazon EC2. Cluster-autoscaler is terminating with error "Failed to find readiness information for cpu-worker" (exit code 137). "cpu-worker" is the name of pool in rancher cluster. Please see cluster-autoscaler log in attached file. |
Can you try without the Also can you tell me the
I just verified here that cluster-autoscaler v1.25.0 runs fine with an RKE2 cluster, even a way older version of |
Hello @ctrox , Without --nodes flag result is the same. Here the snippet from cluster-autoscaler log: I1107 17:39:19.455773 37 klogx.go:86] Pod ci-test/node-example-main-657d4bb7f4-fqwvn is unschedulable And the output of command kubectl describe node i-0d33022be1ed6ac78.eu-central-1.compute.internal |grep ProviderID: ProviderID: aws:///eu-central-1a/i-0d33022be1ed6ac78 |
Aha, it makes sense now why it does not work with your EC2 backed cluster. This is a bit weird, looks like I (wrongly) assumed rancher would always set the Just to be sure, you created your cluster with EC2 using RKE2 like so? Would you mind sharing a full node object with |
Q: Rancher cloud-provider is not yet supported in the Helm chart right? |
I have not tested it but I think it should work with the helm chart. You just need to a few values like Thanks @nugzarg, I can think of a possible fix but I'm not yet sure when I will have time for that. I will look into it more on friday. |
Thanks @ctrox . |
Hey @ctrox, thanks for taking a look at this! What infrastructure provider are you using in your test environment, if any? Our clusters are being created on vSphere and running |
I'm using a custom node driver which is not built-in. My guess is that just the ones that don't have a |
Which component are you using?:
cluster-autoscaler / cluster-autoscaler-chart
What version of the component are you using?:
Component version:
cluster-autoscaler1.23.1 / cluster-autoscaler-chart-9.20.0
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
Dev
What did you expect to happen?:
Cluster Autoscaler to deploy with Helm chart to my Rancher RKE2 cluster after the changes from PR 4975.
What happened instead?:
This error in the cluster autoscaler pod logs:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
On Rancher 2.6.7 with RKE2 1.23.x.
Tried to deploy to downstream and management clusters (both on RKE2 1.23.x)
I am wondering if something is wrong in reading my deployed cloud-config?
The text was updated successfully, but these errors were encountered: