Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure based acs-engine autoscaler failing with standard vm type (availability set) #644

Closed
alexquintero opened this issue Feb 12, 2018 · 6 comments
Assignees
Labels
area/cluster-autoscaler area/provider/azure Issues or PRs related to azure provider

Comments

@alexquintero
Copy link

Issue:
Cluster-autoscaler appears to be requesting a scaleset from Azure despite the vmtype being set to standard.

Log Output:

E0212 18:25:12.947818       1 static_autoscaler.go:135] Failed to update node registry: compute.VirtualMachineScaleSetsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Compute/virtualMachineScaleSets/genpop' under resource group 'k8s' was not found."

Versions:
Kubernetes version 1.9.2
ACS-Engine version 0.12.5
cluster-autoscaler version v1.1.1

Additional Details:

My deployment is based entirely on the azure cloudprovider readme's cluster-autoscaler-standard-master.yaml.

I made the following modifications:

  • Set the image line in the deployment to match my Kubernetes version image: gcr.io/google_containers/cluster-autoscaler:v1.1.1
  • Added additional rights to the clusterrole for storageclasses
- apiGroups: ["storage.k8s.io"]
  resources: ["storageclasses"]
  verbs: ["get", "list", "watch"]
  • I am not going to post my secret.yaml file for obvious reasons but the VMType is set to c3RhbmRhcmQ= which is the base64 of standard, but as I mentioned this value seems to be getting ignored. I filled in the rest of the values based on my service principal and the acs-engine parameters as directed by the readme.
@feiskyer
Copy link
Member

feiskyer commented Feb 13, 2018

@alexquintero Thanks of trying autoscaler. The image v1.1.1 still doesn't support vmas yet (only vmss is included). Could you try build an image on master branch? e.g.

cd cluster-autoscaler
REGISTRY=<your-registry> make dev-release

or use the image I have already built feisky/cluster-autoscaler:dev.

Please also note that useInstanceMetadata should be set to false. See #641.

Please let me know if you still have problems.

@alexquintero
Copy link
Author

@feiskyer I'll give it a go today and let you know.

@alexquintero
Copy link
Author

@feiskyer Well turns out my VMs have useInstanceMetadata set to true. Is that an acs-engine setting that must be set prior to cluster creation or is that something I can resolve on an existing cluster?

@feiskyer
Copy link
Member

@alexquintero It can be set when creating the cluster, or change /etc/kubernetes/azure.json after that. But please notice that new VMs created with autoscaler are still setting useInstanceMetadata to false for the later way. So it's better set useInstanceMetadata to false when creating the cluster.

And after kubernetes/kubernetes#59603, useInstanceMetadata could be set true (may be in v1.9.4).

@alexquintero
Copy link
Author

alexquintero commented Feb 21, 2018

@feiskyer I finally got around to giving this a go and ran into a different issue, but hey, that's progress.

I0221 22:39:27.968163       1 scale_up.go:199] Estimated 2 nodes needed in genpop
I0221 22:39:27.968186       1 scale_up.go:288] Final scale-up plan: [{genpop 3->5 (max: 10)}]
I0221 22:39:27.968213       1 scale_up.go:340] Scale-up: setting group genpop size to 5
I0221 22:39:28.175358       1 azure_agent_pool.go:246] Waiting for deploymentsClient.CreateOrUpdate(k8sv2, cluster-autoscaler-1224553423, {%!s(*resources.DeploymentProperties=&{0xc4209ae6e8 <nil> 0xc4209ae6f0 <nil> Incremental <nil>})})
W0221 22:39:28.232445       1 clusterstate.go:248] Disabling scale-up for node group genpop until 2018-02-21 22:44:28.232428596 +0000 UTC m=+390.245860041
E0221 22:39:28.232519       1 static_autoscaler.go:298] Failed to scale up: failed to increase node group size: resources.DeploymentsClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidDeploymentParameterValue" Message="The value of deployment parameter 'etcdPeerPrivateKey0' is null. Please specify the value or use the parameter reference. See https://aka.ms/arm-deploy/#parameter-file for details."
I0221 22:39:38.411936       1 scale_up.go:56] Pod default/stress-deployment-65c769644d-p4629 is unschedulable
I0221 22:39:38.411972       1 scale_up.go:56] Pod default/stress-deployment-65c769644d-6gxgc is unschedulable

I imagine if etcdPeerPrivateKey0 is required then so would etcdPeerPrivateKey1 and etcdPeerPrivateKey2. I did try supplying those values through an environment variable but got the same error so I imagine the container isn't looking for those environment variables.

@feiskyer
Copy link
Member

@alexquintero Yep, those parameters are not handled yet, need an update to support them.

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
Remove a --arch=amd64 flag from envtest command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/azure Issues or PRs related to azure provider
Projects
None yet
Development

No branches or pull requests

3 participants