Azure based acs-engine autoscaler failing with standard vm type (availability set) #644

alexquintero · 2018-02-12T18:44:11Z

Issue:
Cluster-autoscaler appears to be requesting a scaleset from Azure despite the vmtype being set to standard.

Log Output:

E0212 18:25:12.947818       1 static_autoscaler.go:135] Failed to update node registry: compute.VirtualMachineScaleSetsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Compute/virtualMachineScaleSets/genpop' under resource group 'k8s' was not found."

Versions:
Kubernetes version 1.9.2
ACS-Engine version 0.12.5
cluster-autoscaler version v1.1.1

Additional Details:

My deployment is based entirely on the azure cloudprovider readme's cluster-autoscaler-standard-master.yaml.

I made the following modifications:

Set the image line in the deployment to match my Kubernetes version image: gcr.io/google_containers/cluster-autoscaler:v1.1.1
Added additional rights to the clusterrole for storageclasses

- apiGroups: ["storage.k8s.io"]
  resources: ["storageclasses"]
  verbs: ["get", "list", "watch"]

I am not going to post my secret.yaml file for obvious reasons but the VMType is set to c3RhbmRhcmQ= which is the base64 of standard, but as I mentioned this value seems to be getting ignored. I filled in the rest of the values based on my service principal and the acs-engine parameters as directed by the readme.

The text was updated successfully, but these errors were encountered:

feiskyer · 2018-02-13T05:41:15Z

@alexquintero Thanks of trying autoscaler. The image v1.1.1 still doesn't support vmas yet (only vmss is included). Could you try build an image on master branch? e.g.

cd cluster-autoscaler
REGISTRY=<your-registry> make dev-release

or use the image I have already built feisky/cluster-autoscaler:dev.

Please also note that useInstanceMetadata should be set to false. See #641.

Please let me know if you still have problems.

alexquintero · 2018-02-13T15:27:14Z

@feiskyer I'll give it a go today and let you know.

alexquintero · 2018-02-13T16:52:29Z

@feiskyer Well turns out my VMs have useInstanceMetadata set to true. Is that an acs-engine setting that must be set prior to cluster creation or is that something I can resolve on an existing cluster?

feiskyer · 2018-02-14T03:54:03Z

@alexquintero It can be set when creating the cluster, or change /etc/kubernetes/azure.json after that. But please notice that new VMs created with autoscaler are still setting useInstanceMetadata to false for the later way. So it's better set useInstanceMetadata to false when creating the cluster.

And after kubernetes/kubernetes#59603, useInstanceMetadata could be set true (may be in v1.9.4).

alexquintero · 2018-02-21T22:53:45Z

@feiskyer I finally got around to giving this a go and ran into a different issue, but hey, that's progress.

I0221 22:39:27.968163       1 scale_up.go:199] Estimated 2 nodes needed in genpop
I0221 22:39:27.968186       1 scale_up.go:288] Final scale-up plan: [{genpop 3->5 (max: 10)}]
I0221 22:39:27.968213       1 scale_up.go:340] Scale-up: setting group genpop size to 5
I0221 22:39:28.175358       1 azure_agent_pool.go:246] Waiting for deploymentsClient.CreateOrUpdate(k8sv2, cluster-autoscaler-1224553423, {%!s(*resources.DeploymentProperties=&{0xc4209ae6e8 <nil> 0xc4209ae6f0 <nil> Incremental <nil>})})
W0221 22:39:28.232445       1 clusterstate.go:248] Disabling scale-up for node group genpop until 2018-02-21 22:44:28.232428596 +0000 UTC m=+390.245860041
E0221 22:39:28.232519       1 static_autoscaler.go:298] Failed to scale up: failed to increase node group size: resources.DeploymentsClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidDeploymentParameterValue" Message="The value of deployment parameter 'etcdPeerPrivateKey0' is null. Please specify the value or use the parameter reference. See https://aka.ms/arm-deploy/#parameter-file for details."
I0221 22:39:38.411936       1 scale_up.go:56] Pod default/stress-deployment-65c769644d-p4629 is unschedulable
I0221 22:39:38.411972       1 scale_up.go:56] Pod default/stress-deployment-65c769644d-6gxgc is unschedulable

I imagine if etcdPeerPrivateKey0 is required then so would etcdPeerPrivateKey1 and etcdPeerPrivateKey2. I did try supplying those values through an environment variable but got the same error so I imagine the container isn't looking for those environment variables.

feiskyer · 2018-02-22T01:28:46Z

@alexquintero Yep, those parameters are not handled yet, need an update to support them.

Remove a --arch=amd64 flag from envtest command

aleksandra-malinowska added area/cluster-autoscaler area/provider/azure Issues or PRs related to azure provider labels Feb 13, 2018

aleksandra-malinowska mentioned this issue Feb 13, 2018

Release Cluster Autoscaler 1.2 rc #618

Closed

aleksandra-malinowska assigned feiskyer Feb 14, 2018

feiskyer mentioned this issue Feb 24, 2018

Make deployment parameters robust for various acs-engine versions #682

Merged

mwielgus closed this as completed in #682 Feb 26, 2018

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024

Merge pull request kubernetes#644 from tenzen-y/remove-specifying-arch

c0a51b2

Remove a --arch=amd64 flag from envtest command

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure based acs-engine autoscaler failing with standard vm type (availability set) #644

Azure based acs-engine autoscaler failing with standard vm type (availability set) #644

alexquintero commented Feb 12, 2018

feiskyer commented Feb 13, 2018 •

edited

Loading

alexquintero commented Feb 13, 2018

alexquintero commented Feb 13, 2018

feiskyer commented Feb 14, 2018

alexquintero commented Feb 21, 2018 •

edited

Loading

feiskyer commented Feb 22, 2018

Azure based acs-engine autoscaler failing with standard vm type (availability set) #644

Azure based acs-engine autoscaler failing with standard vm type (availability set) #644

Comments

alexquintero commented Feb 12, 2018

feiskyer commented Feb 13, 2018 • edited Loading

alexquintero commented Feb 13, 2018

alexquintero commented Feb 13, 2018

feiskyer commented Feb 14, 2018

alexquintero commented Feb 21, 2018 • edited Loading

feiskyer commented Feb 22, 2018

feiskyer commented Feb 13, 2018 •

edited

Loading

alexquintero commented Feb 21, 2018 •

edited

Loading