Incorrect deployment command format for cluster-autoscaler on GCE causing "Back-off restarting failed container" loop #10938

MoSheikh · 2021-02-26T20:56:39Z

1. What kops version are you running? The command kops version, will display
this information.

1.19.1
1.20.0-beta.1

Bug occurs in both versions.

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.20.2

3. What cloud provider are you using?

Google Compute Engine

4. What commands did you run? What is the simplest way to reproduce this issue?

Created a basic GCE cluster as per documentation. In my configuration, api.dev.k8s.local was the name of my cluster, and us-east1-b was the zone I selected for my cluster. After creating the cluster, run the following.

export KOPS_FEATURE_FLAGS=AlphaAllowGCE,SpecOverrideFlag
kops set cluster spec.clusterAutoscaler.enabled=true 
kops update cluster --admin=87600h -y

5. What happened after the commands executed?

The cluster-autoscaler pod created by kops get stuck in the following event: Back-off restarting failed container

Kops creates the a cluster-autoscaler deployment file with a spec.containers[0].command array that included the following line:

--nodes=1:1:nodes-us-east1-b.api.example.k8s.local

Which is in the format of:

--nodes=<minNodes>:<maxNodes>:<instanceGroupName>.<clusterName>

6. What did you expect to happen?

The above command should be

--nodes=1:1:nodes-us-east1-b

Which is in the format of:

--nodes=<minNodes>:<maxNodes>:<instanceGroupName>

Replacing the values for that Deployment spec field resolves the issue.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2021-02-26T13:57:13Z"
  generation: 5
  name: api.dev.k8s.local
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudConfig:
    gceServiceAccount: [email protected]
  cloudProvider: gce
  clusterAutoscaler:
    cpuRequest: 100m
    enabled: true
    memoryRequest: 300Mi
    skipNodesWithLocalStorage: true
    skipNodesWithSystemPods: true
  configBase: gs://devops.example.app/kops/state/api.dev.k8s.local
  containerRuntime: docker
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-us-east1-b
      name: b
      volumeType: pd-standard
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-us-east1-b
      name: b
      volumeType: pd-standard
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.19.7
  masterInternalName: api.internal.api.dev.k8s.local
  masterPublicName: api.api.dev.k8s.local
  metricsServer: {}
  networking:
    kubenet: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  project: dojolaunch-302702
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - name: us-east1
    region: us-east1
    type: Public
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

The text was updated successfully, but these errors were encountered:

MoSheikh changed the title ~~Incorrect deployment command syntax for cluster-autoscaler on GCE~~ Incorrect deployment command format for cluster-autoscaler on GCE causing "Back-off restarting failed container" loop Feb 26, 2021

olemarkus mentioned this issue Feb 27, 2021

gce doesn't suffix the IG names with ClusterName #10944

Merged

k8s-ci-robot closed this as completed in #10944 Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect deployment command format for cluster-autoscaler on GCE causing "Back-off restarting failed container" loop #10938

Incorrect deployment command format for cluster-autoscaler on GCE causing "Back-off restarting failed container" loop #10938

MoSheikh commented Feb 26, 2021 •

edited

Loading

Incorrect deployment command format for cluster-autoscaler on GCE causing "Back-off restarting failed container" loop #10938

Incorrect deployment command format for cluster-autoscaler on GCE causing "Back-off restarting failed container" loop #10938

Comments

MoSheikh commented Feb 26, 2021 • edited Loading

MoSheikh commented Feb 26, 2021 •

edited

Loading