Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaler not respecting the taint tag in AWS #2434

Closed
dcherniv opened this issue Oct 9, 2019 · 30 comments
Closed

Autoscaler not respecting the taint tag in AWS #2434

dcherniv opened this issue Oct 9, 2019 · 30 comments
Assignees
Labels
area/provider/aws Issues or PRs related to aws provider lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dcherniv
Copy link

dcherniv commented Oct 9, 2019

It seems that cluster autoscaler is ignoring the taints in some cases and tries to spin up a node pool that is tainted to schedule a pod with no toleration for the taint.

Example pod spec:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  labels:
    app: dev-pay-stub-services
    pod-template-hash: 64f54bbcd
  name: dev-pay-stub-services-64f54bbcd-d5z2r
  namespace: dev
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: dev-pay-stub-services-64f54bbcd
spec:
  containers:
  - env:
REDACTED
    image: REDACTED/pay-stub-services:0.9.1-5-g3d1358f
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: http
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 30
    name: pay-stub-services
    ports:
    - containerPort: 5000
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /ready
        port: http
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 30
    resources:
      limits:
        cpu: "2"
        memory: 4Gi
      requests:
        cpu: 1500m
        memory: 3Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
REDACTED
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      items:
      - key: newrelic.ini
        path: newrelic.ini
      name: dev-pay-stub-services
    name: newrelic-config
  - name: default-token-nj7fb
    secret:
      defaultMode: 420
      secretName: default-token-nj7fb

autoscaler logs from the time. Strangely it rememberd that gpu-4 had a taint. But exact gpu-0 from the another ASG in a different AZ it happily spun up.


Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048387       1 utils.go:237] Pod dev-pay-stub-services-64f54bbcd-27xzb can't be scheduled on nonproduction01-spot-gpu-4, predicate failed: PodToleratesNodeTaints predicate mismatch, reason: node(s) had taints that the pod didn't tolerate
-- | --

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048459       1 utils.go:227] Pod dev-pay-stub-services-64f54bbcd-htm6g can't be scheduled on nonproduction01-spot-gpu-4. Used cached predicate check results

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048470       1 scale_up.go:411] No pod can fit to nonproduction01-spot-gpu-4

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048574       1 scale_up.go:338] Skipping node group nonproduction01-spot-gpu-6 - max size reached

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048589       1 utils.go:237] Pod dev-pay-stub-services-64f54bbcd-27xzb can't be scheduled on nonproduction01-spot-gpu-7, predicate failed: PodToleratesNodeTaints predicate mismatch, reason: node(s) had taints that the pod didn't tolerate

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048661       1 utils.go:227] Pod dev-pay-stub-services-64f54bbcd-htm6g can't be scheduled on nonproduction01-spot-gpu-7. Used cached predicate check results

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048674       1 scale_up.go:411] No pod can fit to nonproduction01-spot-gpu-7

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048679       1 scale_up.go:423] Best option to resize: nonproduction01-spot-gpu-0

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048683       1 scale_up.go:427] Estimated 1 nodes needed in nonproduction01-spot-gpu-0

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048922       1 scale_up.go:521] Splitting scale-up between 4 similar node groups: {nonproduction01-spot-gpu-0, nonproduction01-spot-gpu-1, nonproduction01-spot-gpu-2, nonproduction01-spot-gpu-3}

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048933       1 scale_up.go:529] Final scale-up plan: [{nonproduction01-spot-gpu-0 0->1 (max: 1)}]

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048945       1 scale_up.go:694] Scale-up: setting group nonproduction01-spot-gpu-0 size to 1

  | Oct 9, 2019 @ 00:30:28.048 | I1009 04:30:28.048971       1 auto_scaling_groups.go:211] Setting asg nonproduction01-spot-gpu-0 size to 1

here's the setting for the node pool in aws.
image

So a couple of questions. Does autoscaler only finds out about the taints when it tries to spin up an instance from the nodepool?
Judging by the logs it appears there's some caching of the taints/labels going on. How long will it cache them for?
Why did it spin up an instance from a tainted node pool?
How can i make sure that pods that don't tolerate any taints wont trigger a scaling action?

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 9, 2019

/assign @Jeffwan

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 11, 2019

Does autoscaler only finds out about the taints when it tries to spin up an instance from the nodepool?

It uses predicates to check if pod can be scheduled on the candidate node groups template.

Judging by the logs it appears there's some caching of the taints/labels going on. How long will it cache them for?

It caches for every run. ~10s

Why did it spin up an instance from a tainted node pool?

Did you scale up from 0? Can you describe node if there's one and I'd like to know if you just tag ASG or do taint the node? ASG tag is used for scale from 0 case.

How can i make sure that pods that don't tolerate any taints wont trigger a scaling action?

If there's node group qualified, it won't trigger scaling up.

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 11, 2019

/area aws

@k8s-ci-robot
Copy link
Contributor

@Jeffwan: The label(s) area/aws cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/area aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 11, 2019

/area provider/aws

@k8s-ci-robot k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Oct 11, 2019
@d-baranowski
Copy link

I'm having the same issue on v1.13.8 :(

@Jeffwan
Copy link
Contributor

Jeffwan commented Nov 26, 2019

em.. @d-baranowski Thanks for reporting. I will have a check.

@d-baranowski
Copy link

d-baranowski commented Nov 27, 2019

I built a custom autoscaler container based off v1.13.8 branch and deployed it to our cluster. I added a bunch a print statements trying to make sense of the behaviour. You can find the logs and ASG definition in the following gist https://gist.github.com/d-baranowski/0f727f1426df438f69e8c906adacc060

Oddly the autoscaler appears to be picking up the taints and labels but neither gets applied to the new nodes as I'd expect. Also when considering the ASG for scale up it marks the ASGs as unsuitable despite the fact that according to the label tag the pod node selector would match.

@rapuckett
Copy link

Bump. AWS autoscaler (1.15.4) is still not applying node labels & taints to new instances.

@d-baranowski
Copy link

@Jeffwan Did you have any luck with this?

@Jeffwan
Copy link
Contributor

Jeffwan commented Jan 21, 2020

@rapuckett @d-baranowski autoscaler won't apply labels & taints to node object, it will try to fetch from ASG tags and construct node with taints in CA memory. Could I know what your ASG tags look like?

@d-baranowski
Copy link

d-baranowski commented Jan 21, 2020

        {
            "ResourceType": "auto-scaling-group",
            "ResourceId": "spl-test-asg-monitoring-az2-cluster",
            "PropagateAtLaunch": true,
            "Value": "monitoring-only",
            "Key": "k8s.io/cluster-autoscaler/node-template/label/restrict"
        },
        {
            "ResourceType": "auto-scaling-group",
            "ResourceId": "spl-test-asg-monitoring-az2-cluster",
            "PropagateAtLaunch": true,
            "Value": "true:NoSchedule",
            "Key": "k8s.io/cluster-autoscaler/node-template/taint/monitoring"
        },

@karolinepauls
Copy link

karolinepauls commented Feb 28, 2020

@Jeffwan

autoscaler won't apply labels & taints to node object, it will try to fetch from ASG tags and construct node with taints in CA memory.

So to be 100% clear, are we expecting that if an autoscaling group is tagged (e.g. "Key": "k8s.io/cluster-autoscaler/node-template/taint/spot", "Value": "true:NoSchedule", "PropagateAtLaunch": true}), cluster-autoscaler will understand that spinning a new instance in that group will result in it being tainted with the same key/value and will not do that when needing to fit a pod that doesn't tolerate the taint in question?

However, when a node from that group is spun for a different reason (e.g. when other pods with matching affinity+tolerations need to be accommodated for), in order for these nodes to receive taints, one has to add them in a different way, for example as arguments to kubelet?

In other words k8s.io/cluster-autoscaler/node-template/taint/* tags are only to tell the autoscaler how it should expect nodes to look like when spun?

If the above is correct, it should be documented - I can make a PR to the AWS FAQ document (not sure about others). If not, the real behaviour still needs to be documented but maybe someone else should do it.

@seh
Copy link

seh commented Feb 28, 2020

Yes, @karolinepauls, your understanding is correct: The cluster autoscaler does not establish or change taints on Kubernetes Node objects. The resource tags on the EC2 autoscaling groups are there to tell the autoscaler which taints you as the cluster operator intend to apply to the Nodes for the EC2 instances that these ASGs will create. How you apply those taints is up to you, but yes, setting them in the kubelet's --register-with-taints command-line flag is a common technique.

@oscar60310
Copy link
Contributor

@seh, @karolinepauls thank you, it's now more clear. I also think we should add some description on FAQ about the node-template tags.

BTW, @d-baranowski if you need to add labels/taints on node when it bootstrap with amazon-eks-node AMI, you can try adding args using --kubelet-extra-args.

@karolinepauls
Copy link

karolinepauls commented Mar 23, 2020

EDIT: Actually, it seems that i shouldn't have used CriticalAddonsOnly because that's a "rescheduler taint", and it's ignored by CA. I have to use my own taint.

I have 3 OnDemand ASGs and 3 Spot ASGs. OnDemand ASGs are tainted with CriticalAddonsOnly=true:NoSchedule. This should prevent the cluster-autoscaler from scaling up the ondemand instanes to allocate pods that don't tolerate the taint.

$ aws autoscaling describe-auto-scaling-groups --query '[AutoScalingGroups[*].AutoScalingGroupName]' --output text | xargs -n1 echo | grep cluster
cluster1-ondemand-eu-west-1a20200312110536898200000017
cluster1-ondemand-eu-west-1b2020031211053694470000001b
cluster1-ondemand-eu-west-1c20200312110536895300000016
cluster1-spot-eu-west-1a20200312110536924700000018
cluster1-spot-eu-west-1b20200312110536925200000019
cluster1-spot-eu-west-1c2020031211053694290000001a
$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names cluster1-ondemand-eu-west-1a20200312110536898200000017 cluster1-ondemand-eu-west-1b2020031211053694470000001b cluster1-ondemand-eu-west-1c20200312110536895300000016 --query 'AutoScalingGroups[*].[AutoScalingGroupName,Tags[*].[Key,Value,PropagateAtLaunch]]' --output text | grep -e cluster | column -t
cluster1-ondemand-eu-west-1a20200312110536898200000017
Name                                                                   kube-cluster1    True
k8s.io/cluster-autoscaler/cluster1                                     cluster1         False
k8s.io/cluster-autoscaler/enabled                                      true             False
k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/lifecycle  ondemand         True
k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage    100Gi            False
k8s.io/cluster-autoscaler/node-template/taint/CriticalAddonsOnly       true:NoSchedule  True
kubernetes.io/cluster/cluster1                                         owned            True
cluster1-ondemand-eu-west-1b2020031211053694470000001b
Name                                                                   kube-cluster1    True
k8s.io/cluster-autoscaler/cluster1                                     cluster1         False
k8s.io/cluster-autoscaler/enabled                                      true             False
k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/lifecycle  ondemand         True
k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage    100Gi            False
k8s.io/cluster-autoscaler/node-template/taint/CriticalAddonsOnly       true:NoSchedule  True
kubernetes.io/cluster/cluster1                                         owned            True
cluster1-ondemand-eu-west-1c20200312110536895300000016
Name                                                                   kube-cluster1    True
k8s.io/cluster-autoscaler/cluster1                                     cluster1         False
k8s.io/cluster-autoscaler/enabled                                      true             False
k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/lifecycle  ondemand         True
k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage    100Gi            False
k8s.io/cluster-autoscaler/node-template/taint/CriticalAddonsOnly       true:NoSchedule  True
kubernetes.io/cluster/cluster1                                         owned            True
$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names cluster1-spot-eu-west-1a20200312110536924700000018 cluster1-spot-eu-west-1b20200312110536925200000019 cluster1-spot-eu-west-1c2020031211053694290000001a --query 'AutoScalingGroups[*].[AutoScalingGroupName,Tags[*].[Key,Value,PropagateAtLaunch]]' --output text | grep -e cluster | column -t
cluster1-spot-eu-west-1a20200312110536924700000018
Name                                                                   kube-cluster1  True
k8s.io/cluster-autoscaler/cluster1                                     cluster1       False
k8s.io/cluster-autoscaler/enabled                                      true           False
k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/lifecycle  spot           True
k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage    100Gi          False
kubernetes.io/cluster/cluster1                                         owned          True
cluster1-spot-eu-west-1b20200312110536925200000019
Name                                                                   kube-cluster1  True
k8s.io/cluster-autoscaler/cluster1                                     cluster1       False
k8s.io/cluster-autoscaler/enabled                                      true           False
k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/lifecycle  spot           True
k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage    100Gi          False
kubernetes.io/cluster/cluster1                                         owned          True
cluster1-spot-eu-west-1c2020031211053694290000001a
Name                                                                   kube-cluster1  True
k8s.io/cluster-autoscaler/cluster1                                     cluster1       False
k8s.io/cluster-autoscaler/enabled                                      true           False
k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/lifecycle  spot           True
k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage    100Gi          False
kubernetes.io/cluster/cluster1                                         owned          True

However, when trying to allocate space for a new pod managed by a statefulset, with unallocated storage, cluster-autoscaler seems to be a confused about storage and ends up expanding a node group tagged with k8s.io/cluster-autoscaler/node-template/taint/CriticalAddonsOnly=true:NoSchedule, which no pod will tolerate by default.

I0323 14:24:39.755508       1 scale_up.go:263] Pod default/my-release-postgresql-0 is unschedulable
I0323 14:24:39.755557       1 scale_up.go:300] Upcoming 0 nodes
I0323 14:24:39.755586       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.756172       1 scheduler_binder.go:697] No matching volumes for Pod "default/my-release-postgresql-0", PVC "default/data-my-release-postgresql-0" on node "template-node-for-cluster1-ondemand-eu-west-1a20200312110536898200000017-2945094011860928531"
I0323 14:24:39.756197       1 scheduler_binder.go:752] Provisioning for claims of pod "default/my-release-postgresql-0" that has no matching volumes on node "template-node-for-cluster1-ondemand-eu-west-1a20200312110536898200000017-2945094011860928531" ...
I0323 14:24:39.756219       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.756233       1 csi_volume_predicate.go:135] Persistent volume had no name for claim default/data-my-release-postgresql-0
I0323 14:24:39.756241       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.756293       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.756658       1 scheduler_binder.go:697] No matching volumes for Pod "default/my-release-postgresql-0", PVC "default/data-my-release-postgresql-0" on node "template-node-for-cluster1-ondemand-eu-west-1b2020031211053694470000001b-2172180488295428279"
I0323 14:24:39.756682       1 scheduler_binder.go:752] Provisioning for claims of pod "default/my-release-postgresql-0" that has no matching volumes on node "template-node-for-cluster1-ondemand-eu-west-1b2020031211053694470000001b-2172180488295428279" ...
I0323 14:24:39.756701       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.756712       1 csi_volume_predicate.go:135] Persistent volume had no name for claim default/data-my-release-postgresql-0
I0323 14:24:39.756718       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.756753       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.757126       1 scheduler_binder.go:697] No matching volumes for Pod "default/my-release-postgresql-0", PVC "default/data-my-release-postgresql-0" on node "template-node-for-cluster1-ondemand-eu-west-1c20200312110536895300000016-2920899871041967569"
I0323 14:24:39.757145       1 scheduler_binder.go:752] Provisioning for claims of pod "default/my-release-postgresql-0" that has no matching volumes on node "template-node-for-cluster1-ondemand-eu-west-1c20200312110536895300000016-2920899871041967569" ...
I0323 14:24:39.757163       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.757176       1 csi_volume_predicate.go:135] Persistent volume had no name for claim default/data-my-release-postgresql-0
I0323 14:24:39.757189       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.757228       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.757565       1 scheduler_binder.go:697] No matching volumes for Pod "default/my-release-postgresql-0", PVC "default/data-my-release-postgresql-0" on node "template-node-for-cluster1-spot-eu-west-1a20200312110536924700000018-3126217379181517886"
I0323 14:24:39.757587       1 scheduler_binder.go:752] Provisioning for claims of pod "default/my-release-postgresql-0" that has no matching volumes on node "template-node-for-cluster1-spot-eu-west-1a20200312110536924700000018-3126217379181517886" ...
I0323 14:24:39.757609       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.757619       1 csi_volume_predicate.go:135] Persistent volume had no name for claim default/data-my-release-postgresql-0
I0323 14:24:39.757624       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.757662       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.758025       1 scheduler_binder.go:697] No matching volumes for Pod "default/my-release-postgresql-0", PVC "default/data-my-release-postgresql-0" on node "template-node-for-cluster1-spot-eu-west-1b20200312110536925200000019-4660979458788806340"
I0323 14:24:39.758047       1 scheduler_binder.go:752] Provisioning for claims of pod "default/my-release-postgresql-0" that has no matching volumes on node "template-node-for-cluster1-spot-eu-west-1b20200312110536925200000019-4660979458788806340" ...
I0323 14:24:39.758064       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.758082       1 csi_volume_predicate.go:135] Persistent volume had no name for claim default/data-my-release-postgresql-0
I0323 14:24:39.758092       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.758131       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.758496       1 scheduler_binder.go:697] No matching volumes for Pod "default/my-release-postgresql-0", PVC "default/data-my-release-postgresql-0" on node "template-node-for-cluster1-spot-eu-west-1c2020031211053694290000001a-7420554225850925066"
I0323 14:24:39.758514       1 scheduler_binder.go:752] Provisioning for claims of pod "default/my-release-postgresql-0" that has no matching volumes on node "template-node-for-cluster1-spot-eu-west-1c2020031211053694290000001a-7420554225850925066" ...
I0323 14:24:39.758528       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.758539       1 csi_volume_predicate.go:135] Persistent volume had no name for claim default/data-my-release-postgresql-0
I0323 14:24:39.758545       1 predicates.go:440] PVC default/data-my-release-postgresql-0 is not bound, assuming PVC matches predicate when counting limits
I0323 14:24:39.758588       1 scale_up.go:423] Best option to resize: cluster1-ondemand-eu-west-1a20200312110536898200000017
I0323 14:24:39.758603       1 scale_up.go:427] Estimated 1 nodes needed in cluster1-ondemand-eu-west-1a20200312110536898200000017
I0323 14:24:39.758833       1 scale_up.go:521] Splitting scale-up between 3 similar node groups: {cluster1-ondemand-eu-west-1a20200312110536898200000017, cluster1-ondemand-eu-west-1b2020031211053694470000001b, cluster1-ondemand-eu-west-1c20200312110536895300000016}
I0323 14:24:39.758855       1 scale_up.go:529] Final scale-up plan: [{cluster1-ondemand-eu-west-1a20200312110536898200000017 1->2 (max: 10)}]
I0323 14:24:39.758871       1 scale_up.go:694] Scale-up: setting group cluster1-ondemand-eu-west-1a20200312110536898200000017 size to 2
I0323 14:24:39.758922       1 auto_scaling_groups.go:221] Setting asg cluster1-ondemand-eu-west-1a20200312110536898200000017 size to 2

Looks like, the volume confusion prevents cluster-autoscaler from considering taints?

For reference, the pod fails to schedule after scaleups it triggers - it later manages to do so when something else happens to cause an ondemand instance group to expand.

Events:
  Type     Reason                  Age                From                                                 Message
  ----     ------                  ----               ----                                                 -------
  Normal   TriggeredScaleUp        27m                cluster-autoscaler                                   pod triggered scale-up: [{cluster1-ondemand-eu-west-1b2020031211053694470000001b 1->2 (max: 10)}]
  Warning  FailedScheduling        27m (x4 over 27m)  default-scheduler                                    0/10 nodes are available: 3 node(s) had taints that the pod didn't tolerate, 7 Insufficient cpu.
  Normal   TriggeredScaleUp        27m                cluster-autoscaler                                   pod triggered scale-up: [{cluster1-spot-eu-west-1b20200312110536925200000019 1->2 (max: 20)}]
  Warning  FailedScheduling        26m (x5 over 26m)  default-scheduler                                    0/11 nodes are available: 4 node(s) had taints that the pod didn't tolerate, 7 Insufficient cpu.
  Warning  FailedScheduling        25m (x5 over 25m)  default-scheduler                                    0/12 nodes are available: 5 node(s) had taints that the pod didn't tolerate, 7 Insufficient cpu.
  Normal   Scheduled               25m                default-scheduler                                    Successfully assigned default/my-release-postgresql-0 to ip-10-0-188-213.eu-west-1.compute.internal
  Normal   SuccessfulAttachVolume  25m                attachdetach-controller                              AttachVolume.Attach succeeded for volume "pvc-c6df6fd5-6d33-11ea-8fc7-0218e6b2b19a"
  Normal   Pulling                 25m                kubelet, ip-10-0-188-213.eu-west-1.compute.internal  Pulling image "docker.io/bitnami/minideb:latest"
  Normal   Pulled                  25m                kubelet, ip-10-0-188-213.eu-west-1.compute.internal  Successfully pulled image "docker.io/bitnami/minideb:latest"
  Normal   Created                 25m                kubelet, ip-10-0-188-213.eu-west-1.compute.internal  Created container init-chmod-data
  Normal   Started                 25m                kubelet, ip-10-0-188-213.eu-west-1.compute.internal  Started container init-chmod-data
  Normal   Pulling                 25m                kubelet, ip-10-0-188-213.eu-west-1.compute.internal  Pulling image "docker.io/bitnami/postgresql:11.3.0-debian-9-r38"
  Normal   Pulled                  25m                kubelet, ip-10-0-188-213.eu-west-1.compute.internal  Successfully pulled image "docker.io/bitnami/postgresql:11.3.0-debian-9-r38"
  Normal   Created                 25m                kubelet, ip-10-0-188-213.eu-west-1.compute.internal  Created container my-release-postgresql
  Normal   Started                 25m                kubelet, ip-10-0-188-213.eu-west-1.compute.internal  Started container my-release-postgresql

Cluster Autoscaler 1.14.6
EKS 1.14

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 21, 2020
@ghost
Copy link

ghost commented Aug 7, 2020

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 7, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2020
@k8s-ci-robot k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 5, 2020
@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 16, 2020
@lawliet89
Copy link

lawliet89 commented Dec 16, 2020

Landed here after observing the same issue. According to eksctl docs, you need to add the Effect to the tag value too.

        {
            "ResourceType": "auto-scaling-group",
            "ResourceId": "spl-test-asg-monitoring-az2-cluster",
            "PropagateAtLaunch": true,
            "Value": "monitoring-only:NoSchedule",
            "Key": "k8s.io/cluster-autoscaler/node-template/label/restrict"
        },

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 16, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 15, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@awoimbee
Copy link

I know it's a rather old issue, but why is this detail important:

Did you scale up from 0?

If I tell CA that my node group is tainted and that new nodes will be tainted, who cares if I'm scaling from 0, 1 or 20 ?

This is preventing me from implementing partial/graceful updates to my cluster since autoscaler keeps scaling up the outdated (and tainted) node group.

@sleterrier
Copy link

sleterrier commented Mar 12, 2023

@awoimbee, from cluster-autoscaler documentation:

When scaling up from 0 nodes, the Cluster Autoscaler reads ASG tags to derive information about the specifications of the nodes i.e labels and taints in that ASG. Note that it does not actually apply these labels or taints - this is done by an AWS generated user data script. It gives the Cluster Autoscaler information about whether pending pods will be able to be scheduled should a new node be spun up for a particular ASG with the asumption the ASG tags accurately reflect the labels/taint actually applied.

@tomthetommy
Copy link

Same issue. I'm on CA 1.24 / EKS 1.24. My understanding was that the autoscaler now can describenodegroups in order to view tags / taints. It seems to be viewing these tags (as I can scale from 0 without the template tags on the ASG's), but not the taints.

It'll spin up a node, and then complain the taint isn't tolerated by the pod.

@abstrask
Copy link

Same issue. I'm on CA 1.24 / EKS 1.24. My understanding was that the autoscaler now can describenodegroups in order to view tags / taints. It seems to be viewing these tags (as I can scale from 0 without the template tags on the ASG's), but not the taints.

It'll spin up a node, and then complain the taint isn't tolerated by the pod.

While that's true, the taints are not being read correctly (see #6481).

My colleague, @wcarlsen, and I believe we have a fix for this (PR #6482).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/aws Issues or PRs related to aws provider lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests