-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoscaler not respecting the taint tag in AWS #2434
Comments
/assign @Jeffwan |
It uses predicates to check if pod can be scheduled on the candidate node groups template.
It caches for every run. ~10s
Did you scale up from 0? Can you describe node if there's one and I'd like to know if you just tag ASG or do taint the node? ASG tag is used for scale from 0 case.
If there's node group qualified, it won't trigger scaling up. |
/area aws |
@Jeffwan: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/area provider/aws |
I'm having the same issue on v1.13.8 :( |
em.. @d-baranowski Thanks for reporting. I will have a check. |
I built a custom autoscaler container based off v1.13.8 branch and deployed it to our cluster. I added a bunch a print statements trying to make sense of the behaviour. You can find the logs and ASG definition in the following gist https://gist.github.com/d-baranowski/0f727f1426df438f69e8c906adacc060 Oddly the autoscaler appears to be picking up the taints and labels but neither gets applied to the new nodes as I'd expect. Also when considering the ASG for scale up it marks the ASGs as unsuitable despite the fact that according to the label tag the pod node selector would match. |
Bump. AWS autoscaler (1.15.4) is still not applying node labels & taints to new instances. |
@Jeffwan Did you have any luck with this? |
@rapuckett @d-baranowski autoscaler won't apply labels & taints to node object, it will try to fetch from ASG tags and construct node with taints in CA memory. Could I know what your ASG tags look like? |
|
So to be 100% clear, are we expecting that if an autoscaling group is tagged (e.g. However, when a node from that group is spun for a different reason (e.g. when other pods with matching affinity+tolerations need to be accommodated for), in order for these nodes to receive taints, one has to add them in a different way, for example as arguments to kubelet? In other words If the above is correct, it should be documented - I can make a PR to the AWS FAQ document (not sure about others). If not, the real behaviour still needs to be documented but maybe someone else should do it. |
Yes, @karolinepauls, your understanding is correct: The cluster autoscaler does not establish or change taints on Kubernetes Node objects. The resource tags on the EC2 autoscaling groups are there to tell the autoscaler which taints you as the cluster operator intend to apply to the Nodes for the EC2 instances that these ASGs will create. How you apply those taints is up to you, but yes, setting them in the kubelet's |
@seh, @karolinepauls thank you, it's now more clear. I also think we should add some description on FAQ about the node-template tags. BTW, @d-baranowski if you need to add labels/taints on node when it bootstrap with amazon-eks-node AMI, you can try adding args using --kubelet-extra-args. |
EDIT: Actually, it seems that i shouldn't have used CriticalAddonsOnly because that's a "rescheduler taint", and it's ignored by CA. I have to use my own taint. I have 3 OnDemand ASGs and 3 Spot ASGs. OnDemand ASGs are tainted with
However, when trying to allocate space for a new pod managed by a statefulset, with unallocated storage, cluster-autoscaler seems to be a confused about storage and ends up expanding a node group tagged with
Looks like, the volume confusion prevents cluster-autoscaler from considering taints? For reference, the pod fails to schedule after scaleups it triggers - it later manages to do so when something else happens to cause an ondemand instance group to expand.
Cluster Autoscaler 1.14.6 |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Landed here after observing the same issue. According to {
"ResourceType": "auto-scaling-group",
"ResourceId": "spl-test-asg-monitoring-az2-cluster",
"PropagateAtLaunch": true,
"Value": "monitoring-only:NoSchedule",
"Key": "k8s.io/cluster-autoscaler/node-template/label/restrict"
}, |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I know it's a rather old issue, but why is this detail important:
If I tell CA that my node group is tainted and that new nodes will be tainted, who cares if I'm scaling from 0, 1 or 20 ? This is preventing me from implementing partial/graceful updates to my cluster since autoscaler keeps scaling up the outdated (and tainted) node group. |
@awoimbee, from cluster-autoscaler documentation:
|
Same issue. I'm on CA 1.24 / EKS 1.24. My understanding was that the autoscaler now can describenodegroups in order to view tags / taints. It seems to be viewing these tags (as I can scale from 0 without the template tags on the ASG's), but not the taints. It'll spin up a node, and then complain the taint isn't tolerated by the pod. |
While that's true, the taints are not being read correctly (see #6481). My colleague, @wcarlsen, and I believe we have a fix for this (PR #6482). |
It seems that cluster autoscaler is ignoring the taints in some cases and tries to spin up a node pool that is tainted to schedule a pod with no toleration for the taint.
Example pod spec:
autoscaler logs from the time. Strangely it rememberd that gpu-4 had a taint. But exact gpu-0 from the another ASG in a different AZ it happily spun up.
here's the setting for the node pool in aws.
So a couple of questions. Does autoscaler only finds out about the taints when it tries to spin up an instance from the nodepool?
Judging by the logs it appears there's some caching of the taints/labels going on. How long will it cache them for?
Why did it spin up an instance from a tainted node pool?
How can i make sure that pods that don't tolerate any taints wont trigger a scaling action?
The text was updated successfully, but these errors were encountered: