-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA fails to scale-up or cancel in-progress scale down when there are un-schedulable pods #4456
Comments
I am facing the same issue as @varkey . Also observed that when the existing CA pod is deleted manually , forcing a new CA pod comes up the pending pods get scheduled.
|
We've noticed this as well, re-creating a new CA pod seem to reset the state somehow. It seem to forget about the node which is being drained or at least it no longer waits for the remaining termination grace period. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@x13n: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
@varkey: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
- this is a follow-up to kubernetes#5054 - this might fix kubernetes#4456 Signed-off-by: vadasambar <[email protected]>
- this is a follow-up to kubernetes#5054 - this might fix kubernetes#4456 Signed-off-by: vadasambar <[email protected]> fix: make `HasInstance` in aws provider thread-safe Signed-off-by: vadasambar <[email protected]>
- this is a follow-up to kubernetes#5054 - this might fix kubernetes#4456 fix: make `HasInstance` in aws provider thread-safe Signed-off-by: vadasambar <[email protected]> (cherry picked from commit 1cb55fe)
- this is a follow-up to kubernetes#5054 - this might fix kubernetes#4456 fix: make `HasInstance` in aws provider thread-safe Signed-off-by: vadasambar <[email protected]> (cherry picked from commit 1cb55fe)
- this is a follow-up to kubernetes#5054 - this might fix kubernetes#4456 Signed-off-by: vadasambar <[email protected]> fix: make `HasInstance` in aws provider thread-safe Signed-off-by: vadasambar <[email protected]> (cherry picked from commit 1cb55fe)
- this is a follow-up to kubernetes#5054 - this might fix kubernetes#4456 Signed-off-by: vadasambar <[email protected]> fix: make `HasInstance` in aws provider thread-safe Signed-off-by: vadasambar <[email protected]>
/remove-lifecycle rotten |
/reopen |
@chrisyao2022: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- this is a follow-up to kubernetes#5054 - this might fix kubernetes#4456 Signed-off-by: vadasambar <[email protected]> fix: make `HasInstance` in aws provider thread-safe Signed-off-by: vadasambar <[email protected]> (cherry picked from commit 1cb55fe)
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
cluster-autoscaler
Component version:
k8s.gcr.io/autoscaling/cluster-autoscaler:v1.20.0
What k8s version are you using (
kubectl version
)?:v1.20.10
kubectl version
OutputWhat environment is this in?:
KOPS (v1.20.2) based cluster deployed on AWS
CA is run with the following arguments
The nodes span multiple AZs, but we use separate ASGs for each AZ. Also there are several node groups with different taints so that we can place applications in dedicated node groups.
In the above case, all
node-group-1-xx
node groups have the same taints, instance types etc.We use HPA for many of our applications, and can dynamically scale up and down the number of replicas. As can be seen below, the scale up happen every hour and then the load subsides, and the HPA scales down the number of replicas.
Due to the nature of our applications, some of them are configured with a termination grace period of up to 3 hours. Since we do not want CA to prematurely terminate a node, we have configured CA with
--max-graceful-termination-sec=10800
which is the maximum termination grace period across all pods in the cluster.What did you expect to happen?:
When an HPA scales up a deployment, CA should scale up the appropriate node group without any delay if there is no available capacity. The pods will fit any of the nodes coming under
node-group-x-az1
ornode-group-x-az2
ornode-group-x-az3
What happened instead?:
We noticed that CA does not trigger a scale up always and the pod remains in a
Pending
state for 20-30 min or at times even more.In the above case, the pod remained in
Pending
state for 20 minutes and CA was logging messages such as below repeatedly.From a kube-scheduler perspective, the log said
Investigating further on the
1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636105837}
message, we noticed there was a scale down in progress which had been initiated around 76 min prior to the new un-schedulable pod being created. The node in question wasip-10-10-77-101.ec2.internal
which was the same node that had theToBeDeletedByClusterAutoscaler 1636105837
taint. The node also belonged to one of the node-groups where the pod could have been scheduled.CA seem to be thinking that the above node can be used to schedule the un-schedulable pod but it is actually being deleted due to under-utilisation. Due to a max termination grace period of 3 hours (and cause the pods already running on the node failed to grace-fully terminate before that), CA had to wait 3 hours for all pods to be removed from the affected node. During this interval, there was no scale up triggered by CA even though there were pods that needed to be scheduled.
To summarise, it appears CA will not trigger a scale up in node groups where a scale down is in progress, even if there are
Pending
pods waiting to be scheduled. This may not be a real issue if termination grace period isn't this long.Anyway, the current behaviour doesn't seem right as CA thinks that the pod is schedulable and does nothing, but there is no capacity and should have triggered a scale-up. I am not sure if that is efficient though, perhaps CA should just cancel the in-progress scale down by removing the
ToBeDeletedByClusterAutoscaler
taint allowing thePending
pod to be scheduled immediately? I would think this is more efficient than allowing a new node to launch.We see this behaviour happen multiple times a day in our
prod
environments. At times, capacity is made available after couple of minutes without a scale up possibly due to reduction in number of replicas of other services/pods running on the same node groups. But the fundamental issue of CA failing to scale up or cancel the in-progress scale down is a problem.How to reproduce it (as minimally and precisely as possible):
This should be re-produceable whenever there are
Pending
pods to be scheduled and there is a scale down in progress due to under-utilisation needing eviction of existing pods. If CA and the pods are configured with a high max termination grace period and the pods take a while to gracefully terminate, CA will not trigger a scale up or cancel the in-progress scale down.Anything else we need to know?:
The text was updated successfully, but these errors were encountered: