Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max Node Provision Time + Priority Expander + Node Unavailability #3490

Closed
nitrag opened this issue Sep 4, 2020 · 18 comments
Closed

Max Node Provision Time + Priority Expander + Node Unavailability #3490

nitrag opened this issue Sep 4, 2020 · 18 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@nitrag
Copy link
Contributor

nitrag commented Sep 4, 2020

I have an issue where we did not scale up because spot instances were not available.

AWS EKS: 1.15
Cluster-Autoscaler: 1.15.7
Expander: priority

I have configured --max-node-provision-time="5m0s". The expected behavior would be that after 5 minutes, the node group is marked as unavailable and the next highest priority group is chosen. The logs below shot it's being ignored. IThe default value for --max-node-provision-time is 15 minutes but I see cluster-autoscaler recognizing the node group unhealthy at 10 minutes. Also, why was the spot not group (marked?) unhealthy yet priority expander still chooses it later on? What is the purpose of the i-placeholder instances?

Here are the logs:

--max-node-provision-time="5m0s"  <<confirmed in first 50 lines of pod startup>>

...

2020-09-02T11:22:40.246-04:00
{"log":"I0902 15:22:40.246183 1 scale_up.go:699] Scale-up: setting group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot size to 1

<< nothing 5 minutes later 11:27:xx >>

2020-09-02T11:32:08.458-04:00
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
Failed to find readiness information for pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot
...<< continuous logs starting 11:32:08 through 11:33:20 >>

2020-09-02T11:33:10.351-04:00
{"log":"W0902 15:33:10.351097 1 clusterstate.go:612] Readiness for node group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot not found

<< 10 minutes later?? >>

2020-09-02T11:33:10.353-04:00
{"log":"W0902 15:33:10.353075 1 scale_up.go:326] Node group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot is not ready for scaleup - unhealthy


2020-09-02T11:33:10.833-04:00
{"log":"I0902 15:33:10.831490 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0


2020-09-02T11:35:25.090-04:00
{"log":"I0902 15:35:25.086715 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:35:56.010-04:00
{"log":"I0902 15:35:55.995416 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:35:56.505-04:00
{"log":"I0902 15:35:56.505028 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:36:57.919-04:00
{"log":"I0902 15:36:57.917399 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:38:00.028-04:00
{"log":"I0902 15:38:00.025189 1 auto_scaling_groups.go:407] Instance group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot has only 0 instances created while requested count is 1. Creating placeholder instance with ID i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0.

2020-09-02T11:38:20.823-04:00
{"log":"I0902 15:38:20.823195 1 utils.go:494] Removing unregistered node aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0

2020-09-02T11:38:20.823-04:00
{"log":"I0902 15:38:20.823334 1 auto_scaling_groups.go:267] instance i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0 is detected as a placeholder, decreasing ASG requested size instead of deleting instance

2020-09-02T11:38:21.011-04:00
{"log":"I0902 15:38:20.823348 1 auto_scaling_groups.go:221] Setting asg pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot size to 0

2020-09-02T11:38:26.107-04:00
{"log":"I0902 15:38:21.289446 1 event.go:258] Event(v1.ObjectReference{Kind:\"ConfigMap\", Namespace:\"kube-system\", Name:\"cluster-autoscaler-status\", UID:\"b055dfc9-ed07-44eb-b3de-df4798d0716b\", APIVersion:\"v1\", ResourceVersion:\"121379234\", FieldPath:\"\"}): type: 'Normal' reason: 'DeleteUnregistered' Removed unregistered node aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0\n

2020-09-02T11:38:31.559-04:00
{"log":"W0902 15:38:31.558973 1 clusterstate.go:577] Nodegroup is nil for aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0

2020-09-02T11:38:31.559-04:00
{"log":"I0902 15:38:31.559051 1 utils.go:494] Removing unregistered node aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0

2020-09-02T11:38:31.559-04:00
{"log":"W0902 15:38:31.559062 1 utils.go:501] No node group for node aws:///us-east-1a/i-placeholder-pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot-0


2020-09-02T11:38:31.562-04:00
{"log":"W0902 15:38:31.561377 1 priority.go:91] Priority expander: node group pv-platform-prod-worker-gpu-accelerated-p2-xlarge found with score 10


2020-09-02T11:38:31.562-04:00
{"log":"W0902 15:38:31.561392 1 priority.go:91] Priority expander: node group pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot found with score 40


2020-09-02T11:38:31.562-04:00
{"log":"I0902 15:38:31.561415 1 scale_up.go:427] Best option to resize: pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot


2020-09-02T11:38:31.562-04:00
{"log":"I0902 15:38:31.561427 1 scale_up.go:431] Estimated 1 nodes needed in pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot

2020-09-02T11:38:31.562-04:00
{"log":"I0902 15:38:31.561446 1 scale_up.go:533] Final scale-up plan: [{pv-platform-prod-worker-gpu-accelerated-p2-xlarge-spot 0->1

<< at this point ec2 spot is successful in finding an instance (we got lucky) but our use case we can't take longer than 5 minutes to scale up a node and must fallback to non-spot (the lower priority node group) >>
@daganida88
Copy link

I encountered the same issue.
Did you resolve this??

@nitrag
Copy link
Contributor Author

nitrag commented Sep 24, 2020

@daganida88 No unfortunately. But thanks for confirming it’s not just me. Maybe now this will get some attention.

@kavicse87
Copy link

Any status on this ? I'm facing the same issue.

@4pits
Copy link

4pits commented Oct 28, 2020

Well for me it worked.

See Sample logs

I1028 05:44:57.187621 1 static_autoscaler.go:303] Filtering out schedulables I1028 05:44:57.187676 1 static_autoscaler.go:320] No schedulable pods I1028 05:44:57.187691 1 scale_up.go:263] Pod default/monte-carlo-pi-service-75b6457d79-gx6wf is unschedulable I1028 05:44:57.187727 1 scale_up.go:300] Upcoming 0 nodes

I1028 05:45:07.367756 1 scale_up.go:263] Pod default/monte-carlo-pi-service-75b6457d79-gx6wf is unschedulable I1028 05:45:07.367869 1 scale_up.go:300] Upcoming 1 nodes

And then after 5 mins

W1028 05:50:19.239644 1 scale_up.go:325] Node group eksctl-eksworkshop-eksctl-nodegroup-Special-NodeGroup-M6YPOMZGXQAF is not ready for scaleup - backoff

And it then starts a node from next one from Priority Expander

I1028 05:52:20.714525 1 scale_up.go:300] Upcoming 1 nodes I1028 05:52:20.714546 1 scale_up.go:338] Skipping node group eks-72bab455-6bfe-d81f-55d8-8596f99d207d - max size reached I1028 05:52:20.714616 1 scale_up.go:408] No need for any nodes in eksctl-eksworkshop-eksctl-nodegroup-Large-NodeGroup-1RNXIXEQXPVYJ I1028 05:52:20.714665 1 scale_up.go:408] No need for any nodes in eksctl-eksworkshop-eksctl-nodegroup-Medium-NodeGroup-1MKUB7WL0ZSTV I1028 05:52:20.714718 1 scale_up.go:408] No need for any nodes in eksctl-eksworkshop-eksctl-nodegroup-Small-NodeGroup-1P7ZJ9QGOF34V W1028 05:52:20.714727 1 scale_up.go:325] Node group eksctl-eksworkshop-eksctl-nodegroup-Special-NodeGroup-M6YPOMZGXQAF is not ready for scaleup - backoff

I1028 05:53:11.047588 1 utils.go:147] Pod monte-carlo-pi-service-75b6457d79-gx6wf marked as unschedulable can be scheduled on ip-192-168-74-245.us-west-2.compute.internal. Ignoring in scale up. I1028 05:53:11.047608 1 static_autoscaler.go:317] Schedulable pods present I1028 05:53:11.047623 1 static_autoscaler.go:328] No unschedulable pods I1028 05:53:11.047635 1 static_autoscaler.go:375] Calculating unneeded node

Now my Pod is scheduled.

@nitrag
Copy link
Contributor Author

nitrag commented Oct 28, 2020

@4pits what version of EKS and cluster-autoscaler are you using?

How did you get scaleup backoff to be 5 minutes? I believe it is 15 minutes by default. Did you change max-node-provision-time?

@MaciekPytel
Copy link
Contributor

Scale-up backoff always starts at 5 minutes and grows exponentially up to 30 minutes [1].

Regarding i-placeholder instances - CA operates on individual nodes, not number of nodes. When it scales-up it tracks the status and timeouts individually for each node and placeholder instances are needed to allow that. You can find more details in discussions on #2008.

[1] https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/clusterstate/clusterstate.go#L54

@nitrag
Copy link
Contributor Author

nitrag commented Oct 29, 2020

@MaciekPytel

I don't see any configuration variables that would allow us to customize these variables. Here

How would I accomplish this: I have two priority groups. The first priority fails to scale up within 5 minutes, give up on it, mark it as failed, replace it with the next highest priority, don't attempt again for 2 hours. Will I have to edit the source and build manually?

@MaciekPytel
Copy link
Contributor

I think so, as you say those values are consts and can't currently be changed via flags.

@4pits
Copy link

4pits commented Oct 30, 2020

Yes @nitrag
I used Kubernetes version 1.17 and eks version 0.30
--max-node-provision-time=5m0s

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tarun-asthana
Copy link

@nitrag I faced the same issue. Were you also facing a spot interruption in your fleet?

@nitrag
Copy link
Contributor Author

nitrag commented May 19, 2022

We haven't seen this since 1.19+ and even before then it was rare.

@tarun-asthana
Copy link

@nitrag I am using v1.18.2 on Kubernetes 1.18. It was spot interruption which I confirmed from AWS Support. Shouldn't cluster autoscalar not use an ASG which it has marked unhealthy ? In my case it was still using it.

@tarun-asthana
Copy link

Created a separate bug for the same - #4900

@charlesritchea
Copy link

Heyyy just happened again on the same cluster as OP on 1.28 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

9 participants