-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale up windows on AWS EKS cluster #3133
Comments
Can confirm, on eks 1.16 But if you never had any instance it doesn't work. Haven't tried waiting a few days after downscaling to 0, it may stop working again. |
Have the same issue with 1.17. |
Did you put labels to ASG tags? |
This should be resolved in last release. #2888 |
/assign @Jeffwan |
@Jeffwan yes, it scales up if you already have at least one node up. |
Scale from 0 should be working as well. Could you share your ASG tags? |
@Jeffwan Here my tags: Name: eks-windows-node-1a-Node
alpha.eksctl.io/cluster-name: eks
alpha.eksctl.io/eksctl-version: 0.24.0
alpha.eksctl.io/nodegroup-name: windows-node-1a
alpha.eksctl.io/nodegroup-type: unmanaged
eksctl.cluster.k8s.io/v1alpha1/cluster-name: eks
eksctl.io/v1alpha2/nodegroup-name: windows-node-1a
k8s.io/cluster-autoscaler/eks: owned
k8s.io/cluster-autoscaler/enabled: true
k8s.io/cluster-autoscaler/node-template/label/windows-node: 1a
k8s.io/cluster-autoscaler/node-template/taint/windows: true:NoSchedule
kubernetes.io/cluster/eks: owned As you can see I also have a Taints on these nodes. |
Can you add these tags? CA won't know your node has ENI and IP addresses. Please check #2888 (comment) for more details
|
@Jeffwan didn't help
After I scaled ASG up manually to one and added more workflows it successfully scales up by autoscaler:
|
Did you restart your CA or wait for a while after you apply the tag changes? |
@Jeffwan yes, I did:
|
@iusergii One last thing, what's the patch version are you using? |
@Jeffwan sorry, didn't get you. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@dschunack: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi, Problem is still exist on EKS 1.17 and 1.18. Problem is not solved yet.
|
Please reopen the issue. /reopen |
@dschunack: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@chmielas: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dschunack Issue has been reopened |
any news? |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@dschunack: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Problem still exist, please reopen the issue |
I commented in June and can confirm that when you scale down to zero, and wait some time (not sure how much time), then it stops working again. Only solution is either setting the min to 1 or scaling manually from zero every time |
This is not really a solution, but I think a solution could be to add the stable APIs as described in my other issues #3802 here. |
/reopen |
@chmielas: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dschunack issue has been reopened |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
For those who are getting Explicitly specify the amount of allocatable resources:
Tested with cluster-autoscaler v1.23 |
Hi,
I`m using Kubernetes based on EKS 1.15 with windows node group, vpc controller and webhook
and cluster autoscaler
cluster-autoscaler cluster-autoscaler v1.15.6
The problem that I have is similar to #2888
When ASG need to be scaled from 0 to 2 instances after couple days of inactivity autoscaler don`t trigger scale up.
The workaround is to set the minimum size of ASG to 1. In such case, autoscaler don`t have any problem with scale up and scale down.
After update to v1.15.6 problem still occurs
Here is pod output
and some logs from autoscaller
The text was updated successfully, but these errors were encountered: