-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale from zero fails to work on k8s >= 1.18 AWS #83
Comments
I have tried to reproduce the issue and yes the experimentation is giving results as described in the issue. The problem is due to missing of this label
|
I have digged into the 3rd solution talked in last comment and referred (kubernetes#3230 (comment)). I used v0.10.x for the CSI driver along with and without the flag After using v0.10.x and v1.0.0( not v1.1.0 though) for CSI driver ,the well known topology label is now a part of nodeAffinity in PV along with AWS EBS label and according to kubernetes-sigs/aws-ebs-csi-driver#729 (comment) it should solve our problem by prioritizing the well known label, but it doesn't. Also the flag |
@amshuman-kr - What do you think would be the good approach between (1) and (2) above then? |
Fixed with gardener/gardener-extension-provider-aws#365. |
What happened:
moving a pod from one worker group node type to another via node label selector resulted in the following autoscaler error when the destination worker group is at a minimum size 0. This was noticed after we upgraded from k8s 1.17 -> 1.18 on
AWS
What you expected to happen:
the new node should be scaled up and pod assigned (the PV and node are both in the same zone).
How to reproduce it (as minimally and precisely as possible):
aws-r5d-large
(small) worker group. The volume binding mode isWaitForFirstConsumer
and is created at the time it is scheduled there. The pod is running.aws-r5d-2xlarge
(medium)Pending
because no node is available for it to run. Describing the pod yields the event above repeatingAnything else we need to know:
Through experimentation we can see:
aws-r5d-2xlarge
(medium) to a minimum of 1 the new node is created (to meet the minimum request) and the pod is scheduled and able to run. This means there appears to be no real zone conflict preventing mounting.aws-r5d-2xlarge
(medium) node, the node is created for this pod and it runs. And then also the autoscaler then scales up a 2nd node for the originally stuck pending pod (and its scheduled and able to run).Environment:
/platform aws
/platform azure
Issue credits: Bryon Hummel
The text was updated successfully, but these errors were encountered: