Scale from zero fails to work on k8s >= 1.18 AWS #83

prashanth26 · 2021-06-07T05:41:38Z

What happened:

moving a pod from one worker group node type to another via node label selector resulted in the following autoscaler error when the destination worker group is at a minimum size 0. This was noticed after we upgraded from k8s 1.17 -> 1.18 on AWS

  Normal   NotTriggerScaleUp  18m (x14 over 33m)    cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 Insufficient cpu, 5 Insufficient memory, 1 node(s) had volume node affinity conflict, 4 node(s) didn't match node selector

specifically, note the volume node affinity conflict

What you expected to happen:

the new node should be scaled up and pod assigned (the PV and node are both in the same zone).

How to reproduce it (as minimally and precisely as possible):

A pod with an volume is scheduled onto the aws-r5d-large (small) worker group. The volume binding mode is WaitForFirstConsumer and is created at the time it is scheduled there. The pod is running.
The pod spec is then changed such that the node label selector will force it onto the other worker group node type aws-r5d-2xlarge (medium)
The pod gets stuck Pending because no node is available for it to run. Describing the pod yields the event above repeating

Anything else we need to know:

Through experimentation we can see:

while the pod is stuck in this pending state, if we adjust the shoot worker group aws-r5d-2xlarge (medium) to a minimum of 1 the new node is created (to meet the minimum request) and the pod is scheduled and able to run. This means there appears to be no real zone conflict preventing mounting.
while the pod is stuck in this pending state, if we schedule another new pod on a aws-r5d-2xlarge (medium) node, the node is created for this pod and it runs. And then also the autoscaler then scales up a 2nd node for the originally stuck pending pod (and its scheduled and able to run).
while the pod is stuck in this pending state, if we adjust the shoot worker group definition to include an additional 2 labels related to zone topology, the node is created and the pod is able to run. It appears perhaps the labels are required for deciding auto-scaling but are not observed when the worker group is at 0?

Environment:
/platform aws
/platform azure

Issue credits: Bryon Hummel

The text was updated successfully, but these errors were encountered:

himanshu-kun · 2021-06-15T08:39:56Z

I have tried to reproduce the issue and yes the experimentation is giving results as described in the issue. The problem is due to missing of this label topology.ebs.csi.aws.com/zone on the destination node . This label is used by PV created by AWS CSI driver to get attached to the node. The well known zone label topology.kubernetes.io/zone is not used because it was not well known at the time CSI driver released(see kubernetes-sigs/aws-ebs-csi-driver#729 (comment)) .
CA doesn't know abt the AWS EBS label, and so the solutions can be to either add the label manually on the worker pools through shoot spec or the following solutions (credits @prashanth26 )

Update the cluster autoscaler code to add this label (for AWS and Azure) during scale-up calculations. This makes it ugly though.
Update the extension providers to add this label on worker pool creations.
Community speaks about a way to ignore some labels for calculations. However, I don't fully understand how this will play together. Will try to spend some time over the week to look up more about it.

himanshu-kun · 2021-06-15T08:55:22Z

I have digged into the 3rd solution talked in last comment and referred (kubernetes#3230 (comment)). I used v0.10.x for the CSI driver along with and without the flag --balancing-ignore-label=topology.ebs.csi.aws.com/zone , but it didn't solve the problem.

After using v0.10.x and v1.0.0( not v1.1.0 though) for CSI driver ,the well known topology label is now a part of nodeAffinity in PV along with AWS EBS label and according to kubernetes-sigs/aws-ebs-csi-driver#729 (comment) it should solve our problem by prioritizing the well known label, but it doesn't.

Also the flag --balancing-ignore-label is for ignoring labels while comparing whether two node grps are equal or not.

prashanth26 · 2021-06-15T09:04:08Z

@amshuman-kr - What do you think would be the good approach between (1) and (2) above then?

prashanth26 · 2021-07-21T05:45:47Z

Fixed with gardener/gardener-extension-provider-aws#365.
/close

prashanth26 added the kind/bug Bug label Jun 7, 2021

gardener-robot added platform/aws Amazon web services platform/infrastructure platform/azure Microsoft Azure platform/infrastructure labels Jun 7, 2021

prashanth26 changed the title ~~Scale from zero fails to~~ Scale from zero fails to work on k8s >= 1.18 AWS Jun 7, 2021

prashanth26 added status/in-progress Issue is in progress/work priority/2 Priority (lower number equals higher priority) labels Jun 7, 2021

This was referenced Jun 16, 2021

Create volumes with one topology label, not both kubernetes-sigs/aws-ebs-csi-driver#899

Closed

Scale from zero fails to work on k8s >= 1.18 AWS gardener/gardener-extension-provider-aws#364

Closed

gardener-robot closed this as completed Jul 21, 2021

himanshu-kun mentioned this issue Jul 18, 2023

Scale from Zero doesn't work in case of pods with PVs #245

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale from zero fails to work on k8s >= 1.18 AWS #83

Scale from zero fails to work on k8s >= 1.18 AWS #83

prashanth26 commented Jun 7, 2021

himanshu-kun commented Jun 15, 2021

himanshu-kun commented Jun 15, 2021 •

edited

Loading

prashanth26 commented Jun 15, 2021

prashanth26 commented Jul 21, 2021

Scale from zero fails to work on k8s >= 1.18 AWS #83

Scale from zero fails to work on k8s >= 1.18 AWS #83

Comments

prashanth26 commented Jun 7, 2021

himanshu-kun commented Jun 15, 2021

himanshu-kun commented Jun 15, 2021 • edited Loading

prashanth26 commented Jun 15, 2021

prashanth26 commented Jul 21, 2021

himanshu-kun commented Jun 15, 2021 •

edited

Loading