No ScaleUp action for a new Pod and existing PVC/PV (AWS EKS, EBS Volume) #4739

KashifSaadat · 2022-03-15T17:03:37Z

Which component are you using?: cluster-autoscaler, helm chart

What version of the component are you using?: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.20.0, cluster-autoscaler-chart-9.9.2

What k8s version are you using (kubectl version)?: v1.21.5-eks-bc4871b

kubectl version Output

$ kubectl version

Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?: Amazon Elastic Kubernetes Service (EKS)

What did you expect to happen?:

I created a PersistentVolumeClaim using a default gp2 StorageClass (EBS volume), and then a simple Deployment referencing the PVC. The Volume is created successfully and Pod schedules (cluster-autoscaler brings up a new Node to meet demand). I then scaled down the Deployment to 0, causing the cluster-autoscaler to drop 1 Node as it was no longer needed. On scaling the Deployment back to 1 replica (whilst the current Nodepool is at max capacity), the cluster-autoscaler should have detected and scaled the Nodepool, so the Pod can get scheduled on a Node successfully and attach the PV.

What happened instead?:

The Pod is stuck in a Pending state.

kubectl get events Output

$ kubectl -n test get events

LAST SEEN TYPE REASON OBJECT MESSAGE
70s Warning FailedScheduling pod/nginx-6bdfccff8f-s5b4k 0/3 nodes are available: 1 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
66s Normal NotTriggerScaleUp pod/nginx-6bdfccff8f-s5b4k pod didn't trigger scale-up: 1 node(s) had volume node affinity conflict
71s Normal SuccessfulCreate replicaset/nginx-6bdfccff8f Created pod: nginx-6bdfccff8f-s5b4k
71s Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-6bdfccff8f to 1

kubectl logs -lapp.kubernetes.io/name=aws-cluster-autoscaler Output

I0315 16:44:07.705409       1 static_autoscaler.go:229] Starting main loop
I0315 16:44:07.705978       1 filter_out_schedulable.go:65] Filtering out schedulables
I0315 16:44:07.705999       1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
I0315 16:44:07.706132       1 scheduler_binder.go:795] PersistentVolume "pvc-1d8afbd0-6c46-425d-ae75-25dfbd0077cf", Node "ip-10-0-191-168.eu-west-2.compute.internal" mismatch for Pod "test/nginx-6bdfccff8f-s5b4k": no matching NodeSelectorTerms
I0315 16:44:07.706156       1 filter_out_schedulable.go:170] 0 pods were kept as unschedulable based on caching
I0315 16:44:07.706168       1 filter_out_schedulable.go:171] 0 pods marked as unschedulable can be scheduled.
I0315 16:44:07.706185       1 filter_out_schedulable.go:82] No schedulable pods
I0315 16:44:07.706195       1 klogx.go:86] Pod test/nginx-6bdfccff8f-s5b4k is unschedulable
I0315 16:44:07.706220       1 scale_up.go:364] Upcoming 0 nodes
I0315 16:44:07.706300       1 scheduler_binder.go:775] Could not get a CSINode object for the node "template-node-for-eks-compute-8abebfcd-72a5-97e9-5082-4dcb9b7dbc11-5821563729444175575": csinode.storage.k8s.io "template-node-for-eks-compute-8abebfcd-72a5-97e9-5082-4dcb9b7dbc11-5821563729444175575" not found
I0315 16:44:07.706350       1 scheduler_binder.go:795] PersistentVolume "pvc-1d8afbd0-6c46-425d-ae75-25dfbd0077cf", Node "template-node-for-eks-compute-8abebfcd-72a5-97e9-5082-4dcb9b7dbc11-5821563729444175575" mismatch for Pod "test/nginx-6bdfccff8f-s5b4k": no matching NodeSelectorTerms
I0315 16:44:07.706365       1 scale_up.go:288] Pod nginx-6bdfccff8f-s5b4k can't be scheduled on eks-compute-8abebfcd-72a5-97e9-5082-4dcb9b7dbc11, predicate checking error: node(s) had volume node affinity conflict; predicateName=VolumeBinding; reasons: node(s) had volume node affinity conflict; debugInfo=
I0315 16:44:07.706384       1 scale_up.go:437] No pod can fit to eks-compute-8abebfcd-72a5-97e9-5082-4dcb9b7dbc11
I0315 16:44:07.706417       1 scale_up.go:441] No expansion options

How to reproduce it (as minimally and precisely as possible):

Provision an AWS EKS Cluster with the cluster-autoscaler deployed, versions above
Ensure test cluster is fully utilised that no new workloads can be scheduled without scaling up a Nodepool (or cordon Nodes)

Create a PVC (gp2 default StorageClass)

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-data
spec:
  resources:
    requests:
      storage: 1Gi
  accessModes:
  - ReadWriteOnce

Create a Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
  namespace: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginxinc/nginx-unprivileged
        name: nginx-unprivileged
        volumeMounts:
        - name: data
          mountPath: /data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: test-data

Wait for the PV and Pod to be successfully created and Running
Scale down the Deployment to 0 replicas, forcing cluster-autoscaler to reduce Nodepool size by 1
Scale the Deployment back to 1 replica, and observe the Pod state, namespace events, cluster-autoscaler logs

Anything else we need to know?:

I first thought this could be related to our EKS Cluster configuration, Nodepool configuration (tags are all there), cluster-autoscaler args (defaults, all standard) etc. However, I can just scale the Nodepool manually by 1 instance (or create another workload causing a new Node to be created), and this finally allows the Pod to be scheduled to a Node where the EBS volume can be attached and used.

Edit: I've also tested this with the latest cluster-autoscaler release for Kubernetes v1.21 (v1.21.1) and get the same issue.

The text was updated successfully, but these errors were encountered:

mmerrill3 · 2022-04-09T19:10:13Z

@KashifSaadat , I see a similar issue on my clusters as well. In my case, I see PVs getting provisioined with the new topology keys on a k8s 1.21 cluster of

nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/region
            operator: In
            values:
            - ca-central-1
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - ca-central-1a

However, the function in aws_manager in CA still uses the old labels when building nodes from ASGs that are of zero size. These labels are

LabelFailureDomainBetaZone   = "failure-domain.beta.kubernetes.io/zone"   // deprecated
LabelFailureDomainBetaRegion = "failure-domain.beta.kubernetes.io/region" // deprecated

You can see the labels getting applied for the hypothetical node here:

autoscaler/cluster-autoscaler/cloudprovider/aws/aws_manager.go

Line 427 in dece0b2

    
           func buildGenericLabels(template *asgTemplate, nodeName string) map[string]string {

In my case, I see the error message "node(s) had volume node affinity conflict" b/c the labels don't match.

mmerrill3 · 2022-04-09T19:16:55Z

when nodes are created in EKS, they get both the old, and new, topology labels. It's just the CA that still only uses the old when building a hypothetical node.

Should be an easy fix. Can you confirm this is what is happening to your cluster?

mmerrill3 · 2022-04-10T13:39:36Z

aws_manager.go is updated to use the new topology labels in the 1.22 and 1.23 tags of CA, but not in 1.21.x. This commit would fix our issue
8f11490

KashifSaadat · 2022-04-11T13:59:47Z

Hey @mmerrill3, nice find thank you!! I increased log verbosity to see the reference to the deprecated labels and confirmed the discrepancy as you've described. Tried this out with cluster-autoscaler v1.22.2 and it did manage to scale up successfully!

I ran into another issue validating this on my test cluster. I had a single ASG spanned across 3 Availability Zones and so when it scales up, the new Node happened to be in a different AZ to where the volume was. The Pod could not be scheduled on this Node and so it remained empty until a ScaleDown event, and no further ScaleUp actions were performed. This is mentioned as a gotcha in the AWS specific docs already, with the recommended approach being to have an ASG per AZ.

Edit: I'll close this issue as it appears to be correctly resolved from cluster-autoscaler v1.22.0 onwards and there isn't any further work required as far as I'm aware.

Xyaren · 2022-04-13T08:56:26Z

@KashifSaadat you mentioned you are using k8s 1.21.5.
Do you notice any problems running a 1.22 cluster autoscaler with that verison?

At the maintainers: would this be something that could be backported ?

KashifSaadat · 2022-04-13T09:23:05Z

Hey @Xyaren. I haven't noticed any issues myself, but generally prefer to run the component version in line with my cluster version.

debu99 · 2022-10-19T07:59:04Z

Looks like there is new label for gp3

Node Affinity:
  Required Terms:
    Term 0:        topology.ebs.csi.aws.com/zone in [ap-southeast-1a]

just1900 · 2022-11-08T13:24:39Z

Looks like there is new label for gp3

Node Affinity:
  Required Terms:
    Term 0:        topology.ebs.csi.aws.com/zone in [ap-southeast-1a]

Same for this, the pv of ebs contains the following nodeSelectorTerms

      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.ebs.csi.aws.com/zone
          operator: In
          values:
          - us-west-2b

youwalther65 · 2023-02-06T12:58:19Z

@just1900 and @debu99 gp3 is only suported using EBS CSI driver which implements new labels compared to in-tree EBS provisioner for gp2.

KashifSaadat added the kind/bug Categorizes issue or PR as related to a bug. label Mar 15, 2022

jbartosik added the area/cluster-autoscaler label Mar 22, 2022

KashifSaadat closed this as completed Apr 11, 2022

mmerrill3 mentioned this issue Apr 11, 2022

Cluster Autoscaler: when the pool size has been scaled down to zero, it will never trigger new scale out. #4805

Closed

jtnz mentioned this issue Oct 11, 2022

(eks): support latest EKS engine version v1.22 aws/aws-cdk#20263

Closed

mmerrill3 mentioned this issue Aug 7, 2023

Scale up from 0 does not work with existing AWS EBS CSI PersistentVolume #3845

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No ScaleUp action for a new Pod and existing PVC/PV (AWS EKS, EBS Volume) #4739

No ScaleUp action for a new Pod and existing PVC/PV (AWS EKS, EBS Volume) #4739

KashifSaadat commented Mar 15, 2022 •

edited

Loading

mmerrill3 commented Apr 9, 2022 •

edited

Loading

mmerrill3 commented Apr 9, 2022

mmerrill3 commented Apr 10, 2022

KashifSaadat commented Apr 11, 2022 •

edited

Loading

Xyaren commented Apr 13, 2022

KashifSaadat commented Apr 13, 2022

debu99 commented Oct 19, 2022

just1900 commented Nov 8, 2022

youwalther65 commented Feb 6, 2023

No ScaleUp action for a new Pod and existing PVC/PV (AWS EKS, EBS Volume) #4739

No ScaleUp action for a new Pod and existing PVC/PV (AWS EKS, EBS Volume) #4739

Comments

KashifSaadat commented Mar 15, 2022 • edited Loading

mmerrill3 commented Apr 9, 2022 • edited Loading

mmerrill3 commented Apr 9, 2022

mmerrill3 commented Apr 10, 2022

KashifSaadat commented Apr 11, 2022 • edited Loading

Xyaren commented Apr 13, 2022

KashifSaadat commented Apr 13, 2022

debu99 commented Oct 19, 2022

just1900 commented Nov 8, 2022

youwalther65 commented Feb 6, 2023

KashifSaadat commented Mar 15, 2022 •

edited

Loading

mmerrill3 commented Apr 9, 2022 •

edited

Loading

KashifSaadat commented Apr 11, 2022 •

edited

Loading