Scale up from 0 does not work with existing AWS EBS CSI PersistentVolume #3845

Xyaren · 2021-01-25T10:54:28Z

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

v1.18.3 ( also happened with v1.18.2)

Cluster-Autoscaler Deployment YAML

---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::AWS_ACCOUNT_ID_OMMITTED:role/mycompany-iam-k8s-cluster-autoscaler-test
  name: cluster-autoscaler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
  - apiGroups: [""]
    resources: ["events", "endpoints"]
    verbs: ["create", "patch"]
  - apiGroups: [""]
    resources: ["pods/eviction"]
    verbs: ["create"]
  - apiGroups: [""]
    resources: ["pods/status"]
    verbs: ["update"]
  - apiGroups: [""]
    resources: ["endpoints"]
    resourceNames: ["cluster-autoscaler"]
    verbs: ["get", "update"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["watch", "list", "get", "update"]
  - apiGroups: [""]
    resources:
      - "pods"
      - "services"
      - "replicationcontrollers"
      - "persistentvolumeclaims"
      - "persistentvolumes"
    verbs: ["watch", "list", "get"]
  - apiGroups: ["extensions"]
    resources: ["replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["policy"]
    resources: ["poddisruptionbudgets"]
    verbs: ["watch", "list"]
  - apiGroups: ["apps"]
    resources: ["statefulsets", "replicasets", "daemonsets"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses", "csinodes"]
    verbs: ["watch", "list", "get"]
  - apiGroups: ["batch", "extensions"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch", "patch"]
  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    verbs: ["create"]
  - apiGroups: ["coordination.k8s.io"]
    resourceNames: ["cluster-autoscaler"]
    resources: ["leases"]
    verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["create", "list", "watch"]
  - apiGroups: [""]
    resources: ["configmaps"]
    resourceNames: ["cluster-autoscaler-status", "cluster-autoscaler-priority-expander"]
    verbs: ["delete", "get", "update", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '8085'
    spec:
      serviceAccountName: cluster-autoscaler
      priorityClassName: cluster-critical
      containers:
        - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.18.3 #Major & Minor should match cluster version: https://docs.aws.amazon.com/de_de/eks/latest/userguide/cluster-autoscaler.html#ca-deploy
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/mycompany-test-eks
            - --ignore-daemonsets-utilization=true
            - --scale-down-delay-after-add=10m
            - --scale-down-unneeded-time=10m
            - --balance-similar-node-groups=false
            - --min-replica-count=0
          volumeMounts:
            - name: ssl-certs
              mountPath: /etc/ssl/certs/ca-certificates.crt
              readOnly: true
          imagePullPolicy: "Always"
      volumes:
        - name: ssl-certs
          hostPath:
            path: "/etc/ssl/certs/ca-bundle.crt"

Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output

Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

AWS EKS with multiple ASGs
https://github.com/kubernetes-sigs/aws-ebs-csi-driver installed via helm chart v 0.8.2

What did you expect to happen?:
I do have an ASG dedicated to a single CronJob, that get's triggered 6 times a day.
That ASG is pinned to a specific AWS AZ by it's assigned subnet.
The Cronjob is pinned to that specific ASG by Affinity+Toleration
The job uses a PV, that will be provisioned (AWS EBS) on the first ever run and then subsequently reused on each run.
I expect the ASG to be scaled up to 1 after the Pod gets created and removed shortly after the Pod/Job has finished.

What happened instead?:

The ASG will not be scaled up by the cluster-autoscaler.

cluster-autoscaler log output after the Job is created and the Pod is pending

2021-01-25T05:19:22.523Z : Starting main loop			
2021-01-25T05:19:22.524Z : "Found multiple availability zones for ASG "mycompany-test-eks-myapp-elastic-group-1-20210108154118845300000003"	 using eu-central-1a"		
2021-01-25T05:19:22.525Z : "Found multiple availability zones for ASG "mycompany-test-eks-myapp-worker-group-2-20201029130225136800000004"	 using eu-central-1a"		
2021-01-25T05:19:22.525Z : "Found multiple availability zones for ASG "mycompany-test-eks-worker-group-1-20201029130715836900000005"	 using eu-central-1a"		
2021-01-25T05:19:22.526Z : Filtering out schedulables			
2021-01-25T05:19:22.526Z : 0 pods marked as unschedulable can be scheduled.			
2021-01-25T05:19:22.526Z : No schedulable pods			
2021-01-25T05:19:22.526Z : Pod myapp-masterdata/masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw is unschedulable			
2021-01-25T05:19:22.526Z : Upcoming 0 nodes			
2021-01-25T05:19:22.526Z : Skipping node group mycompany-test-eks-myapp-elastic-group-1-20210108154118845300000003 - max size reached			
2021-01-25T05:19:22.526Z : "Pod masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw can't be scheduled on mycompany-test-eks-myapp-elastic-group-2-20201029130715759300000004, predicate checking error: node(s) didn't match node selector	 predicateName=NodeAffinity	 reasons: node(s) didn't match node selector	 debugInfo="
2021-01-25T05:19:22.526Z : No pod can fit to mycompany-test-eks-myapp-elastic-group-2-20201029130715759300000004			
2021-01-25T05:19:22.526Z : "Could not get a CSINode object for the node "template-node-for-mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003-8426967936887117836": csinode.storage.k8s.io "template-node-for-mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003-8426967936887117836" not found"			
2021-01-25T05:19:22.527Z : "PersistentVolume "pvc-ef85dcce-e63e-42da-b869-c3389bbd948d", Node "template-node-for-mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003-8426967936887117836" mismatch for Pod "myapp-masterdata/masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw": No matching NodeSelectorTerms"			
2021-01-25T05:19:22.527Z : "Pod masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw can't be scheduled on mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003, predicate checking error: node(s) had volume node affinity conflict	 predicateName=VolumeBinding	 reasons: node(s) had volume node affinity conflict	 debugInfo="
2021-01-25T05:19:22.527Z : No pod can fit to mycompany-test-eks-myapp-masterdata-import-20210120105639236000000003			
2021-01-25T05:19:22.527Z : "Pod masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw can't be scheduled on mycompany-test-eks-myapp-worker-group-120200916154409048800000006, predicate checking error: node(s) didn't match node selector	 predicateName=NodeAffinity	 reasons: node(s) didn't match node selector	 debugInfo="
2021-01-25T05:19:22.527Z : No pod can fit to mycompany-test-eks-myapp-worker-group-120200916154409048800000006			
2021-01-25T05:19:22.527Z : Skipping node group mycompany-test-eks-myapp-worker-group-2-20201029130225136800000004 - max size reached			
2021-01-25T05:19:22.527Z : Skipping node group mycompany-test-eks-worker-group-1-20201029130715836900000005 - max size reached			
2021-01-25T05:19:22.527Z : "Pod masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw can't be scheduled on mycompany-test-eks-worker-group-220200916162252020100000006, predicate checking error: node(s) didn't match node selector	 predicateName=NodeAffinity	 reasons: node(s) didn't match node selector	 debugInfo="
2021-01-25T05:19:22.527Z : No pod can fit to mycompany-test-eks-worker-group-220200916162252020100000006			
2021-01-25T05:19:22.527Z : No expansion options			
2021-01-25T05:19:22.527Z : Calculating unneeded nodes			
[...]
2021-01-25T05:19:22.528Z : Scale-down calculation: ignoring 2 nodes unremovable in the last 5m0s			
2021-01-25T05:19:22.528Z : Scale down status: unneededOnly=false lastScaleUpTime=2021-01-25 05:00:14.980160831 +0000 UTC m=+6970.760701246 lastScaleDownDeleteTime=2021-01-25 03:04:22.928996296 +0000 UTC m=+18.709536671 lastScaleDownFailTime=2021-01-25 03:04:22.928996376 +0000 UTC m=+18.709536751 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false			
2021-01-25T05:19:22.528Z : Starting scale down			
2021-01-25T05:19:22.528Z : No candidates for scale down			
2021-01-25T05:19:22.528Z : "Event(v1.ObjectReference{Kind:"Pod", Namespace:"myapp-masterdata", Name:"masterdata-import-cronjob-lambda-d0ad7add-e9b0-424e-94dc-0wbrzw", UID:"97956c38-55f3-4749-ab74-7e7fc674e832", APIVersion:"v1", ResourceVersion:"217276797", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 3 max node group size reached, 3 node(s) didn't match node selector, 1 node(s) had volume node affinity conflict"			
2021-01-25T05:19:22.946Z : k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309: Watch close - *v1beta1.PodDisruptionBudget total 0 items received			
2021-01-25T05:19:32.542Z : Starting main loop

Anything else we need to know?:
Basically this works fine without the volume.
With the volume it works when the volume is not provisioned yet, but fails when it already has been provisioned.
The job also get's scheduled right away when I manually upscale the ASG.

I noticed the volume affinity on the PVC :

Node Affinity:                                                                                                                                │
  Required Terms:                                                                                                                             │
    Term 0:        topology.ebs.csi.aws.com/zone in [eu-central-1b]

That tag is probably set on the node by the "ebs-csi-node" DaemonSet and therefore is unknown for the cluster-autoscaler.

Am I expected to tag the ASG with k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone ?
If so, how am I supposed to set them in a Multi-AZ ASGs ?

Possibly related: #3230

The text was updated successfully, but these errors were encountered:

westernspion · 2021-02-04T19:32:34Z

Same problem here (edit after realizing there is no difference relevant difference in my previous post to what you wrote)

After doing some splunking, I you are correct it has something to do with scaling from 0 and usage of the topology.ebs.csi.aws.com/zone label and the ability of the autoscaler to recognize it. Some experimentation corroborates this.

westernspion · 2021-02-05T19:05:41Z

k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone is the approach I am taking and it works like charm.

I can do some footwork in terraform to get the tags setup. Not sure what you're using to provision your cluster.

Though, it would be nice to have the labels generated from the list of AZs assigned to an ASG

fejta-bot · 2021-05-06T19:55:22Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

mparikhcloudbeds · 2021-06-21T19:27:21Z

How to resolve this issue for statefulset deployments attached custom storage classes on EKS?

k8s-triage-robot · 2021-12-14T16:02:11Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

FarhanSajid1 · 2022-01-04T01:34:17Z

How to resolve this issue for statefulset deployments attached custom storage classes on EKS?

So just set

k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone: "us-east-2a"

for example? Like the OP mentions, how are we supposed to do this for multiple AZs

iomarcovalente · 2022-03-08T20:16:34Z

I have this exact problem too, to add further info the error I get on the pod unable to scale from zero is:
pod didn't trigger scale-up (it wouldn't fit if a new node is added): 3 node(s) had volume node affinity conflict

jbg · 2022-03-29T00:21:16Z

@FarhanSajid1 you should have one node group (and thus one ASG) for each AZ. The above tag needs to be applied to the ASG.

k8s-triage-robot · 2022-06-27T00:33:24Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-07-27T00:53:18Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

decipher27 · 2022-09-20T19:25:02Z

Hi Folks! Facing the same issue:
CA Version: v1.21.1
aws-ebs-csi-driver Version
v1.10.0-eksbuild.1

Cluster-autosacler logs:

I0920 17:30:00.585954       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-173-251.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-173-251.ap-south-1.compute.internal" not found
I0920 17:30:00.586008       1 scheduler_binder.go:823] PersistentVolume "pvc-50c002d3-a5cc-4143-adf2-1362d18fc40e", Node "ip-10-121-173-251.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-0": no matching NodeSelectorTerms
I0920 17:30:00.586074       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-68-79.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-68-79.ap-south-1.compute.internal" not found
I0920 17:30:00.586107       1 scheduler_binder.go:823] PersistentVolume "pvc-31af46c4-0d27-4eea-8ef6-148bbb2b4f0b", Node "ip-10-121-68-79.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-0": no matching NodeSelectorTerms
I0920 17:30:00.586149       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-162-179.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-162-179.ap-south-1.compute.internal" not found
I0920 17:30:00.586172       1 scheduler_binder.go:823] PersistentVolume "pvc-50c002d3-a5cc-4143-adf2-1362d18fc40e", Node "ip-10-121-162-179.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-0": no matching NodeSelectorTerms
I0920 17:30:00.586247       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-241-242.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-241-242.ap-south-1.compute.internal" not found
I0920 17:30:00.586275       1 scheduler_binder.go:823] PersistentVolume "pvc-50c002d3-a5cc-4143-adf2-1362d18fc40e", Node "ip-10-121-241-242.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-0": no matching NodeSelectorTerms
I0920 17:30:00.586328       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-5-204.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-5-204.ap-south-1.compute.internal" not found
I0920 17:30:00.586350       1 scheduler_binder.go:823] PersistentVolume "pvc-31af46c4-0d27-4eea-8ef6-148bbb2b4f0b", Node "ip-10-121-5-204.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-0": no matching NodeSelectorTerms
I0920 17:30:00.586533       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-173-251.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-173-251.ap-south-1.compute.internal" not found
I0920 17:30:00.586572       1 scheduler_binder.go:823] PersistentVolume "pvc-0c9887c2-eea3-4ef7-baae-c4c0aca78699", Node "ip-10-121-173-251.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-1": no matching NodeSelectorTerms
I0920 17:30:00.586622       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-68-79.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-68-79.ap-south-1.compute.internal" not found
I0920 17:30:00.586663       1 scheduler_binder.go:823] PersistentVolume "pvc-df590cf4-a584-4842-9842-9629312c0e45", Node "ip-10-121-68-79.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-1": no matching NodeSelectorTerms
I0920 17:30:00.586711       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-162-179.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-162-179.ap-south-1.compute.internal" not found
I0920 17:30:00.586737       1 scheduler_binder.go:823] PersistentVolume "pvc-0c9887c2-eea3-4ef7-baae-c4c0aca78699", Node "ip-10-121-162-179.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-1": no matching NodeSelectorTerms
I0920 17:30:00.586802       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-241-242.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-241-242.ap-south-1.compute.internal" not found
I0920 17:30:00.586827       1 scheduler_binder.go:823] PersistentVolume "pvc-0c9887c2-eea3-4ef7-baae-c4c0aca78699", Node "ip-10-121-241-242.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-1": no matching NodeSelectorTerms
I0920 17:30:00.586869       1 scheduler_binder.go:803] Could not get a CSINode object for the node "ip-10-121-5-204.ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-121-5-204.ap-south-1.compute.internal" not found
I0920 17:30:00.586907       1 scheduler_binder.go:823] PersistentVolume "pvc-df590cf4-a584-4842-9842-9629312c0e45", Node "ip-10-121-5-204.ap-south-1.compute.internal" mismatch for Pod "kafka/kafka-1": no matching NodeSelectorTerms
I0920 17:30:00.586929       1 filter_out_schedulable.go:170] 0 pods were kept as unschedulable based on caching
I0920 17:30:00.586938       1 filter_out_schedulable.go:171] 0 pods marked as unschedulable can be scheduled.
I0920 17:30:00.586952       1 filter_out_schedulable.go:82] No schedulable pods
I0920 17:30:00.586966       1 klogx.go:86] Pod kafka/kafka-0 is unschedulable
I0920 17:30:00.586972       1 klogx.go:86] Pod kafka/kafka-1 is unschedulable
I0920 17:30:00.587014       1 scale_up.go:376] Upcoming 0 nodes
I0920 17:30:00.587153       1 scheduler_binder.go:803] Could not get a CSINode object for the node "template-node-for-eks-atlan-node-kafka-pod-spot-20220920151848482300000005-36c1adbd-7aef-51ce-830e-d848e9f27e09-6789034556239763083": csinode.storage.k8s.io "template-node-for-eks-atlan-node-kafka-pod-spot-20220920151848482300000005-36c1adbd-7aef-51ce-830e-d848e9f27e09-6789034556239763083" not found
I0920 17:30:00.587188       1 scheduler_binder.go:823] PersistentVolume "pvc-31af46c4-0d27-4eea-8ef6-148bbb2b4f0b", Node "template-node-for-eks-atlan-node-kafka-pod-spot-20220920151848482300000005-36c1adbd-7aef-51ce-830e-d848e9f27e09-6789034556239763083" mismatch for Pod "kafka/kafka-0": no matching NodeSelectorTerms
I0920 17:30:00.587210       1 scale_up.go:300] Pod kafka-0 can't be scheduled on eks-atlan-node-kafka-pod-spot-20220920151848482300000005-36c1adbd-7aef-51ce-830e-d848e9f27e09, predicate checking error: node(s) had volume node affinity conflict; predicateName=VolumeBinding; reasons: node(s) had volume node affinity conflict; debugInfo=
I0920 17:30:00.587316       1 scheduler_binder.go:803] Could not get a CSINode object for the node "template-node-for-eks-atlan-node-kafka-pod-spot-20220920151848482300000005-36c1adbd-7aef-51ce-830e-d848e9f27e09-6789034556239763083": csinode.storage.k8s.io "template-node-for-eks-atlan-node-kafka-pod-spot-20220920151848482300000005-36c1adbd-7aef-51ce-830e-d848e9f27e09-6789034556239763083" not found
I0920 17:30:00.587361       1 scheduler_binder.go:823] PersistentVolume "pvc-df590cf4-a584-4842-9842-9629312c0e45", Node "template-node-for-eks-atlan-node-kafka-pod-spot-20220920151848482300000005-36c1adbd-7aef-51ce-830e-d848e9f27e09-6789034556239763083" mismatch for Pod "kafka/kafka-1": no matching NodeSelectorTerms
I0920 17:30:00.587386       1 scale_up.go:300] Pod kafka-1 can't be scheduled on eks-atlan-node-kafka-pod-spot-20220920151848482300000005-36c1adbd-7aef-51ce-830e-d848e9f27e09, predicate checking error: node(s) had volume node affinity conflict; predicateName=VolumeBinding; reasons: node(s) had volume node affinity conflict; debugInfo=
I0920 17:30:00.587417       1 scale_up.go:449] No pod can fit to eks-atlan-node-kafka-pod-spot-20220920151848482300000005-36c1adbd-7aef-51ce-830e-d848e9f27e09

Our pods are in pending state due to volume node affinity conflict.

Describe kafka-1 pod

LAST SEEN   TYPE      REASON              OBJECT        MESSAGE
6m52s       Warning   FailedScheduling    pod/kafka-0   0/5 nodes are available: 5 node(s) had volume node affinity conflict.
6m52s       Warning   FailedScheduling    pod/kafka-1   0/5 nodes are available: 5 node(s) had volume node affinity conflict.
73s         Normal    NotTriggerScaleUp   pod/kafka-0   pod didn't trigger scale-up: 1 node(s) had volume node affinity conflict
73s         Normal    NotTriggerScaleUp   pod/kafka-1   pod didn't trigger scale-up: 1 node(s) had volume node affinity conflict

JBOClara · 2022-09-21T08:16:28Z

Hi @decipher27 ,

Could you show us the labels on you AWS ASG aws autoscaling describe-auto-scaling-groups ?

My understanding of this issue is that you need the topology tags:

                {
                    "ResourceId": "eks-spot-2-XXXX",
                    "ResourceType": "auto-scaling-group",
                    "Key": "k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone",
                    "Value": "us-east-1c",
                    "PropagateAtLaunch": false
                },

I've also added

                {
                    "ResourceId": "eks-spot-2-5xxxx",
                    "ResourceType": "auto-scaling-group",
                    "Key": "k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone",
                    "Value": "us-east-1c",
                    "PropagateAtLaunch": false
                },

When your ASG is at 0, there no node to retrieve the topogy from. You must have topology labels on ASG itself to allow CA and CSI Driver to retrieve the topology.

decipher27 · 2022-09-29T17:53:11Z

We don't have the mentioned tags mentioned above, and it was working earlier. Though, we found the issue was with the scheduler. we are using a custom scheduler..
Our vendor had made some tweaks and it's fixed. Thank you @JBOClara

decipher27 · 2022-09-29T17:58:17Z

Also, from your comment, what do you mean by When your ASG is at 0? You mean if I set the desired count to be '0'?

JBOClara · 2022-09-30T11:49:22Z

Also, from your comment, what do you mean by When your ASG is at 0? You mean if I set the desired count to be '0'?
@decipher27

Exactly, when an ASG desired value is set to 0 (for instance, after a downscale of all replicas with kube-downscaler, except those from CA itself). CA will not be able to read node labels, because there is no node.

debu99 · 2022-10-19T08:04:52Z

Got the same issue, if a pvc & pod created and then suspend the asg group & scaled down the asg to 0 to save cost at weekend, but on Monday this pod is not able to start from 0, other stateless pods are okay

JBOClara · 2022-10-19T08:06:43Z

@debu99
Look at:

Hi @decipher27 ,

Could you show us the labels on you AWS ASG aws autoscaling describe-auto-scaling-groups ?

My understanding of this issue is that you need the topology tags:
                {
                    "ResourceId": "eks-spot-2-XXXX",
                    "ResourceType": "auto-scaling-group",
                    "Key": "k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone",
                    "Value": "us-east-1c",
                    "PropagateAtLaunch": false
                },
I've also added
                {
                    "ResourceId": "eks-spot-2-5xxxx",
                    "ResourceType": "auto-scaling-group",
                    "Key": "k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone",
                    "Value": "us-east-1c",
                    "PropagateAtLaunch": false
                },
When your ASG is at 0, there no node to retrieve the topogy from. You must have topology labels on ASG itself to allow CA and CSI Driver to retrieve the topology.

debu99 · 2022-10-19T08:14:27Z

my pv requires

Node Affinity:
  Required Terms:
    Term 0:        topology.ebs.csi.aws.com/zone in [ap-southeast-1a]

But I believe this label is added automatically to all nodes? as i didn't add it into ASG tags, but all my nodes has it

ip-10-40-44-63.ap-southeast-1.compute.internal    Ready    <none>   5h3m    v1.21.14-eks-ba74326   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=t3a.large,beta.kubernetes.io/os=linux,dedicated=redis,failure-domain.beta.kubernetes.io/region=ap-southeast-1,failure-domain.beta.kubernetes.io/zone=ap-southeast-1b,k8s-node-lifecycle=on-demand,k8s-node-role/on-demand-worker=true,k8s-node-role/type=none,k8s-node/instance-level=large,k8s-node/worker-type=t-type,k8s.io/cloud-provider-aws=be298adc77b66eafc3745cf0a9c131e0,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-40-44-63.ap-southeast-1.compute.internal,kubernetes.io/os=linux,node.kubernetes.io/instance-type=t3a.large,sb-subnet/type=primary,sb-subnet/zone-id=1,topology.ebs.csi.aws.com/zone=ap-southeast-1b,topology.kubernetes.io/region=ap-southeast-1,topology.kubernetes.io/zone=ap-southeast-1b
ip-10-40-7-219.ap-southeast-1.compute.internal    Ready    <none>   25m     v1.21.14-eks-ba74326   beta.kubernetes.io/arch=arm64,beta.kubernetes.io/instance-type=r6g.large,beta.kubernetes.io/os=linux,dedicated=prometheus-operator,failure-domain.beta.kubernetes.io/region=ap-southeast-1,failure-domain.beta.kubernetes.io/zone=ap-southeast-1a,k8s-node-lifecycle=on-demand,k8s-node-role/on-demand-worker=true,k8s-node-role/type=none,k8s-node/instance-level=large,k8s-node/worker-type=r-type,k8s.io/cloud-provider-aws=be298adc77b66eafc3745cf0a9c131e0,kubernetes.io/arch=arm64,kubernetes.io/hostname=ip-10-40-7-219.ap-southeast-1.compute.internal,kubernetes.io/os=linux,node.kubernetes.io/instance-type=r6g.large,sb-subnet/type=primary,sb-subnet/zone-id=0,topology.ebs.csi.aws.com/zone=ap-southeast-1a,topology.kubernetes.io/region=ap-southeast-1,topology.kubernetes.io/zone=ap-southeast-1a

jbg · 2022-10-19T08:15:30Z

Yes, but when the ASG is at 0, there are no nodes. cluster-autoscaler needs the labels tagged on the ASG to know what labels the node would have if it would scale up the ASG from 0.

KiranReddy230 · 2023-01-04T06:28:15Z

We are facing the same issue with VolumeNodeAffinity error, and our ASG has node Spun Across AZs, What is the best way for CA to spin up the nodes based on the right AZ. We use the priority expander.
Also CA takes throws the error:

I0103 17:43:29.663090       1 scale_up.go:449] No pod can fit to eks-atlan-node-spot-c2c299ee-8af5-1b60-2ce3-2e4dc50b5484
I0103 17:43:29.663106       1 scale_up.go:453] No expansion options

Above error comes when there is enough room for CA to spin up new nodes in the Nodegroup and also there is one more nodegroup where CA can launch, but CA not functioning as expected. CA version: 1.21

jbg · 2023-03-06T06:26:02Z

@KiranReddy230 if you read the comments above yours, the question has been answered three times already. You need to add the tags mentioned above to your ASG. In order for this to work properly, each node group (and thus each ASG) should have only one zone (this is the recommended architecture anyway).

k8s-triage-robot · 2023-06-04T06:48:41Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-07-04T07:07:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

michalschott · 2023-07-13T08:36:57Z

I have this issue despite (I believe) having everything set up correctly.

EKS - 1.25

CA - 1.25.2:

      - command:
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --namespace=kube-system
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/XXX
        - --balance-similar-node-groups=true
        - --emit-per-nodegroup-metrics=true
        - --expander=most-pods,least-waste
        - --ignore-taint=node.cilium.io/agent-not-ready
        - --logtostderr=true
        - --namespace=kube-system
        - --regional=true
        - --scan-interval=1m
        - --skip-nodes-with-local-storage=false
        - --skip-nodes-with-system-pods=false
        - --stderrthreshold=error
        - --v=0
        env:
        - name: AWS_REGION
          value: eu-west-1

My 3 ASGs are tagged as following (each of them covers single region a/b/c):

k8s.io/cluster-autoscaler/node-template/label/failure-domain.beta.kubernetes.io/zone	eu-west-1a / eu-west-1b / eu-west-1c
k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/instance-type  m5.2xlarge
k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone  eu-west-1a / eu-west-1b / eu-west-1c
k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/region	 eu-west-1
k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone	 eu-west-1a / eu-west-1b / eu-west-1c
k8s.io/cluster-autoscaler/node-template/taint/node.cilium.io/agent-not-ready	true:NO_EXECUTE	Yes

I'm running Prometheus as STS with PVC (affinity rules set to ensure replicas are spread across AZ and hosts):

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    polaris.fairwinds.com/automountServiceAccountToken-exempt: "true"
    prometheus-operator-input-hash: "4772490143308579296"
  creationTimestamp: "2023-03-03T20:52:48Z"
  generation: 56
  labels:
    app: kube-prometheus-stack-prometheus
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 47.0.0
    argocd.argoproj.io/instance: xxx-prometheus
    chart: kube-prometheus-stack-47.0.0
    heritage: Helm
    operator.prometheus.io/mode: server
    operator.prometheus.io/name: prometheus-prometheus
    operator.prometheus.io/shard: "0"
    release: prometheus
  name: prometheus-prometheus-prometheus
  namespace: prometheus
  ownerReferences:
  - apiVersion: monitoring.coreos.com/v1
    blockOwnerDeletion: true
    controller: true
    kind: Prometheus
    name: prometheus-prometheus
    uid: ce818fdf-02b4-4718-a430-f4ff4c5acbc5
  resourceVersion: "342440131"
  uid: 662e082a-af26-40e4-b39e-d354a023fe0a
spec:
  podManagementPolicy: Parallel
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: prometheus-prometheus
      app.kubernetes.io/managed-by: prometheus-operator
      app.kubernetes.io/name: prometheus
      operator.prometheus.io/name: prometheus-prometheus
      operator.prometheus.io/shard: "0"
      prometheus: prometheus-prometheus
  serviceName: prometheus-operated
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
        kubectl.kubernetes.io/default-container: prometheus
        linkerd.io/inject: enabled
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: prometheus-prometheus
        app.kubernetes.io/managed-by: prometheus-operator
        app.kubernetes.io/name: prometheus
        app.kubernetes.io/version: 2.44.0
        operator.prometheus.io/name: prometheus-prometheus
        operator.prometheus.io/shard: "0"
        prometheus: prometheus-prometheus
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/instance: prometheus-prometheus
                app.kubernetes.io/name: prometheus
                prometheus: prometheus-prometheus
            topologyKey: topology.kubernetes.io/zone
          - labelSelector:
              matchLabels:
                app.kubernetes.io/instance: prometheus-prometheus
                app.kubernetes.io/name: prometheus
                prometheus: prometheus-prometheus
            topologyKey: kubernetes.io/hostname
      automountServiceAccountToken: true
      containers:
      - args:
        - --web.console.templates=/etc/prometheus/consoles
        - --web.console.libraries=/etc/prometheus/console_libraries
        - --config.file=/etc/prometheus/config_out/prometheus.env.yaml
        - --web.enable-lifecycle
        - --web.external-url=https://prometheus.xxx.xxx
        - --web.route-prefix=/
        - --log.level=error
        - --log.format=json
        - --storage.tsdb.retention.time=3h
        - --storage.tsdb.path=/prometheus
        - --storage.tsdb.wal-compression
        - --web.config.file=/etc/prometheus/web_config/web-config.yaml
        - --storage.tsdb.max-block-duration=2h
        - --storage.tsdb.min-block-duration=2h
        image: XXX.dkr.ecr.eu-west-1.amazonaws.com/quay.io/prometheus/prometheus:v2.44.0
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 6
          httpGet:
            path: /-/healthy
            port: http-web
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 3
        name: prometheus
        ports:
        - containerPort: 9090
          name: http-web
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /-/ready
            port: http-web
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 3
        resources:
          limits:
            memory: 20Gi
          requests:
            cpu: 300m
            memory: 20Gi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        startupProbe:
          failureThreshold: 60
          httpGet:
            path: /-/ready
            port: http-web
            scheme: HTTP
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 3
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /etc/prometheus/config_out
          name: config-out
          readOnly: true
        - mountPath: /etc/prometheus/certs
          name: tls-assets
          readOnly: true
        - mountPath: /prometheus
          name: prometheus-prometheus-prometheus-db
          subPath: prometheus-db
        - mountPath: /etc/prometheus/rules/prometheus-prometheus-prometheus-rulefiles-0
          name: prometheus-prometheus-prometheus-rulefiles-0
        - mountPath: /etc/prometheus/web_config/web-config.yaml
          name: web-config
          readOnly: true
          subPath: web-config.yaml
      - args:
        - --listen-address=:8080
        - --reload-url=http://127.0.0.1:9090/-/reload
        - --config-file=/etc/prometheus/config/prometheus.yaml.gz
        - --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
        - --watched-dir=/etc/prometheus/rules/prometheus-prometheus-prometheus-rulefiles-0
        - --log-level=error
        - --log-format=json
        command:
        - /bin/prometheus-config-reloader
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: SHARD
          value: "0"
        image: XXX.dkr.ecr.eu-west-1.amazonaws.com/quay.io/prometheus-operator/prometheus-config-reloader:v0.66.0
        imagePullPolicy: Always
        name: config-reloader
        ports:
        - containerPort: 8080
          name: reloader-web
          protocol: TCP
        resources:
          limits:
            cpu: 200m
            memory: 50Mi
          requests:
            cpu: 50m
            memory: 50Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /etc/prometheus/config
          name: config
        - mountPath: /etc/prometheus/config_out
          name: config-out
        - mountPath: /etc/prometheus/rules/prometheus-prometheus-prometheus-rulefiles-0
          name: prometheus-prometheus-prometheus-rulefiles-0
      - args:
        - sidecar
        - --prometheus.url=http://127.0.0.1:9090/
        - '--prometheus.http-client={"tls_config": {"insecure_skip_verify":true}}'
        - --grpc-address=:10901
        - --http-address=:10902
        - --objstore.config=$(OBJSTORE_CONFIG)
        - --tsdb.path=/prometheus
        - --log.level=error
        - --log.format=json
        env:
        - name: OBJSTORE_CONFIG
          valueFrom:
            secretKeyRef:
              key: config
              name: thanos-config
        image: XXX.dkr.ecr.eu-west-1.amazonaws.com/bitnami/thanos:0.31.0
        imagePullPolicy: Always
        name: thanos-sidecar
        ports:
        - containerPort: 10902
          name: http
          protocol: TCP
        - containerPort: 10901
          name: grpc
          protocol: TCP
        resources:
          limits:
            memory: 256Mi
          requests:
            cpu: 10m
            memory: 256Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /prometheus
          name: prometheus-prometheus-prometheus-db
          subPath: prometheus-db
      dnsPolicy: ClusterFirst
      initContainers:
      - args:
        - --watch-interval=0
        - --listen-address=:8080
        - --config-file=/etc/prometheus/config/prometheus.yaml.gz
        - --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
        - --watched-dir=/etc/prometheus/rules/prometheus-prometheus-prometheus-rulefiles-0
        - --log-level=error
        - --log-format=json
        command:
        - /bin/prometheus-config-reloader
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: SHARD
          value: "0"
        image: XXX.dkr.ecr.eu-west-1.amazonaws.com/quay.io/prometheus-operator/prometheus-config-reloader:v0.66.0
        imagePullPolicy: Always
        name: init-config-reloader
        ports:
        - containerPort: 8080
          name: reloader-web
          protocol: TCP
        resources:
          limits:
            cpu: 200m
            memory: 50Mi
          requests:
            cpu: 50m
            memory: 50Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /etc/prometheus/config
          name: config
        - mountPath: /etc/prometheus/config_out
          name: config-out
        - mountPath: /etc/prometheus/rules/prometheus-prometheus-prometheus-rulefiles-0
          name: prometheus-prometheus-prometheus-rulefiles-0
      nodeSelector:
        node.kubernetes.io/instance-type: m5.2xlarge
      priorityClassName: prometheus
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 2000
        runAsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: prometheus-prometheus
      serviceAccountName: prometheus-prometheus
      terminationGracePeriodSeconds: 600
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/instance: prometheus-prometheus
            app.kubernetes.io/name: prometheus
            prometheus: prometheus-prometheus
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      volumes:
      - name: config
        secret:
          defaultMode: 420
          secretName: prometheus-prometheus-prometheus
      - name: tls-assets
        projected:
          defaultMode: 420
          sources:
          - secret:
              name: prometheus-prometheus-prometheus-tls-assets-0
      - emptyDir:
          medium: Memory
        name: config-out
      - configMap:
          defaultMode: 420
          name: prometheus-prometheus-prometheus-rulefiles-0
        name: prometheus-prometheus-prometheus-rulefiles-0
      - name: web-config
        secret:
          defaultMode: 420
          secretName: prometheus-prometheus-prometheus-web-config
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: prometheus-prometheus-prometheus-db
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: ebs-sc-preserve
      volumeMode: Filesystem
    status:
      phase: Pending

~ k get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                               STORAGECLASS      REASON   AGE
pvc-e6df1f14-4f62-41ce-8f21-97b73b0c055f   10Gi       RWO            Retain           Bound    prometheus/prometheus-prometheus-prometheus-db-prometheus-prometheus-prometheus-0   ebs-sc-preserve            61d
pvc-f40a6589-6fcf-4419-9486-70e5efa43575   10Gi       RWO            Retain           Bound    prometheus/prometheus-prometheus-prometheus-db-prometheus-prometheus-prometheus-1   ebs-sc-preserve            9d

~ k describe pv pvc-e6df1f14-4f62-41ce-8f21-97b73b0c055f pvc-f40a6589-6fcf-4419-9486-70e5efa43575
Name:              pvc-e6df1f14-4f62-41ce-8f21-97b73b0c055f
Labels:            <none>
Annotations:       pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
                   volume.kubernetes.io/provisioner-deletion-secret-name:
                   volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers:        [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com]
StorageClass:      ebs-sc-preserve
Status:            Bound
Claim:             prometheus/prometheus-prometheus-prometheus-db-prometheus-prometheus-prometheus-0
Reclaim Policy:    Retain
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          10Gi
Node Affinity:
  Required Terms:
    Term 0:        topology.ebs.csi.aws.com/zone in [eu-west-1c]
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            ebs.csi.aws.com
    FSType:            ext4
    VolumeHandle:      vol-08b0f4a31f192dad7
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1683859406228-8081-ebs.csi.aws.com
Events:                <none>


Name:              pvc-f40a6589-6fcf-4419-9486-70e5efa43575
Labels:            <none>
Annotations:       pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
                   volume.kubernetes.io/provisioner-deletion-secret-name:
                   volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers:        [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com]
StorageClass:      ebs-sc-preserve
Status:            Bound
Claim:             prometheus/prometheus-prometheus-prometheus-db-prometheus-prometheus-prometheus-1
Reclaim Policy:    Retain
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          10Gi
Node Affinity:
  Required Terms:
    Term 0:        topology.ebs.csi.aws.com/zone in [eu-west-1b]
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            ebs.csi.aws.com
    FSType:            ext4
    VolumeHandle:      vol-07d31d533b2e01a4b
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1687797020030-8081-ebs.csi.aws.com
Events:                <none>

Every night between 00:00 - 06:00 (I believe this is when AWS rebalancing happens) at least one of prometheus replicas is being stuck in Pending state. Once cluster-autoscaler is being restarted - k -n kube-system rollout restart deploy cluster-autoscaler - ASG is being properly scheduled up.

For now I had to set minCapacity = 1 for these ASGs to prevent such situations.

mmerrill3 · 2023-08-07T21:51:10Z

This is closely related to issue #4739, which was fixed in cluster autoscaler version 1.22 onward. If you look at the function that generates a hypothetical new node to satisfy the pending pod, the new label that is needed to satisfy volumes created by the EBS CSI driver is not part of that function. It will not scale up unless you add the tag to the ASG manually.

Current function:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L409

The next function is why adding the labels to the ASG makes this work

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L423

Since the annotation is widely used now, maybe we update the buildGenericLabels function to use the label topology.ebs.csi.aws.com/zone as well for the new node when its hypothetically being built.

msvticket · 2023-09-06T09:10:55Z

I can make a stab at providing a PR with a fix.

Xyaren added the kind/bug Categorizes issue or PR as related to a bug. label Jan 25, 2021

Xyaren changed the title ~~Scale from 0 does not work with existing AWS EBS CSI PersistentVolume~~ Scale up from 0 does not work with existing AWS EBS CSI PersistentVolume Jan 25, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 6, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 18, 2021

ArchiFleKs mentioned this issue Sep 4, 2021

--balance-similar-node-groups and scale from zero don't work together for EBS volumes #4305

Closed

jbartosik added the area/cluster-autoscaler label Sep 15, 2021

k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 14, 2021

Secretions mentioned this issue May 18, 2022

DOM-37779: Add tag to connect asg to ebs zones dominodatalab/cdk-cf-eks#99

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 27, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 4, 2023

msvticket mentioned this issue Sep 6, 2023

fix: add aws csi zone label to node template #6090

Merged

k8s-ci-robot closed this as completed in #6090 Sep 7, 2023

Shubham82 mentioned this issue Sep 25, 2023

Backport: Adding aws csi zone label to a node template #6136

Closed

torredil mentioned this issue Feb 14, 2024

Report zone via well-known topology key in NodeGetInfo kubernetes-sigs/aws-ebs-csi-driver#1931

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale up from 0 does not work with existing AWS EBS CSI PersistentVolume #3845

Scale up from 0 does not work with existing AWS EBS CSI PersistentVolume #3845

Xyaren commented Jan 25, 2021 •

edited

Loading

westernspion commented Feb 4, 2021 •

edited

Loading

westernspion commented Feb 5, 2021 •

edited

Loading

fejta-bot commented May 6, 2021

mparikhcloudbeds commented Jun 21, 2021

k8s-triage-robot commented Dec 14, 2021

FarhanSajid1 commented Jan 4, 2022 •

edited

Loading

iomarcovalente commented Mar 8, 2022 •

edited

Loading

jbg commented Mar 29, 2022 •

edited

Loading

k8s-triage-robot commented Jun 27, 2022

k8s-triage-robot commented Jul 27, 2022

decipher27 commented Sep 20, 2022 •

edited

Loading

JBOClara commented Sep 21, 2022 •

edited

Loading

decipher27 commented Sep 29, 2022 •

edited

Loading

decipher27 commented Sep 29, 2022 •

edited

Loading

JBOClara commented Sep 30, 2022 •

edited

Loading

debu99 commented Oct 19, 2022

JBOClara commented Oct 19, 2022

debu99 commented Oct 19, 2022

jbg commented Oct 19, 2022

KiranReddy230 commented Jan 4, 2023

jbg commented Mar 6, 2023

k8s-triage-robot commented Jun 4, 2023

k8s-triage-robot commented Jul 4, 2023

michalschott commented Jul 13, 2023 •

edited

Loading

mmerrill3 commented Aug 7, 2023

msvticket commented Sep 6, 2023

Scale up from 0 does not work with existing AWS EBS CSI PersistentVolume #3845

Scale up from 0 does not work with existing AWS EBS CSI PersistentVolume #3845

Comments

Xyaren commented Jan 25, 2021 • edited Loading

westernspion commented Feb 4, 2021 • edited Loading

westernspion commented Feb 5, 2021 • edited Loading

fejta-bot commented May 6, 2021

mparikhcloudbeds commented Jun 21, 2021

k8s-triage-robot commented Dec 14, 2021

FarhanSajid1 commented Jan 4, 2022 • edited Loading

iomarcovalente commented Mar 8, 2022 • edited Loading

jbg commented Mar 29, 2022 • edited Loading

k8s-triage-robot commented Jun 27, 2022

k8s-triage-robot commented Jul 27, 2022

decipher27 commented Sep 20, 2022 • edited Loading

JBOClara commented Sep 21, 2022 • edited Loading

decipher27 commented Sep 29, 2022 • edited Loading

decipher27 commented Sep 29, 2022 • edited Loading

JBOClara commented Sep 30, 2022 • edited Loading

debu99 commented Oct 19, 2022

JBOClara commented Oct 19, 2022

debu99 commented Oct 19, 2022

jbg commented Oct 19, 2022

KiranReddy230 commented Jan 4, 2023

jbg commented Mar 6, 2023

k8s-triage-robot commented Jun 4, 2023

k8s-triage-robot commented Jul 4, 2023

michalschott commented Jul 13, 2023 • edited Loading

mmerrill3 commented Aug 7, 2023

msvticket commented Sep 6, 2023

Xyaren commented Jan 25, 2021 •

edited

Loading

westernspion commented Feb 4, 2021 •

edited

Loading

westernspion commented Feb 5, 2021 •

edited

Loading

FarhanSajid1 commented Jan 4, 2022 •

edited

Loading

iomarcovalente commented Mar 8, 2022 •

edited

Loading

jbg commented Mar 29, 2022 •

edited

Loading

decipher27 commented Sep 20, 2022 •

edited

Loading

JBOClara commented Sep 21, 2022 •

edited

Loading

decipher27 commented Sep 29, 2022 •

edited

Loading

decipher27 commented Sep 29, 2022 •

edited

Loading

JBOClara commented Sep 30, 2022 •

edited

Loading

michalschott commented Jul 13, 2023 •

edited

Loading