Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support region topology for the ebs in tree provider #1109

Closed
Noksa opened this issue Jan 8, 2022 · 14 comments
Closed

Support region topology for the ebs in tree provider #1109

Noksa opened this issue Jan 8, 2022 · 14 comments
Labels
documentation Improvements or additions to documentation feature New feature or request

Comments

@Noksa
Copy link

Noksa commented Jan 8, 2022

Version

Karpenter: v0.5.4

Kubernetes: v1.21.5

Expected Behavior

Karpenter creates nodes using volume topology when a PVC already exists for a pod

Actual Behavior

Karpenter can't create a new node due to wrong nodeSelector

Steps to Reproduce the Problem

Use the following steps and manifests.

  • Apply manifests first time (a node will be provisioned)
  • Scale STS to 0
  • Wait until the node deleted
  • Scale STS to 1
  • Get the error

So it looks like Karpenter can't provision nodes if a PVC already exists for a pod.

Manifests:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
    - port: 80
      name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "nginx"
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector:
        only-db: "true"
      tolerations:
        - key: "only-db"
          operator: Exists
          effect: NoSchedule
      containers:
        - name: nginx
          image: k8s.gcr.io/nginx-slim:0.8
          ports:
            - containerPort: 80
              name: web
          volumeMounts:
            - name: www
              mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
    - metadata:
        name: www
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 1Gi

Resource Specs and Logs

Provisioner:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  annotations:
    meta.helm.sh/release-name: smb-eu-central-1
    meta.helm.sh/release-namespace: smb-eu-central-1
  creationTimestamp: "2022-01-08T05:39:28Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: storage-db-smb-eu-central-1
  resourceVersion: "171015299"
  uid: 3e707cf2-a6a0-4b06-b8ae-5d8675722cbc
spec:
  kubeletConfiguration: {}
  labels:
    only-db: "true"
  limits: {}
  provider:
    apiVersion: extensions.karpenter.sh/v1alpha1
    instanceProfile: ad-eks-c-NodeInstanceProfile
    kind: AWS
    launchTemplate: ad-eks-db-lt
    securityGroupSelector:
      aws:eks:cluster-name: ad-eks-c
    subnetSelector:
      kubernetes.io/cluster/ad-eks-c: '*'
  requirements:
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  taints:
  - effect: NoSchedule
    key: only-db
    value: "true"
  ttlSecondsAfterEmpty: 120
status:
  resources:
    cpu: "0"
    memory: "0"

Log:

tried provisioner/storage-db-smb-eu-central-1: invalid nodeSelector "topology.kubernetes.io/region", [eu-central-1] not in []	{"commit": "7e79a67", "pod": "kazoo-db/storage-db-smb-eu-central-1-0"}

Part of pod spec:

      nodeSelector:
        only-db: "true"
      tolerations:
      - effect: NoSchedule
        key: only-db
        value: "true"
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: db
            release: smb-eu-central-1
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
      - labelSelector:
          matchLabels:
            app: db
            release: smb-eu-central-1
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule

As I can see Karpenter tries to use some other nodeSelector for some reason.

@Noksa Noksa added the bug Something isn't working label Jan 8, 2022
@Noksa Noksa changed the title Karpenter doesn't provision pods with persistent volumes (StatefulSet) Karpenter doesn't provision nodes when PVC already exist for a pod (StatefulSet) Jan 8, 2022
@Noksa Noksa changed the title Karpenter doesn't provision nodes when PVC already exist for a pod (StatefulSet) Karpenter doesn't provision nodes when a PVC already exists for a pod (StatefulSet) Jan 8, 2022
@ellistarn
Copy link
Contributor

It looks like the topology region label is getting applied to your volume somehow. Is it possible that this label is a property on your storage class for some reason?

Karpenter doesn't support (currently) the region label since it doesn't support multiple regions. We could always add noop support for it, but I'm curious how this is getting applied in the first place.

@Noksa
Copy link
Author

Noksa commented Jan 8, 2022

Yeah, my volumes have the following labels:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    kubernetes.io/createdby: aws-ebs-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
  creationTimestamp: "2021-12-27T08:10:47Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    topology.kubernetes.io/region: eu-central-1
    topology.kubernetes.io/zone: eu-central-1a
...

And the following nodeAffinity:

...
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.kubernetes.io/zone
          operator: In
          values:
          - eu-central-1a
        - key: topology.kubernetes.io/region
          operator: In
          values:
          - eu-central-1
...

But the StorageClass doesn't have any unexpected settings:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2021-01-26T12:45:33Z"
  name: gp2
  resourceVersion: "76963236"
  uid: d2acc111-2227-4953-9c5d-ea09084b70a0
parameters:
  fsType: ext4
  type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

I have 3 EKS clusters in different regions and all of them have the same behavior.

I see the difference that I'm using default gp2 provisioner kubernetes.io/aws-ebs.
But in #1015 you used ebs csi

@olemarkus
Copy link
Contributor

I know of a couple of configurations that will cause this. I don't know what EKS use, but I am guessing it depends on versions.

First off, there is this KCM admission controller
Secondly, it is CCMs GetLabelsForVolume. I believe this is only used if you are using the in-tree EBS volume driver, which you are (the kubernetes.io/aws-ebs bit).
If you are using the AWS EBS CSI driver, I believe these labels would not be set.

@Noksa
Copy link
Author

Noksa commented Jan 8, 2022

Yeah, I've tested it.

When I use https://kubernetes.io/docs/concepts/storage/volumes/#awselasticblockstore
The labels and the nodeAffinity are set to volumes.

When I use AWS EBS CSI driver the volume looks like this and seems it will work as expected:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
  creationTimestamp: "2022-01-08T08:41:35Z"
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/ebs-csi-aws-com
  name: pvc-2986b2aa-d788-4549-a9b7-01bb8198dc48
  resourceVersion: "171094109"
  uid: f91cbd91-a115-4c3e-b39e-77d5dc6b62f4
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 4Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: ebs-claim
    namespace: smb-eu-central-1
    resourceVersion: "171093369"
    uid: 2986b2aa-d788-4549-a9b7-01bb8198dc48
  csi:
    driver: ebs.csi.aws.com
    fsType: ext4
    volumeAttributes:
      storage.kubernetes.io/csiProvisionerIdentity: ...
    volumeHandle: ...
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.ebs.csi.aws.com/zone
          operator: In
          values:
          - eu-central-1a
  persistentVolumeReclaimPolicy: Delete
  storageClassName: ebs-sc
  volumeMode: Filesystem
status:
  phase: Bound

At the same time https://kubernetes.io/docs/concepts/storage/volumes/#awselasticblockstore is default EBS provisioner in EKS clusters at the moment

@ellistarn ellistarn changed the title Karpenter doesn't provision nodes when a PVC already exists for a pod (StatefulSet) Support region topology for the ebs in tree provider Jan 8, 2022
@ellistarn
Copy link
Contributor

How difficult is it for you to migrate to the out-of-tree provider? I think we likely want to support the in-tree in the short term, but perhaps not in the long term.

@Noksa
Copy link
Author

Noksa commented Jan 8, 2022

@ellistarn
I already have moved to the ebs csi driver due to the issue.
So it is not a problem anymore. At least for me.

This issue was a good point to try an out-of-tree storage provider (:

@Noksa
Copy link
Author

Noksa commented Jan 8, 2022

Probably it would be good to update the documentation about this limitation to avoid new similar issues

@ellistarn ellistarn added documentation Improvements or additions to documentation feature New feature or request and removed bug Something isn't working labels Jan 8, 2022
@jeffreymlewis
Copy link

jeffreymlewis commented Jan 11, 2022

I ran into this issue yesterday. Having support for the in-tree provisioner would be greatly appreciated! I will look into switching to the AWS EBS CSI driver, but I assume that will be too much churn right now.

@DWSR
Copy link
Contributor

DWSR commented Jan 19, 2022

We're running into an issue where the in-tree provisioner used the deprecated failure-domain.beta.kubernetes.io/{region,zone} labels, and so Karpenter will not select it:

2022-01-19T19:37:11.206Z	DEBUG	controller.selection	Could not schedule pod, matched 0/1 provisioners, tried provisioner/default: invalid nodeSelector "failure-domain.beta.kubernetes.io/region", [us-east-2] not in []	{"commit": "7e79a67", "pod": "cockroachdb-system/cockroachdb-1"}

It would be really helpful if Karpenter could support these labels for those of us still using in-tree provisioned volumes.

More context: https://gist.github.com/DWSR/ee176d7cb1d7678ebbe23f8b38ae2885

@tobernguyen
Copy link

tobernguyen commented Jan 20, 2022

We're running into this same issue that prevents us from adopting Karpenter unfortunately. A workaround to support in-tree provisioner would be greatly appreciated because migrating in-tree PV to CSI PV won't be supported in EKS in foreseeable future: kubernetes-sigs/aws-ebs-csi-driver#480 (comment)

@vietwow
Copy link

vietwow commented Jan 26, 2022

I also got the same issue. Really wait for this feature to be supported soon. Thanks

@lbvffvbl
Copy link

lbvffvbl commented Feb 1, 2022

hi, Folks. Looks like you should add v1.LabelFailureDomainBetaRegion to IgnoredLabels too
because v1.LabelTopologyRegion - it's "topology.kubernetes.io/region" and i got error

invalid nodeSelector "failure-domain.beta.kubernetes.io/region", [us-east-2] not in []

until add v1.LabelFailureDomainBetaRegion (failure-domain.beta.kubernetes.io/region) to IgnoredLabels

@tobernguyen
Copy link

hi, Folks. Looks like you should add v1.LabelFailureDomainBetaRegion to IgnoredLabels too because v1.LabelTopologyRegion - it's "topology.kubernetes.io/region" and i got error

invalid nodeSelector "failure-domain.beta.kubernetes.io/region", [us-east-2] not in []

until add v1.LabelFailureDomainBetaRegion (failure-domain.beta.kubernetes.io/region) to IgnoredLabels

I'm using v0.6.0 and have the exact same issue.

@ellistarn
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation feature New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants