Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"attachable-volumes-aws-ebs" not being set on nodes even when --volume-attach-limit is used... #1258

Closed
diranged opened this issue Jun 2, 2022 · 12 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@diranged
Copy link

diranged commented Jun 2, 2022

/kind bug

What happened?

We have noticed that even though we're setting the --volume-attach-limit=15 flag on our ebs-csi-node daemonsets, our instances are reporting 39 attachable EBS volumes. Shouldn't setting --volume-attach-limit=15 turn around and tell Kubernetes that the limit is 15 for that particular node?

What you expected to happen?

I expect to see .status.allocatable,"attachable-volumes-aws-ebs" report 15... not 39:

$  k get node ip-100-64-142-206.us-west-2.compute.internal -o json | jq '.status.allocatable."attachable-volumes-aws-ebs"'
"39"

How to reproduce it (as minimally and precisely as possible)?

We are running EKS 1.22, Bottlerocket 1.72 nodes, and using the 2.6.7 helm chart for the aws-ebs-csi drivers..

Anything else we need to know?:

Logs

MacBook-Air:components diranged$ k logs ebs-csi-node-kx4nf ebs-plugin
I0602 00:11:32.299797       1 driver.go:73] Driver: ebs.csi.aws.com Version: v1.6.1
I0602 00:11:32.299866       1 node.go:83] [Debug] Retrieving node info from metadata service
I0602 00:11:32.299870       1 metadata.go:85] retrieving instance data from ec2 metadata
I0602 00:11:35.443637       1 metadata.go:92] ec2 metadata is available
I0602 00:11:35.456282       1 driver.go:143] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0602 00:11:35.634807       1 identity.go:27] GetPluginInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:11:35.765117       1 identity.go:27] GetPluginInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:11:36.365784       1 node.go:513] NodeGetInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:11:50.595302       1 identity.go:61] Probe: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:11:54.013703       1 node.go:497] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:11:54.014772       1 node.go:434] NodeGetVolumeStats: called with args {VolumeId:vol-0b263a488ca00e136 VolumePath:/var/lib/kubelet/pods/3af51f7e-f82a-43f4-848f-75fa818dd938/volumes/kubernetes.io~csi/pvc-28501c63-166b-4b0a-87ab-6927150b9113/mount StagingTargetPath: XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:11:54.567156       1 node.go:497] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:11:54.568297       1 node.go:434] NodeGetVolumeStats: called with args {VolumeId:vol-03b7c8133aa062f03 VolumePath:/var/lib/kubelet/pods/a39ff523-a9c4-46ad-a2c1-b17c947f5713/volumes/kubernetes.io~csi/pvc-a2b1c935-8df5-436b-9649-ffa564fd7349/mount StagingTargetPath: XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:12:00.595768       1 identity.go:61] Probe: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:12:01.271964       1 node.go:497] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:12:01.272808       1 node.go:434] NodeGetVolumeStats: called with args {VolumeId:vol-0da2b5c4505dda7f5 VolumePath:/var/lib/kubelet/pods/c547ea16-e784-4ed8-bee2-e82df1f52c07/volumes/kubernetes.io~csi/pvc-7d9e42af-975e-4dae-b4eb-40b52a55ec2a/mount StagingTargetPath: XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:12:10.581161       1 node.go:497] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0602 00:12:10.581649       1 node.go:434] NodeGetVolumeStats: called with args {VolumeId:vol-09d5a89d3710b79df VolumePath:/var/lib/kubelet/pods/9a718c70-5b4a-420e-a47c-5e

Example pod config:

$ k get pod ebs-csi-node-kx4nf -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2022-06-02T00:11:30Z"
  generateName: ebs-csi-node-
  labels:
    app: ebs-csi-node
    app.kubernetes.io/component: csi-driver
    app.kubernetes.io/instance: aws-ebs-csi-driver
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: aws-ebs-csi-driver
    app.kubernetes.io/version: 1.6.1
    controller-revision-hash: 847fb8885d
    helm.sh/chart: aws-ebs-csi-driver-2.6.7
    pod-template-generation: "21"
    test: foo
  name: ebs-csi-node-kx4nf
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: ebs-csi-node
    uid: ddad0dc2-6263-4091-a719-8a919f34884e
  resourceVersion: "715435360"
  uid: c9e8428f-8306-4b11-bdf9-79d2cc7e758f
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - ip-100-64-159-134.us-west-2.compute.internal
  containers:
  - args:
    - node
    - --endpoint=$(CSI_ENDPOINT)
    - --volume-attach-limit=15
    - --logtostderr
    - --v=10
    env:
    - name: CSI_ENDPOINT
      value: unix:/csi/csi.sock
    - name: CSI_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    image: .../ebs-csi-driver/aws-ebs-csi-driver:v1.6.1
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 5
      httpGet:
        path: /healthz
        port: healthz
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 3
    name: ebs-plugin
    ports:
    - containerPort: 9808
      hostPort: 9808
      name: healthz
      protocol: TCP
    resources: {}
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/kubelet
      mountPropagation: Bidirectional
      name: kubelet-dir
    - mountPath: /csi
      name: plugin-dir
    - mountPath: /dev
      name: device-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5n5r7
      readOnly: true
  - args:
    - --csi-address=$(ADDRESS)
    - --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    - --v=2
    env:
    - name: ADDRESS
      value: /csi/csi.sock
    - name: DRIVER_REG_SOCK_PATH
      value: /var/lib/kubelet/plugins/ebs.csi.aws.com/csi.sock
    image: .../eks-distro/kubernetes-csi/node-driver-registrar:v2.1.0-eks-1-18-13
    imagePullPolicy: IfNotPresent
    name: node-driver-registrar
    resources:
      requests:
        cpu: 10m
        memory: 16Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /csi
      name: plugin-dir
    - mountPath: /registration
      name: registration-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5n5r7
      readOnly: true
  - args:
    - --csi-address=/csi/csi.sock
    image: .../eks-distro/kubernetes-csi/livenessprobe:v2.2.0-eks-1-18-13
    imagePullPolicy: IfNotPresent
    name: liveness-probe
    resources:
      requests:
        cpu: 10m
        memory: 16Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /csi
      name: plugin-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5n5r7
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-100-64-159-134.us-west-2.compute.internal
  nodeSelector:
    kubernetes.io/os: linux
  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: ebs-csi-node-sa
  serviceAccountName: ebs-csi-node-sa
  terminationGracePeriodSeconds: 30
  tolerations:
  - operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  volumes:
  - hostPath:
      path: /var/lib/kubelet
      type: Directory
    name: kubelet-dir
  - hostPath:
      path: /var/lib/kubelet/plugins/ebs.csi.aws.com/
      type: DirectoryOrCreate
    name: plugin-dir
  - hostPath:
      path: /var/lib/kubelet/plugins_registry/
      type: Directory
    name: registration-dir
  - hostPath:
      path: /dev
      type: Directory
    name: device-dir
  - name: kube-api-access-5n5r7
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace

Example Node

$ k get node ip-100-64-159-134.us-west-2.compute.internal -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    csi.volume.kubernetes.io/nodeid: '{"ebs.csi.aws.com":"i-05a5d080d738d23dc","efs.csi.aws.com":"i-05a5d080d738d23dc"}'
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2022-06-01T13:37:11Z"
  finalizers:
  - karpenter.sh/termination
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: c6a.48xlarge
    beta.kubernetes.io/os: linux
    failure-domain.beta.kubernetes.io/region: us-west-2
    failure-domain.beta.kubernetes.io/zone: us-west-2c
    karpenter.sh/capacity-type: spot
    karpenter.sh/provisioner-name: compute-amd64
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: ip-100-64-159-134.us-west-2.compute.internal
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: c6a.48xlarge
    topology.ebs.csi.aws.com/zone: us-west-2c
    topology.kubernetes.io/region: us-west-2
    topology.kubernetes.io/zone: us-west-2c
    vpc.amazonaws.com/eniConfig: us-west-2c
  name: ip-100-64-159-134.us-west-2.compute.internal
  resourceVersion: "715459005"
  uid: 5d9f7df7-ac3a-4050-8fa3-bbd60268d43c
spec:
  providerID: aws:///us-west-2c/i-05a5d080d738d23dc
status:
  addresses:
  - address: 100.64.159.134
    type: InternalIP
  - address: ip-100-64-159-134.....internal
    type: Hostname
  - address: ip-100-64-159-134.....internal
    type: InternalDNS
  allocatable:
    attachable-volumes-aws-ebs: "39"
    cpu: 191450m
    ephemeral-storage: "189155564434"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: "374179144204"
    pods: "250"
  capacity:
    attachable-volumes-aws-ebs: "39"
    cpu: "192"
    ephemeral-storage: 206412008Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 387880464Ki
    pods: "250"
  conditions:
  - lastHeartbeatTime: "2022-06-02T00:28:17Z"
    lastTransitionTime: "2022-06-01T13:37:32Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2022-06-02T00:28:17Z"
    lastTransitionTime: "2022-06-01T13:37:32Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2022-06-02T00:28:17Z"
    lastTransitionTime: "2022-06-01T13:37:32Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2022-06-02T00:28:17Z"
    lastTransitionTime: "2022-06-01T13:38:22Z"
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
...
  nodeInfo:
    architecture: amd64
    bootID: 69558e8e-53f1-44b2-8331-259300b6b352
    containerRuntimeVersion: containerd://1.5.11+bottlerocket
    kernelVersion: 5.10.109
    kubeProxyVersion: v1.22.6-eks-b18cdc9
    kubeletVersion: v1.22.6-eks-b18cdc9
    machineID: ec2b847666ff9cb00b9f622fb1159f46
    operatingSystem: linux
    osImage: Bottlerocket OS 1.7.2 (aws-k8s-1.22)
    systemUUID: ec2b8476-66ff-9cb0-0b9f-622fb1159f46
...
  volumesAttached:
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-04b842f0da9265366
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-0fbd4097dd35d8915
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-0af89b29d77a5f1e4
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-0b263a488ca00e136
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-06b8d8cc3105bf2a5
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-0f0e58da8ce7a6ad5
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-0aa7e6af0d289c3ec
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-0136af4088062b7d5
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-010897a3c3c9bbad4
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-0d756f57c5846a0a5
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-03b7c8133aa062f03
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-06a2d7a6108abc578
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-0da2b5c4505dda7f5
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-09d5a89d3710b79df
  - devicePath: ""
    name: kubernetes.io/csi/ebs.csi.aws.com^vol-08a92132de6bfd5fe
  volumesInUse:
  - kubernetes.io/csi/ebs.csi.aws.com^vol-010897a3c3c9bbad4
  - kubernetes.io/csi/ebs.csi.aws.com^vol-0136af4088062b7d5
  - kubernetes.io/csi/ebs.csi.aws.com^vol-03b7c8133aa062f03
  - kubernetes.io/csi/ebs.csi.aws.com^vol-04b842f0da9265366
  - kubernetes.io/csi/ebs.csi.aws.com^vol-06a2d7a6108abc578
  - kubernetes.io/csi/ebs.csi.aws.com^vol-06b8d8cc3105bf2a5
  - kubernetes.io/csi/ebs.csi.aws.com^vol-08a92132de6bfd5fe
  - kubernetes.io/csi/ebs.csi.aws.com^vol-09d5a89d3710b79df
  - kubernetes.io/csi/ebs.csi.aws.com^vol-0aa7e6af0d289c3ec
  - kubernetes.io/csi/ebs.csi.aws.com^vol-0af89b29d77a5f1e4
  - kubernetes.io/csi/ebs.csi.aws.com^vol-0b263a488ca00e136
  - kubernetes.io/csi/ebs.csi.aws.com^vol-0d756f57c5846a0a5
  - kubernetes.io/csi/ebs.csi.aws.com^vol-0da2b5c4505dda7f5
  - kubernetes.io/csi/ebs.csi.aws.com^vol-0f0e58da8ce7a6ad5
  - kubernetes.io/csi/ebs.csi.aws.com^vol-0fbd4097dd35d8915
...

Environment
Kubernetes: v1.22.6-eks-14c7a48

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 31, 2022
@pkit
Copy link

pkit commented Sep 7, 2022

@diranged it also reports always zero in describe:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         2360m (29%)   2500m (31%)
  memory                      8064Mi (26%)  8128Mi (26%)
  ephemeral-storage           0 (0%)        0 (0%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0

This is a mess but authors say that you should look only at csinode:

$ kubectl describe csinode ip-192-168-33-33.ec2.internal
Name:               ip-192-168-33-33.ec2.internal
Labels:             <none>
Annotations:        storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/cinder
CreationTimestamp:  Fri, 19 Aug 2022 12:05:23 +0000
Spec:
  Drivers:
    efs.csi.aws.com:
      Node ID:  i-12345678900000000
    ebs.csi.aws.com:
      Node ID:  i-12345678900000000
      Allocatables:
        Count:        25
      Topology Keys:  [topology.ebs.csi.aws.com/zone]
Events:               <none>

@torredil
Copy link
Member

torredil commented Sep 8, 2022

The Kubelet populates a CSINode object for the CSI driver as part of Kubelet plugin registration using the node-driver-registrar sidecar container. The correct volume attach limit is only reported via this CSINode object. You can view the CSINode "Allocatables" count with kubectl describe csinode insert_node_name_here. My understanding is that the attachable-volumes-aws-ebs number being reported when describing a node is a remnant of the in-tree driver and not an accurate representation of attachable volume count.

@sok1234
Copy link

sok1234 commented Sep 9, 2022

@torredil I've set volume limit on the csi driver to 100 and tested on an EKS 1.23 cluster with a t3.xlarge node and I couldn't attach more than 23 volumes (not even 25).

describe node shows 25:

$ kubectl describe nodes ip-10-4-151-53..ec2.internal 
...
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         4
  ephemeral-storage:           104845292Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      16203828Ki
  pods:                        58
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         3920m

describe csinode shows 100

$ kubectl describe csinode ip-10-4-151-53.ec2.internal
Name:               ip-10-4-151-53.ec2.internal
Labels:             <none>
Annotations:        storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/cinder,kubernetes.io/gce-pd
CreationTimestamp:  Fri, 02 Sep 2022 12:36:42 +0300
Spec:
  Drivers:
    efs.csi.aws.com:
      Node ID:  i-12345678900000000
    ebs.csi.aws.com:
      Node ID:  i-12345678900000000
      Allocatables:
        Count:        100
      Topology Keys:  [topology.ebs.csi.aws.com/zone]
Events:         <none>

I'm using aws-ebs-csi-driver-2.10.1 helm chart and node is running v1.23.9-eks-ba74326.

@torredil
Copy link
Member

torredil commented Sep 9, 2022

@sok1234 The attachable-volumes-aws-ebs count displayed when running kubectl describe nodes is not relevant in any way. The CSINode object Allocatables property is what you want to be looking at, which correctly reports the number you've set.

My understanding is that t3.xlarge instances support a maximum of 28 attachments (volumes and ENIs). How many ENIs are attached to your instance?

@sok1234
Copy link

sok1234 commented Sep 12, 2022

@torredil the instance has 4 ENIs.

If I understand correctly, the total available attachments are calculated as: (Total - [ENIs] - [root EBS volume]). That gives 23 volumes in this case which is exactly what I get.

It looks like the instance type is limiting the attachable volumes which makes the --volume-attach-limit useful only for values lower than the default. Is there any way around it or we're limited by the instance type?

That's really bad because even though we can have up to 58 pods on this instance type we can only attach volumes to half of them.

@torredil
Copy link
Member

@sok1234 Your calculation is correct and I would expect to see an attachment limit of 23 (as reported) given the instance has 4 ENIs. The max number of attachments is limited by the instance type.

By default, the CSI driver parses the instance type and decides the volume limit, but this is only a rough approximation and not accurate in some cases. Specifying the volume limit via --volume-attach-limit is an alternative solution until we can dynamically discover the max number of attachments per instance type. For more context see #347.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 12, 2022
@sotiriougeorge
Copy link

If I understand correctly, the total available attachments are calculated as: (Total - [ENIs] - [root EBS volume]). That gives 23 volumes in this case which is exactly what I get.

It looks like the instance type is limiting the attachable volumes which makes the --volume-attach-limit useful only for values lower than the default. Is there any way around it or we're limited by the instance type?

That's really bad because even though we can have up to 58 pods on this instance type we can only attach volumes to half of them.

So the number of ENIs attached to a Node only play a role in only actively reducing the amount of attachable volumes out from a "default" total , or do they affect the total number? Because while trying to tackle this issue and searching around google, I think I've found some answers that kind of imply the latter.

On the other hand I might have this wrong - I am still trying to understand what can be done to overcome this limitation.

@stevehipwell
Copy link
Contributor

@sotiriougeorge I've replied to your question on #1163 but for completeness the following calculation defines the max attachable volumes.

max_volume_attachments = min(user_defined_volume_attach_limit, instance_defined_attach_limit - 1)

So for a Nitro instance max_volume_attachments can't be greater than 27, no matter how much you want it to be otherwise, as the attach limit is 28 but there is always at least 1 ENI attached (see docs). Adding additional ENIs or using an instance with additional NVMe drives will further lower this value.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 3, 2022
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants