Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create CSI snapshot-EBS csi driver #1249

Closed
keerthana1608 opened this issue May 23, 2022 · 14 comments · Fixed by #1257
Closed

Unable to create CSI snapshot-EBS csi driver #1249

keerthana1608 opened this issue May 23, 2022 · 14 comments · Fixed by #1257
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@keerthana1608
Copy link

/kind bug

What happened?
Unable to create CSI snapshot

What you expected to happen?
Should be able to create CSI driver snapshot without any failure.

How to reproduce it (as minimally and precisely as possible)?
Install the latest 1.5(1.5.3) or 1.6(1.6.2) ebs csi driver on EKS clsuter. Try creating the volumesnapshot.

Anything else we need to know?:
We are seeing following constant error in controller logs:
E0523 16:45:25.101262 1 reflector.go:127] k8s.io/[email protected]/tools/cache/reflector.go:156: Failed to watch *v1beta1.VolumeSnapshotContent: failed to list *v1beta1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)
E0523 16:45:25.101335 1 reflector.go:127] k8s.io/[email protected]/tools/cache/reflector.go:156: Failed to watch *v1beta1.VolumeSnapshotClass: failed to list *v1beta1.VolumeSnapshotClass: the server could not find the requested resource (get volumesnapshotclasses.snapshot.storage.k8s.io)

Environment

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.9-eks-14c7a48", GitCommit:"717bfb2b8ceb809a42a6c0baabde59fae28637ef", GitTreeState:"clean", BuildDate:"2022-04-01T03:17:28Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

  • Driver version: 1.5.3, 1.6.2
    csi-snapshotter_logs.txt

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 23, 2022
@torredil
Copy link
Member

Hi @keerthana1608 👋

Have you installed the external-snapshotter sidecar? If you intend to use the csi-snapshotter functionality, you will need to first install the CSI Snapshotter.

@sgatdell
Copy link

Hi @keerthana1608 👋

Have you installed the external-snapshotter sidecar? If you intend to use the csi-snapshotter functionality, you will need to first install the CSI Snapshotter.

@torredil Yes, the external snapshotter was installed. The same tests were working fine with CSI driver 1.5.1 when this was tested previously a while back.

@rdpsin
Copy link
Contributor

rdpsin commented May 23, 2022

Looks like the snapshot CRDs are not installed.

Can you try installing them with:

kubectl kustomize client/config/crd | kubectl create -f -

@phuongatemc
Copy link

What we observed is the VolumeSnapshot was created and the associated VolumeSnapshotContent also created but the EBS CSI Driver does not show any action (in the log) that it starts working on the VolumeSnapshotContent so the VolumeSnapshot and VolumeSnapshotContent are just hanging there.
We ran "kubectl logs -n kube-system ebs-csi-controller-xxx csi-snapshotter" and see that it keep repeating that error mentioned in bug description.
We checked the API version VolumeSnapshot and VolumeSnapshotContent. Both are v1.
We also ran "kubectl get crd volumesnapshotcontent -o yaml" and check to see if there is any "v1beta1" at all but see nothing except v1. Not sure why the csi-snapshotter log keeps refering to v1beta1 and not recognize the VolumeSnapshotContent being created.

@tgip-work
Copy link

tgip-work commented May 25, 2022

Same behaviour for me as reported by @phuongatemc but I am using the managed add-on aws-ebs-csi-driver v1.6.1-eksbuild.1

Creating PVCs works fine (type gp3) but not taking snapshots via volumesnapshots

K8S/Kubectl Version:

Client Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-13+d2965f0db10712", GitCommit:"d2965f0db1071203c6f5bc662c2827c71fc8b20d", GitTreeState:"clean", BuildDate:"2021-06-26T01:02:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.9-eks-0d102a7", GitCommit:"eb09fc479c1b2bfcc35c47416efb36f1b9052d58", GitTreeState:"clean", BuildDate:"2022-02-17T16:36:28Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

Log from container csi-snapshotter :

csi-snapshotter E0525 13:38:38.687352       1 reflector.go:127] github.com/kubernetes-csi/external-snapshotter/client/v3/informers/externalversions/factory.go:117: Failed to watch *v1beta1.VolumeSnapshotClass: failed to list *v1beta1.VolumeSnapshotClass: the server could not find the requested resource (get volumesnapshotclasses.snapshot.storage.k8s.io)

CRDs (related to Volume Snapshots) taken from https://github.com/kubernetes-csi/external-snapshotter/tree/v6.0.0/client/config/crd

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    api-approved.kubernetes.io: https://github.com/kubernetes-csi/external-snapshotter/pull/665
    controller-gen.kubebuilder.io/version: v0.8.0
  creationTimestamp: "2022-05-25T12:27:24Z"
  generation: 1
  name: volumesnapshotclasses.snapshot.storage.k8s.io
... AND MORE ...

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    api-approved.kubernetes.io: https://github.com/kubernetes-csi/external-snapshotter/pull/665
    controller-gen.kubebuilder.io/version: v0.8.0
  creationTimestamp: "2022-05-25T12:27:25Z"
  generation: 1
  name: volumesnapshotcontents.snapshot.storage.k8s.io
... AND MORE ...

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    api-approved.kubernetes.io: https://github.com/kubernetes-csi/external-snapshotter/pull/665
    controller-gen.kubebuilder.io/version: v0.8.0
  creationTimestamp: "2022-05-25T12:27:27Z"
  generation: 1
  name: volumesnapshots.snapshot.storage.k8s.io

@gtxu
Copy link
Contributor

gtxu commented May 25, 2022

Hi @phuongatemc @sgatdell @tgip-work, This PR will fix the API version issue on sidecar. Updating the snapshot controller CRDs after this PR will work, for long term we will bump up our snapshotter sidecar version to make sure the API V1 works.

@keerthana1608
Copy link
Author

keerthana1608 commented May 26, 2022

@gtxu
I picked the latest snapshot CRDs from https://github.com/kubernetes-csi/external-snapshotter/tree/master/client/config/crd.
Still seeing the same issue

CSI snapshotter logs:
I0526 06:13:41.908757 1 snapshot_controller_base.go:111] Starting CSI snapshotter
E0526 06:13:41.911982 1 reflector.go:127] k8s.io/[email protected]/tools/cache/reflector.go:156: Failed to watch *v1beta1.VolumeSnapshotContent: failed to list *v1beta1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)
E0526 06:13:41.912034 1 reflector.go:127] k8s.io/[email protected]/tools/cache/reflector.go:156: Failed to watch *v1beta1.VolumeSnapshotClass: failed to list *v1beta1.VolumeSnapshotClass: the server could not find the requested resource (get volumesnapshotclasses.snapshot.storage.k8s.io)

@tgip-work
Copy link

Hi @gtxu

as @keerthana1608 did, I also used the latest CRDs from https://github.com/kubernetes-csi/external-snapshotter/tree/master/client/config/crd.

and I still see the same issue as before.

csi-snapshotter E0527 09:25:08.324162       1 reflector.go:127] github.com/kubernetes-csi/external-snapshotter/client/v3/informers/externalversions/factory.go:117: Failed to watch *v1beta1.VolumeSnapshotContent: failed to list *v1beta1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)
csi-snapshotter E0527 09:25:13.834563       1 reflector.go:127] github.com/kubernetes-csi/external-snapshotter/client/v3/informers/externalversions/factory.go:117: Failed to watch *v1beta1.VolumeSnapshotClass: failed to list *v1beta1.VolumeSnapshotClass: the server could not find the requested resource (get volumesnapshotclasses.snapshot.storage.k8s.io)

My setup:

Maybe I am totally wrong, but the sidecar container csi-snapshotter (for the managed addon) is using image 602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/csi-snapshotter:v3.0.3. Is this image identical to k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.3 ?
The helm chart is using k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.3

According to the docs this is an outdated version! Most current image is k8s.gcr.io/sig-storage/csi-snapshotter:v6.0.1 which has not been updated in the docs yet.

I will try to run the sidecar container with k8s.gcr.io/sig-storage/csi-snapshotter:v6.0.1 and will give an update here.

In addition to the CSI Snapshotter isse, there is an event at the VolumeSnapshotContent - Is a permission missing?

  Type     Reason                  Age                From                             Message
  ----     ------                  ----               ----                             -------
  Warning  SnapshotCreationFailed  27m (x4 over 27m)  csi-snapshotter ebs.csi.aws.com  Failed to create snapshot: failed to add VolumeSnapshotBeingCreated annotation on the content snapcontent-cfca045d-559b-4075-9200-ba6270fe59e2: "snapshot controller failed to update snapcontent-cfca045d-559b-4075-9200-ba6270fe59e2 on API server: volumesnapshotcontents.snapshot.storage.k8s.io \"snapcontent-cfca045d-559b-4075-9200-ba6270fe59e2\" is forbidden: User \"system:serviceaccount:kube-system:ebs-csi-controller-sa\" cannot patch resource \"volumesnapshotcontents\" in API group \"snapshot.storage.k8s.io\" at the cluster scope"

@tgip-work
Copy link

tgip-work commented May 27, 2022

SHORT UPDATE and good news!

A snapshot has been created:

Events:
  Type    Reason           Age    From                 Message
  ----    ------           ----   ----                 -------
  Normal  SnapshotCreated  2m10s  snapshot-controller  Snapshot default/ebs-volume-snapshot was successfully created by the CSI driver.
  Normal  SnapshotReady    105s   snapshot-controller  Snapshot default/ebs-volume-snapshot is ready to use.

Summary: What I needed to change:

  1. Using latest helm chart 2.6.8 with csi-snapshotter image v6.0.1 (instead of v3.0.3 which is the default). The managed EKS addon does not seem to work as I cannot customize the image thats being used for the CSI Snapshotter sidecar container.
  2. Manually added verb "patch" to ClusterRole ebs-external-snapshotter-role

Step by Step

1. My helm values file myvalues.yaml :

controller:
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: ARN OF THE IAM ROLE - See https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html

sidecars:
  snapshotter:
    image:
      tag: "v6.0.1"

2. Helm Installation

helm upgrade --install aws-ebs-csi-driver     --namespace kube-system     --version 2.6.8     -f myvalues.yaml     aws-ebs-csi-driver/aws-ebs-csi-driver

3. ClusterRole ebs-external-snapshotter-role

Due to error

  Type     Reason                  Age                From                             Message
  ----     ------                  ----               ----                             -------
  Warning  SnapshotCreationFailed  27m (x4 over 27m)  csi-snapshotter ebs.csi.aws.com  Failed to create snapshot: failed to add VolumeSnapshotBeingCreated annotation on the content snapcontent-cfca045d-559b-4075-9200-ba6270fe59e2: "snapshot controller failed to update snapcontent-cfca045d-559b-4075-9200-ba6270fe59e2 on API server: volumesnapshotcontents.snapshot.storage.k8s.io \"snapcontent-cfca045d-559b-4075-9200-ba6270fe59e2\" is forbidden: User \"system:serviceaccount:kube-system:ebs-csi-controller-sa\" cannot patch resource \"volumesnapshotcontents\" in API group \"snapshot.storage.k8s.io\" at the cluster scope"

I added the verb patch to resource volumesnapshotcontents - also see #1243

- apiGroups:
  - snapshot.storage.k8s.io
  resources:
  - volumesnapshotcontents
  verbs:
  - create
  - get
  - list
  - watch
  - update
  - delete
  - patch

4. Then I had to restart the pods ebs-csi-controller-* in namespace kube-system in order to reflect the ClusterRole change as quick as possible.

Appendix

In case you want the helm release exactly look like the EKS managed addon (except 1. aws-ebs-csi-driver 1.6.2 instead of 1.6.1 and 2. csi-snapshotter 6.0.1 instead of 3.0.3) , you can use these helm values.

image:
  repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/aws-ebs-csi-driver

controller:
  k8sTagClusterId: <CLUSTER_ID>
  region: <REGION>
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: <ROLE_ARN>
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - ebs-csi-controller
          topologyKey: kubernetes.io/hostname
        weight: 100
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
  resources:
    limits:
      cpu: 100m
      memory: 128Mi
    requests:
      cpu: 10m
      memory: 40Mi

node:
  resources:
    limits:
      cpu: 100m
      memory: 128Mi
    requests:
      cpu: 10m
      memory: 40Mi


sidecars:
  provisioner:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-provisioner
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  attacher:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-attacher
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  snapshotter:
    image:
      #repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-snapshotter
      repository: k8s.gcr.io/sig-storage/csi-snapshotter
      tag: "v6.0.1"
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  resizer:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-resizer
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  livenessProbe:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/livenessprobe
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  nodeDriverRegistrar:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-node-driver-registrar

@gtxu
Copy link
Contributor

gtxu commented May 27, 2022

Hi @tgip-work, thanks for the attempt and detailed description on the solution. The latest snapshot controller(deployed with CRDs) includes several changes that does not works well with the external-snapshotter sidecar lower than v6.0.0+. We will update the helm to use latest external-snapshotter after testing.

@gtxu
Copy link
Contributor

gtxu commented May 27, 2022

Plus if you install & manage your EBS CSI Driver with Helm, I recommend pulling sidecar images from GCR currently.

k8s.gcr.io/sig-storage/livenessprobe
k8s.gcr.io/sig-storage/csi-node-driver-registrar
k8s.gcr.io/sig-storage/csi-resizer
k8s.gcr.io/sig-storage/csi-attacher
k8s.gcr.io/sig-storage/csi-provisioner
k8s.gcr.io/sig-storage/csi-snapshotter

@J1a-wei
Copy link

J1a-wei commented May 31, 2022

@tgip-work I try this. Still seeing the same issue. ebs-csi-controller/csi-snapshotter can't found snapshot

SHORT UPDATE and good news!

A snapshot has been created:

Events:
  Type    Reason           Age    From                 Message
  ----    ------           ----   ----                 -------
  Normal  SnapshotCreated  2m10s  snapshot-controller  Snapshot default/ebs-volume-snapshot was successfully created by the CSI driver.
  Normal  SnapshotReady    105s   snapshot-controller  Snapshot default/ebs-volume-snapshot is ready to use.

Summary: What I needed to change:

  1. Using latest helm chart 2.6.8 with csi-snapshotter image v6.0.1 (instead of v3.0.3 which is the default). The managed EKS addon does not seem to work as I cannot customize the image thats being used for the CSI Snapshotter sidecar container.
  2. Manually added verb "patch" to ClusterRole ebs-external-snapshotter-role

Step by Step

1. My helm values file myvalues.yaml :

controller:
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: ARN OF THE IAM ROLE - See https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html

sidecars:
  snapshotter:
    image:
      tag: "v6.0.1"

2. Helm Installation

helm upgrade --install aws-ebs-csi-driver     --namespace kube-system     --version 2.6.8     -f myvalues.yaml     aws-ebs-csi-driver/aws-ebs-csi-driver

3. ClusterRole ebs-external-snapshotter-role

Due to error

  Type     Reason                  Age                From                             Message
  ----     ------                  ----               ----                             -------
  Warning  SnapshotCreationFailed  27m (x4 over 27m)  csi-snapshotter ebs.csi.aws.com  Failed to create snapshot: failed to add VolumeSnapshotBeingCreated annotation on the content snapcontent-cfca045d-559b-4075-9200-ba6270fe59e2: "snapshot controller failed to update snapcontent-cfca045d-559b-4075-9200-ba6270fe59e2 on API server: volumesnapshotcontents.snapshot.storage.k8s.io \"snapcontent-cfca045d-559b-4075-9200-ba6270fe59e2\" is forbidden: User \"system:serviceaccount:kube-system:ebs-csi-controller-sa\" cannot patch resource \"volumesnapshotcontents\" in API group \"snapshot.storage.k8s.io\" at the cluster scope"

I added the verb patch to resource volumesnapshotcontents - also see #1243

- apiGroups:
  - snapshot.storage.k8s.io
  resources:
  - volumesnapshotcontents
  verbs:
  - create
  - get
  - list
  - watch
  - update
  - delete
  - patch

4. Then I had to restart the pods ebs-csi-controller-* in namespace kube-system in order to reflect the ClusterRole change as quick as possible.

Appendix

In case you want the helm release exactly look like the EKS managed addon (except 1. aws-ebs-csi-driver 1.6.2 instead of 1.6.1 and 2. csi-snapshotter 6.0.1 instead of 3.0.3) , you can use these helm values.

image:
  repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/aws-ebs-csi-driver

controller:
  k8sTagClusterId: <CLUSTER_ID>
  region: <REGION>
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: <ROLE_ARN>
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - ebs-csi-controller
          topologyKey: kubernetes.io/hostname
        weight: 100
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
  resources:
    limits:
      cpu: 100m
      memory: 128Mi
    requests:
      cpu: 10m
      memory: 40Mi

node:
  resources:
    limits:
      cpu: 100m
      memory: 128Mi
    requests:
      cpu: 10m
      memory: 40Mi


sidecars:
  provisioner:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-provisioner
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  attacher:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-attacher
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  snapshotter:
    image:
      #repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-snapshotter
      repository: k8s.gcr.io/sig-storage/csi-snapshotter
      tag: "v6.0.1"
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  resizer:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-resizer
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  livenessProbe:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/livenessprobe
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 40Mi
  nodeDriverRegistrar:
    image:
      repository: 602401143452.dkr.ecr.<REGION>.amazonaws.com/eks/csi-node-driver-registrar

@J1a-wei
Copy link

J1a-wei commented May 31, 2022

I try this, it works well.

apply CRD old version. v6.0.0-rc4

csi-snapshotter image use v3.0.3.
I installed driver by helm. chart version 2.6.8
origin_img_v2_55418a8f-612b-43be-bc0a-629c7cacbf6h

@keerthana1608 @gtxu

@keerthana1608
Copy link
Author

Updated to latest snapshotter sidecars and added verb patch to resource volumesnaphsotcontents.
It works fine.

torredil added a commit to torredil/aws-ebs-csi-driver that referenced this issue Jun 1, 2022
torredil added a commit to torredil/aws-ebs-csi-driver that referenced this issue Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants