Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBS CSI Driver does not work on arm64 based instances #604

Closed
gkrizek opened this issue Nov 3, 2020 · 26 comments · Fixed by #707
Closed

EBS CSI Driver does not work on arm64 based instances #604

gkrizek opened this issue Nov 3, 2020 · 26 comments · Fixed by #707
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@gkrizek
Copy link

gkrizek commented Nov 3, 2020

/kind bug

What happened?

arm support was added in #527 for the driver Docker images. However, when deploying the EBS CSI from this repo it references Docker images that are not multiarch. Thus the deployment gets stuck in a crash loop. So ARM still won't work.

These images:
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/963eccfc404f56540f025a93fd4966fa6a73458f/deploy/kubernetes/overlays/stable/kustomization.yaml

As well as snapshotted and resize. The ones from qauy.io and not multi arch.

What you expected to happen?

A successful deployment of the EBS CSI Driver on ARM based servers like m6g.

How to reproduce it (as minimally and precisely as possible)?

Try to deploy the EBS CSI on ARM based systems.

Anything else we need to know?:

Environment

  • Kubernetes version (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-eks-7c9bda", GitCommit:"7c9bda52c425d0d56d7b93f1377a826b4132c05c", GitTreeState:"clean", BuildDate:"2020-08-28T23:04:33Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
  • Driver version:
    v0.7.0
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 3, 2020
@gkrizek
Copy link
Author

gkrizek commented Nov 3, 2020

Some Kubernetes output:

kube-system                             ebs-csi-controller-6ccc47b87c-9wgdz                      1/6     CrashLoopBackOff   29         3m36s
kube-system                             ebs-csi-node-sz75j                                       1/3     CrashLoopBackOff   18         8m8s
$ kubectl logs ebs-csi-controller-6ccc47b87c-9wgdz -n kube-system -c liveness-probe
standard_init_linux.go:211: exec user process caused "exec format error"

@Chili-Man
Copy link

this would be great as we want to run this on the gravitron ec2 instances

@gkrizek
Copy link
Author

gkrizek commented Dec 2, 2020

Totally. I rebuilt all the necessary images myself with ARM support to get them to work with Graviton2. But they're in my private ECR so I don't have a link to share

@StevenLou
Copy link

Any update for this issue?

@Chili-Man
Copy link

Chili-Man commented Dec 9, 2020

looks like with the recent commits, they are now publish ARM container images on DockerHub: https://hub.docker.com/r/amazon/aws-ebs-csi-driver/tags?page=1&ordering=last_updated

@gkrizek
Copy link
Author

gkrizek commented Dec 9, 2020

The EBS CSI images are, but that's not the issue here. It's the referenced Kubernetes Docker images that get deployed with this CSI that are not.

@Chili-Man
Copy link

Ahh, that's right; all of the images sourced from Quay.io do not have ARM64 images built. However, it does appear that newer versions of the CSI images do have ARM64 images; it appears that AWS should be upgrading their CSI dependencies to something more recent.

@ayberk
Copy link
Contributor

ayberk commented Dec 9, 2020

We have a backlog item on this, but please thumbs up this issue if this is affecting you. We take those into account while prioritizing :)

@ayberk
Copy link
Contributor

ayberk commented Dec 11, 2020

We've merged a new kustomization overlay with multiarch images: #653. Unfortunately some of the images are pulled from docker hub, but it should unblock you for now.

My initial tests were successful, please let me know if you encounter any issues.

@shrivastavshubham34
Copy link

I'm sorry, but I'm not sure if I'm supposed to pull in v0.8 or v0.7 of k8s.gcr.io/provider-aws/aws-ebs-csi-driver?
Does v0.7.0 work with Multiarch?

@ayberk
Copy link
Contributor

ayberk commented Dec 20, 2020

@shrivastavshubham34 We're currently having issues with our v0.8, so I'd suggest using the arm overlay with v0.7 image for now.

@shrivastavshubham34
Copy link

    image_name: 'k8s.gcr.io/provider-aws/aws-ebs-csi-driver'
    image_tag: 'v0.7.0'

@ayberk It's still giving me

standard_init_linux.go:219: exec user process caused: exec format error

@ayberk
Copy link
Contributor

ayberk commented Dec 21, 2020

@shrivastavshubham34 Yeah I was able to isolate the issue this morning and it seems like for some reason the images are not being pushed as multiarch to gcr.

For example I was able to run amazon/aws-ebs-csi-driver:v0.8.0-amazonlinux and amazon/aws-ebs-csi-driver:v0.7.1 on my graviton instance. You can pull them from dockerhub. Alternatively you can use 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/aws-ebs-csi-driver:v0.7.1 if you prefer ECR.

I'll try to push v0.8.0 image to ECR today.

@shrivastavshubham34
Copy link

@ayberk doesn't work for ECR and amazon/aws-ebs-csi-driver:v0.7.1 for me.

helm upgrade --install aws-ebs-csi-driver \
    --namespace kube-system \
    --set enableVolumeScheduling=true \
    --set enableVolumeResizing=true \
    --set enableVolumeSnapshot=true \
    --set image.repository='amazon/aws-ebs-csi-driver' \
    --set image.tag='v0.7.1' \
    aws-ebs-csi-driver/aws-ebs-csi-driver

@ayberk
Copy link
Contributor

ayberk commented Dec 21, 2020

Unfortunately you won't be able to install on arm using the helm chart, because the sidecar images we're pulling aren't multiarch. You'd need to use the arm kustomization overlay.

@shrivastavshubham34
Copy link

@ayberk Ohh, now the earlier comments make sense.
Still, the given ECR and Dockerhub images don't work for me for EKS though

kubectl apply -k aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/arm64/
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../../base
images:
 - name: amazon/aws-ebs-csi-driver
   newTag: v0.7.1
 - name: quay.io/k8scsi/csi-provisioner
   newName: raspbernetes/csi-external-provisioner
   newTag: "1.6.0"
 - name: quay.io/k8scsi/csi-attacher
   newName: raspbernetes/csi-external-attacher
   newTag: "2.2.0"
 - name: quay.io/k8scsi/livenessprobe
   newName: k8s.gcr.io/sig-storage/livenessprobe
   newTag: "v2.1.0"
 - name: quay.io/k8scsi/csi-node-driver-registrar
   newName: raspbernetes/csi-node-driver-registrar
   newTag: "1.3.0"

@shrivastavshubham34
Copy link

shrivastavshubham34 commented Dec 26, 2020

Unfortunately,

kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-0.8"

Gives the same error for amazon-eks-arm64-node-1.18-v20201211 (ami-03b83573d40dfcd0d)

standard_init_linux.go:219: exec user process caused: exec format error
ebs-csi-controller-7bbc66476-7p6g8   0/4     CrashLoopBackOff   28         11m
ebs-csi-controller-7bbc66476-tn8n9   0/4     CrashLoopBackOff   28         11m
ebs-csi-node-c9nnf                   0/3     CrashLoopBackOff   21         11m
ebs-csi-node-h4xlp                   0/3     CrashLoopBackOff   21         11m
ebs-csi-node-jt2f2                   0/3     CrashLoopBackOff   21         11m

Am I doing something wrong? Is there a forum or a Slack channel where I can place my query?

@darewreck54
Copy link

is there any updates on this?

@ajaykmis
Copy link

@ayberk I followed this thread and installed the EBS CSI driver, but the snapshot-controller is still failing for me. On checking the overlays code, I don't see a replacement image for snapshot-contoller in here: "https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/deploy/kubernetes/overlays/stable/arm64/kustomization.yaml"
could you please suggest if we are supposed to find the other images ourselves?

kubectl get pod -n kube-system | grep ebs
ebs-csi-controller-85c48b7d6d-d6r2m 4/4 Running 0 19h
ebs-csi-controller-85c48b7d6d-qphx9 4/4 Running 0 19h
ebs-csi-node-4k84n 3/3 Running 0 19h
ebs-csi-node-6bzzk 3/3 Running 0 19h
ebs-csi-node-9mcx9 3/3 Running 0 19h
ebs-csi-node-f4gt6 3/3 Running 0 19h
ebs-csi-node-ffqn7 3/3 Running 0 19h
ebs-csi-node-hpm7j 3/3 Running 0 19h
ebs-csi-node-l8pcx 3/3 Running 0 19h
ebs-csi-node-pdr27 3/3 Running 0 19h
ebs-csi-node-q8v7n 3/3 Running 0 19h
ebs-csi-node-qs28g 3/3 Running 0 19h
ebs-snapshot-controller-0 0/1 CrashLoopBackOff 2 46s

kubectl logs ebs-snapshot-controller-0 -n kube-system
standard_init_linux.go:219: exec user process caused: exec format error

@ajaykmis
Copy link

Hi,

So, I realized the arm64 overlay doesn't include snapshot-controller. I tried to do something similar to alpha overlay, by replacing images of snapshot-controller with arm64 variant.
This is the one I am using that seems to start for me: "csiplugin/snapshot-controller:v2.1.0"

But the snapshotting itself doesn't seem to work.

@ayberk
Copy link
Contributor

ayberk commented Jan 27, 2021

We have an issue to remove the snapshotter as technically we shouldn't really deploy it with the driver. We're working on updating the sidecar images so we won't need the arm overlay soon.

@darewreck54
Copy link

darewreck54 commented Jan 27, 2021 via email

@ayberk
Copy link
Contributor

ayberk commented Jan 27, 2021

I don't unfortunately. But I know this has been a lingering issue and it's at the top of my list as soon as I have some cycles to work on the driver.

@ayberk
Copy link
Contributor

ayberk commented Jan 27, 2021

@ajaykmis Can you elaborate on how it's failing? Any logs/yamls would be helpful.

@ajaykmis
Copy link

@ayberk :

So, using the ARM overlay, we were able to install the CSI driver that works and now the PVCs can be created using the CSI provisioner.

But, the snapshotting yamls weren't included in the arm overlay, so I just modified the alpha overlay and some yamls to include correct images for external-snapshotter.

Here's the diff:
alpha.txt

pod output: we can see all the pods started running fine, so image doesn't seem to be an issue.
$ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
aws-node-44gnv 1/1 Running 0 6m50s
aws-node-4dwhx 1/1 Running 0 6m51s
aws-node-6m6gx 1/1 Running 0 6m51s
aws-node-8lgkm 1/1 Running 0 6m51s
aws-node-cbgrl 1/1 Running 0 6m50s
aws-node-jqj2z 1/1 Running 1 10d
aws-node-sjq47 1/1 Running 0 6m51s
aws-node-t7g4l 1/1 Running 0 6m51s
aws-node-txk4m 1/1 Running 0 6m51s
aws-node-zm9s8 1/1 Running 0 6m51s
coredns-556765db45-7dtb2 1/1 Running 0 7d1h
coredns-556765db45-z797t 1/1 Running 0 7d1h
ebs-csi-controller-87fbf4d87-gjlfl 5/5 Running 0 5m8s
ebs-csi-controller-87fbf4d87-xnjr7 5/5 Running 0 5m8s
ebs-csi-node-26rtd 3/3 Running 0 5m8s
ebs-csi-node-4zl7b 3/3 Running 0 5m8s
ebs-csi-node-59577 3/3 Running 0 5m8s
ebs-csi-node-7f8wn 3/3 Running 0 5m8s
ebs-csi-node-8r8m5 3/3 Running 0 5m8s
ebs-csi-node-cf7mh 3/3 Running 0 5m8s
ebs-csi-node-fmfkk 3/3 Running 0 5m8s
ebs-csi-node-mfgrb 3/3 Running 0 5m8s
ebs-csi-node-nv4vh 3/3 Running 0 5m8s
ebs-csi-node-vwr5j 3/3 Running 0 5m8s
ebs-snapshot-controller-0 1/1 Running 0 5m8s
kube-proxy-2tf8j 1/1 Running 0 6m50s
kube-proxy-5jc4q 1/1 Running 0 6m50s
kube-proxy-bvpc6 1/1 Running 0 10d
kube-proxy-crpl7 1/1 Running 0 6m50s
kube-proxy-ll9sc 1/1 Running 0 6m50s
kube-proxy-nx57l 1/1 Running 0 6m51s
kube-proxy-r7jnv 1/1 Running 0 6m50s
kube-proxy-t25r2 1/1 Running 0 6m50s
kube-proxy-xhktg 1/1 Running 0 6m50s
kube-proxy-zpdl7 1/1 Running 0 6m50s

Update: it's working fine now. I am not sure if I had something wrong. but I just tried it out again, and it seems to work fine. Thanks for the help!

@ayberk
Copy link
Contributor

ayberk commented Jan 28, 2021

Awesome, glad it worked! Hopefully we'll have this fixed soon so you won't need an overlay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants