Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't start v1 docker image "panic: runtime error: invalid memory address or nil pointer dereference" #876

Closed
lens0021 opened this issue May 9, 2021 · 5 comments · Fixed by #897
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@lens0021
Copy link

lens0021 commented May 9, 2021

/kind bug

What happened?

amazon/aws-ebs-csi-driver:v1.0.0 and amazon/aws-ebs-csi-driver:v1.0.0-amazonlinux docker image cannot start.

$ docker run --rm amazon/aws-ebs-csi-driver:v1.0.0
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x12a42e8]

goroutine 1 [running]:
k8s.io/client-go/kubernetes.NewForConfig(0x0, 0x1b52201, 0x4000523c80, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/vendor/k8s.io/client-go/kubernetes/clientset.go:395 +0x28
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud.NewMetadata(0x5, 0x0, 0x0, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/metadata.go:84 +0xb4
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newControllerService(0x400007cfa0, 0x400057fb60, 0x0, 0x4000405540, 0x40004afe08)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/controller.go:81 +0x150
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver(0x40004aff30, 0x7, 0x7, 0x400057f980, 0x28e3a00, 0x1b522e0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:96 +0x270
main.main()
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x190
$ docker run --rm amazon/aws-ebs-csi-driver:v1.0.0-amazonlinux
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x12a42e8]

goroutine 1 [running]:
k8s.io/client-go/kubernetes.NewForConfig(0x0, 0x1b52201, 0x400003bd40, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/vendor/k8s.io/client-go/kubernetes/clientset.go:395 +0x28
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud.NewMetadata(0x5, 0x0, 0x0, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/metadata.go:84 +0xb4
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newControllerService(0x40000bc550, 0x400057fb60, 0x0, 0x40000432f0, 0x40001cbe08)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/controller.go:81 +0x150
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver(0x40001cbf30, 0x7, 0x7, 0x400057f9e0, 0x28e3a00, 0x1b522e0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:96 +0x270
main.main()
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x190

Adding -e "AWS_REGION=xx-xxxx-x" does not help.

What you expected to happen?

I expected it starts like v0.10.1:

$ docker run --rm amazon/aws-ebs-csi-driver:v0.10.1
I0509 03:00:01.501769       1 driver.go:71] Driver: ebs.csi.aws.com Version: v0.10.1
W0509 03:00:04.749582       1 metadata.go:136] Failed to parse the outpost arn: 
W0509 03:00:07.993771       1 metadata.go:136] Failed to parse the outpost arn: 
I0509 03:00:08.001284       1 driver.go:141] Listening for connections on address: &net.UnixAddr{Name:"/tmp/csi.sock", Net:"unix"}

How to reproduce it (as minimally and precisely as possible)?

  1. Launch an EC2 instance (In this case, amazonlinux2 on t4g.micro, full terraform code)
  2. Install and start Docker as described in the Developer Guide.
  3. Pull the docker image: docker pull amazon/aws-ebs-csi-driver:v1.0.0
  4. Execute: docker run amazon/aws-ebs-csi-driver:v1.0.0

Environment

  • Driver version: v1.0.0
  • Docker version:
    • Client: 20.10.4
    • Server: 20.10.4
    • containerd: 1.4.4
  • uname -a: Linux ip-172-31-26-55.ap-northeast-1.compute.internal 4.14.231-173.360.amzn2.aarch64 ♯1 SMP Mon Apr 19 23:20:36 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
  • AMI name: amzn2-ami-minimal-hvm-2.0.20210421.0-arm64-ebs
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 9, 2021
@AndyXiangLi
Copy link
Contributor

Hi @lens0021 , For v1.0.0 we add a new serviceaccount for the driver, Can you confirm you created, clusterrole, clusterrolebinding and serviceaccount in your cluster?

@AndyXiangLi AndyXiangLi added triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 10, 2021
@lens0021
Copy link
Author

Oh, I didn't. Before I tested with 'docker run', I had tried to run the driver using Hashicorp Nomad as described in here and failed.
Then, is it fine and expexted that there is no way to run the driver with docker command directly since v1.0.0? If so, I would close this issue.

@tgross
Copy link

tgross commented May 14, 2021

Hi @AndyXiangLi! I'm one of the developers of Nomad. I just took a pass through the section of the code in question and it looks like when the InClusterConfig call is being made, we're getting a nil config back and the ErrNotInCluster error that gets returned is being ignored. If the ErrNotInCluster was being handled, that would avoid the panic but it looks like the following sections of the code rely on it being there.

Making the query of the EC2 metadata service strictly dependent on being in a K8s cluster is unfortunate. Any chance that could be made opt-in when k8s is available but use a typical HTTP scrape of the EC2 metadata endpoint when it's not?

@AndyXiangLi
Copy link
Contributor

Hi @tgross, the reason we making this change is avoiding expose instance metadata on pod level and we also suggest our customer to disable instance metadata service on their cluster for the security concern. But what you mentioned makes sense, driver should be working if instance metadata service is available but clusterConfig is not. NewMetadata requires k8 config, if we set k8 clientset when Instance metadata is not available, it should solve this issue. @wongma7 @vdhanan What do you think?

@AndyXiangLi AndyXiangLi added kind/bug Categorizes issue or PR as related to a bug. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels May 17, 2021
@tgross
Copy link

tgross commented May 18, 2021

NewMetadata requires k8 config, if we set k8 clientset when Instance metadata is not available, it should solve this issue

That will solve the crash. It would also need to have a no-op for the call to clientset.CoreV1().Nodes().Get() so that it doesn't just error-out later instead of panicking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants