Can't start v1 docker image "panic: runtime error: invalid memory address or nil pointer dereference" #876

lens0021 · 2021-05-09T03:26:08Z

/kind bug

What happened?

amazon/aws-ebs-csi-driver:v1.0.0 and amazon/aws-ebs-csi-driver:v1.0.0-amazonlinux docker image cannot start.

$ docker run --rm amazon/aws-ebs-csi-driver:v1.0.0
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x12a42e8]

goroutine 1 [running]:
k8s.io/client-go/kubernetes.NewForConfig(0x0, 0x1b52201, 0x4000523c80, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/vendor/k8s.io/client-go/kubernetes/clientset.go:395 +0x28
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud.NewMetadata(0x5, 0x0, 0x0, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/metadata.go:84 +0xb4
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newControllerService(0x400007cfa0, 0x400057fb60, 0x0, 0x4000405540, 0x40004afe08)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/controller.go:81 +0x150
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver(0x40004aff30, 0x7, 0x7, 0x400057f980, 0x28e3a00, 0x1b522e0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:96 +0x270
main.main()
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x190

$ docker run --rm amazon/aws-ebs-csi-driver:v1.0.0-amazonlinux
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x12a42e8]

goroutine 1 [running]:
k8s.io/client-go/kubernetes.NewForConfig(0x0, 0x1b52201, 0x400003bd40, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/vendor/k8s.io/client-go/kubernetes/clientset.go:395 +0x28
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud.NewMetadata(0x5, 0x0, 0x0, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/metadata.go:84 +0xb4
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newControllerService(0x40000bc550, 0x400057fb60, 0x0, 0x40000432f0, 0x40001cbe08)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/controller.go:81 +0x150
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver(0x40001cbf30, 0x7, 0x7, 0x400057f9e0, 0x28e3a00, 0x1b522e0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:96 +0x270
main.main()
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x190

Adding -e "AWS_REGION=xx-xxxx-x" does not help.

What you expected to happen?

I expected it starts like v0.10.1:

$ docker run --rm amazon/aws-ebs-csi-driver:v0.10.1
I0509 03:00:01.501769       1 driver.go:71] Driver: ebs.csi.aws.com Version: v0.10.1
W0509 03:00:04.749582       1 metadata.go:136] Failed to parse the outpost arn: 
W0509 03:00:07.993771       1 metadata.go:136] Failed to parse the outpost arn: 
I0509 03:00:08.001284       1 driver.go:141] Listening for connections on address: &net.UnixAddr{Name:"/tmp/csi.sock", Net:"unix"}

How to reproduce it (as minimally and precisely as possible)?

Launch an EC2 instance (In this case, amazonlinux2 on t4g.micro, full terraform code)
Install and start Docker as described in the Developer Guide.
Pull the docker image: docker pull amazon/aws-ebs-csi-driver:v1.0.0
Execute: docker run amazon/aws-ebs-csi-driver:v1.0.0

Environment

Driver version: v1.0.0
Docker version:
- Client: 20.10.4
- Server: 20.10.4
- containerd: 1.4.4
uname -a: Linux ip-172-31-26-55.ap-northeast-1.compute.internal 4.14.231-173.360.amzn2.aarch64 ♯1 SMP Mon Apr 19 23:20:36 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
AMI name: amzn2-ami-minimal-hvm-2.0.20210421.0-arm64-ebs

The text was updated successfully, but these errors were encountered:

AndyXiangLi · 2021-05-10T17:23:36Z

Hi @lens0021 , For v1.0.0 we add a new serviceaccount for the driver, Can you confirm you created, clusterrole, clusterrolebinding and serviceaccount in your cluster?

lens0021 · 2021-05-11T01:39:47Z

Oh, I didn't. Before I tested with 'docker run', I had tried to run the driver using Hashicorp Nomad as described in here and failed.
Then, is it fine and expexted that there is no way to run the driver with docker command directly since v1.0.0? If so, I would close this issue.

tgross · 2021-05-14T15:36:15Z

Hi @AndyXiangLi! I'm one of the developers of Nomad. I just took a pass through the section of the code in question and it looks like when the InClusterConfig call is being made, we're getting a nil config back and the ErrNotInCluster error that gets returned is being ignored. If the ErrNotInCluster was being handled, that would avoid the panic but it looks like the following sections of the code rely on it being there.

Making the query of the EC2 metadata service strictly dependent on being in a K8s cluster is unfortunate. Any chance that could be made opt-in when k8s is available but use a typical HTTP scrape of the EC2 metadata endpoint when it's not?

AndyXiangLi · 2021-05-17T18:31:24Z

Hi @tgross, the reason we making this change is avoiding expose instance metadata on pod level and we also suggest our customer to disable instance metadata service on their cluster for the security concern. But what you mentioned makes sense, driver should be working if instance metadata service is available but clusterConfig is not. NewMetadata requires k8 config, if we set k8 clientset when Instance metadata is not available, it should solve this issue. @wongma7 @vdhanan What do you think?

tgross · 2021-05-18T14:52:35Z

NewMetadata requires k8 config, if we set k8 clientset when Instance metadata is not available, it should solve this issue

That will solve the crash. It would also need to have a no-op for the call to clientset.CoreV1().Nodes().Get() so that it doesn't just error-out later instead of panicking.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 9, 2021

AndyXiangLi added triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 10, 2021

AndyXiangLi assigned vdhanan May 17, 2021

AndyXiangLi added kind/bug Categorizes issue or PR as related to a bug. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels May 17, 2021

chrisayoub mentioned this issue May 20, 2021

Only initialize the in-cluster kube client when metadata service is actually unavailable #897

Merged

wongma7 closed this as completed in #897 May 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't start v1 docker image "panic: runtime error: invalid memory address or nil pointer dereference" #876

Can't start v1 docker image "panic: runtime error: invalid memory address or nil pointer dereference" #876

lens0021 commented May 9, 2021 •

edited

Loading

AndyXiangLi commented May 10, 2021

lens0021 commented May 11, 2021

tgross commented May 14, 2021 •

edited

Loading

AndyXiangLi commented May 17, 2021

tgross commented May 18, 2021

Can't start v1 docker image "panic: runtime error: invalid memory address or nil pointer dereference" #876

Can't start v1 docker image "panic: runtime error: invalid memory address or nil pointer dereference" #876

Comments

lens0021 commented May 9, 2021 • edited Loading

AndyXiangLi commented May 10, 2021

lens0021 commented May 11, 2021

tgross commented May 14, 2021 • edited Loading

AndyXiangLi commented May 17, 2021

tgross commented May 18, 2021

lens0021 commented May 9, 2021 •

edited

Loading

tgross commented May 14, 2021 •

edited

Loading