Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create volume with full-private cluster #889

Closed
yufan022 opened this issue May 17, 2021 · 6 comments
Closed

Can't create volume with full-private cluster #889

yufan022 opened this issue May 17, 2021 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@yufan022
Copy link

yufan022 commented May 17, 2021

/kind bug

What happened?
EBS CSI can't create volume with full-private EKS cluster.
Refer docs: https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html

I got some error logs:

I0517 03:09:50.566279       1 driver.go:72] Driver: ebs.csi.aws.com Version: v1.0.0
I0517 03:09:50.566574       1 driver.go:142] Listening for connections on address: &net.UnixAddr{Name:"/var/lib/csi/sockets/pluginproxy/csi.sock", Net:"unix"}
I0517 03:09:50.694431       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0517 03:09:50.801730       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0517 03:09:50.802216       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0517 03:09:50.930425       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0517 03:09:51.218834       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0517 03:10:58.144863       1 controller.go:101] CreateVolume: called with args {Name:pvc-49a940b3-2063-4a9a-b2d4-27b6500ca908 CapacityRange:required_bytes:4294967296  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > > preferred:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0517 03:11:08.144104       1 driver.go:119] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context canceled
I0517 03:11:09.144978       1 controller.go:101] CreateVolume: called with args {Name:pvc-49a940b3-2063-4a9a-b2d4-27b6500ca908 CapacityRange:required_bytes:4294967296  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > > preferred:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0517 03:11:19.144697       1 driver.go:119] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context canceled
[root@ip-10-0-1-46 dynamic-provisioning]# kubectl get pvc
NAME        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ebs-claim   Pending                                      ebs-sc         10m
[root@ip-10-0-1-46 dynamic-provisioning]# kubectl get pods
NAME   READY   STATUS    RESTARTS   AGE
app    0/1     Pending   0          10m

my eks yaml:

privateCluster:
  enabled: true
  additionalEndpointServices:
  - "autoscaling"
  - "logs"
  - "cloudformation"

Default VPC endpoint created by eksctl
image

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?:

Environment

  • Kubernetes version (use kubectl version):
[root@ip-10-0-1-46 dynamic-provisioning]# kubectl version
Client Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:21:03Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.8-eks-96780e", GitCommit:"96780e1b30acbf0a52c38b6030d7853e575bcdf3", GitTreeState:"clean", BuildDate:"2021-03-10T21:32:29Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
  • Driver version:
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 17, 2021
@AndyXiangLi
Copy link
Contributor

Hi @yufan022, there could be multiple reasons cause this issue, can you enable debug mode and restart driver? Basically add --aws-sdk-debug-log=true in the controller manifests. For more info

@AndyXiangLi AndyXiangLi added the triage/needs-information Indicates an issue needs more information in order to work on it. label May 18, 2021
@yufan022
Copy link
Author

yufan022 commented May 19, 2021

[root@ip-10-0-1-46 dynamic-provisioning]# kubectl logs -n kube-system ebs-csi-controller-6d66756ccf-r4wls ebs-plugin
I0519 07:19:00.440150       1 driver.go:72] Driver: ebs.csi.aws.com Version: v1.0.0
I0519 07:19:00.440411       1 driver.go:142] Listening for connections on address: &net.UnixAddr{Name:"/var/lib/csi/sockets/pluginproxy/csi.sock", Net:"unix"}
I0519 07:19:00.561712       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0519 07:19:00.649988       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0519 07:19:00.650450       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0519 07:19:00.755134       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0519 07:19:01.030537       1 controller.go:360] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0519 07:19:35.629872       1 controller.go:101] CreateVolume: called with args {Name:pvc-6a22713f-8ad9-40c2-9b36-25e7b29175fd CapacityRange:required_bytes:4294967296  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > > preferred:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
2021/05/19 07:19:35 DEBUG: Request sts/AssumeRoleWithWebIdentity Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: sts.amazonaws.com
User-Agent: aws-sdk-go/1.35.37 (go1.15.6; linux; amd64)
Content-Length: 1278
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Accept-Encoding: gzip


-----------------------------------------------------
2021/05/19 07:19:45 DEBUG: Sign Request ec2/DescribeVolumes failed, not retrying, error RequestCanceled: request context canceled
caused by: context canceled
E0519 07:19:45.628809       1 driver.go:119] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context canceled
I0519 07:19:46.629698       1 controller.go:101] CreateVolume: called with args {Name:pvc-6a22713f-8ad9-40c2-9b36-25e7b29175fd CapacityRange:required_bytes:4294967296  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > > preferred:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0519 07:19:56.629468       1 driver.go:119] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context deadline exceeded
2021/05/19 07:19:56 DEBUG: Sign Request ec2/DescribeVolumes failed, not retrying, error RequestCanceled: request context canceled
caused by: context deadline exceeded
I0519 07:19:58.630290       1 controller.go:101] CreateVolume: called with args {Name:pvc-6a22713f-8ad9-40c2-9b36-25e7b29175fd CapacityRange:required_bytes:4294967296  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > > preferred:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"ap-northeast-1c" > segments:<key:"topology.kubernetes.io/zone" value:"ap-northeast-1c" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
2021/05/19 07:20:05 DEBUG: Send Request sts/AssumeRoleWithWebIdentity failed, attempt 0/8, error RequestError: send request failed
caused by: Post "https://sts.amazonaws.com/": dial tcp 54.239.29.25:443: i/o timeout
2021/05/19 07:20:05 DEBUG: Request sts/AssumeRoleWithWebIdentity Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST / HTTP/1.1
Host: sts.amazonaws.com
User-Agent: aws-sdk-go/1.35.37 (go1.15.6; linux; amd64)
Content-Length: 1278
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Accept-Encoding: gzip


-----------------------------------------------------
2021/05/19 07:20:08 DEBUG: Sign Request ec2/DescribeVolumes failed, not retrying, error RequestCanceled: request context canceled
caused by: context canceled
E0519 07:20:08.630023       1 driver.go:119] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context canceled

@AndyXiangLi I guess this error is caused by full-private clusters. I have tried to deploy in a public subnet cluster, and CSI is worked.

image

These endpoints are created by EKSCTL.

@xposix
Copy link

xposix commented May 19, 2021

I'm about to configure a cluster in similar manner, so I'm interested in the solution to this.

@yufan022
Copy link
Author

region: "ap-northeast-1"

controller:
  extraVars:
    AWS_STS_REGIONAL_ENDPOINTS: regional

@AndyXiangLi
Found it here, Use these environment variables in chart values, It works. Thanks for the help.

@xposix It's already supported in this pr

@AndyXiangLi
Copy link
Contributor

Thanks, I will close this issue then
/close

@k8s-ci-robot
Copy link
Contributor

@AndyXiangLi: Closing this issue.

In response to this:

Thanks, I will close this issue then
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants