Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to deploy driver - Failed getting project and zone #490

Closed
prashantokochavara opened this issue Apr 17, 2020 · 23 comments
Closed

Unable to deploy driver - Failed getting project and zone #490

prashantokochavara opened this issue Apr 17, 2020 · 23 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@prashantokochavara
Copy link

I've double checked all credentials, but not sure why I keep hitting this issue for any version I deploy - stable or alpha.
2020-04-17_12-33-44

Any idea what could be going wrong here?

@msau42
Copy link
Contributor

msau42 commented Apr 17, 2020

Which version of the driver are you using?

Also, the error message is cut off, could you paste the entire "Failed to get cloud provider" error?

@msau42
Copy link
Contributor

msau42 commented Apr 17, 2020

I see you are using v0.7.0.

@prashantokochavara
Copy link
Author

prashantokochavara commented Apr 17, 2020

correct - installing the alpha version for snapshot feature support.
Happens with stable, dev, alpha all versions 0.5.0 - 0.7.0 though.

Full output..

I0417 16:33:29.986931 1 main.go:67] Driver vendor version v0.7.0-gke.0
I0417 16:33:29.987006 1 gce.go:80] Using GCE provider config
I0417 16:33:29.987166 1 gce.go:125] GOOGLE_APPLICATION_CREDENTIALS env var set /etc/cloud-sa/cloud-sa.json
I0417 16:33:29.987176 1 gce.go:129] Using DefaultTokenSource &oauth2.reuseTokenSource{new:jwt.jwtSource{ctx:(*context.cancelCtx)(0xc000296300), conf:(*jwt.Config)(0xc000118780)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
F0417 16:33:31.212427 1 main.go:83] Failed to get cloud provider: Failed getting Project and Zone: Get http://169.254.169.254/computeMetadata/v1/instance/zone: dial tcp 169.254.169.254:80: connect: connection refused

@msau42
Copy link
Contributor

msau42 commented Apr 17, 2020

Do you have hostNetwork: true set in the DaemonSet spec?

@prashantokochavara
Copy link
Author

Nope, I do not. Do I need to add that in the spec somewhere?

@msau42
Copy link
Contributor

msau42 commented Apr 17, 2020

Yes see

@prashantokochavara
Copy link
Author

I tried that and still facing the same issue.
This is an OpenShift environment that I am working with - 4.3 with 1.16 K8. Any supportability issues?

@msau42
Copy link
Contributor

msau42 commented Apr 17, 2020

@gnufied @jsafrane are you aware of any configuration that needs to be done in openshift to access the GCP metadata server?

@jsafrane
Copy link
Contributor

Not sure what's GCP metadata server... Is it the link-local address to get VM metadata? The DaemonSet pods must use hostNetwork: true.

OpenShift does not allow random pods to get to VM metadata, we used to put some sensitive material (don't remember exactly, some certificates?).

@jsafrane
Copy link
Contributor

There is nothing else OpenShift specific...

@msau42
Copy link
Contributor

msau42 commented Apr 20, 2020

Yes I mean link local address to get vm metadata: 169.254.169.254:80: connect: connection refused

@prashantokochavara
Copy link
Author

@msau42 have we been able to confirm that non-OCP environments are not having this issue?

@msau42
Copy link
Contributor

msau42 commented Apr 20, 2020

Yes, we have CI running successfully in kubetest GCP environment:
https://k8s-testgrid.appspot.com/provider-gcp-compute-persistent-disk-csi-driver#Kubernetes%20Master%20Driver%20Latest

@gnufied @jsafrane are you able to run the pd driver in your ocp environment?

@jsafrane
Copy link
Contributor

Yes, I am able to run e2e tests with manifests from https://github.com/kubernetes/kubernetes/tree/master/test/e2e/testing-manifests/storage-csi/gce-pd on GCP.

1 similar comment
@jsafrane
Copy link
Contributor

Yes, I am able to run e2e tests with manifests from https://github.com/kubernetes/kubernetes/tree/master/test/e2e/testing-manifests/storage-csi/gce-pd on GCP.

@prashantokochavara
Copy link
Author

I'm reading another issue but from an AWS project with similar issues being faced..
aws/aws-node-termination-handler#21

Also, I am hitting similar metadata issues when using EC2 with OCP and EBS CSI Driver.
Is there a common codepath between the two CSI drivers by any chance?

@prashantokochavara
Copy link
Author

@msau42
kubernetes-sigs/aws-ebs-csi-driver#474 (comment)
Could the GCP Driver be hitting the same thing?

@msau42
Copy link
Contributor

msau42 commented Apr 27, 2020

Are you trying to run the controller on a node that doesn't have access to the metadata service?

@msau42
Copy link
Contributor

msau42 commented Apr 27, 2020

There is no common code path between the two drivers, but the ideas are similar. They both require access to the metadata service in order to get project/zone information of the cluster they're running in.

There was work being done in both drivers to remove this requirement and allow controllers to be run outside of the Kubernetes cluster. But it requires additional arguments to be passed into the driver, and is not the normal case.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 25, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants