OpenShift + Pipeline on GCP is broken #1742

vdemeester · 2019-12-12T15:17:26Z

Expected Behavior

Deploying Tekton Pipeline on an OpenShift cluster running in GCP should work.

Actual Behavior

Deploying Tekton Pipeline on an OpenShift cluster running in GCP should work.
With a OCP 4.2 cluster installed in GCP and RH OpenShift Piplenise Operator 0.8.0, I see the creation of a runtime object (TaskRun or PipelineRun) does not create any resources like pods. When checking the pipeline controller log it doesn't show anything. The controller is actually looping forever.

Quoting @bbrowning

I tracked this down some today and the problem is an infinite retry loop in the k8schain library used by Knative, Tekton, and various other projects. For some reason, in OpenShift on GCP, this library cannot contact the expected Google metadata server.

The main issue is with the "k8s.io/kubernetes/pkg/credentialprovider/gcp" import, and what is magically happening there, especially here. This metadat URL is being disallowed by OpenShift and thus this loops for ever (with backofff, but still).

Steps to Reproduce the Problem

Install OpenShift on GCP
Install tekton on it (using the OpenShift Pipelines operator or directly applying the release yaml)
Look at the controller and the resource not being created.

Additional Info

One easy way to fix it, would be to put the following magic import behind build tags (upstream in go-containerregistry)

	_ "k8s.io/kubernetes/pkg/credentialprovider/aws"
	_ "k8s.io/kubernetes/pkg/credentialprovider/azure"
	_ "k8s.io/kubernetes/pkg/credentialprovider/gcp"

/assign
/kind bug

The text was updated successfully, but these errors were encountered:

imjasonh · 2019-12-12T15:23:30Z

To summarize, so that I'm sure I understand the issue:

k8schain uses the credentialprovider/gcp magic import to fetch GCP creds from GCP metadata, in order to fetch image metadata (only from GCP?) to inject Tekton's entrypoint binary.
OpenShift-on-GCP blocks GCP metadata requests, so when k8schain is used it fails continuously, and doesn't surface that failure or fallback to anonymous.

Is that correct?

One solution would be to not use k8schain, but I'm not sure what we'd use instead to get necessary credentials to fetch image data. Using k8schain without the magic imports is also possible (as suggested in the above bug report), but this would presumably break auth for users who rely on it today.

cc @jonjohnsonjr

vdemeester · 2019-12-12T15:27:07Z

@imjasonh I think this is true even without the entrypoint magic as k8schain is also used in knative/pkg that we depend on for the controller.

Created google/go-containerregistry#630 upstream 👼

vdemeester · 2019-12-12T15:30:06Z

2. OpenShift-on-GCP blocks GCP metadata requests, so when `k8schain` is used it fails continuously, and doesn't surface that failure or fallback to anonymous.

Yes, https://github.com/kubernetes/kubernetes/blob/master/pkg/credentialprovider/gcp/metadata.go#L239 blocks (as it loop forever with backoff), and thus the rest of the code never get executed (and that means, for the controller, it is never ready to reconcile anything 😅 ).

imjasonh · 2019-12-12T15:50:31Z

Ah okay, so just importing the magic import causes the controller to block forever, when installed on OpenShift-on-GCP. Is that correct? These magic imports seem like more trouble than they're worth to be honest. 👿

Did this work until recently? AFAIK we've had an indirect dependency on the magic imports for quite a while.

vdemeester · 2019-12-12T16:00:33Z

Ah okay, so just importing the magic import causes the controller to block forever, when installed on OpenShift-on-GCP. Is that correct? These magic imports seem like more trouble than they're worth to be honest. imp

Did this work until recently? AFAIK we've had an indirect dependency on the magic imports for quite a while.

We only tried that recently on GCP so… I am guessing it never worked before. It is the same for Knative by the way. Yeah I am really not a huge fan of magic import and the use of init() in those credentials package. Having, at least, a way to disable those is a "best-effort" fix for now I think 👼

vdemeester · 2019-12-12T16:21:59Z

I can confirm that with master...vdemeester:k8schain-quick-fix and GOFLAGS="-tags=disable_gcp" … it works 👼

~/s/g/t/p/e/taskruns k8schain-quick-fix *2 λ kubectl create -f git-resource.yaml
taskrun.tekton.dev/git-resource-tag-dg5kt created
taskrun.tekton.dev/git-resource-branch-8hvqq created
ktaskrun.tekton.dev/git-resource-ref-vxscg created                                                                                                     
~/s/g/t/p/e/taskruns k8schain-quick-fix *2 λ kubectl get pods
NAME                                  READY   STATUS     RESTARTS   AGE
git-resource-branch-8hvqq-pod-hq2h9   0/2     Init:0/3   0          3s
git-resource-ref-vxscg-pod-flmqj      0/2     Init:0/3   0          3s
git-resource-tag-dg5kt-pod-zcl9j      0/2     Init:0/3   0          3s

bbrowning · 2019-12-12T16:53:56Z

@imjasonh This is not technically OpenShift specific either. We've had reports in Knative of other managed Kubernetes services on GCP hitting this same issue. Basically, anyone that can hit that metadata URL can gain credentials that a random user on a K8s cluster shouldn't necessarily be able to get. That's why OpenShift and other managed K8s distros block that metadata URL from pods in the cluster unless the pods are running with host networking.

vdemeester · 2019-12-13T09:30:40Z

upstream issue : kubernetes/kubernetes#86245

vdemeester · 2020-01-16T21:14:14Z

This can be consider complete as #1882 has been merged, so a build tag to set and we are good to go.
@bobcatfish should we close ?

vdemeester · 2020-01-17T08:20:45Z

As we are tracking that downstream and the required bump of go-containerregistry is in, I'll go ahead and close this one.

/close

tekton-robot · 2020-01-17T08:20:48Z

@vdemeester: Closing this issue.

In response to this:

As we are tracking that downstream and the required bump of go-containerregistry is in, I'll go ahead and close this one.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot assigned vdemeester Dec 12, 2019

tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 12, 2019

vdemeester mentioned this issue Dec 13, 2019

GCP credentialprovider loops forever in case of metadata url cannot be reached kubernetes/kubernetes#86245

Closed

vdemeester added this to the Pipelines 0.10 🐱 milestone Dec 13, 2019

nikhil-thomas mentioned this issue Jan 8, 2020

Pipeline with parameter input cannot be triggered from UI openshift/tektoncd-pipeline-operator#146

Closed

vdemeester mentioned this issue Jan 16, 2020

Bump go-containerregistry to latest version #1882

Merged

3 tasks

tekton-robot closed this as completed Jan 17, 2020

vdemeester mentioned this issue Jan 20, 2020

Add disable_gcp build tag to make it run on GCP 🏃 openshift/tektoncd-pipeline#300

Merged

aaronshurley mentioned this issue Jun 27, 2022

imgpkg on OpenShift + GCP doesn't work without IMGPKG_ENABLE_IAAS_AUTH=false carvel-dev/kapp-controller#768

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenShift + Pipeline on GCP is broken #1742

OpenShift + Pipeline on GCP is broken #1742

vdemeester commented Dec 12, 2019

imjasonh commented Dec 12, 2019

vdemeester commented Dec 12, 2019

vdemeester commented Dec 12, 2019

imjasonh commented Dec 12, 2019

vdemeester commented Dec 12, 2019

vdemeester commented Dec 12, 2019 •

edited

Loading

bbrowning commented Dec 12, 2019

vdemeester commented Dec 13, 2019

vdemeester commented Jan 16, 2020

vdemeester commented Jan 17, 2020

tekton-robot commented Jan 17, 2020

OpenShift + Pipeline on GCP is broken #1742

OpenShift + Pipeline on GCP is broken #1742

Comments

vdemeester commented Dec 12, 2019

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

imjasonh commented Dec 12, 2019

vdemeester commented Dec 12, 2019

vdemeester commented Dec 12, 2019

imjasonh commented Dec 12, 2019

vdemeester commented Dec 12, 2019

vdemeester commented Dec 12, 2019 • edited Loading

bbrowning commented Dec 12, 2019

vdemeester commented Dec 13, 2019

vdemeester commented Jan 16, 2020

vdemeester commented Jan 17, 2020

tekton-robot commented Jan 17, 2020

vdemeester commented Dec 12, 2019 •

edited

Loading