Set fake AWS credentials on controller to workaround aws-sdk bug #4073

ghost · 2021-07-02T19:13:11Z

Workaround for issue #4087 while we look into better solutions.

Changes

Several issues have now reared their head which are directly caused
by an update to the aws-sdk. The update results in extremely long
delays in the execution of tasks after the Pipelines controller is
first deployed in a cluster. The aws-sdk is initialized through
a transitive dependency that pipelines pulls in via k8schain or go-containerregistry.

Here are the recent issues directly related to this bug:

One quick way to work around this problem is to set fake AWS
credentials in the environment of the deployed controller. This
apparently causes the aws-sdk to skip whatever process it has
introduced that causes massive delays in execution. So this commit
does exactly that - set fake aws creds in the deployments env vars.

This is an unfortunate hack to try and mitigate the problem until
a better solution presents itself. Ideally go-containerregistry or
k8s-pkg-credentialprovider would provide a way for us to disable
AWS SDK initialization to avoid this misbehaviour.

/kind misc

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

Follows the commit message standard
Meets the Tekton contributor standards (including
functionality, content, code)
Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

Work around a bug in the AWS go SDK that causes extremely long delays in task startup times.

Several issues have now reared their head which are directly caused by an update to the aws-sdk. The update results in extremely long delays in the execution of tasks after the Pipelines controller is first deployed in a cluster. The aws-sdk is initialized through a transitive dependency that pipelines pulls in via k8schain. Here are the recent issues directly related to this aws-sdk bug: - #3627 (Since December!) - #4084 One quick way to work around this problem is to set fake AWS credentials in the environment of the deployed controller. This apparently causes the aws-sdk to skip whatever process it has introduced that causes massive delays in execution. So this commit does exactly that - set fake aws creds in the deployments env vars. This is an unfortunate hack to try and mitigate the problem until a better solution presents itself.

ghost · 2021-07-09T18:02:30Z

/hold cancel

bobcatfish

Thanks for figuring this out @sbwsg ! This seems fine as a short term fix but I think we should at least track this somewhere so we can come back to it (i think you might be working on this already but id like to track it anyway just in case)

im also wondering what (if any) implication this might have for folks who are actually using AWS credentials - might be worth asking if someone using AWS is willing to try this out and make sure it doesnt cause any issues for them

/approve

bobcatfish · 2021-07-09T19:43:09Z

config/controller.yaml

@@ -93,6 +93,16 @@ spec:
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
+        # These phony AWS credentials are here to work around a bug in the aws go sdk


could we add an issue to track this and mention it in the comment? (either here or in go-containerregistry) 🙏

basically just trying to make sure when someone stumbles on this years from now cuz they need to change it that they know where to go to see what the latest state is (i mean they can always track it through the blame but since this is kind of a hack workaround, it feels like we'd like to maybe pursue a long term solution at some point)

+1, we should track this in its own parent issue and probably open or identify an aws-sdk-go issue that's at the bottom of this turtle stack.

tekton-robot · 2021-07-09T19:45:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [bobcatfish]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

imjasonh · 2021-07-09T21:34:24Z

+1 on making this change as a short-term stopgap. Another route to fixing this that we should explore:

If aws-sdk-go has fixed the bug since v1.31.12 (which is the version we depend on, a year+ old), we should try to see if this bug goes away just by upgrading dependencies.

We currently depend on k8schain from January; we could upgrade to the latest release, v0.5.1, which depends on k8s-pkg-credentialprovider v1.21.0-1, which depends on aws-sdk-go v1.35.24 (November 2020)

The latest aws-sdk-go is v1.39.4 (two hours ago), we could see if upgrading that dependency then upgrading k8schain then upgrading tektoncd/pipeline fixes the bug.

ghost · 2021-07-12T15:15:45Z

@bobcatfish @imjasonh I've created #4087 to capture the problem with all the notes I've been piecing together.

ghost · 2021-07-12T15:26:45Z

Updating the SDK may well help indeed, have just seen that aws/aws-sdk-go#3066 from back in January reduces a timeout in requests to the ec2 metadata service. I am super confused how we are seeing timeout times of ~75 seconds per connection attempt while their timeout configuration only had it at 5 seconds prior to that PR though. I might not be fully grokking what's actually going wrong here.

afrittoli · 2021-07-12T19:51:09Z

@imjasonh @sbwsg I'd be happy to merge that asap even as a temporary workaround - so that it might still get into the next release or minor release - wdyt? @pritidesai

pritidesai · 2021-07-12T20:05:41Z

this is really phony but as long as its a temporary workaround, I am fine with this, thanks a bunch @sbwsg 🙏

/lgtm

tekton-robot added release-note-none Denotes a PR that doesnt merit a release note. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/misc Categorizes issue or PR as a miscellaneuous one. labels Jul 2, 2021

tekton-robot requested review from afrittoli and imjasonh July 2, 2021 19:13

tekton-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 2, 2021

ghost closed this Jul 2, 2021

ghost reopened this Jul 9, 2021

tekton-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 9, 2021

ghost changed the title ~~Temp commit, just seeing if adding these env vars speeds up our PR CI~~ Set fake AWS credentials on controller to workaround aws-sdk bug Jul 9, 2021

tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesnt merit a release note. labels Jul 9, 2021

tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 9, 2021

bobcatfish reviewed Jul 9, 2021

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2021

tekton-robot assigned pritidesai Jul 12, 2021

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2021

tekton-robot merged commit 49a7fa2 into tektoncd:main Jul 12, 2021

This was referenced Jul 15, 2021

TaskRun hangs when command field is not set #4084

Closed

e2e TestGitPipelineRun test fails on newly installed pipeline #3627

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set fake AWS credentials on controller to workaround aws-sdk bug #4073

Set fake AWS credentials on controller to workaround aws-sdk bug #4073

ghost commented Jul 2, 2021 •

edited by ghost

Loading

ghost commented Jul 9, 2021

bobcatfish left a comment

bobcatfish Jul 9, 2021

imjasonh Jul 9, 2021

tekton-robot commented Jul 9, 2021

imjasonh commented Jul 9, 2021

ghost commented Jul 12, 2021

ghost commented Jul 12, 2021

afrittoli commented Jul 12, 2021

pritidesai commented Jul 12, 2021

Set fake AWS credentials on controller to workaround aws-sdk bug #4073

Set fake AWS credentials on controller to workaround aws-sdk bug #4073

Conversation

ghost commented Jul 2, 2021 • edited by ghost Loading

Changes

Submitter Checklist

Release Notes

ghost commented Jul 9, 2021

bobcatfish left a comment

Choose a reason for hiding this comment

bobcatfish Jul 9, 2021

Choose a reason for hiding this comment

imjasonh Jul 9, 2021

Choose a reason for hiding this comment

tekton-robot commented Jul 9, 2021

imjasonh commented Jul 9, 2021

ghost commented Jul 12, 2021

ghost commented Jul 12, 2021

afrittoli commented Jul 12, 2021

pritidesai commented Jul 12, 2021

ghost commented Jul 2, 2021 •

edited by ghost

Loading