-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set fake AWS credentials on controller to workaround aws-sdk bug #4073
Conversation
Several issues have now reared their head which are directly caused by an update to the aws-sdk. The update results in extremely long delays in the execution of tasks after the Pipelines controller is first deployed in a cluster. The aws-sdk is initialized through a transitive dependency that pipelines pulls in via k8schain. Here are the recent issues directly related to this aws-sdk bug: - #3627 (Since December!) - #4084 One quick way to work around this problem is to set fake AWS credentials in the environment of the deployed controller. This apparently causes the aws-sdk to skip whatever process it has introduced that causes massive delays in execution. So this commit does exactly that - set fake aws creds in the deployments env vars. This is an unfortunate hack to try and mitigate the problem until a better solution presents itself.
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for figuring this out @sbwsg ! This seems fine as a short term fix but I think we should at least track this somewhere so we can come back to it (i think you might be working on this already but id like to track it anyway just in case)
im also wondering what (if any) implication this might have for folks who are actually using AWS credentials - might be worth asking if someone using AWS is willing to try this out and make sure it doesnt cause any issues for them
/approve
@@ -93,6 +93,16 @@ spec: | |||
valueFrom: | |||
fieldRef: | |||
fieldPath: metadata.namespace | |||
# These phony AWS credentials are here to work around a bug in the aws go sdk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add an issue to track this and mention it in the comment? (either here or in go-containerregistry) 🙏
basically just trying to make sure when someone stumbles on this years from now cuz they need to change it that they know where to go to see what the latest state is (i mean they can always track it through the blame but since this is kind of a hack workaround, it feels like we'd like to maybe pursue a long term solution at some point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, we should track this in its own parent issue and probably open or identify an aws-sdk-go issue that's at the bottom of this turtle stack.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bobcatfish The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
+1 on making this change as a short-term stopgap. Another route to fixing this that we should explore: If aws-sdk-go has fixed the bug since v1.31.12 (which is the version we depend on, a year+ old), we should try to see if this bug goes away just by upgrading dependencies. We currently depend on k8schain from January; we could upgrade to the latest release, v0.5.1, which depends on k8s-pkg-credentialprovider v1.21.0-1, which depends on aws-sdk-go v1.35.24 (November 2020) The latest aws-sdk-go is v1.39.4 (two hours ago), we could see if upgrading that dependency then upgrading k8schain then upgrading tektoncd/pipeline fixes the bug. |
@bobcatfish @imjasonh I've created #4087 to capture the problem with all the notes I've been piecing together. |
Updating the SDK may well help indeed, have just seen that aws/aws-sdk-go#3066 from back in January reduces a timeout in requests to the ec2 metadata service. I am super confused how we are seeing timeout times of ~75 seconds per connection attempt while their timeout configuration only had it at 5 seconds prior to that PR though. I might not be fully grokking what's actually going wrong here. |
@imjasonh @sbwsg I'd be happy to merge that asap even as a temporary workaround - so that it might still get into the next release or minor release - wdyt? @pritidesai |
this is really phony but as long as its a temporary workaround, I am fine with this, thanks a bunch @sbwsg 🙏 /lgtm |
Workaround for issue #4087 while we look into better solutions.
Changes
Several issues have now reared their head which are directly caused
by an update to the aws-sdk. The update results in extremely long
delays in the execution of tasks after the Pipelines controller is
first deployed in a cluster. The aws-sdk is initialized through
a transitive dependency that pipelines pulls in via k8schain or go-containerregistry.
Here are the recent issues directly related to this bug:
One quick way to work around this problem is to set fake AWS
credentials in the environment of the deployed controller. This
apparently causes the aws-sdk to skip whatever process it has
introduced that causes massive delays in execution. So this commit
does exactly that - set fake aws creds in the deployments env vars.
This is an unfortunate hack to try and mitigate the problem until
a better solution presents itself. Ideally go-containerregistry or
k8s-pkg-credentialprovider would provide a way for us to disable
AWS SDK initialization to avoid this misbehaviour.
/kind misc
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
functionality, content, code)
Release Notes