-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Credential provider fails intermittently on ECS (fargate) #697
Comments
Thanks for the report. I'm currently attempting to reproduce. |
I've reproduced this. The container metadata service appears to be rate-limiting, it returns this string if you try to obtain credentials too many times in a short period:
What's concerning to me is that this is returned as a raw string instead of a json payload, which is what we expect from the API. We even have code in this credentials provider to handle json-based errors that it might return. I'm hesitant to simply key off of this message given that it's a string out of band, I think there's a good chance it's not part of the credential service's modeled API and therefore it's not something we should rely on receiving. We'll need to discuss amongst the team how we want to handle this error, if at all. In the meantime, you should be able to avoid this if you create a single client for message polling. |
Thanks for the deep look, and the rate limit explanation makes a lot of sense with the pattern that we saw: a lot of successes on the first run and then a sudden error, and upon restart a much shorter time before the error. I agree that not doing the naive authenticating every iteration is the solution for us, but just wanted to be sure to bring the error up here as it was still surprising and would hope to be able to catch some specific Thanks again! |
|
Describe the bug
While running some code that processes messages rapidly from SQS, I saw a frequent, but irregular errors loading credentials for the worker under load. The set up is 1 dropwizard command process that starts up 10
CompletableFutures
from a thread pool. Each of those threads then rapidly polls for messages on the queue, process it, and then goes back to polling. Currently each time it polls it reauths withSqsClient.fromEnvironment()
(not ideal, but didn't expect an issue like this when using built-in creds). This is all running on one ECS task at a time (The error forced rerunning the task a few times in the end).Over the course of processing about 15k messages the error happened 5 times. Below is a single exception pulled from the logs:
Expected behavior
Getting credentials while running on ECS should not arbitrarily fail
Current behavior
Getting credentials while running on ECS is arbitrarily failing
Steps to Reproduce
Run
SqsClient.fromEnvironment()
on ECS with an execution role configured in a hot loopPossible Solution
there is both a
SocketException
and aDeserializationException
in the stack provided, which seem like a startContext
No response
AWS Kotlin SDK version used
0.17.0-beta
Platform (JVM/JS/Native)
JVM
Operating System and version
fargate provider for ECS
The text was updated successfully, but these errors were encountered: