Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Credential provider fails intermittently on ECS (fargate) #697

Closed
jonesetc opened this issue Sep 8, 2022 · 4 comments
Closed

Credential provider fails intermittently on ECS (fargate) #697

jonesetc opened this issue Sep 8, 2022 · 4 comments
Assignees
Labels
bug This issue is a bug.

Comments

@jonesetc
Copy link

jonesetc commented Sep 8, 2022

Describe the bug

While running some code that processes messages rapidly from SQS, I saw a frequent, but irregular errors loading credentials for the worker under load. The set up is 1 dropwizard command process that starts up 10 CompletableFutures from a thread pool. Each of those threads then rapidly polls for messages on the queue, process it, and then goes back to polling. Currently each time it polls it reauths with SqsClient.fromEnvironment() (not ideal, but didn't expect an issue like this when using built-in creds). This is all running on one ECS task at a time (The error forced rerunning the task a few times in the end).

Over the course of processing about 15k messages the error happened 5 times. Below is a single exception pulled from the logs:

aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderException: No credentials could be loaded from the chain: CredentialsProviderChain -> EnvironmentCredentialsProvider -> ProfileCredentialsProvider -> StsWebIdentityProvider -> EcsCredentialsProvider -> ImdsCredentialsProvider
    at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain$getCredentials$chainException$1.invoke(CredentialsProviderChain.kt:36)
    at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain$getCredentials$chainException$1.invoke(CredentialsProviderChain.kt:36)
    at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
    at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain.getCredentials$suspendImpl(CredentialsProviderChain.kt:42)
    at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain.getCredentials(CredentialsProviderChain.kt)
    at aws.sdk.kotlin.runtime.auth.credentials.CachedCredentialsProvider$getCredentials$2.invokeSuspend(CachedCredentialsProvider.kt:58)
    at aws.sdk.kotlin.runtime.auth.credentials.CachedCredentialsProvider$getCredentials$2.invoke(CachedCredentialsProvider.kt)
    at aws.sdk.kotlin.runtime.auth.credentials.CachedCredentialsProvider$getCredentials$2.invoke(CachedCredentialsProvider.kt)
    at aws.sdk.kotlin.runtime.config.CachedValue.getOrLoad(CachedValue.kt:59)
    at aws.sdk.kotlin.runtime.auth.credentials.CachedCredentialsProvider.getCredentials(CachedCredentialsProvider.kt:55)
    at aws.sdk.kotlin.runtime.auth.credentials.DefaultChainCredentialsProvider.getCredentials(DefaultChainCredentialsProvider.kt:68)
    at aws.smithy.kotlin.runtime.auth.awssigning.DefaultAwsSignerImpl.sign(DefaultAwsSigner.kt:26)
    at aws.smithy.kotlin.runtime.auth.awssigning.middleware.AwsSigningMiddleware.modifyRequest(AwsSigningMiddleware.kt:158)
    at aws.smithy.kotlin.runtime.auth.awssigning.middleware.AwsSigningMiddleware.modifyRequest(AwsSigningMiddleware.kt:20)
    at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
    at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
    at aws.smithy.kotlin.runtime.io.middleware.Phase.handle(Phase.kt:64)
    at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
    at aws.smithy.kotlin.runtime.http.operation.MutateHandler.call(SdkOperationExecution.kt:126)
    at aws.smithy.kotlin.runtime.http.operation.MutateHandler.call(SdkOperationExecution.kt:123)
    at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
    at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
    at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
    at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
    at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
    at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
    at aws.smithy.kotlin.runtime.http.middleware.Retry$handle$outcome$1.invokeSuspend(Retry.kt:52)
    at aws.smithy.kotlin.runtime.http.middleware.Retry$handle$outcome$1.invoke(Retry.kt)
    at aws.smithy.kotlin.runtime.http.middleware.Retry$handle$outcome$1.invoke(Retry.kt)
    at aws.smithy.kotlin.runtime.retries.StandardRetryStrategy.doTryLoop(StandardRetryStrategy.kt:51)
    at aws.smithy.kotlin.runtime.retries.StandardRetryStrategy.retry(StandardRetryStrategy.kt:32)
    at aws.smithy.kotlin.runtime.http.middleware.Retry.handle$suspendImpl(Retry.kt:36)
    at aws.smithy.kotlin.runtime.http.middleware.Retry.handle(Retry.kt)
    at aws.sdk.kotlin.runtime.http.middleware.AwsRetryMiddleware.handle(AwsRetryMiddleware.kt:38)
    at aws.sdk.kotlin.runtime.http.middleware.AwsRetryMiddleware.handle(AwsRetryMiddleware.kt:30)
    at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
    at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
    at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
    at aws.smithy.kotlin.runtime.io.middleware.Phase.handle(Phase.kt:64)
    at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
    at aws.smithy.kotlin.runtime.http.operation.SerializeHandler.call(SdkOperationExecution.kt:119)
    at aws.smithy.kotlin.runtime.http.operation.SerializeHandler.call(SdkOperationExecution.kt:102)
    at aws.smithy.kotlin.runtime.http.operation.InitializeHandler.call(SdkOperationExecution.kt:99)
    at aws.smithy.kotlin.runtime.io.middleware.Phase.handle(Phase.kt:60)
    at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
    at aws.smithy.kotlin.runtime.http.operation.SdkHttpOperationKt.execute(SdkHttpOperation.kt:83)
    at aws.smithy.kotlin.runtime.http.operation.SdkHttpOperationKt.roundTrip(SdkHttpOperation.kt:66)
    at aws.sdk.kotlin.services.sqs.DefaultSqsClient.receiveMessage(DefaultSqsClient.kt:620)
    at com.example.Sqs$poll$1.invokeSuspend(SqsTaskChannel.kt:36)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
    at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
    at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
    at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
    at com.example.Sqs.poll(SqsTaskChannel.kt:34)
    at com.example.Task.processNextAvailableTask(TaskService.kt:19)
    at com.example.Run.run$lambda-3$lambda-1$lambda-0(RunTasks.kt:69)
    at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
    ... 4 common frames omitted
    Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: Missing value for environment variable `AWS_ACCESS_KEY_ID`
        at aws.sdk.kotlin.runtime.auth.credentials.EnvironmentCredentialsProvider.requireEnv(EnvironmentCredentialsProvider.kt:23)
        at aws.sdk.kotlin.runtime.auth.credentials.EnvironmentCredentialsProvider.getCredentials(EnvironmentCredentialsProvider.kt:26)
        at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain.getCredentials$suspendImpl(CredentialsProviderChain.kt:39)
        ... 61 common frames omitted
    Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: could not find source profile default
        at aws.sdk.kotlin.runtime.auth.credentials.profile.ProfileChain$Companion.resolve$aws_config(ProfileChain.kt:44)
        at aws.sdk.kotlin.runtime.auth.credentials.ProfileCredentialsProvider.getCredentials(ProfileCredentialsProvider.kt:101)
        at aws.sdk.kotlin.runtime.auth.credentials.ProfileCredentialsProvider$getCredentials$1.invokeSuspend(ProfileCredentialsProvider.kt)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
        ... 14 common frames omitted
    Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: Required field `roleArn` could not be automatically inferred for StsWebIdentityCredentialsProvider. Either explicitly pass a value, set the environment variable `AWS_ROLE_ARN`, or set the JVM system property `aws.roleArn`
        at aws.sdk.kotlin.runtime.auth.credentials.StsWebIdentityCredentialsProvider$Companion.fromEnvironment-TUY-ock(StsWebIdentityCredentialsProvider.kt:133)
        at aws.sdk.kotlin.runtime.auth.credentials.StsWebIdentityCredentialsProvider$Companion.fromEnvironment-TUY-ock$default(StsWebIdentityCredentialsProvider.kt:54)
        at aws.sdk.kotlin.runtime.auth.credentials.StsWebIdentityProvider.getCredentials(DefaultChainCredentialsProvider.kt:87)
        at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain.getCredentials$suspendImpl(CredentialsProviderChain.kt:39)
        at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain$getCredentials$1.invokeSuspend(CredentialsProviderChain.kt)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
        ... 14 common frames omitted
    Suppressed: aws.smithy.kotlin.runtime.serde.DeserializationException: Unexpected JSON token at offset 0; found `Y`, expected one of `{`, `[`, `\"`, `null`, `true`, `false`, `<number>`
        at aws.smithy.kotlin.runtime.serde.json.JsonLexer.fail(JsonLexer.kt:356)
        at aws.smithy.kotlin.runtime.serde.json.JsonLexer.fail$default(JsonLexer.kt:355)
        at aws.smithy.kotlin.runtime.serde.json.JsonLexer.unexpectedToken(JsonLexer.kt:352)
        at aws.smithy.kotlin.runtime.serde.json.JsonLexer.readToken(JsonLexer.kt:202)
        at aws.smithy.kotlin.runtime.serde.json.JsonLexer.doPeek(JsonLexer.kt:85)
        at aws.smithy.kotlin.runtime.serde.json.JsonLexer.peek(JsonLexer.kt:72)
        at aws.smithy.kotlin.runtime.serde.json.JsonDeserializer.deserializeStruct(JsonDeserializer.kt:110)
        at aws.sdk.kotlin.runtime.auth.credentials.JsonCredentialsDeserializerKt.deserializeJsonCredentials(JsonCredentialsDeserializer.kt:133)
        ... 18 common frames omitted
    Wrapped by: aws.sdk.kotlin.runtime.auth.credentials.InvalidJsonCredentialsException: invalid JSON credentials response
        at aws.sdk.kotlin.runtime.auth.credentials.JsonCredentialsDeserializerKt.deserializeJsonCredentials(JsonCredentialsDeserializer.kt:116)
        at aws.sdk.kotlin.runtime.auth.credentials.EcsCredentialsDeserializer.deserialize(EcsCredentialsProvider.kt:177)
        at aws.sdk.kotlin.runtime.auth.credentials.EcsCredentialsDeserializer$deserialize$1.invokeSuspend(EcsCredentialsProvider.kt)
        ... 16 common frames omitted
    Wrapped by: aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderException: Failed to get credentials from container metadata service
        at aws.sdk.kotlin.runtime.auth.credentials.EcsCredentialsProvider.getCredentials(EcsCredentialsProvider.kt:114)
        at aws.sdk.kotlin.runtime.auth.credentials.EcsCredentialsProvider$getCredentials$1.invokeSuspend(EcsCredentialsProvider.kt)
        ... 16 common frames omitted
    Suppressed: java.net.SocketException: Invalid argument
        at java.base/sun.nio.ch.Net.connect0(Native Method)
        at java.base/sun.nio.ch.Net.connect(Net.java:579)
        at java.base/sun.nio.ch.Net.connect(Net.java:568)
        at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588)
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
        at java.base/java.net.Socket.connect(Socket.java:633)
        at okhttp3.internal.platform.Platform.connectSocket(Platform.kt:128)
        at okhttp3.internal.connection.ConnectPlan.connectSocket(ConnectPlan.kt:254)
        at okhttp3.internal.connection.ConnectPlan.connectTcp(ConnectPlan.kt:128)
        at okhttp3.internal.connection.SequentialExchangeFinder.find(SequentialExchangeFinder.kt:33)
        at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:267)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:84)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:65)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at datadog.trace.instrumentation.okhttp3.TracingInterceptor.intercept(TracingInterceptor.java:35)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:205)
        at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:533)
        ... 3 common frames omitted
    Wrapped by: aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderException: failed to load instance profile
        at aws.sdk.kotlin.runtime.auth.credentials.ImdsCredentialsProvider.getCredentials(ImdsCredentialsProvider.kt:62)
        at aws.sdk.kotlin.runtime.auth.credentials.ImdsCredentialsProvider$getCredentials$1.invokeSuspend(ImdsCredentialsProvider.kt)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.UndispatchedCoroutine.afterResume(CoroutineContext.kt:233)
        at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
        ... 14 common frames omitted
Wrapped by: java.util.concurrent.CompletionException: aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderException: No credentials could be loaded from the chain: CredentialsProviderChain -> EnvironmentCredentialsProvider -> ProfileCredentialsProvider -> StsWebIdentityProvider -> EcsCredentialsProvider -> ImdsCredentialsProvider
    at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
    at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
    at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1807)
    at com.example.Executor.execute$lambda-1(ContextAwareExecutor.kt:42)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

Expected behavior

Getting credentials while running on ECS should not arbitrarily fail

Current behavior

Getting credentials while running on ECS is arbitrarily failing

Steps to Reproduce

Run SqsClient.fromEnvironment() on ECS with an execution role configured in a hot loop

Possible Solution

there is both a SocketException and a DeserializationException in the stack provided, which seem like a start

Context

No response

AWS Kotlin SDK version used

0.17.0-beta

Platform (JVM/JS/Native)

JVM

Operating System and version

fargate provider for ECS

@jonesetc jonesetc added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 8, 2022
@lucix-aws lucix-aws removed the needs-triage This issue or PR still needs to be triaged. label Sep 12, 2022
@lucix-aws
Copy link
Contributor

Thanks for the report. I'm currently attempting to reproduce.

@lucix-aws
Copy link
Contributor

I've reproduced this.

The container metadata service appears to be rate-limiting, it returns this string if you try to obtain credentials too many times in a short period:

You have reached maximum request limit. (notice the "unexpected token Y" in the deserialization exception)

What's concerning to me is that this is returned as a raw string instead of a json payload, which is what we expect from the API. We even have code in this credentials provider to handle json-based errors that it might return. I'm hesitant to simply key off of this message given that it's a string out of band, I think there's a good chance it's not part of the credential service's modeled API and therefore it's not something we should rely on receiving. We'll need to discuss amongst the team how we want to handle this error, if at all.

In the meantime, you should be able to avoid this if you create a single client for message polling.

@jonesetc
Copy link
Author

Thanks for the deep look, and the rate limit explanation makes a lot of sense with the pattern that we saw: a lot of successes on the first run and then a sudden error, and upon restart a much shorter time before the error.

I agree that not doing the naive authenticating every iteration is the solution for us, but just wanted to be sure to bring the error up here as it was still surprising and would hope to be able to catch some specific ClientException in the future.

Thanks again!

@github-actions
Copy link

github-actions bot commented Dec 5, 2022

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

2 participants