Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultS3Presigner from AWS SDK v2 leaks context. #9238

Closed
zbytt opened this issue Aug 17, 2023 · 3 comments · Fixed by #9275
Closed

DefaultS3Presigner from AWS SDK v2 leaks context. #9238

zbytt opened this issue Aug 17, 2023 · 3 comments · Fixed by #9275
Labels
bug Something isn't working repro provided

Comments

@zbytt
Copy link

zbytt commented Aug 17, 2023

Describe the bug
Trace ids are not getting cleared between requests, when aws sdk v2 presign url was used during serving the request.

Steps to reproduce
We've got a spring boot application, deployed on AWS, with instrumentation agent from AWS OTEL ( + xray tracing)
Some of the app endpoints are responsible for creating presign urls to s3.
Threads that are serving those requests are getting stuck with the same trace id for the rest of the application lifetime.
Serving consecutive requests on those threads yields logs from OTEL debug:

[otel.javaagent 2023-08-17 08:31:07:036 +0000] [main/http-nio-53] WARN io.opentelemetry.javaagent.shaded.instrumentation.api.internal.ContextPropagationDebug - Unexpected non-root current context found when extracting remote context!

[otel.javaagent 2023-08-17 08:31:07:036 +0000] [main/http-nio-53] WARN io.opentelemetry.javaagent.shaded.instrumentation.api.internal.ContextPropagationDebug - It contains this span: SdkSpan{traceId=64dddac92b0ced3b40080da00be017e7, spanId=c2dde67bc892128e, parentSpanContext=ImmutableSpanContext{traceId=64dddac92b0ced3b40080da00be017e7, spanId=ff04fb63db7023d0, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, name=s3.GetObject, kind=CLIENT, attributes=AttributesMap{data={rpc.service=s3, aws.bucket.name=bucket, rpc.method=GetObject, rpc.system=aws-api, thread.name=main/http-nio-53, thread.id=53}, capacity=128, totalAddedValues=6}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1692261065853441702, endEpochNanos=0}

or

[otel.javaagent 2023-08-17 08:31:47:224 +0000] [main/http-nio-53] WARN io.opentelemetry.javaagent.shaded.instrumentation.api.internal.ContextPropagationDebug - Unexpected non-root current context found when extracting remote context!

[otel.javaagent 2023-08-17 08:31:47:224 +0000] [main/http-nio-53] WARN io.opentelemetry.javaagent.shaded.instrumentation.api.internal.ContextPropagationDebug - It contains this span: SdkSpan{traceId=64dddaf1767d21764df7f5611b19c2a4, spanId=b8b2a9dcd612bc31, parentSpanContext=ImmutableSpanContext{traceId=64dddaf1767d21764df7f5611b19c2a4, spanId=c42e895af4025ab3, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, name=s3.PutObject, kind=CLIENT, attributes=AttributesMap{data={rpc.service=s3, aws.bucket.name=bucket, rpc.method=PutObject, rpc.system=aws-api, thread.name=main/http-nio-53, thread.id=53}, capacity=128, totalAddedValues=6}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1692261105507624333, endEpochNanos=0}

We're using software.amazon.awssdk.services.s3.internal.signing.DefaultS3Presigner (from SDK v2) to presign our requests.
I stumbled upon this PR #8815 which presumably explains what is the problem, and fixes it for sdk v1, so I replaced our usage of sdk v2 presigner with the presigning method mentioned in the PR and all the warning disappeared.

What did you expect to see?
New trace ids provided by API GW should not be ignored, new traces should be created and propagated.
What did you see instead?
New traces provided by API GW are getting ignored, trace from requests creating presign urls are getting stuck on the threads (which is visible in MDC logs)

What version are you using?
https://github.com/aws-observability/aws-otel-java-instrumentation/releases/tag/v1.28.0

Environment
Compiler: openjdk 17.0.7 2023-04-18
OS: Ubuntu 20.04

@zbytt zbytt added the bug Something isn't working label Aug 17, 2023
@laurit
Copy link
Contributor

laurit commented Aug 17, 2023

@zbytt Could you share the code that reproduces the issue.

@zbytt
Copy link
Author

zbytt commented Aug 18, 2023

I will try to provide a sample over the weekend.

zbytt pushed a commit to zbytt/otel-s3-presigner-bug that referenced this issue Aug 21, 2023
@zbytt
Copy link
Author

zbytt commented Aug 21, 2023

@laurit
https://github.com/zbytt/otel-s3-presigner-bug contain minimal app that allows to reproduce the bug.
README.md file contain the instructions on how to reproduce it.
Running the app with profile 'bug' uses aws sdk v2 presigner which leads to erroneous handling of the request, running the same app without any profiles uses aws sdk v1 presigner in which case the error does not occur.
Provided requests need to be executed in order to show the problem.

In case of any questions, feel free to ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working repro provided
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants