Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question/problem: agent context shading vs explicit context propagation #11042

Open
lmolkova opened this issue Apr 5, 2024 · 5 comments
Open
Assignees
Labels
enhancement New feature or request needs triage New issue that requires triage

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Apr 5, 2024

Is your feature request related to a problem? Please describe.

Azure SDKs are instrumented natively. We have a plugin (azure-core-tracing-opentelemetry) shaded into the otel java agent.

Our SDKs are reactor/async heavy and we can't always rely that context propagation happens correctly. Also reactor instrumentation has relatively high overhead and some users want to disable it.

To help with context propagation we recommend users to pass it over explicitly through our own com.azure.core.util.Context property bag.

It looks similar to the following:

  Span parent = tracer.spanBuilder("parent").startSpan();
  io.opentelemetry.context.Context traceContext = io.opentelemetry.context.Context.current().with(parent);

  String response = sampleAzureClient.methodCall("get items",
      new Context(PARENT_TRACE_CONTEXT_KEY, traceContext));
  ...

Life's good: tests are green, applications that manually instrument by installing the plugin work just fine.

But it does not work with the agent - as long as plugin is shaded, it does not recognize the type of the context in PARENT_TRACE_CONTEXT_KEY key.

Repro

context_prop_issue.zip

run with
-javaagent:"path/to/opentelemetry-javaagent.jar" -Dotel.traces.exporter=logging -Dotel.logs.exporter=none -Dotel.metrics.exporter=none

It'll print two scenarios:

  • [Demonstrates the issue] parent span created in user app and passed explicitly as azure context prop. The child span (for http request created by Azure SDK) won't be correlated with parent passed by user app
  • [All good] parent span created using the plugin as if it was created by an Azure SDK. The child span will be correlated with parent

Root cause

The plugin needs to cast the object behind PARENT_TRACE_CONTEXT_KEY to io.opentelemetry.context.Context to use it as an explicit parent to new spans.

But during shading and relocation, the type I need to cast to becomes io.opentelemetry.javaagent.shaded.io.opentelemetry.context.Context.

The context created by the application is io.opentelemetry.javaagent.instrumentation.opentelemetryapi.context.AgentContextWrapper and it implements the original io.opentelemetry.context.Context, therefore it can't be used as parent.

Describe the solution you'd like

I'd like users to be able to pass me context explicitly that I can correlate with.
Ideally, native instrumentations should not worry about relocation/shading at all, but I'd be happy to add any duct tape and detect that plugin runs in the agent.

So essentially I'm looking for any guidance on how to solve this either in the shaded instrumentation part, or inside the azure plugin.

Describe alternatives you've considered

I tried detecting that I'm in the agent and calling AgentContextStorage.toAgentContext to convert user-passed-context to the shaded one.

The only way however, I can resolve this method with reflection is by iterating over all methods on the class like

Class<?> agentCtxStorageClass = Class
    .forName("io.opentelemetry.javaagent.instrumentation.opentelemetryapi.context.AgentContextStorage");
Method[] getAgentContext = agentCtxStorageClass.getMethods();
for (Method method : getAgentContext) {
    if (method.getName().equals("getAgentContext")) {
        return (io.opentelemetry.context.Context) method.invoke(null, data);
    }
}

because even string literals in Class.forName("io.opentelemetry.context.Context") are rewritten to the new shaded context type.

@laurit
Copy link
Contributor

laurit commented Apr 8, 2024

because even string literals in Class.forName("io.opentelemetry.context.Context") are rewritten to the new shaded context type.

Shading plugins try to rename everything that looks like a class name, usually you can trick them with something like Class.forName("#io.opentelemetry.context.Context".substring(1)) or some other construct that avoids having the class name as an constant in the application source code. If you need to use the Context from the opentelemetry api that is inside the application another option is using application.io.opentelemetry.context.Context like you used in https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/reactor/reactor-3.1/javaagent

I didn't try to run your application, but I guess that the issue is that you expect to set the context that is used in application under PARENT_TRACE_CONTEXT_KEY and inside the agent because of the shading when you read you expect to get the agent context, but get the application one. You could set up a relocation rule trace-context -> agent-trace-context to ensure that PARENT_TRACE_CONTEXT_KEY would be different for application and agent code. This probably would get rid of the class cast but you still wouldn't be able to pass the context from the application to the agent. I think your best option is to instrument the code that you shade inside the agent and there when you read PARENT_TRACE_CONTEXT_KEY from the azure context apply then conversion using AgentContextStorage.getAgentContext().

@lmolkova
Copy link
Contributor Author

lmolkova commented Apr 8, 2024

Thanks a lot @laurit! I decided to start with hacky but simple Class.forName("#io.opentelemetry.context.Context".substring(1)) and reflection-based AgentContextStorage.getAgentContext conversion since it allows me to keep all instrumentation-related code inside the azure tracing plugin.

Once the Azure/azure-sdk-for-java#39602 is merged and released, I want to add a test inside the https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/azure-core/azure-core-1.36/javaagent/src/testAzure/java/io/opentelemetry/javaagent/instrumentation/azurecore/v1_36/AzureSdkTest.java to validate if it works (validated manually for now). So could you please keep this issue open until then and assign it to me? Thanks!

@laurit
Copy link
Contributor

laurit commented Apr 10, 2024

FYI while this may work in the short term it is likely that we will break it when work on #8999 resumes.

@MrSinisterX
Copy link

MrSinisterX commented Jun 28, 2024

I am having maybe similar issues while using AwsXrayPropagator's public <C> Context extract(Context context, @Nullable C carrier, TextMapGetter<C> getter) method. I am getting a NoSuchMethod exception while calling it. While debugging I figured out that the agent is somehow changing the io.opentelemetry.context.Context class to another one from a shaded package.

image

@laurit
Copy link
Contributor

laurit commented Jun 29, 2024

@MrSinisterX Agent renames opentelemetry api classes as to not conflict with application provided opentelemetry api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs triage New issue that requires triage
Projects
None yet
Development

No branches or pull requests

3 participants