Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context propagation with OpenTelemetry and Smallrye context propagation #30362

Closed
OpenGuidou opened this issue Jan 13, 2023 · 19 comments
Closed

Comments

@OpenGuidou
Copy link
Contributor

OpenGuidou commented Jan 13, 2023

Describe the bug

Hello,

When using asynchronous code (based on CompletionStages) and context propagation, it seems that the opentelemetry context is not correctly propagated.

It's visible when using a custom ExecutorService and the ThreadContext.withContextPropagation.
When injecting the ManagedExecutor, it's working correctly.

Expected behavior

The tracing should be correctly propagated from one thread to the other.

Actual behavior

The opentelemetry information is lost when jumping to a new thread.

How to Reproduce?

I've created here a reproducer:
https://github.com/OpenGuidou/opentelemetry-propagation-reproducer

It contains one implementation with opentracing for which the context is correctly propagated, and one with opentelemetry for which it isn't.

Output of java -version

openjdk version "17.0.5" 2022-10-18 LTS
OpenJDK Runtime Environment Microsoft-6841604 (build 17.0.5+8-LTS)
OpenJDK 64-Bit Server VM Microsoft-6841604 (build 17.0.5+8-LTS, mixed mode, sharing)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

See reproducer

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.8.7 (b89d5959fcde851dcb1c8946a785a163f14e1e29)

@quarkus-bot
Copy link

quarkus-bot bot commented Jan 13, 2023

/cc @FroMage (context-propagation), @Ladicek (smallrye), @brunobat (opentelemetry), @jmartisk (smallrye), @manovotn (context-propagation), @phillip-kruger (smallrye), @radcortez (opentelemetry,smallrye)

@OpenGuidou
Copy link
Contributor Author

After some debugging, it would seem that it's the VertxContext that is not correctly propagated.
Something similar to #25818

@radcortez
Copy link
Member

I'll have a look.

@OpenGuidou
Copy link
Contributor Author

OpenGuidou commented Jan 17, 2023

The workaround proposed here #26353 (comment) seems to do the trick, but it doesn't cover the case where there is no vertx context (using MDCEnabledContextStorage).
Anyway I guess it's better to fix that in the library rather than having each client doing it on its side.

@OpenGuidou
Copy link
Contributor Author

OpenGuidou commented Jan 17, 2023

A working workaround for both VertxContext and MDCEnabledContextStorage:

package com.quarkus.reproducer;

import io.quarkus.opentelemetry.runtime.QuarkusContextStorage;
import io.vertx.core.Vertx;
import io.vertx.core.impl.ContextInternal;
import org.eclipse.microprofile.context.spi.ThreadContextProvider;
import org.eclipse.microprofile.context.spi.ThreadContextSnapshot;

import java.util.Map;

public class VertxContextProvider implements ThreadContextProvider {

  @Override
  public ThreadContextSnapshot currentContext(Map<String, String> props) {

    io.vertx.core.Context vertXContext = Vertx.currentContext();
    if (vertXContext != null) {
      return () -> {
        io.vertx.core.Context currentContext = Vertx.currentContext();
        if (vertXContext != null && currentContext == null) {
          final ContextInternal vertxContext = (ContextInternal)vertXContext;
          vertxContext.beginDispatch();
          return () -> vertxContext.endDispatch(null);
        }
        return () -> {
        };
      };
    } else {
      io.opentelemetry.context.Context context = QuarkusContextStorage.INSTANCE.current();
      return () -> {
        io.opentelemetry.context.Context currentContext = QuarkusContextStorage.INSTANCE.current();
        if (context != null && currentContext == null) {
          QuarkusContextStorage.INSTANCE.attach(context);
        }
        return () -> {
        };
      };
    }
  }

  @Override
  public ThreadContextSnapshot clearedContext(Map<String, String> props) {
    return () -> () -> {
    };
  }

  @Override
  public String getThreadContextType() {
    return "Vertx";
  }
}

I hope there is a way to do it better 😄

@FroMage
Copy link
Member

FroMage commented Jan 17, 2023

When using asynchronous code (based on CompletionStages) and context propagation, it seems that the opentelemetry context is not correctly propagated.

It's visible when using a custom ExecutorService and the ThreadContext.withContextPropagation.
When injecting the ManagedExecutor, it's working correctly.

From what you describe, context propagation is working correctly, no? Context propagation only works when using ThreadContext.withContextPropagation or ManagedExecutor to submit tasks. It should not work with CompletionStage by default unless you enable context propagation.

Or did I perhaps misunderstand the issue?

@OpenGuidou
Copy link
Contributor Author

When using asynchronous code (based on CompletionStages) and context propagation, it seems that the opentelemetry context is not correctly propagated.
It's visible when using a custom ExecutorService and the ThreadContext.withContextPropagation.
When injecting the ManagedExecutor, it's working correctly.

From what you describe, context propagation is working correctly, no? Context propagation only works when using ThreadContext.withContextPropagation or ManagedExecutor to submit tasks. It should not work with CompletionStage by default unless you enable context propagation.

Or did I perhaps misunderstand the issue?

I am in fact using the ThreadContext.withContextPropagation: https://github.com/OpenGuidou/opentelemetry-propagation-reproducer/blob/995920a8045acd0520d684d2f8e7dfadf6831342/opentelemetry-propagation/src/main/java/com/quarkus/reproducer/GreetingResource.java#L45

I'm just providing a custom ExecutorService to execute the tasks.
It's working with the default ExecutorService in Quarkus (EnhancedQueueExecutor) as it seems there is a built-in VertXContext propagation.
I would expect the context propagation extension to provide the same level of service for the users having a different worker-threads pool.

@radcortez
Copy link
Member

@OpenGuidou

I don't think that is a workaround. That is probably what we need to implement between OTel and CP.

We have something similar for OpenTracing:
https://github.com/smallrye/smallrye-fault-tolerance/blob/7b13710882d8038335d1f415bcc41b2c40015374/implementation/tracing-propagation/src/main/java/io/smallrye/faulttolerance/tracing/TracingContextProvider.java

@OpenGuidou
Copy link
Contributor Author

OpenGuidou commented Jan 18, 2023

@OpenGuidou

I don't think that is a workaround. That is probably what we need to implement between OTel and CP.

We have something similar for OpenTracing: https://github.com/smallrye/smallrye-fault-tolerance/blob/7b13710882d8038335d1f415bcc41b2c40015374/implementation/tracing-propagation/src/main/java/io/smallrye/faulttolerance/tracing/TracingContextProvider.java

Nice catch, then if you agree I can open a PR with the change I proposed.
The next question would be: where should it be done: smallrye-context-propagation, quarkus smallrye-context-propagation extension or quarkus open-telemetry extension ?

@FroMage
Copy link
Member

FroMage commented Jan 18, 2023

Well, OK. So what you're doing is creating your own executor and using ThreadContext.withContextPropagation on threads created by this service. Why not use the Quarkus ManagedExecutor?

Your threads will not have a Vert.x context because they're not Vert.x/Quarkus threads. So if OTel stores its context in a Vert.x context then yes, this will fail.

Otherwise, the shown VertxContextProvider can't be used as-is for Quarkus, because it messes with the Vert.x context, which is handled automatically for Quarkus and all the cases besides this custom executor service, so this would mess it up. The part with io.opentelemetry.context.Context looks more correct and general-case, though incomplete.

@OpenGuidou
Copy link
Contributor Author

OpenGuidou commented Jan 18, 2023

I'm using a custom executor to be able to use a ForkJoinPool as threadpool, it's something we can't (or I didn't find how) customize in the ManagedExecutor.
The ThreadContext.withContextPropagation is an alternative in case we don't want to use the ManagedExecutor, so the same use case should work with it.

I indeed tried to only use the part dealing with the QuarkusContextStorage, but as is it only works with the MDCEnabledStorage mode, not with the vertx one.
If you have some hints about a proper fix, please advise.

@radcortez
Copy link
Member

I think this should be enough for the OTel integration:

Context current = QuarkusContextStorage.INSTANCE.current();
if (current != null) {
    return () -> {
        QuarkusContextStorage.INSTANCE.attach(current);
        return () -> {};
    };
}

return () -> () -> {};

Is there a case that this does not cover (considering your proposal)?

@FroMage
Copy link
Member

FroMage commented Jan 18, 2023

Well, normally you'd also restore the previous context in the returned lambda ;)

@OpenGuidou
Copy link
Contributor Author

I think this should be enough for the OTel integration:

Context current = QuarkusContextStorage.INSTANCE.current();
if (current != null) {
    return () -> {
        QuarkusContextStorage.INSTANCE.attach(current);
        return () -> {};
    };
}

return () -> () -> {};

Is there a case that this does not cover (considering your proposal)?

Yes indeed it seems to do the trick, I don't know why I didn't stick to that version earlier 🤔

@radcortez
Copy link
Member

Well, normally you'd also restore the previous context in the returned lambda ;)

Not sure if that is required because we just want to propagate the OTel Context to the new thread. When the context is restored, we are in the Vert.x Context, which already contains the OTel Context. Am I missing something?

@OpenGuidou
Copy link
Contributor Author

I guess it's just to stick with the ThreadContextProvider contract.
So something like that ?

import io.quarkus.opentelemetry.runtime.QuarkusContextStorage;
import lombok.extern.slf4j.Slf4j;
import org.eclipse.microprofile.context.spi.ThreadContextProvider;
import org.eclipse.microprofile.context.spi.ThreadContextSnapshot;

import java.util.Map;

@Slf4j
public class VertxContextProvider  implements ThreadContextProvider {

  @Override
  public ThreadContextSnapshot currentContext(Map<String, String> props) {

    io.opentelemetry.context.Context context = QuarkusContextStorage.INSTANCE.current();
    return () -> {

      io.opentelemetry.context.Context currentContext = QuarkusContextStorage.INSTANCE.current();
      if (context != null) {
        QuarkusContextStorage.INSTANCE.attach(context);
        return () -> {
          QuarkusContextStorage.INSTANCE.attach(currentContext);
        };
      }
      return () -> {
      };
    };
  }

  @Override
  public ThreadContextSnapshot clearedContext(Map<String, String> props) {
    return () -> () -> {};
  }

  @Override
  public String getThreadContextType() {
    return "Vertx";
  }
}

@radcortez
Copy link
Member

Looks reasonable. We want this in the OTel Extension with a build step that checks if CP propagation is available and only registers the service in that case.

@OpenGuidou
Copy link
Contributor Author

Ok I'm on it

@radcortez
Copy link
Member

Great. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants