Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry Mutiny #25576

Closed
wants to merge 12 commits into from
Closed

Conversation

holomekc
Copy link
Contributor

@holomekc holomekc commented May 14, 2022

Hi in #25453 it was mentioned that OpenTelemetry does not work with Mutiny.

I already worked on this topic and thought maybe the stuff I created might be helpful. I am pretty sure that it is not 100% complete but maybe it can save some time.

I am also not sure about a few things, so I hope the Experts in Mutiny etc. can help here.

Maybe shortly how it works:
OpenTelemetry provides the so called AsyncOperationEndStrategy. So you can basically register different type (here Uni and Multi) and the strategy then needs to make sure that spans are closed correctly.

Due to the fact that async strategies can be used in the same context as imperative things I adjusted the WithSpanInterceptor (Or to be more exact I moved some things around. I tried to explain why in code). So basically when you use WithSpan, the strategy checks if it can handle the return type. If so it does some magic. Otherwise, it falls back to imperative. Also errors close the span properly now.

Besides that the main Topic is to get the suitable context on different events from Mutiny (onItem, onComplete, etc.) Therefore, I used the Mutiny Interceptor approach to wrap the created Unis. I tried to follow the approach OpenTelemetry does for rxjava. And here I really hope that the Mutiny Experts say: "What? Why so complicated that is much easier to achieve. Just do this and that."

What else:

  • I add a configurable event which is written as soon as the Uni/Multi subscription happens. Customizable text can be used here and is configurable.
  • A Span Attribute (the other e.g. rxjava implementations do that as well), which is set in case a stream is cancelled. Also configurable.
  • As mentioned above I moved around some stuff regarding WithSpanInterceptor. The idea was to provide a way to use all the implemented logic via the WithSpan annotation but also be able to use it programmatically. The so called InstrumenterTracer can be used for that. I had no better idea for a name... So basically you can call: withSpan(, () -> supplySomethingSyncOrAsync); The parameters provide even more control in case the annotation is not powerfull enough.
  • WithRoot. Is a new annotation, which can be used with WithSpan and then a new trace is started instead. I think this is really important to have. Otherwise, traces can become very, very long. Sadly there is no approach yet in OpenTelemetry but maybe this changes at some point in time. Then the custom annotation could be removed. See Feature Request: Allow @WithSpan to start a new Trace open-telemetry/opentelemetry-java-instrumentation#1036. Furthermore, the annotation also allows to set the flag links. This will not only start a new trace but also link it to the previous one. So you can create shorter, more readble traces if you want to but you can also link them so that you can see where the trace came from.

QuarkusContextStorage
In general I wrote some TODOs: which also contain some questions but there is one thing I am not that sure about and this is the QuarkusContextStorage. During Scope creation it also checks onClose if the current Context is the one which was used during attach. In my tests on a project, this was not working well in cobination with the Vertx SQL traces. I am also not sure why this happens. Seems like a span with scope is started but closed at transaction commit (I think), but then might overlap with scopes from Mutiny etc. But maybe you see if I did a mistake here.

In general I am not that familiar with Quarkus extension development so I probably forgot something or the maven structure / dependencies are not suitable.
The build is still running (seems like to take some time) but I thought I could already create the PR so that you can take a look.

Ok this was more text than I thought. I hope it helps and feedback would be really awesome.

holomekc added 12 commits May 13, 2022 22:14
Add MutinyAsyncOperationEndStrategy
add some configuration to add some events or span attributes during certain mutiny events (onSubscription/onCancellation)
register async strategy in OpenTelemetry
… case the async-mutiny dependency is not used or disabled no mutiny strategy will be registered and async strategies implementation from OpenTelemetry falls back to sync.
…ld need feedback here.

Created Tracing Mutiny streams and subscriber
…allows programmatic creation of "withSpan". Also supports async return values.

Add @withroot annotation which allows to start a new trace when used in combination with @WithSpan. Also allows to link to previous active span context.
add more junit tests
revert some line formatting executed on bom/pom.xml
@quarkus-bot quarkus-bot bot added area/core area/dependencies Pull requests that update a dependency file area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins area/documentation area/tracing labels May 14, 2022
@quarkus-bot
Copy link

quarkus-bot bot commented May 14, 2022

/cc @radcortez

@holomekc
Copy link
Contributor Author

This is the QuarkusContextStorage line I talked about above:

log.warn("Context in storage not the expected context, Scope.close was not called correctly");

@holomekc
Copy link
Contributor Author

I think what is still missing is a way to register to WithSpanInterceptor, so that it is possible to adjust / replace the instrumenter. I think this would be awesome. Why? Some tools ignore the Exception set by instrumenter but it works when attributes are set manually. But this would not be possible if the instrumenter is not adjustable.

@gsmet
Copy link
Member

gsmet commented Feb 2, 2023

@radcortez @brunobat @jponge @cescoffier ping. There are conflicts now but it would be nice to check if the approach is worth pursuing.

}

private static Multi<String> createDefaultMulti(final SpanContext context, final int delayMs) {
return Multi.createFrom().items("SUPERSONIC", "SUBATOMIC", "JAVA")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Uni and Multi tests use the "delayIt" operators which use the Quarkus scheduler.
It would be interesting to test:

  • truly async using the duplicated context
  • unmanaged thread.

import io.smallrye.mutiny.Multi;
import io.smallrye.mutiny.infrastructure.MultiInterceptor;

public class OpenTelemetryMultiInterceptor implements MultiInterceptor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this will have a massive impact on the performance. I don't know a better way, but that's something to keep in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. When I check other implementation like rxjava it seems like this is necessary:
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/rxjava/rxjava-2.0/library/src/main/java/io/opentelemetry/instrumentation/rxjava/v2_0/TracingAssembly.java

I do not know if there is a different way to achieve tracing in reactive. But I am afraid it is not. This is why I hoped that the Munity guys say: No, no Chris there is a way easier approach to achieve the same. Without that I always ended up with broken, not connected traces, which is not really helpful.

import io.smallrye.mutiny.infrastructure.UniInterceptor;
import io.smallrye.mutiny.subscription.UniSubscriber;

public class OpenTelemetryUniInterceptor implements UniInterceptor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the Multi version - it has a massive performance cost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above

if (multi instanceof TracingMulti) {
return multi;
}
return new TracingMulti<>(Context.current(), multi);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the Context.current is what you want?

It will be the current at assembly time. I'm pretty sure you need the one at subscription time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Long time ago... I cannot really remember. It is also present in onSubscription. I mean the tests are there, so feel free to improve the approach. I quickly checked the rxjava impl. I do not see it there. There the current context is only used at subscription time. I think something was broken when I did not do this, but as said, I cannot really remember. I had some difficulties with the Multi instrumentation, that is at least what I can remember.

private final Subscriber<? super T> subscriber;

public TracingMultiSubscriber(final Context context, final Subscriber<? super T> subscriber) {
this.context = context;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the assembly time context. Most probably you need the subscription time one (onSubscribe()).

public void onItem(final T item) {
try (final Scope ignored = context.makeCurrent()) {
subscriber.onNext(item);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected to close the scope? onNext is likely going to do async operation. So, the return of onNext does not mean we are done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@radcortez radcortez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to discuss what would be our preferred approach.

Looking only at what we have here:

  • We do need more integration tests to make sure the context is being correctly propagated and that spans are generated as we expect, no matter the origin or the target
  • The Extension cannot be something separate. It should be part of the main extension
  • I share the same concerns as @cescoffier

@gsmet gsmet added the triage/needs-rebase This PR needs to be rebased first because it has merge conflicts label Feb 4, 2023
Copy link
Contributor

@brunobat brunobat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss this, as stated above.
This should be part of the main extension.
The main extension is in the process of a massive rewrite: #30033

@holomekc holomekc closed this May 10, 2024
@quarkus-bot quarkus-bot bot added the triage/invalid This doesn't seem right label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core area/dependencies Pull requests that update a dependency file area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins area/documentation area/tracing triage/invalid This doesn't seem right triage/needs-rebase This PR needs to be rebased first because it has merge conflicts
Projects
Development

Successfully merging this pull request may close these issues.

5 participants