-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context Propagation - OpenTelemetry Integration #7351
Comments
@reta @Gaganjuneja @Bukhtawar @backslasht would like to know what you guys think about the framework/consumer contract set here. Thanks. |
Sample usage in source peer recovery code -
- new ActionListenerResponseHandler<>(listener, reader, ThreadPool.Names.GENERIC)
+ new ActionListenerResponseHandler<>(
+ new OTelContextPreservingActionListener<>(listener, Context.current()),
+ reader, ThreadPool.Names.GENERIC)
+ if (OpenTelemetryService.isThreadPoolAllowed(name())) {
+ executor = OpenTelemetryContextWrapper.wrapTask(executor);
+ } - recover(request, new ChannelActionListener<>(channel, Actions.START_RECOVERY, request));
+ BiFunction<Object[], ActionListener<?>, Void> recoverFunction = (args, actionListener) -> {
+ recover((StartRecoveryRequest) args[0], (ActionListener<RecoveryResponse>) actionListener);
+ return null;
+ };
+ OpenTelemetryService.callFunctionAndStartSpan("recover", recoverFunction,
+ new ChannelActionListener<>(channel, Actions.START_RECOVERY, request), request); source |
Here is the complete code change for source peer recover and all framework changes described here. It produces following spans - |
@rishabhmaurya thanks for the summary, wrapping action listeners and thread pools (1st and 2nd) use cases make perfect sense, the
I believe this is always the case: the span should be closed on response or failure, how it is different from the 1st option you have described? |
@reta thanks for the review.
You're right about the span closure. The first option is just for context propagation and isn't tied to starting a new span and closing them, so if there is an existing context and some asynchronous flow is starting, wrapping ActionListener will ensure that on callback the context is reattached. public static <R> void callFunctionAndStartSpan(String spanName, BiFunction<Object[], ActionListener<?>, R> function,
ActionListener<?> actionListener, Object... args) {
Context beforeAttach = Context.current();
Span span = startSpan(spanName);
try(Scope ignored = span.makeCurrent()) {
actionListener = new OTelContextPreservingActionListener<>(actionListener, beforeAttach, span.getSpanContext().getSpanId());
function.apply(args, actionListener);
}
} on callback @Override
public void onResponse(Response r) {
try (Scope ignored = Objects.requireNonNull(afterAttachContext).makeCurrent()) {
Span span = Span.current();
closeCurrentScope(span);
}
try (Scope ignored = Objects.requireNonNull(beforeAttachContext).makeCurrent()) {
delegate.onResponse(r);
}
} In first case, the before and after attach context would be same as we are not starting a new span. |
Thanks @rishabhmaurya , I believe the span could be started by the method itself ( |
I was a bit hesitant as well and felt like its more of a utility method than a requirement to be part of framework. Given 1 - public OTelContextPreservingActionListener(ActionListener<Response> delegate, Context beforeAttachContext, String spanID) can be used to achieve 3, I will remove it from the framework. public OTelContextPreservingActionListener(ActionListener<Response> delegate) I was also thinking if this can be annotated and if the last arg is an actionListener, something like - @StartSpan("spanName")
fun(arg1, arg2, actionListener) {} However, I will park it for now and will create a separate discussion to avoid code pollution to discuss short hand notations and annotations which can be used. |
I will wait for basic open telemetry framework changes being implemented in #7026 to be pushed into the feature branch before raising the PR here. |
@rishabhmaurya, tracing framework would take care of the context propagation automatically through ThreadContext. Tracing aware implementation of ActionListener makes sense in some scenarios like where same thread is submitting multiple async tasks. TracingAwareActionListener should be able to call the framework to start and end the span. |
@Gaganjuneja I somehow missed your comment earlier. Thanks for the review.
Assuming someone started a span using tracing framework, this would internally translate into following -
OR you mean to modify the default behavior of Span by making changes in OpenTelemetry classes to use ThreadContext instead of directly interacting with the ThreadLocal? This means we have to ensure with each new version of OpenTelemtry the conversion logic is compatible. I'm not against using ThreadContext but I think using ThreadContext, we are introducing a translation layer to translate context back and forth from opentelemetry way <-> ThreadContext, which can have maintenance overhead with otel versions and is an additional step which can be avoided. I maybe completely wrong here though, thus, it would be great if you could elaborate more on the approach you are thinking to help us take a better call here. Also, did you agree upon we need something like 1) which you referred as |
Hi @rishabhmaurya, I have written a detailed prototype here #7026. Idea is to just keep the current span in the thread context. For TracingAwareActionListener, yes we would be needing this but TracingAwareActionListener should take care of staring and closing the span. |
@Gaganjuneja Yes, I read it and it wasn't clear there thus I asked the question above. Would be great if you can answer above and more details on implementation of
this is discussed above and I think it shouldn't start and close the span automatically as a default behavior. |
@rishabhmaurya Sure, There is a slight change post discussion on the same thread. Now we will be storing the current span in the ThreadContext. Span would be our own implementation which contains the OtelSpan and the parent span. So here, ThreadContext would take care propagating the current span to new threads and other data node calls. The framework would take care of updating the current span based on the start and end span calls.
Yes, my bad start should be in the method itself but the end would be inside the listener onSuccess and onFailure methods. thoughts? |
Span context isn't propagated automatically when using OpenTelemetry manual instrumentation. It needs to be propagated whenever thread context switch happens (as they are stored in ThreadLocal) to avoid losing it down the execution. It needs to be propagated for following cases -
The basic idea is - If thread has a context, it should pass the context to the next thread in case of context switch and preserve it for the non-blocking IO operation cases. The context propagation logic shouldn't be left to the consumer to incorporate and should be taken care by framework wherever possible.
For 1, the OpenSearch ActionListener interface is heavily used throughout the codebase. Thus, preserving the context there should take care of all the cases.
For 2, again all non-blocking IO calls are wrapped around ActionListener interface, thus preserving the context around the ActionListener for such calls will ensure whichever thread picks the task onResponse() of the IO event, will have the Context.
For 1, most of the times, it is DirectExecutorService which is used to spawn a thread asynchronously using the same executor service, however, consumer can also use ThreadPool instance to directly get the desired executor service and execute the asynchronous step using it. This usage pattern heavily depends on the business logic and thus will be left to the consumer of the framework to wrap such ActionListeners to preserve the context when needed.
For 2, which is less business logic oriented, there are limited cases and depends on the nature of the IO operation. For example, FileChannel is blocking IO and thus doesn't need to be incorporated. Whereas network calls, which are managed through transport actions and are, majority of the times, non-blocking in nature, use ActionListener interface, where the context needs to be preserved. Thus, wrapping the ActionListner in TransportService before execution the transport action to preserve the context, should take care of majority of the cases. All such cases will be taken care of by framework on demand basis.
Let's summarize the above in terms of requirements from the framework (more details can be found in code changes associated)-
OTelContextPresevingActionListener(ActionListener delegate)
- wrapper over ActionListener to preserve the context.OTelContextPresevingExecutorService(ExecutorService delegate)
- wrapper over ExecutorService to automatically stash the context and pass it over to the next thread when it starts work on the task.Above figure illustrates the expected life cycle of a span, which is started in the middle of an execution of a task, started by
Thread 1
(ofExecutorService 1
). It performs an IO operation, and on completion of IO,Thread 2
(ofExecutorService 1
) resumes the task.Thread 2
also does an IO operation and task is resumed later by theThread 3
(ofExecutorService 1
) on IO completion.Thread 3
performs 3 different sub-tasks, which it executes in parallel by spawning three different threads in different executor services, and is non-blocking in nature, so it continues to work on rest of the task, while sub-tasks are delegated to different threads. However, it does wait for sub-tasks completion, without blocking the thread, and on response to the last sub-task, the task is resumed byThread 2
(ofExecutorService 1
) to its completion and the Span is closed somewhere in between.ExecutorService 3
, isn't onboarded to preserve the context, thus it doesn't have the span propagated (no blue strip). However, the ActionListener used by main task while spawning threads to execute the sub-tasks, was context preserving, thusThread 2
(ofExecutorService 1
) was able to resume the span on response on the last sub-task.In many use cases, the Span closure is tightly coupled to the completion of async step, this brings to the third requirement from the framework i.e. an option to automatically close the span on response to the ActionListener. To keep it simple and clean, the StartSpan API can be extended for such cases. For such cases the contract is, consumer must move all logic of a span into a function, which takes span details (span to be started), function and an ActionListener, to be wrapped with span closure logic, as arguments.
StartSpan(SpanDetails, Function, FunArgs, ActionListener)
- This will invoke the function and wraps the ActionListener using OTelContextPresevingActionListener and closes the Span on Respose/Failure of ActionListener.These requirements needs to be incorporated in the overall initiative of Tracing framework APIs in OpenSearch #7026 and #6750
The text was updated successfully, but these errors were encountered: