Add ForceFlush to the TracerProvider Trace API specification #2944

richardmcmillen · 2022-11-14T08:06:33Z

What are you trying to achieve?

I expected to see that the OpenTelemetry Trace API specify that TracerProvider should implement ForceFlush

My proposal is to specify that the Trace API TracerProvider should implement ForceFlush as a no-op which always reports success.

Additional context.

The Trace SDK TracerProvider specification contain sections on Shutdown and ForceFlush

Neither of these are specified in the API.

Issue for adding Shutdown to the SDK spec: Add a Shutdown() method to *Provider #1073
Issue for adding ForceFlush to the SDK spec: Add ForceFlush to Tracing SDK #1451

An issue that was created as a result of some of the above proposals and subsequent PR: #1457

As an instrumentation author, writing instrumentation for libraries. I know when a library will perform work and then exit, with this knowledge I would like to ensure spans are exported by using ForceFlush. However, when this is only implemented in the SDK I can not write my instrumentation code purely against the API.

Here is a discussion I had for opentelemetry-ruby: open-telemetry/opentelemetry-ruby#1401

My understanding that as instrumentation authors we should be depending on only the API, I would consider ForceFlush a method that instrumentation authors would want to perform.

The text was updated successfully, but these errors were encountered:

ahayworth · 2022-11-14T15:29:14Z

For context, the discussion linked above ☝️ talks about how two of our ruby instrumentation libraries inadvertently rely upon an SDK tracer method (rather than solely using API methods). One such example is here

The motivation for that reliance in the instrumentation is because in some contexts (background jobs, task runners, etc) - the language runtime quickly exits after finishing work and may not live long enough to flush out spans batched for export. The instrumentation wanted to provide a convenience for users to ensure that spans would be exported.

We have options to clean this up on our end (including some that preserve the functionality via ✨ metaprogramming ✨), but we did wonder why these weren't part of the API to begin with.

reyang · 2022-11-18T04:45:29Z

Could you use something like this? https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Extensions#autoflushactivityprocessor

reyang · 2022-11-18T04:54:35Z

I can understand why folks would want to have ForceFlush in the API, similar to many I/O APIs that provide flush().

I would vote for not having this because the negative impact to the ecosystem:

ForceFlush allows some timeout, it is impossible for instrumentation libraries to know the application developers' intention regarding timeout.
Imagine a developer using dozens of instrumentation libraries and all of them use ForceFlush somewhere, it'll be super inefficient and hard to troubleshoot random latency/contention.
What if ForceFlush failed when the instrumentation library called it? Most libraries will have no idea since the application is the one which decides the expected behavior in most cases.

richardmcmillen · 2022-11-19T10:32:20Z

Thanks for the response to the issue @reyang, I’ll have a go at responding to your commented below.

ForceFlush allows some timeout, it is impossible for instrumentation libraries to know the application developers' intention regarding timeout.

I think what you are saying here is that instrumentation libraries can’t assume timeout values. To this point I would suggest that instrumentation library authors should allow this timeout to be configurable. When the instrumentation is installed there would be some mechanism to say, timeout after 5 seconds.

Imagine a developer using dozens of instrumentation libraries and all of them use ForceFlush somewhere, it'll be super inefficient and hard to troubleshoot random latency/contention.

To a large extent I think we trust instrumentation authors to do the right thing. I don’t think ForceFlush will be commonly called in instrumentation libraries. But if an instrumentation author knows the code they are tracing will exit its process, possibly stopping batched spans from being exported, I think it is reasonable to give them the affordance to flush spans. And not put the burden on the end users have to wire this up.

Instrumentation authors could do all sorts of chaotic things that are or are not in the specification. Some guidance on when to make calls to force flush could help with this concern. If the author is in any doubt it would definitely be advisable for instrumentation authors to make the force flushing optional.

For your specific concern, in what circumstance do you think multiple instrumentation authors would be calling ForceFlush? Perhaps I am missing how common this would be.

What if ForceFlush failed when the instrumentation library called it? Most libraries will have no idea since the application is the one which decides the expected behavior in most cases.

I am not quite sure what you are getting at here. Could you expand on this a little? If I was to guess I think you mean, if it is deep in instrumentation code that the force flush fails, the end user applicatoin might have a hard time knowing that this happened. You are concerned about how instrumentation code should behave when a force flush fails, you think this should be a an end user responsibility? Is this correct?

Again, thanks for the discussion. I am not totally across the philosophy of the specification, API vs SDK, so please take all my comments as purely an attempt to improve my understanding of your concerns.

reyang · 2022-11-21T19:41:55Z

Short version, I feel ForceFlush is more like an SDK feature, I suggest that we get more evidence before considering it in the API specification, so we know that 1) there is a high demand 2) the outcome is greater than the problem it'll introduce.

@richardmcmillen I suggest that you reach out to other community members and users to collect the requirements and see how important this is.

ahayworth · 2022-11-21T22:33:43Z

Could you use something like this? https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Extensions#autoflushactivityprocessor

Maybe? Flushing in general still requires SDK-only methods, so even if we used some kind of processor to accomplish it we'd still call SDK-only methods.

That said, it's a great idea overall - I could see this kind of processor being useful for folks. We might implement it regardless.

[all of your other comments]
I suggest that we get more evidence before considering it in the API specification, so we know that 1) there is a high demand 2) the outcome is greater than the problem it'll introduce.

Your concerns make sense to me; and I agree that we should collect more information to see if anyone else even needs this. It's possible that in trying to make life easy for users, we might have actually gone too far 😆

(And in any case, I think we can work around it in the ruby repo with magic for now, so we can wait to see what the broader community thinks).

richardmcmillen added the spec:trace Related to the specification/trace directory label Nov 14, 2022

github-actions bot assigned jmacd Nov 14, 2022

This was referenced Nov 14, 2022

ActiveJob instrumentation relies on SDK method instead of API only open-telemetry/opentelemetry-ruby-contrib#189

Closed

Rake instrumentation relies on SDK method instead of API only open-telemetry/opentelemetry-ruby-contrib#190

Closed

arminru added the area:sdk Related to the SDK label Nov 14, 2022

reyang added the [label deprecated] triaged-needmoreinfo [label deprecated] The issue is triaged - the OTel community needs more information to decide label Nov 18, 2022

yurishkuro mentioned this issue Dec 19, 2022

OpenTracing Shim: Add Tracer.Close() operation. #3048

Merged

Nevay mentioned this issue Jan 25, 2023

TracerProvider::forceFlush should be an API method? open-telemetry/opentelemetry-php#911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ForceFlush to the TracerProvider Trace API specification #2944

Add ForceFlush to the TracerProvider Trace API specification #2944

richardmcmillen commented Nov 14, 2022 •

edited

Loading

ahayworth commented Nov 14, 2022 •

edited

Loading

reyang commented Nov 18, 2022 •

edited

Loading

reyang commented Nov 18, 2022

richardmcmillen commented Nov 19, 2022 •

edited

Loading

reyang commented Nov 21, 2022 •

edited

Loading

ahayworth commented Nov 21, 2022

Add ForceFlush to the TracerProvider Trace API specification #2944

Add ForceFlush to the TracerProvider Trace API specification #2944

Comments

richardmcmillen commented Nov 14, 2022 • edited Loading

ahayworth commented Nov 14, 2022 • edited Loading

reyang commented Nov 18, 2022 • edited Loading

reyang commented Nov 18, 2022

richardmcmillen commented Nov 19, 2022 • edited Loading

reyang commented Nov 21, 2022 • edited Loading

ahayworth commented Nov 21, 2022

richardmcmillen commented Nov 14, 2022 •

edited

Loading

ahayworth commented Nov 14, 2022 •

edited

Loading

reyang commented Nov 18, 2022 •

edited

Loading

richardmcmillen commented Nov 19, 2022 •

edited

Loading

reyang commented Nov 21, 2022 •

edited

Loading