Sync and Async children (FOLLOWS_FROM) #65

tedsuo · 2019-04-22T16:36:33Z

In OpenTracing, we have CHILD_OF and FOLLOWS_FROM. In the new project, we are considering whether to include this concept as a flag on the SpanBuilder option when setting the span parent. The new naming is proposed to be sync and async children, to make the relationship more clear.

Reference PR: open-telemetry/opentelemetry-java#130

Questions:

Do we still want this at all? It can be useful for critical path and other types of trace analysis.
Do we also need an unknown flag as well?

The text was updated successfully, but these errors were encountered:

tylerbenson · 2019-05-02T21:26:54Z

@tedsuo How would you label server span with a parent propagated from an http request?

Perhaps a better name would be direct/indirect?
(Anything pulled directly from the current context would be direct otherwise indirect)

Perhaps that wouldn't give you the info required for critical path analysis you need... part of the problem is only the caller really knows if they're blocking, where the link/relationship is established by the callee.

{I guess some of this was already discussed in #14}

rochdev · 2019-05-22T16:04:28Z

I don't think sync and async are the correct semantics for this. My understanding of ChildOf and FollowsFrom is that ChildOf is the previous operation that directly created the new span, and FollowsFrom is any previous operation that indirectly caused the new span. This is unrelated to the references being asynchronous or not. Both potentially have value but they are conceptually different.

An example of this in JavaScript is promises. The ChildOf would generally be the span in the scope where promise.then() was called, and FollowsFrom would be the span where resolve() was called for example.

My take on the questions above:

Do we still want this at all? It can be useful for critical path and other types of trace analysis.

I think both ChildOf/FollowsFrom and sync/async potentially have value for different reasons.

Do we also need an unknown flag as well?

What is the case where this would not be known? I think this is something that is always known in advance.

carlosalberto · 2019-06-03T21:02:04Z

@rochdev Hey, sorry for the late answer.

Agreed with what you wrote - but what names you think we should use? @tylerbenson already mentioned direct/indirect as options, and if you have something in mind feel free to propose it.

rochdev · 2019-06-03T22:29:05Z

I think ChildOf and FollowsFrom made a lot of sense. To be honest, semantic-wise I think OpenTracing got a lot of things right.

How was this relationship called in OpenCensus?

bogdandrutu · 2019-06-03T22:43:16Z

@rochdev I am worried that the understanding that of childOf and followsFrom is different for you than for others. I think what you explained is different than what others explained to me, I am very confused now about what is the correct meaning of childOf vs followsFrom.

SergeyKanzhelev · 2019-06-03T22:57:27Z

Moving to the API revision milestone on specification. We need more feedback collected

rochdev · 2019-06-03T23:00:46Z

In Node this concept is very important and is core to how context propagation works in the runtime itself. For this purpose specifically, we call them execution and trigger. In Node you can only have one of each because it's limited to function calls, but from a tracing perspective it makes sense that you could follow from multiple different operations.

Let me give an example specific to Node at the language level:

const promise = new Promise((resolve, reject) => {
  resolve() // execution ID here it 2
})

// execution ID here is 1
promise.then(() => {
  // here, execution ID is 1 and trigger ID is 2
})

The reasoning for the above is that resolve() is what triggered the execution of the callback, but the callback was actually registered when then() was called, so that's its execution parent.

For a case like this, we could then say that the callback is running as a ChildOf the context where then() was called, and FollowsFrom the context where resolve() was called.

It's possible I got this completely wrong and that ChildOf/FollowsFrom has nothing to do with the relationship described above. I think the best person to explain the real meaning of these is probably @tedsuo.

In general, I think the different wordings proposed in this thread make sense, but they don't necessarily map 1:1 with each other.

yurishkuro · 2019-06-04T00:22:12Z

The problem here is slightly bigger than what to call these. Kudos @rochdev for thinking OpenTracing got it right, but it didn't, not quite (cf. this blog post). The most fundamental question in analyzing the graph of events is the Lamport's happens-before relationship. In the OpenTracing span model the following holds:

parent.start  happens-before child.start

That's it! Neither child-of nor follows-from imply any further causality. Child-of only means parent depends on the outcome of child, in some way. It doesn't mean the parent is blocked - it can be doing other things (thus sync/async naming isn't quite right). It doesn't mean child completes before parent - it looks this way, but parent (RPC caller) may timeout before child (RPC server). In case of such a timeout, OpenTracing does not have a convention on how parent should record that fact (sad face).

The difference between child-of and follows-from is useful, in practice, for calculating critical path, but strictly speaking that calculation is not possible since the causality is not captured between the ends of spans, so critical path can only be calculated via a heuristic (I would love to be disproven on this!).

Another odd thing about child-of and follows-from is that it's the child span that defines this reference type, even though it talks about parent's dependency on child outcome. If you're a remote server, how do you even know if parent/caller does or does not depend on your outcome? I tend to think of this as the nature of the protocol: producer of a message to Kafka does not respect any response, so the receiver should use follows-from. Sender of HTTP request does expect a response, so the server always uses child-of, even of the sender doesn't care about the outcome - in that case it can internally create a follows-from span first, and then a normal pair of RPC call spans. So it's possible to rationalize this way, but it's still kind of dirty.

Of course, there's always the argument that OpenTelemetry 1.0 is not supposed to improve upon OpenTracing/OpenCensus (convergence is more important than improvements), in which case it doesn't matter much what we call these, because the model would need to be revisited anyway.

rochdev · 2019-06-04T01:28:09Z

Thanks for the clarification @yurishkuro! It sounds like this is a larger discussion then. Would it make more sense to wait then instead of implementing this knowing that it's not necessarily the correct way to handle this relationship?

Of course, this depends on whether users are currently depending on the feature. If that's the case, then I think we should get more information about exactly how it's used which would give us a better understanding of how the currently used feature should be called.

AlexanderWert · 2019-09-09T05:49:16Z

I think a flag for sync / async (blocking / non-blocking) spans would be very useful for trace analysis. In this way you could much easier and quicker identify hot spots along the critical path and, thus, the "root causes" for long trace timings.
Such a flag is important because it indicates whether a child span's time is included in the parents time or not. Thus, whether a parent span is slow because of the child span or independently of the child span. Without such a flag, this question cannot be answered.

I agree that in many cases only the caller knows whether a child span is blocking or not, but, in such cases this information could be propagated with the context accordingly, so this information could be added at the child span's side.

Oberon00 · 2019-09-09T11:03:15Z

To fully capture all (or at least more) of the possible relationships between Spans, in addition to the create(parent) API, we would need APIs that signal that a span begins/ends waiting for a particular (set of) child(ren) and possibly even that it consumes the result of a particular child. Of course we would need to identify a child on the parent side without the child communicating back it's span ID, which is a whole other problem (but can probably be solved elegantly by introducing IDs not only for the nodes but also the edges in the span graph). Heck, you can even wait synchronously (occupying a thread) or asynchronously (by using something like async/await where other operations can be scheduled while a different one waits for I/O).

To show some difficult cases (pseudo C#) :

// Start async (the child has no idea whether we used a blocking API or an async
// one -- nor should it have to know).
var request = myClient.GetAsync("http://example.com/myAPI"); 
var myCalcResult = /* some expensive calculation */; // Could be its own subspan

// Block for child request, but not indefinitely
var maybeResult = await request.withTimeout(500);
if (maybeResult.HasValue) {
  renderCompletion(myCalcResult, maybeResult.Value); // Consume result
} else {
  renderPleaseWait();

  // Another operation/span will consume the result (the handle-response part of the
  // client could become a child of the server Span, or it could be an independent
  // root span with a CONSUMES relation to the server span).
  delayedRequests.AddPending(new PendingInfo(myCalcResult, request)); 
}

SergeyKanzhelev · 2019-10-03T05:21:10Z

Moving to v0.3

jmacd · 2020-01-22T16:53:08Z

Closing this as being accomplished through Links in the current spec.

yurishkuro · 2020-01-22T17:07:38Z

Links don't address the issue in this ticket. I suggest to close #86 instead, because there's more discussions here.

carlosalberto · 2020-09-02T16:06:13Z

After giving this a try via #906, we decided to postpone it in order to develop it properly. Re-labeling it so we can add this feature after GA.

)

dotNetDR · 2022-04-18T06:37:04Z

any update on this?

h1z3y3 · 2022-04-25T13:15:03Z

+1

austinlparker · 2024-04-23T20:32:41Z

Solved via span links.

…pen-telemetry#65) OTLP proposal originally used a separate ResultCode enumeration for server to tell the client whether the failed request can be retried or no. After discussion here open-telemetry/opentelemetry-proto#47 (comment) it became clear that the goal can be achieved using gRPC status codes without a need for custom enumeration. This change removes the ResultCode and explains how to use gRPC status codes. Co-authored-by: Yuri Shkuro <[email protected]>

tedsuo changed the title ~~Boolean field to set a Span as asynchronous~~ Sync and Async children (FOLLOWS_FROM) Apr 22, 2019

SergeyKanzhelev transferred this issue from open-telemetry/opentelemetry-java Jun 3, 2019

SergeyKanzhelev added the area:api Cross language API specification issue label Jun 3, 2019

SergeyKanzhelev added this to the API revision: 07-2019 milestone Jun 3, 2019

iredelmeier added spec:context Related to the specification/context directory area:sampling Related to trace sampling labels Jul 30, 2019

iredelmeier added the area:span-relationships Related to span relationships label Aug 13, 2019

AlexanderWert mentioned this issue Sep 9, 2019

[Question] Additional analysis view for detailed trace overview and span statistics jaegertracing/jaeger#1779

Open

SergeyKanzhelev removed this from the API revision: 07-2019 milestone Sep 27, 2019

SergeyKanzhelev added this to the Alpha v0.3 milestone Oct 3, 2019

SergeyKanzhelev mentioned this issue Oct 3, 2019

should not allow Linking spans belonging to same trace #207

Closed

rghetia mentioned this issue Oct 4, 2019

Exporter: export Link from SpanData in Jaeger exporter open-telemetry/opentelemetry-go#165

Closed

haf mentioned this issue Oct 7, 2019

clarifies parent and links #283

Merged

SergeyKanzhelev mentioned this issue Nov 1, 2019

Clarify the interpretation of SpanKind #337

Merged

SergeyKanzhelev mentioned this issue Dec 18, 2019

Link type is missing from the protocol open-telemetry/opentelemetry-proto#86

Closed

jmacd closed this as completed Jan 22, 2020

reyang mentioned this issue Jul 29, 2020

End-to-end example applications open-telemetry/opentelemetry-dotnet#936

Merged

Oberon00 mentioned this issue Aug 3, 2020

Formalize the translation of OpenTracing references to span parent and links #562

Open

andrewhsu assigned andrewhsu and carlosalberto and unassigned andrewhsu Aug 4, 2020

carlosalberto mentioned this issue Aug 10, 2020

Add Initial OpenTracing compatibility requirements #768

Closed

eyakimov-bbg mentioned this issue Aug 18, 2020

Non-immediate and Non-causal Links between Spans #826

Open

carlosalberto mentioned this issue Sep 1, 2020

Add ParentReferenceType. #906

Closed

andrewhsu added priority:p2 Medium priority level and removed priority:p1 Highest priority level labels Sep 1, 2020

carlosalberto added release:after-ga Not required before GA release, and not going to work on before GA and removed priority:p2 Medium priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Sep 2, 2020

arminru mentioned this issue Sep 16, 2020

Messaging Batch Receive - swap parent / link? #958

Closed

Oberon00 mentioned this issue Oct 12, 2020

Questions about span relationships in messaging semantic convention #1085

Closed

TuckTuckFloof pushed a commit to TuckTuckFloof/opentelemetry-specification that referenced this issue Oct 15, 2020

Plug the SpanProcessor into the Tracer implementation (open-telemetry#65

132aca1

)

catostrophe mentioned this issue Nov 30, 2020

Wrong SpanRefType mapping in JaegerSpanExporter trace4cats/trace4cats#151

Closed

kotharironak mentioned this issue Jun 4, 2021

handling spans related to async trace having follow_from construct hypertrace/hypertrace#250

Open

kenfinnigan mentioned this issue Jul 7, 2021

Update Kafka messaging example #1799

Closed

alnr mentioned this issue Dec 20, 2022

fix: spurious cancelation of async webhooks, better tracing ory/kratos#2969

Merged

joe-elliott mentioned this issue Jan 9, 2023

Generating trace-level metrics grafana/tempo#1968

Closed

austinlparker closed this as completed Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync and Async children (FOLLOWS_FROM) #65

Sync and Async children (FOLLOWS_FROM) #65

tedsuo commented Apr 22, 2019

tylerbenson commented May 2, 2019 •

edited

Loading

rochdev commented May 22, 2019

carlosalberto commented Jun 3, 2019

rochdev commented Jun 3, 2019

bogdandrutu commented Jun 3, 2019

SergeyKanzhelev commented Jun 3, 2019

rochdev commented Jun 3, 2019

yurishkuro commented Jun 4, 2019

rochdev commented Jun 4, 2019

AlexanderWert commented Sep 9, 2019

Oberon00 commented Sep 9, 2019 •

edited

Loading

SergeyKanzhelev commented Oct 3, 2019

jmacd commented Jan 22, 2020

yurishkuro commented Jan 22, 2020

carlosalberto commented Sep 2, 2020

dotNetDR commented Apr 18, 2022

h1z3y3 commented Apr 25, 2022

austinlparker commented Apr 23, 2024

Sync and Async children (FOLLOWS_FROM) #65

Sync and Async children (FOLLOWS_FROM) #65

Comments

tedsuo commented Apr 22, 2019

tylerbenson commented May 2, 2019 • edited Loading

rochdev commented May 22, 2019

carlosalberto commented Jun 3, 2019

rochdev commented Jun 3, 2019

bogdandrutu commented Jun 3, 2019

SergeyKanzhelev commented Jun 3, 2019

rochdev commented Jun 3, 2019

yurishkuro commented Jun 4, 2019

rochdev commented Jun 4, 2019

AlexanderWert commented Sep 9, 2019

Oberon00 commented Sep 9, 2019 • edited Loading

SergeyKanzhelev commented Oct 3, 2019

jmacd commented Jan 22, 2020

yurishkuro commented Jan 22, 2020

carlosalberto commented Sep 2, 2020

dotNetDR commented Apr 18, 2022

h1z3y3 commented Apr 25, 2022

austinlparker commented Apr 23, 2024

tylerbenson commented May 2, 2019 •

edited

Loading

Oberon00 commented Sep 9, 2019 •

edited

Loading