-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow samplers to be called during different moments in the Span lifetime #307
Comments
If I were a sampler with enough sophistication to respond to each of these events in a span lifecycle, I might also like to maintain state between the calls in the span processor state itself. This feels like something for a specialized SDK, not something for a public sampler API IMO. |
We've hit some questions (open-telemetry/opentelemetry-go#224) that I think are related with this issue. By the current specs, if a span that was initially defined as To have a span that contains all data we would need to keep every event, attribute, status and etc from every span, if we are doing this what is the point of the |
I would like to propose a different solution. Presently, there are open issues in the Go repository about removing the WithRecord option and about the UpdateName vs SetName debate, which are both connected with this issue. Instead of "re-triggering" the sampler API or re-evaluating it at different moments during the span lifetime, I see two good options: -- --
This addresses the confusion. A sampler can make relatively simplistic decisions up front, or it can make fully-informed decisions at the end of the span, but that is all it can do. This is, after all, the simple case. This suggests that there should be TWO calls in the Sampler API. At the start, decide whether to record. At the end, decide whether to sample. |
Note that when the Sampler (or the exporter for that matter) decides to drop the Span only at step 2, the Span's context may already have been propagated with an Sampled=true TraceOptions flag. I.e., if the decision changes from sampled in to sampled out any time after span creation (step 1), then you will probably get broken traces. |
@Oberon00 what is this different than someone updating a span names and it be sampled out after context propagation may happen? If sampler happens anywhere after creation there will be always chance for broken traces. |
@paivagustavo: Exactly that was my point 😃 EDIT: My understanding of the current sampling approach is that sampling happens immediately at span creation only. |
AFAIK this is something that tracers like Jaeger already might run into, and this is done on at their own risk ;)
I really like the second option. Given that a lot of things have been added lately, and more of them will, I'd stick to the simplest, correct choice whenever possible. |
Remember that there are more uses for a Span than just constructing "complete" traces. If I want to study the statistical properties of a certain operation, let's say, I might decide to record all Spans and use some kind of latency-weighted sampling logic, for example. There's no requirement to collect a full trace, and I'd like to have control over whether spans are recorded or not. Z-pages are the other important use-case. |
Moving this issue to the next milestone as there's no consensus here yet. |
Responding to @jmacd's #307 (comment) about restricting sampler to two decisions only, instead of multiple decisions proposed in the issue description. Restricting the second decision only to span.finish means that any other activity the request has performed that resulted in downstream RPCs will not be recorded if the original decision as No, since it will be propagated to other services. In contrast, allowing multiple sampling decisions on every update of the span means the information needed for a Yes decision (like a certain tag) can become available early in the span life cycle. |
FYI I've created an OTEP around this: open-telemetry/oteps#115. Will attempt to get discussions moving, so review is much appreciated! |
Just to clarify my earlier remark, I had described two descisions, but not two sampling decisions. One was "attach a real span object vs a no-op span", which equates with "Maybe sample" or "Definitely do not sample"--this is a decision to possibly sample the span before all the information is available. If the first decision is No, there will be no information recorded. The second decision was a tail-sampling decision where all the information is available. I view my suggestion is pretty close to open-telemetry/oteps#115. |
Given that open-telemetry/oteps#115 has no approvals yet and there is no clear understanding of how this issue will be resolved do we still believe this is a must-have for 1.0 GA? It appears to me this functionality can be added in backwards compatible manner and if so can be postponed and addressed after GA. Thoughts? |
I think it's fine to leave until after GA, since my main concern (discouraging calling update_name) has been removed from the spec. That said the OTEP as it stands has either addressed all feedback or waiting on responses from others. If someone can help guide me in how to get more eyes I'm happy to get right back to working on it. |
Currently Sampler is called only
onSpanCreation
.Based on the comment #296 (comment) we should consider to allow:
onStart
- span creation timeonSpanNameUpdate
- whenupdateName
is calledonSetAttribute
- whensetAttribute
is calledonEnd
- whenend
is called.The text was updated successfully, but these errors were encountered: