Skip to content
This repository has been archived by the owner on Jul 1, 2022. It is now read-only.

Provide a way to sample entire trace which contains span with tag "error=true" #465

Closed
quaff opened this issue Jun 25, 2018 · 7 comments
Closed

Comments

@quaff
Copy link
Contributor

quaff commented Jun 25, 2018

Requirement - what kind of business use case are you trying to solve?

I wish jaeger can provide a way to sample entire trace (or at lease all spans in the same service) which contains span with tag "error=true".

We have 2 requirements:

  1. Reasonable sample strategy like ProbabilisticSampler for normal requests.
  2. ConstSampler for abnormal requests which has error spans.

Problem - what in Jaeger blocks you from solving the requirement?

Currently, If I want a trace be sampled definitely, It has three ways:

  1. use ConstSampler, it breaks requirement 1.
  2. set a positive sampling.priority in span if error occur, but it will not sampled if parent span is not sampled.
  3. attach a debugId before start, but we can not know request is abnormal before execute it.
@jpkrohling
Copy link
Collaborator

This is mentioned on the roadmap, item "Post-Trace Sampling" (we called it also "tail-based sampling", IIRC).

This is not really something for the Java client, but rather, something for the Agent/Collector. In fact, it looks like there's an issue for it already: jaegertracing/jaeger#425 . We would love to get you involved in the discussion!

I'm therefore closing this issue here. Feel free to reopen if you think this is relevant for the client as well.

@yurishkuro
Copy link
Member

There is a simpler solution that I think is being asked here where we only capture spans within the service once an error happens. We could also push the sampling decision back up to the callers, but OpenTracing doesn't really provide backwards propagation API.

For the same-service spans, it would require rewriting how the reporter batches the spans. Right now it keeps just list and flushes it when the desired size is reached. It would have to change to keeping spans per-trace and only flushing them when the top-level span in the service is finished and the sampling decision has been overwritten to YES (e.g. by error=true tag).

Btw, from our experience at Uber, error=true is not a very reliable indicator, i.e. we have plenty of traces that have this tag, but the traces otherwise are successful, due to retries, etc. So implementing this on-error sampling might cause unexpected inflow of traces to the backend.

@quaff
Copy link
Contributor Author

quaff commented Jun 26, 2018

It would have to change to keeping spans per-trace and only flushing them when the top-level span in the service is finished and the sampling decision has been overwritten to YES (e.g. by error=true tag).

Agreed totally! this is related to java client not Agent/Collector.
I set error=true only if it's actual error(e.g. catch Exception), You can make it as an option of tracer and default to false.

@jpkrohling
Copy link
Collaborator

Sorry, that indeed makes sense. I'm reopening this issue.

@sdanzo
Copy link

sdanzo commented Jan 5, 2019

@jpkrohling -- Is anyone working on this issue? If not, I may be interested in doing the work.

@yurishkuro
Copy link
Member

afaik nobody is working on it. If you want to take a crack at it, I would recommend first posting a proposed design here.

@doctorpangloss
Copy link

doctorpangloss commented Feb 24, 2021

This seems like a pretty useful thing. Any opinions on what to do?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants