tail_sampling drops `otel.library.name` span attribute #13642

ryepup · 2022-08-26T17:32:53Z

Describe the bug

Spans that pass through a tail_sampling processor (even always_sample) have their otel.library.name attribute removed.

Steps to reproduce

Create an otel-collector configured with two receivers, one that does tail_sampling and one that doesn't. Send a trace to each, using go.opentelemetry.io/otel. I've packaged this up with docker-compose in a gist.

download files from https://gist.github.com/ryepup/d70eebd8b0e6bb1a4578ac7c06a842cf
docker-compose up
open grafana http://localhost:3000/explore
go to the search tab, then hit "run query", you should see two traces

What did you expect to see?

Both traces have a otel.library.name attribute that matches the name given to TracerProvider.Tracer

What did you see instead?

The trace that went through a pipeline with tail_sampling did not have an otel.library.name

What version did you use?

dockerized otel/opentelemetry-collector-contrib:0.58.0

What config did you use?

https://gist.github.com/ryepup/d70eebd8b0e6bb1a4578ac7c06a842cf#file-otel-yaml

Environment

Seen in dockerized local development on osx and linux-based kubernetes on azure.

Additional context

My production code is passing through https://github.com/grafana/agent, but the problem is reproducible without that layer.

Other span attributes seem unaffected, even when added via another processor.

The text was updated successfully, but these errors were encountered:

jpkrohling · 2022-08-29T17:09:57Z

This is tricky: we are re-assembling the spans for a trace in a new ResourceSpans object, which likely came from different sources and with differing instrumentation libraries. We could reorganize things in a single pdata.Traces with several ResourceSpans inside them, one for each unique Resource+InstrumentationLibrary, but I feel like this would be highly inefficient.

I'm leaving this open so that we can hear other people's opinions.

@ryepup, would you be willing to perform the mentioned changes and check the performance difference?

paychex-ssmithrand · 2022-09-13T19:40:59Z

@jpkrohling As a workaround, is there a processor that can access the InstrumentationScope information and set it as an attribute on each individual Span before the span gets to the Tail Sampling Processor? We, for example, are reliant on Instrumentation Library information in order to classify the Span in a custom exporter component we've developed.

jpkrohling · 2022-09-14T16:25:40Z

Could you check whether the transform processor can read the instrumentation scope?

parkedwards · 2022-10-19T22:54:50Z

Could you check whether the transform processor can read the instrumentation scope?

@jpkrohling if i'm not mistaken, it looks like this was recently updated to have access to this scope, at least via the OTTL?
#14892

Update: looks like this isn't yet in a released version (currently at 0.62.1)

Aneurysm9 · 2022-10-19T23:08:19Z

This is tricky: we are re-assembling the spans for a trace in a new ResourceSpans object, which likely came from different sources and with differing instrumentation libraries. We could reorganize things in a single pdata.Traces with several ResourceSpans inside them, one for each unique Resource+InstrumentationLibrary, but I feel like this would be highly inefficient.

Isn't this precisely what the component should be doing? If a sampler is losing information on spans that are not dropped that seems like a problem.

parkedwards · 2022-10-20T00:14:49Z

im also wondering if this is less that the span attribute drops, and more that the telemetry loses view of the instrumentation scope itself when being tail-sampled

this is a tough tradeoff for us, because we'd like to do some trace-specific sampling at the tail, but then it means that we'll lose instrumentation granularity when the traces are assembled

jpkrohling · 2022-10-20T15:00:23Z

If a sampler is losing information on spans that are not dropped, that seems like a problem.

That's true. I'm not sure what the original engineer had in mind when this component was written, but I believe they decided to gain some performance by avoiding the extra processing of reorganizing this data.

I would be open to reviewing a PR with a proposal to change the current behavior. Bonus points if it could include a benchmark so that we can understand what we are giving up in terms of performance.

github-actions · 2022-12-20T03:29:53Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

roryharness · 2023-02-07T17:42:47Z

This caused major confusion and issues exporting traces & APM data to Datadog.

It appears that Operation names & APM metrics trace.* use the library in the naming which follows a standard of trace.<otel.library>.<metric> Examples below
trace.opentelemetry_ecto.client => trace.opentelemetry.client
trace.opentelemetry_tesla.client => trace.opentelemetry.client
trace.opentelemetry_redix.client => trace.opentelemetry.client
This was not only confusing but ultimately combined metrics from all clients making the APM metrics not accurate.

We were able to solve this by Implementing the DatdogProcessor but i would love to see this get fixed so operation name in traces and metrics line up again.

jpkrohling · 2023-02-08T19:54:04Z

@roryharness, agree, this would be great to have. From my previous message:

I would be open to reviewing a PR with a proposal to change the current behavior. Bonus points if it could include a benchmark so that we can understand what we are giving up in terms of performance.

parkedwards · 2023-04-06T19:51:53Z

@roryharness out of curiosity - what was your setup to retain the otel library name while tail sampling?

paychex-ssmithrand · 2023-04-13T13:31:44Z

@roryharness out of curiosity - what was your setup to retain the otel library name while tail sampling?

The workaround we implemented was to add a custom span processor that is used by the SDK in code. This span processor takes the instrumentation library name and creates a Span Attribute on each Span with the value:

onStart(span, parentContext) {
    span.setAttribute("instrumentationLibrary.name",span.instrumentationLibrary.name)
  }

It is not elegant but it works for now. If you really really need the data to be there by the time it gets to tail_sampling...

parkedwards · 2023-05-12T18:55:34Z

thanks @paychex-ssmithrand - i was able to do the same, but uncovering some other interesting behaviors through this experiment

for context, i'm using the tailsampling processor similarly to @roryharness - i'm collecting spans (from the python OTEL SDK) through the Collector, passing them into a pipeline with the processor/datadog for accurate APM stats -> processor/tailsampling -> exporter/datadog. the processor/datadog solves the metric name misalignment that @roryharness mentioned

however the spans themselves appear to be missing some span level metadata after passing through the process/tailsampling - it's unclear to me if it's span.name (this is obscured from me in the datadog layer), but the end result is that spans in the vendor are being labelled generically, thru some default value instead of what we set at the library level. i've ruled out that it's only otel.library.name bc I added a custom span processor at the SDK level to explicitly set that property + I see it on the vendor side, though with the same default span name behavior

# SDK level
name = "my-custom-span-name"
with tracer.start_as_current_span(name):
  ...

# After tail sampling, the span name in the vendor shows as:
opentelemetry.internal

obviously this is dependent on the way that datadog (the vendor) parses these spans, but I'm trying to better understand which span attributes/metadata is being lost in the processor/tailsampling other than otel.library.name. after which point, i may be able to dig into the processor code to see how to rectify this

github-actions · 2023-07-12T03:33:30Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling · 2023-07-12T15:23:00Z

but uncovering some other interesting behaviors through this experiment

Can you open an issue for that? I could probably look into that myself.

siriusfreak · 2023-08-11T07:34:48Z

Hello everyone! I have prepared a fix for this problem. This fix was already tested in the production environment under load 1k rps and all looks good.

Please review and add comments! I will be happy to contribute to this project!

PR: #25117

jpkrohling · 2023-08-22T08:37:45Z

I reviewed this and looks good to me, but you might need to sort out the CLA check.

…info (#25117) **Description:** Added saving instrumentation library information. Fix issue: #13642 How solved: 1. Add saving information about instrumentation scope for traces: processor/tailsamplingprocessor/processor.go:307 2. Sampled spans will now be put into the previous version of the instrumentation scope: processor/tailsamplingprocessor/processor.go:435 **Testing:** on production with high load. Unit tests in progress **Documentation:** no documentation changes

songy23 · 2023-08-31T18:20:51Z

Fixed by #25117

ryepup added the bug Something isn't working label Aug 26, 2022

jpkrohling added the processor/tailsampling Tail sampling processor label Aug 29, 2022

evan-bradley added the needs-discussion label Sep 9, 2022

codeboten added discussion needed Community discussion needed and removed needs-discussion labels Sep 9, 2022

github-actions bot added the Stale label Dec 20, 2022

fatsheep9146 added help wanted Extra attention is needed and removed Stale labels Dec 20, 2022

github-actions bot added the Stale label Jul 12, 2023

github-actions bot removed the Stale label Jul 13, 2023

siriusfreak mentioned this issue Aug 9, 2023

[processor/tailsampling] fix tailsampling lost instrumentation scope info #25117

Merged

songy23 closed this as completed Aug 31, 2023

songy23 assigned siriusfreak Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tail_sampling drops `otel.library.name` span attribute #13642

tail_sampling drops `otel.library.name` span attribute #13642

ryepup commented Aug 26, 2022

jpkrohling commented Aug 29, 2022

paychex-ssmithrand commented Sep 13, 2022 •

edited

Loading

jpkrohling commented Sep 14, 2022

parkedwards commented Oct 19, 2022 •

edited

Loading

Aneurysm9 commented Oct 19, 2022

parkedwards commented Oct 20, 2022 •

edited

Loading

jpkrohling commented Oct 20, 2022 •

edited

Loading

github-actions bot commented Dec 20, 2022

roryharness commented Feb 7, 2023

jpkrohling commented Feb 8, 2023

parkedwards commented Apr 6, 2023

paychex-ssmithrand commented Apr 13, 2023

parkedwards commented May 12, 2023 •

edited

Loading

github-actions bot commented Jul 12, 2023

jpkrohling commented Jul 12, 2023

siriusfreak commented Aug 11, 2023 •

edited

Loading

jpkrohling commented Aug 22, 2023

songy23 commented Aug 31, 2023

tail_sampling drops otel.library.name span attribute #13642

tail_sampling drops otel.library.name span attribute #13642

Comments

ryepup commented Aug 26, 2022

jpkrohling commented Aug 29, 2022

paychex-ssmithrand commented Sep 13, 2022 • edited Loading

jpkrohling commented Sep 14, 2022

parkedwards commented Oct 19, 2022 • edited Loading

Aneurysm9 commented Oct 19, 2022

parkedwards commented Oct 20, 2022 • edited Loading

jpkrohling commented Oct 20, 2022 • edited Loading

github-actions bot commented Dec 20, 2022

roryharness commented Feb 7, 2023

jpkrohling commented Feb 8, 2023

parkedwards commented Apr 6, 2023

paychex-ssmithrand commented Apr 13, 2023

parkedwards commented May 12, 2023 • edited Loading

github-actions bot commented Jul 12, 2023

jpkrohling commented Jul 12, 2023

siriusfreak commented Aug 11, 2023 • edited Loading

jpkrohling commented Aug 22, 2023

songy23 commented Aug 31, 2023

tail_sampling drops `otel.library.name` span attribute #13642

tail_sampling drops `otel.library.name` span attribute #13642

paychex-ssmithrand commented Sep 13, 2022 •

edited

Loading

parkedwards commented Oct 19, 2022 •

edited

Loading

parkedwards commented Oct 20, 2022 •

edited

Loading

jpkrohling commented Oct 20, 2022 •

edited

Loading

parkedwards commented May 12, 2023 •

edited

Loading

siriusfreak commented Aug 11, 2023 •

edited

Loading