Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish Tail Sampling Policies in Metrics by Prefixing Processor Name #34099

Closed
EOjeah opened this issue Jul 16, 2024 · 3 comments · Fixed by #34192
Closed

Distinguish Tail Sampling Policies in Metrics by Prefixing Processor Name #34099

EOjeah opened this issue Jul 16, 2024 · 3 comments · Fixed by #34192
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed processor/tailsampling Tail sampling processor

Comments

@EOjeah
Copy link
Contributor

EOjeah commented Jul 16, 2024

Component(s)

processor/tailsampling

Is your feature request related to a problem? Please describe.

Currently, the otelcol_processor_tail_sampling_sampling_decision_latency_bucket metric does not distinguish between different tail sampling policies if they share the same name. This is problematic because it makes it difficult to differentiate between metrics emitted by different policies, especially when they are defined in separate tail sampling processors. For example, both tail_sampling/match and tail_sampling/string processors have a policy named default-policy, leading to identical metric labels and causing confusion in metric analysis.

  tail_sampling/match:
    policies: [
      {
        name: default-policy,
        type: ottl_condition,
        ottl_condition: {
          span: [
            'IsMatch(attributes["app.force_sample"], "true")',
          ]
        }
      }
    ]


  tail_sampling/string:
    policies: [
      {
        name: default-policy,
        type: ottl_condition,
        ottl_condition: {
          span: [
            'IsString(attributes["app.force_sample"])',
          ]
        }
      }
    ]

Describe the solution you'd like

To resolve this issue, I propose prefixing the policy label in the metric with the name of the tail sampling processor. This way, the metrics will be uniquely identifiable based on both the processor and the policy name. For instance:

The metric for the default-policy in the tail_sampling/match processor would be labeled as tail_sampling/match-default-policy. (or remove the tail_sampling if it's too obvious and just use the name after the / if any)

The metric for the default-policy in the tail_sampling/string processor would be labeled as tail_sampling/string-default-policy.

Describe alternatives you've considered

Unique Names for Each Policy: One alternative is to ensure that each policy has a unique name across all processors. However, this approach is not scalable, especially when policies are defined across multiple files. It would require more coordination and management to avoid name collisions.

Additional context

Only raised the issue for otelcol_processor_tail_sampling_sampling_decision_latency_bucket but it's likely to be the case for the other metrics from the tail sampling processor unique to each policy definition.

@EOjeah EOjeah added enhancement New feature or request needs triage New item requiring triage labels Jul 16, 2024
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Jul 16, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jpkrohling
Copy link
Member

Seems like an easy one to tackle. Do you want to give it a try? You can get the "componentID" from processor.Settings that the Factory receives. You'd need to propagate it down to the place we initialize the telemetry.

@codeboten, perhaps the component ID should be a parameter of the telemetry builder? I believe the same problem would occur to every other component.

@jpkrohling jpkrohling added help wanted Extra attention is needed good first issue Good for newcomers and removed needs triage New item requiring triage labels Jul 16, 2024
@EOjeah
Copy link
Contributor Author

EOjeah commented Jul 16, 2024

sure, I'll give it a go, can assign the issue to me

EOjeah added a commit to EOjeah/opentelemetry-collector-contrib that referenced this issue Jul 22, 2024
This change includes the componentID as a dot prefix to the metrics
'policy' dimension when generating metrics for the processor. The change
ensures that similarly named policys in the tail sampling processor that
belong to different components also has a unique value.

Resolves: open-telemetry#34099
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…licy' (open-telemetry#34192)

**Description:**
Fixing a bug - This change includes the componentID as a dot prefix to
the metrics `policy` dimension when generating metrics for the
processor. The change ensures that similarly named policy's in the tail
sampling processor that belong to different components also has a unique
value in the `policy` field for the metrics.

Also includes minor refactor change to rename `telemetry` to
`telemetryBuilder` where applicable (return type ==
`NewTelemetryBuilder`)

Resolves: open-telemetry#34099

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** 
Ran the collector locally with `make run` with the configuration below
which uses the tail sampling processor and has metrics exposed in
prometheus format. Sending sample zipkin spans to the receiver
```yaml
receivers:
  zipkin:

processors:
  tail_sampling:
    policies:
      [
          {
            name: test-policy-1,
            type: always_sample
          }
      ]

  tail_sampling/custom_name:
    policies:
      [
          {
            name: test-policy-1,
            type: always_sample
          }
      ]

exporters:
  debug:

service:
  telemetry:
    logs:
    metrics:
  pipelines:
    traces:
      receivers: [zipkin]
      processors: [tail_sampling, tail_sampling/custom_name]
      exporters: [debug]
```

Curling the metrics endpoint shows the policy name is unique for both
tail sampling processors
```bash
otelcol_processor_tail_sampling_sampling_decision_latency_bucket{policy="custom_name.test-policy-1",service_instance_id="X",service_name="otelcontribcol",service_version="0.105.0-dev",le="5000"} 1

otelcol_processor_tail_sampling_sampling_decision_latency_bucket{policy="test-policy-1",service_instance_id="X",service_name="otelcontribcol",service_version="0.105.0-dev",le="5000"} 1
```

Tasks

- [ ] Confirm prefix separator as `.` 
- [ ] Update change log entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed processor/tailsampling Tail sampling processor
Projects
None yet
2 participants