Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior of And policy #29637

Closed
chenlujjj opened this issue Dec 4, 2023 · 6 comments
Closed

Unexpected behavior of And policy #29637

chenlujjj opened this issue Dec 4, 2023 · 6 comments
Assignees
Labels
bug Something isn't working closed as inactive processor/tailsampling Tail sampling processor Stale

Comments

@chenlujjj
Copy link
Contributor

Component(s)

processor/tailsampling

What happened?

Description

I tried to use the tailsampling processor to drop the health check traces of a particular service (let's name is foo here), but the behavior of the And policy was out of my expectation.

Steps to Reproduce

The first version of config:

tail_sampling:
  policies:
    - name: backwards-compatibility-policy
      type: and
      and:
        and_sub_policy:
          - name: services-using-tail_sampling-policy
            type: string_attribute
            string_attribute:
              invert_match: true
              key: service.name
              values:
                - foo
    - name: sample-others
      type: and
      and:
        and_sub_policy:
          - name: service-name-policy
            string_attribute:
              key: service.name
              values:
                - foo
            type: string_attribute
          - name: operation-policy
            string_attribute:
              invert_match: true
              key: http.route
              values:
                - /actuator/health/**
            type: string_attribute

It worked as I expected, excluding all traces with http.route=/actuator/health/** of service foo.

Then I thought maybe I could simplify the config a bit, as below:

tail_sampling:
  policies:
    - name: services-using-tail_sampling-policy
      type: string_attribute
      string_attribute:
        invert_match: true
        key: service.name
        values:
          - foo
    - name: sample-others
      type: and
      and:
        and_sub_policy:
          - name: service-name-policy
            string_attribute:
              key: service.name
              values:
                - foo
            type: string_attribute
          - name: operation-policy
            string_attribute:
              invert_match: true
              key: http.route
              values:
                - /actuator/health/**
            type: string_attribute

The only change was moving the sub policy of And to the top level.
Then I found that other traces with http.route not equal to /actuator/health/** of service foo were dropped too.

Expected Result

Moving the only one sub policy of And to the top level should not change its behavior.

Actual Result

The policy behavior is changed

Collector version

0.79.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@chenlujjj chenlujjj added bug Something isn't working needs triage New item requiring triage labels Dec 4, 2023
Copy link
Contributor

github-actions bot commented Dec 4, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Dec 4, 2023
@chenlujjj
Copy link
Contributor Author

The issue is related to the two parts of code:

(1) https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.79.0/processor/tailsamplingprocessor/processor.go#L253-L261

(2) https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.79.0/processor/tailsamplingprocessor/internal/sampling/and.go#L39-L41

In the first version config, the non-/actuator/health/** traces are evaluated as NotSampled in the first policy and Sampled in the second policy, so the final decision is Sampled.

In the second version config, the non-/actuator/health/** traces are evaluated as InvertNotSampled in the first policy, so the final decision is NotSampled, even if they are evaluated as Sampled in the second policy.

@chenlujjj
Copy link
Contributor Author

I don't quite understand the purpose to differentiate NotSampled and InvertNotSampled, and why "InvertNotSampled takes precedence over any other decision" ?

@jpkrohling jpkrohling self-assigned this Dec 5, 2023
@jpkrohling jpkrohling removed the needs triage New item requiring triage label Dec 5, 2023
@jiekun
Copy link
Member

jiekun commented Dec 6, 2023

typically, invert match is used to exclude small parts of data that user don't want to see it.

For example, when using invert match to exclude health check trace:

- invert-match-policy
	invert-match=true
	path=/health
- other-policy
	bla bla bla

I don't quite understand the purpose to differentiate NotSampled and InvertNotSampled, and why "InvertNotSampled takes precedence over any other decision" ?

Well if InvertNotSampled does not take precedence over other decision, in the example I mentioned above, /health sampling result (may) rely on other-policy if other-policy did not exclude this path. So invert-match-policy is useless in this case, which is ALSO not working as expected.

I think this may be an unavoidable trade-off where there will always be a configuration layout that needs to compromise.

In your case, I modified the configuration to make it more symmetry:

tail_sampling:
  policies:
    - name: policy-for-other-service
      type: and
      and:
        and_sub_policy:
          - name: define-other-service
            type: string_attribute
            string_attribute:
              invert_match: true
              key: service.name
              values:
                - foo
          # the following part is omitted in your config, I add it here for comprehensive.
          - name: real-policy
            type: always_sample
    - name: policy-for-foo-service
      type: and
      and:
        and_sub_policy:
          - name: define-foo-service
            type: string_attribute
            string_attribute:
              key: service.name
              values:
                - foo
          - name: real-policy
            type: string_attribute
            string_attribute:
              invert_match: true
              key: http.route
              values:
                - /actuator/health/**

Now it looks neat.

I understand that it's complicated; however, if the behavior of InvertNotSampled changed, there would also be someone asking why it's not working with the invert match on /health.

Me personally suggestion (put on my end-user hat):

  1. Test the configuration first before appling it to the production.
  2. Minimize the intersection of the impact scope among different policies as much as possible.

Copy link
Contributor

github-actions bot commented Feb 5, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 5, 2024
Copy link
Contributor

github-actions bot commented Apr 6, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working closed as inactive processor/tailsampling Tail sampling processor Stale
Projects
None yet
Development

No branches or pull requests

3 participants