Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple service policies result in unusual sample rates #6640

Closed
Tracked by #6894
bryce-b opened this issue Nov 16, 2021 · 2 comments
Closed
Tracked by #6894

Multiple service policies result in unusual sample rates #6640

bryce-b opened this issue Nov 16, 2021 · 2 comments
Assignees
Labels
Milestone

Comments

@bryce-b
Copy link
Contributor

bryce-b commented Nov 16, 2021

APM Server version (apm-server version): 7.16.0-SNAPSHOT

Description of the problem including expected versus actual behavior:
In testing, I used multiple tail-based sampling policies designed to apply unique sample rates to specific services. I used a comparison between the request transaction metrics count vs the request transaction traces captured to determine the actual sample rate. useful dashboard for comparison is found here: https://github.com/bryce-b/apm-server/blob/tbs-metrics/cmd/TBS/tbs_charts_3.2.ndjson

Here is the policies used in testing:

apm-server:
 sampling:
   keep_unsampled: false
   tail:
     enabled: true
   policies: 
        - service:
          name: "opbeans-java"
          sample_rate: 0.5
        - service:
          name: "opbeans-python"
          sample_rate: 0.2
        - service:
          name: "opbeans-ruby"
          sample_rate: 0.8
        - service:
          name: "opbeans-go"
          sample_rate: 0.6
        - sample_rate: 0.1

Screen Shot 2021-11-16 at 10 38 39 AM

In testing, each service had some drift from the expected sample rate, which isn't unexpected, but the sample rate for opbeans-ruby is drastically different from the expected sample rate.

Steps to reproduce:

Please include a minimal but complete recreation of the problem,
including server configuration, agent(s) used, etc. The easier you make it
for us to reproduce it, the more likely that somebody will take the time to
look at it.

  1. start cloud deployment
  2. apply apm integration to deployment
  3. start apm-integration-testing for traffic
  4. apply apm-server config described above

Provide logs (if relevant):

@bryce-b bryce-b added the bug label Nov 16, 2021
@simitt simitt mentioned this issue Dec 17, 2021
21 tasks
@simitt simitt added this to the 8.1 milestone Dec 17, 2021
@simitt simitt changed the title Tail Based Sampling: multiple service policies result in unusual sample rates Multiple service policies result in unusual sample rates Dec 17, 2021
@axw axw self-assigned this Jan 11, 2022
@axw
Copy link
Member

axw commented Jan 11, 2022

@bryce-b do you recall how you ran apm-integration-testing? Did you start all the opbeans services together?

If all opbeans services are started together, then the services will proxy some requests to one another to demonstrate distributed tracing. Thus, a request to opbeans-ruby may have started in another service -- hence the sampling rate for that other service would take effect.

Then, to add on top: I just ran apm-integration-testing with opbeans-go and opbeans-ruby, and none of the transactions coming through from Ruby are root transactions. Seems there's something wrong there, so I'd ignore the opbeans-ruby results.

@axw
Copy link
Member

axw commented Feb 3, 2022

Opened elastic/apm-agent-ruby#1231 for the Ruby issue. Closing this, as TBS not working is a symptom.

@axw axw closed this as completed Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants