Multiple service policies result in unusual sample rates #6640

bryce-b · 2021-11-16T18:44:49Z

APM Server version (apm-server version): 7.16.0-SNAPSHOT

Description of the problem including expected versus actual behavior:
In testing, I used multiple tail-based sampling policies designed to apply unique sample rates to specific services. I used a comparison between the request transaction metrics count vs the request transaction traces captured to determine the actual sample rate. useful dashboard for comparison is found here: https://github.com/bryce-b/apm-server/blob/tbs-metrics/cmd/TBS/tbs_charts_3.2.ndjson

Here is the policies used in testing:

apm-server:
 sampling:
   keep_unsampled: false
   tail:
     enabled: true
   policies: 
        - service:
          name: "opbeans-java"
          sample_rate: 0.5
        - service:
          name: "opbeans-python"
          sample_rate: 0.2
        - service:
          name: "opbeans-ruby"
          sample_rate: 0.8
        - service:
          name: "opbeans-go"
          sample_rate: 0.6
        - sample_rate: 0.1

In testing, each service had some drift from the expected sample rate, which isn't unexpected, but the sample rate for opbeans-ruby is drastically different from the expected sample rate.

Steps to reproduce:

Please include a minimal but complete recreation of the problem,
including server configuration, agent(s) used, etc. The easier you make it
for us to reproduce it, the more likely that somebody will take the time to
look at it.

start cloud deployment
apply apm integration to deployment
start apm-integration-testing for traffic
apply apm-server config described above

Provide logs (if relevant):

The text was updated successfully, but these errors were encountered:

axw · 2022-01-11T08:24:22Z

@bryce-b do you recall how you ran apm-integration-testing? Did you start all the opbeans services together?

If all opbeans services are started together, then the services will proxy some requests to one another to demonstrate distributed tracing. Thus, a request to opbeans-ruby may have started in another service -- hence the sampling rate for that other service would take effect.

Then, to add on top: I just ran apm-integration-testing with opbeans-go and opbeans-ruby, and none of the transactions coming through from Ruby are root transactions. Seems there's something wrong there, so I'd ignore the opbeans-ruby results.

axw · 2022-02-03T06:03:04Z

Opened elastic/apm-agent-ruby#1231 for the Ruby issue. Closing this, as TBS not working is a symptom.

bryce-b added the bug label Nov 16, 2021

simitt mentioned this issue Dec 17, 2021

Tail Based Sampling GA #6894

Closed

21 tasks

simitt added this to the 8.1 milestone Dec 17, 2021

simitt changed the title ~~Tail Based Sampling: multiple service policies result in unusual sample rates~~ Multiple service policies result in unusual sample rates Dec 17, 2021

axw self-assigned this Jan 11, 2022

axw closed this as completed Feb 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple service policies result in unusual sample rates #6640

Multiple service policies result in unusual sample rates #6640

bryce-b commented Nov 16, 2021

axw commented Jan 11, 2022

axw commented Feb 3, 2022

Multiple service policies result in unusual sample rates #6640

Multiple service policies result in unusual sample rates #6640

Comments

bryce-b commented Nov 16, 2021

axw commented Jan 11, 2022

axw commented Feb 3, 2022