You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
APM Server version (apm-server version): 7.16.0-SNAPSHOT
Description of the problem including expected versus actual behavior:
The issue appears when tail.interval is introduced in the configuration yaml. The documentation describes using a flush interval no greater than half the duration of tail.ttl. Following this instruction, or setting any other value to tail.interval results in all throughput of apm server to cease.
Steps to reproduce:
Please include a minimal but complete recreation of the problem,
including server configuration, agent(s) used, etc. The easier you make it
for us to reproduce it, the more likely that somebody will take the time to
look at it.
simitt
changed the title
Tail Based Sampling: defining apm-server.sampling.tail.interval causes throughput to cease
Defining apm-server.sampling.tail.interval causes throughput to cease
Dec 17, 2021
Using the latest 8.0 snapshots for kibana and elasticsearch + apm-server (6a45a89), I was able to ingest events using the config provided in the issue description. I sent 1000 events, and confirmed that 100 events (corresponding to sample_rate: 0.1) were present in traces-apm.sampled-default.
@bryce-b do you remember which opbeans you used? Or, do you still have the command line invocation that started apm-integration-testing?
I've also given it a shot with apm-integration-testing, using ./scripts/compose.py start 8.1.0 --with-opbeans-python. I modified docker-compose.yml with the config specified in the description (excluding data_streams & keep_unsampled, which are now the defaults). I ran that for a while, and then changed sample_rate to 0.5 and ran that for a while.
Here's a screenshot of the number of sampled transaction docs in Discover.
With sample_rate=0.1, the number of docs is approximately 10% of the original. With sample_rate 0.5, it's approximately 50%.
Jumping over to the APM app, we can see the throughput is fairly steady regardless of the sampling rate:
There's a drop in the throughput chart at the end, because the final (i.e. current) bucket is incomplete.
All seems to be working as expected. Seeing as neither @stuartnelson3 nor I could reproduce it, I'm going to close this.
@bryce-b if you are still able to reproduce the issue, or provide more details that can enable us to do so, please reopen.
APM Server version (
apm-server version
): 7.16.0-SNAPSHOTDescription of the problem including expected versus actual behavior:
The issue appears when
tail.interval
is introduced in the configuration yaml. The documentation describes using a flush interval no greater than half the duration oftail.ttl
. Following this instruction, or setting any other value totail.interval
results in all throughput of apm server to cease.Steps to reproduce:
Please include a minimal but complete recreation of the problem,
including server configuration, agent(s) used, etc. The easier you make it
for us to reproduce it, the more likely that somebody will take the time to
look at it.
The text was updated successfully, but these errors were encountered: