Extrapolate Jaeger transaction count from reported sampling rate #3722

axw · 2020-04-30T08:30:08Z

Motivation/summary

When Jaeger is configured to sample a percentage of traces, then the statistics reported in APM UI will be proportional to the sampling rate, and not the actual number of operations. Jaeger reports the sampling rate as a pair of tags (sampler.type and sampler.param) with each span. We can use these to extrapolate the number of transactions when performing aggregations.

Checklist

I have signed the Contributor License Agreement.
My code follows the style guidelines of this project (run make check-full for static code checks and linting)
I have rebased my changes on top of the latest master branch
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have updated CHANGELOG.asciidoc

How to test these changes

Enable transaction aggregation (apm-server.aggregation.enabled=true)
Run an app instrumented with Jaeger, configured with remote sampling
Set a sampling rate of 0.1 for the service in APM UI
Run 1000 operations
Check there are 100 (i.e. 1000 * 0.1) transactions
Check the sum of counts in transaction.duration.histogram fields is 1000 (e.g. using Histogram field type support for ValueCount and Avg aggregations elasticsearch#55933 when it lands)

Related issues

Closes #3011

axw · 2020-04-30T08:33:35Z

Rounding error on this is going to be pretty bad for some sampling rates (e.g 0.3, 0.4), so it might be worth scaling all counts recorded in the in-memory histograms up when recording (e.g. multiply by 100), and then back down when creating metricset docs.

codecov-io · 2020-04-30T08:46:09Z

Codecov Report

Merging #3722 into master will decrease coverage by 0.05%.
The diff coverage is 96.66%.

@@            Coverage Diff             @@
##           master    #3722      +/-   ##
==========================================
- Coverage   80.35%   80.30%   -0.06%     
==========================================
  Files         131      131              
  Lines        6022     6042      +20     
==========================================
+ Hits         4839     4852      +13     
- Misses       1183     1190       +7

Impacted Files	Coverage Δ
model/transaction.go	`88.09% <ø> (ø)`
...ack/apm-server/aggregation/txmetrics/aggregator.go	`93.50% <92.30%> (+0.16%)`	⬆️
processor/otel/consumer.go	`93.40% <100.00%> (+0.26%)`	⬆️
kibana/connecting_client.go	`64.51% <0.00%> (-8.07%)`	⬇️
beater/jaeger/common.go	`78.78% <0.00%> (-6.07%)`	⬇️

simitt

This looks like a fairly straight forward way to approximate the number of transactions. Not sure if it could lead to problematic edge cases, but I couldn't think of any that would be triggered by this handling. Would be great to see this getting in to provide better APM UI usability for Jaeger agent collected data.

apmmachine · 2020-06-25T07:30:11Z

💚 Build Succeeded

Expand to view the summary

Build stats

Build Cause: [Pull request #3722 updated]
Start Time: 2020-07-01T06:39:20.946+0000
Duration: 47 min 7 sec

Test stats 🧪

Test	Results
Failed	0
Passed	3217
Skipped	147
Total	3364

Steps errors

Expand to view the steps failures

Name: Compress
- Description: tar --exclude=coverage-files.tgz -czf coverage-files.tgz coverage
- Duration: 0 min 0 sec
- Start Time: 2020-07-01T06:54:35.458+0000
- log
Name: Compress
- Description: tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests
- Duration: 0 min 0 sec
- Start Time: 2020-07-01T07:05:41.752+0000
- log
Name: Test Sync
- Description: ./script/jenkins/sync.sh
- Duration: 3 min 31 sec
- Start Time: 2020-07-01T06:49:31.371+0000
- log

Set Transaction.RepresentativeCount based on the value of sampler.param if sampler.type=probabilistic.

Once the UI side of this is in, the histograms will be used for RPM graphs, which will take sampling into account.

x-pack/apm-server/aggregation/txmetrics/aggregator.go

x-pack/apm-server/aggregation/txmetrics/aggregator_test.go

bmorelli25

Docs LGTM

…stic#3722) * model: add Transaction.RepresentativeCount field * jaeger: set Transaction.RepresentativeCount Set Transaction.RepresentativeCount based on the value of sampler.param if sampler.type=probabilistic. * aggregation/txmetrics: use RepresentativeCount * Update changelog * docs: remove caveats about Jaeger sampling & RPMs Once the UI side of this is in, the histograms will be used for RPM graphs, which will take sampling into account. * Fix/add comments

…) (#3932) * model: add Transaction.RepresentativeCount field * jaeger: set Transaction.RepresentativeCount Set Transaction.RepresentativeCount based on the value of sampler.param if sampler.type=probabilistic. * aggregation/txmetrics: use RepresentativeCount * Update changelog * docs: remove caveats about Jaeger sampling & RPMs Once the UI side of this is in, the histograms will be used for RPM graphs, which will take sampling into account. * Fix/add comments

…ate (elastic#3722)" This reverts commit 76ac96e.

axw mentioned this pull request May 28, 2020

Feature: destination service metrics elastic/apm#270

Closed

simitt reviewed May 28, 2020

View reviewed changes

axw force-pushed the jaeger-histogram-aggregation branch from 69b8f86 to f266588 Compare June 25, 2020 07:25

axw force-pushed the jaeger-histogram-aggregation branch from 6c7b5cf to e3aedd6 Compare June 25, 2020 09:20

axw added 4 commits June 25, 2020 17:20

model: add Transaction.RepresentativeCount field

c06a629

jaeger: set Transaction.RepresentativeCount

f26dcd1

Set Transaction.RepresentativeCount based on the value of sampler.param if sampler.type=probabilistic.

aggregation/txmetrics: use RepresentativeCount

a75b457

Update changelog

dcb130d

axw force-pushed the jaeger-histogram-aggregation branch from e3aedd6 to dcb130d Compare June 25, 2020 09:21

docs: remove caveats about Jaeger sampling & RPMs

4d65995

Once the UI side of this is in, the histograms will be used for RPM graphs, which will take sampling into account.

axw marked this pull request as ready for review June 25, 2020 09:57

axw requested review from simitt and bmorelli25 June 25, 2020 09:57

simitt approved these changes Jun 29, 2020

View reviewed changes

x-pack/apm-server/aggregation/txmetrics/aggregator.go Outdated Show resolved Hide resolved

x-pack/apm-server/aggregation/txmetrics/aggregator_test.go Show resolved Hide resolved

bmorelli25 approved these changes Jun 29, 2020

View reviewed changes

axw added 2 commits July 1, 2020 14:35

Merge branch 'master' into jaeger-histogram-aggregation

329e6c3

Fix/add comments

1aa63cb

axw merged commit 76ac96e into elastic:master Jul 1, 2020

axw deleted the jaeger-histogram-aggregation branch July 1, 2020 07:28

axw added the v7.9.0 label Jul 1, 2020

axw mentioned this pull request Jul 1, 2020

[7.x] Extrapolate Jaeger transaction count from reported sampling rate (#3722) #3932

Merged

jalvz added a commit to jalvz/apm-server that referenced this pull request Jul 2, 2020

Revert "Extrapolate Jaeger transaction count from reported sampling r…

3dc8562

…ate (elastic#3722)" This reverts commit 76ac96e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extrapolate Jaeger transaction count from reported sampling rate #3722

Extrapolate Jaeger transaction count from reported sampling rate #3722

axw commented Apr 30, 2020 •

edited

Loading

axw commented Apr 30, 2020

codecov-io commented Apr 30, 2020

simitt left a comment

apmmachine commented Jun 25, 2020 •

edited

Loading

Build stats

Test stats 🧪

bmorelli25 left a comment

Extrapolate Jaeger transaction count from reported sampling rate #3722

Extrapolate Jaeger transaction count from reported sampling rate #3722

Conversation

axw commented Apr 30, 2020 • edited Loading

Motivation/summary

Checklist

How to test these changes

Related issues

axw commented Apr 30, 2020

codecov-io commented Apr 30, 2020

Codecov Report

simitt left a comment

Choose a reason for hiding this comment

apmmachine commented Jun 25, 2020 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Steps errors

bmorelli25 left a comment

Choose a reason for hiding this comment

axw commented Apr 30, 2020 •

edited

Loading

apmmachine commented Jun 25, 2020 •

edited

Loading