Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tail Based Sampling configuration options to APM Integration #121534

Closed
Tracked by #6894
simitt opened this issue Dec 17, 2021 · 8 comments · Fixed by #124025
Closed
Tracked by #6894

Add Tail Based Sampling configuration options to APM Integration #121534

simitt opened this issue Dec 17, 2021 · 8 comments · Fixed by #124025
Assignees
Labels
apm:fleet apm:test-plan-done Pull request that was successfully tested during the test plan Team:APM All issues that need APM UI Team support v8.1.0

Comments

@simitt
Copy link
Contributor

simitt commented Dec 17, 2021

Kibana version: 8.1

Describe the feature:
APM Server introduces a new feature Tail Based Sampling, which adds new configuration options to the APM Integration. The APM Integration Editor needs to expose these configuration options.

The definition of the configuration options can be found in elastic/apm-server#6895 (comment).

Design solution

image

Depends On
elastic/apm-server#6895

@simitt simitt added Team:APM All issues that need APM UI Team support needs design apm:fleet labels Dec 17, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@simitt
Copy link
Contributor Author

simitt commented Jan 19, 2022

There should be a new Section Tail Sampling, that can be enabled or disabled, depending on the value of tail_sampling_enabled. It is disabled by default.

If tail_sampling_enabled: true, following configuration options are enabled:

  • tail_sampling_interval: optional; duration; defaults to 1m ; minimum is 1s
  • tail_sampling_policies: conditionally required (only if tail_sampling_enabled: true); array of policies, stored in yaml representation

The APM Server runs a couple of validations against the policies, which ideally would be validated already in the UI. Given the time restrictions, we could start with a simple yaml box without validations in the APM UI for 8.1 and then add more fine grained visual components and validations for 8.2.

A sampling policy can consist of following attributes:

  • sample_rate: required, min=0,max=1
  • trace.name: optional, string
  • trace.outcome: optional, string
  • service.name: optional, string
  • service.environment: optional, string

Additional to the value validations (min, max), exactly one default policy needs to always be configured.

I'll update with suggestions for the description soon.

@simitt
Copy link
Contributor Author

simitt commented Jan 19, 2022

Tail Based sampling is going to be under the platinum license. I don't think that we currently support any other non-basic feature in the configuration editor. So this likely also entails to only conditionally show the config options if the correct license is detected.

@simitt
Copy link
Contributor Author

simitt commented Jan 20, 2022

Adding a more concise overview of the options and the promised descriptions. Please note the more precise description that the sample rate max value is exclusive of 1.

@bmorelli25 I'd value your review on the description texts.

Tail sampling top level tail config options

config option title type required default validation description
tail_sampling_enabled Tail Sampling bool yes false   Enable tail based sampling.
tail_sampling_interval Interval duration no 1m min=1s Interval for syncronisation between multiple APM Servers. Should be in the order of tens of seconds or low minutes.
tail_sampling_policies Policies []policy yes, if tail sampling enabled empty exactly 1 catch-all-policy (only specifying a sample_rate) Events are matched in the exact order of how policies are specified. Every policy needs to specify a sample rate. A trace needs to match all policy constraints to match a policy. Specify one policy at the end of the list with only a sample rate. This policy is applied to all traces that do not match a more fine grained policy.

Tail sampling Policy

config option title type required default validation description
sample_rate Sample Rate number yes   [0,1) Sample rate to apply to trace events matching this policy.
trace.name Trace Name string no     Specify the trace name required for events to match this policy.
trace.outcome Trace Outcome string no     Specify the trace outcome required for events to match this policy.
service.name Service Name string no     Specify the service name required for events to match this policy.
service.environment Service Environment string no     Specify the service environment required for events to match this policy.

@formgeist can you help figure out the concrete design for the tail sampling policies. I believe we don't yet have the same requirements on any existing config options.

@MiriamAparicio please note that the order of the specified policies is important, events are matched against the policies in order of how policies are specified. Given this order, we should provide a nice way to users to change the order of their specified policies to change how events match against the policies, without having to delete and recreate them, for example by allowing to drag them around.

@formgeist
Copy link
Contributor

@formgeist can you help figure out the concrete design for the tail sampling policies. I believe we don't yet have the same requirements on any existing config options.

Yes, of course - I do think your proposal for the 8.1 implementations of supporting the policy creation in the YAML input will be sufficient and allow the user to enable the feature and its options. @MiriamAparicio I've created a quick mock of the panel and its options to make it more clear what should go in and how based on @simitt's descriptions above. Let me know if you have any questions.

image

Tail Based sampling is going to be under the platinum license. I don't think that we currently support any other non-basic feature in the configuration editor. So this likely also entails to only conditionally show the config options if the correct license is detected.

@simitt OK, I wasn't aware but thanks for bringing this up. I would suggest (as we've already spoken about) that we sync with the APM PM team and figure out the right steps for the UX of how users will use this feature and perhaps its integration into the APM app. Perhaps sampling policies should be more tightly integrated into the current APM workflows within the APM app, instead of solely being managed in the Integration configuration.

@bmorelli25
Copy link
Member

@bmorelli25 I'd value your review on the description texts.

✅ Left my comments in Miriam's PR

@graphaelli
Copy link
Member

graphaelli commented Jan 26, 2022

@formgeist I think the UI should mention the license requirements. Insufficient license should still allow configuration, but should be clear that the settings will be ignored in that case.

I'd like to request s a stretch goal for this, to capture telemetry on TBS usage - a simple boolean for enabled would suffice for now. In the future we'd also like to know the actual details around the configuration.

@formgeist
Copy link
Contributor

@graphaelli Agreed - we can add another badge to describe this as a Platinum feature. It will display a tooltip with an explanation of being able to configure it but the configs will be ignored if the Kibana license is anything but Platinum. Not sure if that should be the right copy cc @bmorelli25

image

@kpatticha kpatticha self-assigned this Feb 7, 2022
@kpatticha kpatticha added the apm:test-plan-done Pull request that was successfully tested during the test plan label Feb 8, 2022
@kpatticha kpatticha removed their assignment Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:fleet apm:test-plan-done Pull request that was successfully tested during the test plan Team:APM All issues that need APM UI Team support v8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants