Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TBS support for APM Integration when using non-Elasticsearch output #7025

Closed
Tracked by #6894
simitt opened this issue Jan 10, 2022 · 14 comments
Closed
Tracked by #6894

TBS support for APM Integration when using non-Elasticsearch output #7025

simitt opened this issue Jan 10, 2022 · 14 comments

Comments

@simitt
Copy link
Contributor

simitt commented Jan 10, 2022

Define if/how TBS should be working for the APM Integration when a different output than Elasticsearch is configured.
In a first version, the feature might be disabled. This would require some wiring between the Integration policy editor and the configured output, which might not be trivial and needs to be clarified with the UI team.
Another option might be to add a freeform yaml box for configuring a TBS specific ES endpoint.

@simitt simitt mentioned this issue Jan 10, 2022
21 tasks
@simitt simitt changed the title TBS support when using logstash output and elastic agent TBS support for APM Integration when using non-Elatsichsearch output Jan 10, 2022
@simitt simitt added the discuss label Jan 10, 2022
@joshdover
Copy link

In a first version, the feature might be disabled. This would require some wiring between the Integration policy editor and the configured output, which might not be trivial and needs to be clarified with the UI team.

One challenge with this option is that outputs can be changed for an agent policy after the APM integration was configured. This means the output could change to a non-ES one without the user being going through the APM policy editor at all. In such cases, I'd expect APM to successfully switch outputs without any downtime (except in the TBS feature).

Some ideas:

  • APM server could handle this in a graceful way by disabling TBS when only a Logstash output is configured. APM UI could show a warning when TBS is enabled on a policy that is using a Logstash output.
  • We could introduce a new "output requirements" concept to the package spec to prevent users who have TBS enabled to configure a non-ES output.
  • The APM integration could introduce a separate policy template for TBS within the same package, which would allow for separate output configuration. This certainly has some UX implications which we'd need to consider. This would likely depend on [Elastic Agent] Infrastructure to support multiple outputs in a policy beats#27442 or similar work.

@simitt
Copy link
Contributor Author

simitt commented Jan 10, 2022

I believe all of the suggestions have some impact on Fleet UI.

APM server could handle this in a graceful way by disabling TBS when only a Logstash output is configured. APM UI could show a warning when TBS is enabled on a policy that is using a Logstash output.

If the APM Server disables TBS (and other features), this should be surfaced in the Fleet UI for the configured APM integration, surfacing the issue closely to where it arises. But I doubt that this would be prominent enough, and most certainly it should also be highlighted somewhere more prominent. On the other hand we might want to avoid highlighting config issues in the APM UI to avoid dependencies and complexity. With Fleet, the configuration and the data are getting more isolated from each other, so we might not want to mix that up again.

We could introduce a new "output requirements" concept to the package spec to prevent users who have TBS enabled to configure a non-ES output.

The Fleet UI should surface which package policies are blocking to change the output. That might be a nice option to handle that. (Afaik the output will be configurable per agent policy right?)

The APM integration could introduce a separate policy template for TBS within the same package, which would allow for separate output configuration.

This sounds like a step 2 to the suggestion of providing a yaml box for now. Or am I misreading this?

@formgeist, @chrisdistasio, @alex-fedotyev can we have some PM/design input on this?
cc @mostlyjason

@simitt
Copy link
Contributor Author

simitt commented Jan 10, 2022

Similar issue with API Key support #7028

@graphaelli graphaelli changed the title TBS support for APM Integration when using non-Elatsichsearch output TBS support for APM Integration when using non-Elasticsearch output Jan 12, 2022
@axw
Copy link
Member

axw commented Jan 13, 2022

@ruflin and I are discussing how to best enable tail-based sampling when running under Fleet (related to elastic/fleet-server#1048). Too early to say if, how, or when, but one possibility is that apm-server does the TBS pub/sub via fleet-server. If that were the case, I believe the APM Server's output is irrelevant.

@joshdover
Copy link

With regards to an "output requirements" feature:

The Fleet UI should surface which package policies are blocking to change the output. That might be a nice option to handle that. (Afaik the output will be configurable per agent policy right?)

Yes, we will support configuring different outputs for each agent policy (see screenshot below). If we went with this option, I'd expect the UI to surface which policies are blocking an output change. It'd also be nice to provide some guidance text or documentation links on how to proceed if using a specific output is more important than the features that don't currently support it.

image

The APM integration could introduce a separate policy template for TBS within the same package, which would allow for separate output configuration.

This sounds like a step 2 to the suggestion of providing a yaml box for now. Or am I misreading this?

I guess it could be a step 2, but I think it would be preferable to a yaml input since it allows us to leverage the existing API key generation infrastructure inside Fleet Server to generate the credentials for the ES output rather than requiring a user to input it manually.

I do believe the yaml input box is the lowest-effort option for moving forward. I think the main thing we need to decide here is whether or not yaml input is an acceptable solution since it can result in users dropping TBS traces without any indication in the UI when they switch to using a Logstash output.

An output requirements feature in the package spec + UI would be more robust in result in giving users less "foot-guns" / traps to fall in. As @ruflin mentioned in #7028 (comment), it's likely not low-effort since we need to support conditionals and add accompanying UI. IMO we should start scoping out the effort required for this option so we can make a proper tradeoff decision for the primary target audience here.

@simitt
Copy link
Contributor Author

simitt commented Jan 17, 2022

@chrisdistasio @alex-fedotyev and @mostlyjason can you help define what would be an acceptable (potentially multi-step) solution here?

@axw
Copy link
Member

axw commented Jan 18, 2022

@ruflin and I had a discussion about elastic/fleet-server#1048 where this issue also came up.

One option to enable TBS we considered is to introduce pub/sub as a feature in Elastic Agent and/or Fleet Server. Integrations like APM Server would not communicate with Elasticsearch directly, but instead through Elastic Agent. If we were to do this, then this issue would be moot; integrations would not require additional output configuration.

That might be reasonable in the future, but at the moment it's probably too early to generalise such a feature.

@simitt
Copy link
Contributor Author

simitt commented Jan 18, 2022

IMO the important part for now is how APM Server with TBS enabled (or API Keys enabled) should behave when logstash output is supported.

The simplest option would be to only document that these features are not compatible. Might be ok very short term, but is not very user friendly.
A bit better seems the option to have a callout somewhere in Fleet and/or APM UI mentioning the feature incompatibility as soon as logstash output is configured and the APM Integration is detected.

The next step would be to not allow to configure logstash output if APM Integration is part of the agent policy and not allow to add APM Integration if logstash is already configured. And then even more advanced, only add the restriction if an APM feature is enabled that is incompatible with logstash.

And then mid-term, work on feature support for logstash. This would give us some time to define how the Elastic Agent/Fleet-Server could support this. Supporting logstash is one missing feature, preventing us from deprecating the standalone binary.

I was hoping to get PM input on the outlined possibilities in how to handle this short term, as soon as logstash is going to generally be supported.

@ruflin
Copy link
Contributor

ruflin commented Jan 18, 2022

I'm supportive of the path @simitt proposes above. It gives us some more time to dig through the ideal design to solve this in Elastic Agent but at the same time move forward in small steps.

@alex-fedotyev
Copy link

@simitt would we ever expect users to use separate ES for TBS instead of the main ES?
Would this configuration be useful and/or desired?
Thinking from potential overhead on the central ES cluster from TBS load, and also from network footprint of that communication.

@alex-fedotyev
Copy link

Also wonder from usability perspective..
Why would anyone put APM server on a shared policy together with some other integrations?

@axw
Copy link
Member

axw commented Jan 19, 2022

would we ever expect users to use separate ES for TBS instead of the main ES?

It's possible, but I can't think of a reason why anyone would want to do that.

Would this configuration be useful and/or desired?
Thinking from potential overhead on the central ES cluster from TBS load, and also from network footprint of that communication.

I wouldn't expect the overhead to be significant enough to warrant it. For the theoretically best performance you might want to use some separate pub/sub system, but this would come with considerable operational overhead, particularly if you've got a multi-cloud architecture.

Why would anyone put APM server on a shared policy together with some other integrations?

For TBS, the idea is that APM Server runs on or near the instrumented service host. If it's on the host, then it's probably going to be co-located with the System integration, maybe Endpoint, maybe Prodfiler.

@simitt simitt added this to the 8.2 milestone Feb 3, 2022
@simitt
Copy link
Contributor Author

simitt commented Feb 7, 2022

Short term (8.2): APM Integration does not support outputs different than ES.

Long-term: Find a way to support TBS for different outputs.

Please refer to #7028 (comment) for more details.

@simitt simitt removed this from the 8.2 milestone Feb 7, 2022
@axw axw removed the 8.3-candidate label Mar 15, 2022
@axw
Copy link
Member

axw commented Nov 15, 2022

At present there are no concrete plans to support other outputs in the APM integration. Let's reopen this if/when that changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants