Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for community supported plugins #937

Closed
6 tasks done
SylvainJuge opened this issue Nov 21, 2019 · 17 comments
Closed
6 tasks done

Add support for community supported plugins #937

SylvainJuge opened this issue Nov 21, 2019 · 17 comments
Assignees
Labels
agent-java enhancement Enhancement of an existing feature
Milestone

Comments

@SylvainJuge
Copy link
Member

SylvainJuge commented Nov 21, 2019

Issue description

Community contributions are split in two broad categories

  1. agent "core" and common infrastructure code
  2. agent "plugins", which provide support for specific frameworks/libraries.

Plugins (2) are the easiest way to add features to the agent and are also the most common contribution type.
We (Elastic Java Agent team) currently take ownership of all those added features and merge them to the plugins, which means asking for changes in pull-requests to meet quality/style standards.

While doing so seems fair, it means that merging contributions takes a considerable effort and time, and that our team is often the bottleneck here, often with back and forth with contributors.
Also, keeping pull-requests open for a while makes them harder to merge.

Proposed solution

https://docs.google.com/document/d/1LFTBDLM3-Sj7cU1pHFxfgI3donyTMp8tvjFpyJM7dF8/edit

  • Create apm-agent-plugin-sdk: used both by internal and external/community plugins
    • agnostic of the tracer API
  • External plugins can use the public API or the OpenTracing API bridge
  • External plugins are loadable from $AGENT_HOME/plugins
  • Testing: relies on internal API and internal test helpers such as AbstractInstrumentationTest and MockReporter
    • Creating and maintaining abstractions that are backwards compatible doesn't seem feasible and will always lag behind
    • The important objective of the plugin being compatible for the users of the plugin is still maintained
    • When the plugin maintainer bumps the agent version, they have to resolve breaking changes in test code
    • Makes it easier to eventually pull external plugins into the core agent as the same testing framework is used.
  • Plugin registry
    • Would be just the supported technologies page linking to external GH repos
    • The plugins can be downloaded from the releases page

Tasks

@SylvainJuge
Copy link
Member Author

Also, we currently have "incubating" features (currently JMS is one of them), those are disabled by default because disable_instrumentations=incubating in the default configuration.

There are two possible approaches:

  1. reuse the same feature and set disable_instrumentations=incubating,community by default
  2. create a dedicated opt-in option for enabled community instrumentations enabled_community_instrumentations that would be empty by default.

Pros/cons

  1. will automatically enable community-supported plugins if default value is changed, even if only a few users do, that would be a breaking change
  2. adds another configuration option

@felixbarny
Copy link
Member

I would prefer the explicit opt-in via enabled_community_instrumentations

@SylvainJuge
Copy link
Member Author

I've changed a bit my mind for the naming part, thus I feel now that community is a better choice for the following reasons:

  • it's independent from maturity of the plugin: alpha/beta/incubating, we currently use instrumentation group names for this purpose.
  • it focuses on ownership, and here our goal is to 1) make contributions easier and 2) set users expectations

Also, we should probably start to have a distinction between core plugins and community plugins for naming, but we will keep the existing structure for the core part.

+1 on using a single enabled_community_instrumentations to opt-in those, as adding new ones will not be made active when updating the agent.

@SylvainJuge SylvainJuge added this to the 7.7 milestone Mar 10, 2020
@felixbarny
Copy link
Member

allow building plugins outside of agent codebase & load them dynamically

Let's discuss whether that would be a feasible option.

It has the following advantages:

  • Users can independently develop and distribute plugins
  • Clearer ownership as the plugins would be in a separate GitHub repo
  • Plugins can incubate independently from the main repo and as they mature, we can pull them in

Downsides of the initial suggestion of creating a dedicated contrib folder within the main repo:

  • we might miss the main objective of having community-owned plugins: we would still be the "gatekeepers" and might be slow to respond to PRs or to merge them in.

The challenges

  • Our internal API can always change without considering that a breaking change
  • Only the public API is guaranteed to stay stable within a major version
    • The public API is does not have the full functionality of the internal API
    • The performance is worse, as transactions and spans have to be wrapped
    • There's no support for auto-instrumentation

What we currently recommend to users that want to add plugins for their custom frameworks, which should not be included in the main repo, is to fork the repo and add an additional plugin in the apm-agents-plugin folder. They already have to deal with changes in the internal API and have to rebase with the upstream and build their own fork whenever they want to update the agent version.

I think it makes sense to fold the two use cases of community plugins and custom plugins and improve the experience of developing them in the following ways:

  • Create examples/tutorials/blogs/templates/maven archetypes for external plugins
  • Make plugins loadable from the $AGENT_HOME/plugins directory
  • List community plugins in the contrib repo or in the supported technologies docs.
  • Documenting the path how to get a community plugin into the main repo
    As those plugins work in the same way as internal plugins, it is very easy, from a technical standpoint, to just move those plugins into the apm-agent-plugins folder.
  • On a best effort basis, list changes in the internal API in the release notes

The plugin development for Kibana and Elasticsearch is actually quite similar. Maybe the situation has changed since I had my last touchpoint with it but in Kibana, you had to explicitly declare for which version an external plugin works because the internal APIs can change without notice.

@SylvainJuge
Copy link
Member Author

I really like this proposal as it's definitely a "best of both worlds" option, without much extra complexity on agent architecture.

We should probably add a way to check for agent/plugin compatibility, probably using jar manifest, that would allow us to still break things when needed, and then incompatible plugins would be disabled with a proper warning and an optional link to plugin github repository (if any).

On a side note, that probably means publishing our internal APIs as regular jars, which might be a bit confusing when people are searching maven central (plugins outside of our codebase will depend on those). This is quite minor and if we make a good walkthrough/doc we can make this user-proof.

@felixbarny
Copy link
Member

On a side note, that probably means publishing our internal APIs as regular jars

I don’t think we’d be doing that. Users need to compile against the shaded version of Byte Buddy, for example. Do they’d just have a provided dependency on the elastic-apm-agent module, which we already publish on maven central.

@eyalkoren
Copy link
Contributor

I like this discussion and I agree that moving outside the repo is important in order to achieve what we want. I am not feeling comfortable doing that and relying on our internal APIs though.

I just did two refactors lately that we wanted to do in our internal API. One would have practically break all community plugins and the other would have break those creating transactions (currently in progress are such for Ratpack, WebFlux, RocketMQ, Dubbo and probably Scala). We already know of a future refactor we have in mind that will practically break all in multiple places. I can easily see how having 30 plugins being affected by such changes affect our decisions, or the effort they take. Start breaking things for all the users of those plugins upon upgrade is unpleasant.

What if we did what @felixbarny suggested but allow usage of the public API only?
Regarding the related listed concerns:

The public API is does not have the full functionality of the internal API

True, but it is already reacher than the OpenTracing API that was proved useful for creation of custom plugins. Like we did so far, we can be attentive and adjust it to support future requirements.

The performance is worse, as transactions and spans have to be wrapped

I am pretty sure that wouldn't be the major performance concern for most community plugins that we are not involved in developing. The fact that these plugins are likely to be not-as-efficient is inherent and I believe acceptable. Going back to OpenTracing- it has the same (or worse) limitations and it got its adoption.

There's no support for auto-instrumentation

We can do the same as suggested above - loading plugins from the $AGENT_HOME/plugins directory.

@felixbarny
Copy link
Member

Definitely valid concerns.

I can easily see how having 30 plugins being affected by such changes affect our decisions, or the effort they take. Start breaking things for all the users of those plugins upon upgrade is unpleasant.

Part of this would be educating users that things break at any time. It should not influence our decisions about refactoring the internal API, however.

Also, note that we're already in this situation for users that are adding plugins for their internal frameworks. But I acknowledge that promoting public API-based community plugins would expose many more users to this problem.

What if we did what @felixbarny suggested but allow usage of the public API only?

Why not both 🙂
We could let the user decide which API (internal or public API) to use.

However, there are some technical complications with allowing the public API in auto-instrumentation plugins.

  • Plugins using the public API would still have a provided dependency to elastic-apm-agent in order to leverage type matchers and advice.
  • At the moment, the core of the agent is loaded form the bootstrap class loader
    • As a consequence, plugins can't ship with any libraries, including the public API, as it could introduce version conflicts.
    • We'd have to load the plugins from an isolated class loader to avoid conflicts. However, the consequence is that advices, which are inlined in the methods they instrument, can't access static variables declared in that plugin for example.
    • Public API-based auto instrumentation plugins could become viable again as we rework the agent/advice loading architecture to require using method handles to call the agent from within an advice.

In the meantime, I'd suggest starting with internal API plugins and consider public API plugins after we have refactored the agent to allow for that.

@eyalkoren
Copy link
Contributor

Plugins using the public API would still have a provided dependency to elastic-apm-agent in order to leverage type matchers and advice.

Right, although this is not agent code. They can rely on ByteBuddy and the build will shade those. Or other solution.

As a consequence, plugins can't ship with any libraries, including the public API, as it could introduce version conflicts.

They don't need to ship with it, the API can be provided as well. There is no expectation that these plugins will work without an agent.

@SylvainJuge SylvainJuge removed this from the 7.7 milestone Mar 12, 2020
@SylvainJuge
Copy link
Member Author

Update on the spec after today's discussion.

Summary

  • community plugins should not be part of our codebase
  • we will profide and maintain a list of available community plugins
  • those plugins will be separate binaries loaded by the agent at runtime
  • we defined a custom instrumentation ladder (see below) where community plugins are the 3rd level

Custom instrumentation ladder

1st level

  • Instrument application itself through explicit changes in application code
  • Relies on pubic agent API
  • User manual: mostly documentation

2cnd level

  • Instrument framework using interceptors (filters, event hooks, framework interceptors, ...)
  • Relies on pubic agent API
  • Shipped as regular application dependencies
  • Promoted through list of community plugins
  • User manual: documentation + blog posts + contrib repository samples

3rd level

  • Instrument framework using byte-buddy advices (auto-instrumentation)
  • Relies on internal agent API (thus more likely to break).
  • Shipped as external community agent plugins, outside of application dependencies
  • Major technical requirement: depends on changing current plugin classloading/isolation
  • Strong version agent/plugin version check (up to patch version).
  • Promoted through list of community plugins
  • User manual: Documentation + blog posts + contrib repository samples

4th level

  • Instrument framework using byte-buddy advices (auto-instrumentation)
  • Shipped as core agent plugins, embedded within the agent
  • Plugin is part of our official roadmap
  • Plugins promoted to this stage require a 3rd level plugin

Given there is a strong technical dependency on solving classloading/isolation, we will need to solve this first as it's a pre-requisite.

@felixbarny
Copy link
Member

@roncohen @alvarolobato As you raised some concerns with the previous plan to have community plugins within the main repo, could we get your opinion on the new plan?

See #937 (comment) and #937 (comment)

@alvarolobato
Copy link
Contributor

Thanks team for looking into this, this is a key initiative

I read through the proposal and it looks good to me with the exception of allowing plugins to rely on the internal API.
You argue that it is a decision of the user to use the internal API or not, but the reality is that it is a decision of the developer of the plugin, not the final user of the plugin.

My concern is getting into a situation where plugins are hard to maintain because we break the internal API, maintainers take a long time to update and our users start feeling the pain.

This also can happen with public API's, but that's the objective of public API's and that's why we are so careful when breaking them.

I've lived this situation and is very painful and in the end, it tends to hamper your progress because you are conditioned to not break your own internal API.

Having the plugins in an external repo is something we need, but it will make all this worse by making it very hard to do a breaking change that also fixes the plugin (we also don't want to do for community plugins we don't maintain).

I don't fully understand why we want to allow internal API, what are the advantages over just allowing public API?

@felixbarny
Copy link
Member

The truth is that to develop an auto-instrumentation plugin, you not only rely on the tracer API (to create transactions/spans). You also need to define the instrumentation. Byte Buddy helps in that regard as it lets you define matchers (which methods to instrument) and advices (the code to be injected into methods). We have developed lots of glue code that helps to navigate the issues around class visibility. This glue code is subject to change as well. In fact, we have plans for a major overhaul of this in order to properly support runtime attachment.

If we'd only allow access to the public API for external plugins, developers would be left without support when it comes to the hardest part of agent development: classloading/visibility. If we give them access to those helpers, we're basically allowing them to access the internals of the agent: any changes we make would be breaking ones.

Maybe we could extract some of these helpers to a separate module which is guaranteed to be stable. Or even try to work towards getting some of those changes into Byte Buddy.

But developing plugins only with the public API will always be quite limiting. There are a number of things you wouldn't be able to do like

  • Adding configuration options
  • Add lifecycle listeners (callbacks on start/stop/pause/resume)
  • Add activation listeners (callback when a span is activated/deactivated)
  • object pooling
  • I have probably forgotten some

While not all plugins may need that, for some the need will eventually arise. In that case, there's no way around relying on internal APIs

@felixbarny
Copy link
Member

Maybe we could extract some of these helpers to a separate module which is guaranteed to be stable. Or even try to work towards getting some of those changes into Byte Buddy.

I just had a chat with @roncohen about this.

He also shared concerns around the implications of more people relying on the internal API. We would either break lots of plugins by making changes in the internal API or be forced to maintain backwards compatibility for our own internal API. Both don't sound like good options.

We have discussed what we could have a plugin SDK, possibly in a separate repo, that allows people to write bytecode instrumentation plugins without relying on the internal API. The SDK would contain the public API (which could be made a bit more powerful in future versions) and an opinionated Byte Buddy setup, including things similar to ElasticApmInstrumentation and HelperClassManager.
However, there should be no shared code among the internal API and the plugin SDK so that we don't break the SDK API when changing something in the agent core. Instead, we may adapt some the agent core classes and duplicate them in the plugin SDK repo. This makes it easier to maintain compatibility.

Over time, we can make the SDK more powerful. For example, we could allow adding additional configuration options. But to start with it can be lightweight, even if it means it's not as powerful as the internal API.

This has a dependency on #1085, as we are revising the class loading and HelperClassManager mechanism in the core agent.

@alvarolobato
Copy link
Contributor

SGTM, thanks

@felixbarny
Copy link
Member

I've created a doc with the proposed changes in the module structure of the agent https://docs.google.com/document/d/1LFTBDLM3-Sj7cU1pHFxfgI3donyTMp8tvjFpyJM7dF8/edit

TL;DR:

  • Create apm-agent-plugin-sdk: used both by internal and external/community plugins
    • agnostic of the tracer API
  • External plugins can use the public API or the OpenTracing API bridge
  • External plugins are loadable from $AGENT_HOME/plugins
  • Testing: relies on internal API and internal test helpers such as AbstractInstrumentationTest and MockReporter
    • Creating and maintaining abstractions that are backwards compatible doesn't seem feasible and will always lag behind
    • The important objective of the plugin being compatible for the users of the plugin is still maintained
    • When the plugin maintainer bumps the agent version, they have to resolve breaking changes in test code
    • Makes it easier to eventually pull external plugins into the core agent as the same testing framework is used.
  • Plugin registry
    • Would be just the supported technologies page linking to external GH repos
    • The plugins can be downloaded from the releases page

@SylvainJuge SylvainJuge added enhancement Enhancement of an existing feature and removed [zube]: Backlog labels Feb 1, 2021
@AlexanderWert AlexanderWert added this to the 7.16 milestone Aug 26, 2021
@AlexanderWert
Copy link
Member

first iteration on external plugins support is completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java enhancement Enhancement of an existing feature
Projects
None yet
Development

No branches or pull requests

6 participants