Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APM Metering API #99832

Merged
merged 26 commits into from
Sep 29, 2023
Merged

APM Metering API #99832

merged 26 commits into from
Sep 29, 2023

Conversation

stu-elastic
Copy link
Contributor

@stu-elastic stu-elastic commented Sep 22, 2023

Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types:

  • Gauge - a single number that can go up or down
  • Histogram - bucketed samples
  • Counter - monotonically increasing summed value
  • UpDownCounter - summed value that may decrease

Supports both Long* and Double* versions of the instruments.

Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider.

The metering provider starts as the open telemetry noop provider.

Adds Metering instrument interfaces and adapter implementations for
opentelemetry instrument types:
* Gauge - a signle number that can go up or down
* Histogram - bucketed samples
* Counter - monotonically increaming summed value
* UpDownCounter - summed value that may decrease

Supports both Long* and Double* versions of the instruments.

Instruments can be registered and retrieved by MetricName through APMMeter
which is available via the APMTelemetryProvider.

The metering provider starts as the open telemetry noop provider.
@stu-elastic stu-elastic added >enhancement :Core/Infra/Core Core issues without another label labels Sep 22, 2023
@elasticsearchmachine elasticsearchmachine added v8.11.0 Team:Core/Infra Meta label for core/infra team labels Sep 22, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @stu-elastic, I've created a changelog YAML for you.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. I'm really pleased with the relative simplicity of the api, and am happy you were able to isolate the implementation.

I left some thoughts. In general the main things to look at are adding javadocs and thinking through some of the terminology (it was a bit confusing at times while reading through).


import static org.elasticsearch.telemetry.apm.internal.APMAgentSettings.APM_ENABLED_SETTING;

public class APMMetric extends AbstractLifecycleComponent implements org.elasticsearch.telemetry.metric.Metric {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would APMMeter be a better name? It matches more with "APMTracer" IMO, and also matches the underly otel terminology, while metric sounds to me like a single datapoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on Meter to match OTEL terminology

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to APMMeter.

this.enabled = APM_ENABLED_SETTING.get(settings);
}

public void setEnabled(boolean enabled) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this called? Doesn't it need to be wired up to the dynamic enabled setting? Also, are there any concurrency considerations then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is here wired to dynamic cluster settings
https://github.com/elastic/elasticsearch/pull/99832/files#diff-8e5ca1315006cc00c58ef42d4bfc14f50b30d11567a845f8e551ecf3e52c830bR58
regarding concurrency considerations we concluded that:

  1. enabled should be volatile
  2. Instruments.setProvider method is acquiring a lock. It needs this so that new instruments are not added at the same time when the implementation of the Meter (noop/otel) is set
  3. whenever we add a new instrument via register* methods we need to acquire a lock
    adding new instruments calling setProvider should be rare (we expect those to be called at most once during node lifetime)


public class APMMetric extends AbstractLifecycleComponent implements org.elasticsearch.telemetry.metric.Metric {
private static final Meter NOOP = OpenTelemetry.noop().getMeter("noop");
private final Instruments instruments = new Instruments(NOOP);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this start with the correct instrument if enabled on construction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import java.util.Objects;

public class MetricName {
public static final MetricName EMPTY = new MetricName("EMPTY");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case do we need an empty metric name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOOP meter from the TelemetryProvider. Currently used by default in a bunch of tests that don't care about metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed MetricName and changed these names to noop.


import java.util.Objects;

public class MetricName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class seems like a simple wrapper around a String name. Any reason we can't just use String, or at worse a record?

Copy link
Contributor Author

@stu-elastic stu-elastic Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to make it easier to find metric names in the code base based on inspiration of TracerId but we can start with a String and refactor if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case we should remove MetricName class? I cannot see if it used.
in https://github.com/elastic/elasticsearch/pull/99832/files#r1336150731 you mention you removed, perhaps you missed to commit it


package org.elasticsearch.telemetry.metric;

public interface Metric {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadocs on this class and methods will help other devs learn how to use the api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

import java.util.Map;

public interface LongUpDownCounter extends Instrument {
void add(long inc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me when one would be expected to use these 3 variants of add. Javadocs would help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the threadcontext version and added javadocs.


@Override
LongUpDownCounter buildInstrument(Meter meter) {
return Objects.requireNonNull(meter).upDownCounterBuilder(getName()).setDescription(getDescription()).setUnit(getUnit()).build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider making a local var for the builder so this is not such a long line (difficult to read).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in all the adapters will long lines.

import java.util.Map;

public interface LongUpDownCounter extends Instrument {
void add(long inc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me when one would be expected to use these 3 variants of add. Javadocs would help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the threadcontext version and added javadocs.

import java.util.Objects;
import java.util.concurrent.atomic.AtomicReference;

public abstract class AbstractInstrument<T, I> implements Instrument {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The terminology here is a bit confusing. This is an instrument, but it also has an instrument?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more like a delegate/decorator pattern.
One role of this class is to replace the implementation of the underlying delegate instrument. Perhaps we could rename this?
Brainstorming some ideas for a name: SwitchableInstrument , AbstractSwitchableInstrument (I am biased on my previous draft)

But is also has buildInstrument which fits the simple AbstractInstrument too.

alternatively we could just rename instrumentRef to delegate. Although I am not sure how much it helps

Copy link
Contributor

@pgomulka pgomulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, left comments


import static org.elasticsearch.telemetry.apm.internal.APMAgentSettings.APM_ENABLED_SETTING;

public class APMMetric extends AbstractLifecycleComponent implements org.elasticsearch.telemetry.metric.Metric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on Meter to match OTEL terminology

private static final Meter NOOP = OpenTelemetry.noop().getMeter("noop");
private final Instruments instruments = new Instruments(NOOP);
private volatile boolean enabled;
private final AtomicReference<APMServices> services = new AtomicReference<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use services in this implementation. All we need is instruments right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

import java.util.Objects;
import java.util.concurrent.atomic.AtomicReference;

public abstract class AbstractInstrument<T, I> implements Instrument {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more like a delegate/decorator pattern.
One role of this class is to replace the implementation of the underlying delegate instrument. Perhaps we could rename this?
Brainstorming some ideas for a name: SwitchableInstrument , AbstractSwitchableInstrument (I am biased on my previous draft)

But is also has buildInstrument which fits the simple AbstractInstrument too.

alternatively we could just rename instrumentRef to delegate. Although I am not sure how much it helps

Copy link
Contributor

@pgomulka pgomulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 comments.
I think we should also add some tests (I will look into this and try to commit to this PR)
alternatively the API (interfaces only) could be split from this PR and merged. And we could iterate on the implementation and keep on adding tests/improvements

record APMServices(Meter meter, OpenTelemetry openTelemetry) {}

public APMMetric(Settings settings) {
this.enabled = APM_ENABLED_SETTING.get(settings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call setEnabled here. currently a default value for tracing.apm.enabled is false, so there will always be a cluster state update event that will update the value to true.
However, if we at some point change the default to true, there will be no setting value change, so setEnabled will never be called from cluster settings updater https://github.com/elastic/elasticsearch/blob/main/modules/apm/src/main/java/org/elasticsearch/telemetry/apm/internal/APMAgentSettings.java#L45

I think we can mark the class final, so that we can safely call setEnabled (it is a public method, so there is a risk of this escape problem if someone extend this class).

this.enabled = APM_ENABLED_SETTING.get(settings);
}

public void setEnabled(boolean enabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is here wired to dynamic cluster settings
https://github.com/elastic/elasticsearch/pull/99832/files#diff-8e5ca1315006cc00c58ef42d4bfc14f50b30d11567a845f8e551ecf3e52c830bR58
regarding concurrency considerations we concluded that:

  1. enabled should be volatile
  2. Instruments.setProvider method is acquiring a lock. It needs this so that new instruments are not added at the same time when the implementation of the Meter (noop/otel) is set
  3. whenever we add a new instrument via register* methods we need to acquire a lock
    adding new instruments calling setProvider should be rare (we expect those to be called at most once during node lifetime)

@felixbarny
Copy link
Member

Great to see metric infrastructure being added to Elasticsearch!

Just a drive-by question out of curiosity. Why is this abstracting/wrapping the OpenTelemetry Metric API into an Elasticsearch-specific metrics API instead of directly providing access to the OpenTelemetry Metric API?

@pgomulka
Copy link
Contributor

Why is this abstracting/wrapping the OpenTelemetry Metric API into an Elasticsearch-specific metrics API instead of directly providing access to the OpenTelemetry Metric API?

@felixbarny Using OTEL api directly would require introducing this as a dependency to all the modules/plugins that want to use it. With metrics (but tracing too) we also want to have control when the feature is enabled (hence the 'switcheable' abstract implementation)
More generally abstracting the 3rd party library allows us to hide the implementation and refactor if we ever need this.

private final T unit;

public AbstractInstrument(Meter meter, MetricName name, String description, T unit) {
this.instrumentRef = new AtomicReference<>(buildInstrument(meter));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a problem with this approach.

  1. buildInstrument implementations are relying on the getName, getDescription and getUnit methods. THose are not initialised yet.
  2. the implementing classes might implement this 'in a wrong way' (unlikely but still) leading to concurrency problems. Perhaps we can mitigate this by sealed class.

I think we should change the signature of buildInstrument method to accept name, description and unit and we could pass it as a constructor parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, moved buildInstrument last to avoid other fields being uninitialized.

Copy link
Contributor

@pgomulka pgomulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, I will address comments from this review

@@ -159,7 +167,7 @@ public void setAgentSetting(String key, String value) {

public static final Setting<Boolean> APM_ENABLED_SETTING = Setting.boolSetting(
APM_SETTING_PREFIX + "enabled",
false,
true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should switch this back to false before we merge and override this in serverless

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a commit to do this.

return provider.get();
}

private void destroyApmServices() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could refactor this method to return noop instance.
(same as above createApmServices)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a commit to do this.

this.otelMeterSupplier = otelMeterSupplier;
this.noopMeterSupplier = noopMeterSupplier;
this.instruments = new Instruments(enabled ? createApmServices() : noopMeterSupplier.get());
setEnabled(enabled);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the above line, we no longer have to call setEnabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a commit to do this.


import java.util.Objects;

public class MetricName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case we should remove MetricName class? I cannot see if it used.
in https://github.com/elastic/elasticsearch/pull/99832/files#r1336150731 you mention you removed, perhaps you missed to commit it

@ldematte
Copy link
Contributor

It was quite a long PR to go through but worth it. I really like how linear the API is, and how the OTel implementation is hidden effectively. I personally don't like abstract classes like AbstractInstrument (I like a very flat hierarchy) but it's a matter of personal taste and I can see why it is very convenient here.
Looks GREAT to me 👍

@stu-elastic stu-elastic merged commit f8d09e9 into elastic:main Sep 29, 2023
stu-elastic pushed a commit that referenced this pull request Sep 29, 2023
The latest version contains a fix to allow sending metrics to APM server. also adds a apm agent jvm options
"enable_experimental_instrumentations", "true"
which is required to enable the otel-metrics-instrumentation.

relates #99832
piergm pushed a commit to piergm/elasticsearch that referenced this pull request Oct 2, 2023
Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types:
* Gauge - a single number that can go up or down
* Histogram - bucketed samples
* Counter - monotonically increasing summed value
* UpDownCounter - summed value that may decrease

Supports both Long* and Double* versions of the instruments.

Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider.

The metering provider starts as the open telemetry noop provider.

`telemetry.metrics.enabled` turns on metering.
piergm pushed a commit to piergm/elasticsearch that referenced this pull request Oct 2, 2023
The latest version contains a fix to allow sending metrics to APM server. also adds a apm agent jvm options
"enable_experimental_instrumentations", "true"
which is required to enable the otel-metrics-instrumentation.

relates elastic#99832
jakelandis pushed a commit to jakelandis/elasticsearch that referenced this pull request Oct 2, 2023
Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types:
* Gauge - a single number that can go up or down
* Histogram - bucketed samples
* Counter - monotonically increasing summed value
* UpDownCounter - summed value that may decrease

Supports both Long* and Double* versions of the instruments.

Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider.

The metering provider starts as the open telemetry noop provider.

`telemetry.metrics.enabled` turns on metering.
jakelandis pushed a commit to jakelandis/elasticsearch that referenced this pull request Oct 2, 2023
The latest version contains a fix to allow sending metrics to APM server. also adds a apm agent jvm options
"enable_experimental_instrumentations", "true"
which is required to enable the otel-metrics-instrumentation.

relates elastic#99832
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement Team:Core/Infra Meta label for core/infra team v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants