-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APM Metering API #99832
APM Metering API #99832
Conversation
Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types: * Gauge - a signle number that can go up or down * Histogram - bucketed samples * Counter - monotonically increaming summed value * UpDownCounter - summed value that may decrease Supports both Long* and Double* versions of the instruments. Instruments can be registered and retrieved by MetricName through APMMeter which is available via the APMTelemetryProvider. The metering provider starts as the open telemetry noop provider.
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Hi @stu-elastic, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. I'm really pleased with the relative simplicity of the api, and am happy you were able to isolate the implementation.
I left some thoughts. In general the main things to look at are adding javadocs and thinking through some of the terminology (it was a bit confusing at times while reading through).
|
||
import static org.elasticsearch.telemetry.apm.internal.APMAgentSettings.APM_ENABLED_SETTING; | ||
|
||
public class APMMetric extends AbstractLifecycleComponent implements org.elasticsearch.telemetry.metric.Metric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would APMMeter be a better name? It matches more with "APMTracer" IMO, and also matches the underly otel terminology, while metric sounds to me like a single datapoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ on Meter to match OTEL terminology
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to APMMeter
.
this.enabled = APM_ENABLED_SETTING.get(settings); | ||
} | ||
|
||
public void setEnabled(boolean enabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this called? Doesn't it need to be wired up to the dynamic enabled setting? Also, are there any concurrency considerations then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is here wired to dynamic cluster settings
https://github.com/elastic/elasticsearch/pull/99832/files#diff-8e5ca1315006cc00c58ef42d4bfc14f50b30d11567a845f8e551ecf3e52c830bR58
regarding concurrency considerations we concluded that:
- enabled should be volatile
- Instruments.setProvider method is acquiring a lock. It needs this so that new instruments are not added at the same time when the implementation of the
Meter
(noop/otel) is set - whenever we add a new instrument via register* methods we need to acquire a lock
adding new instruments calling setProvider should be rare (we expect those to be called at most once during node lifetime)
|
||
public class APMMetric extends AbstractLifecycleComponent implements org.elasticsearch.telemetry.metric.Metric { | ||
private static final Meter NOOP = OpenTelemetry.noop().getMeter("noop"); | ||
private final Instruments instruments = new Instruments(NOOP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this start with the correct instrument if enabled on construction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
import java.util.Objects; | ||
|
||
public class MetricName { | ||
public static final MetricName EMPTY = new MetricName("EMPTY"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case do we need an empty metric name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOOP meter from the TelemetryProvider. Currently used by default in a bunch of tests that don't care about metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed MetricName
and changed these names to noop
.
|
||
import java.util.Objects; | ||
|
||
public class MetricName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class seems like a simple wrapper around a String name. Any reason we can't just use String, or at worse a record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to make it easier to find metric names in the code base based on inspiration of TracerId
but we can start with a String and refactor if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case we should remove MetricName class? I cannot see if it used.
in https://github.com/elastic/elasticsearch/pull/99832/files#r1336150731 you mention you removed, perhaps you missed to commit it
|
||
package org.elasticsearch.telemetry.metric; | ||
|
||
public interface Metric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadocs on this class and methods will help other devs learn how to use the api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
import java.util.Map; | ||
|
||
public interface LongUpDownCounter extends Instrument { | ||
void add(long inc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me when one would be expected to use these 3 variants of add. Javadocs would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the threadcontext version and added javadocs.
|
||
@Override | ||
LongUpDownCounter buildInstrument(Meter meter) { | ||
return Objects.requireNonNull(meter).upDownCounterBuilder(getName()).setDescription(getDescription()).setUnit(getUnit()).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider making a local var for the builder so this is not such a long line (difficult to read).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in all the adapters will long lines.
import java.util.Map; | ||
|
||
public interface LongUpDownCounter extends Instrument { | ||
void add(long inc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me when one would be expected to use these 3 variants of add. Javadocs would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the threadcontext version and added javadocs.
import java.util.Objects; | ||
import java.util.concurrent.atomic.AtomicReference; | ||
|
||
public abstract class AbstractInstrument<T, I> implements Instrument { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The terminology here is a bit confusing. This is an instrument, but it also has an instrument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more like a delegate/decorator pattern.
One role of this class is to replace the implementation of the underlying delegate instrument. Perhaps we could rename this?
Brainstorming some ideas for a name: SwitchableInstrument
, AbstractSwitchableInstrument
(I am biased on my previous draft)
But is also has buildInstrument
which fits the simple AbstractInstrument
too.
alternatively we could just rename instrumentRef
to delegate
. Although I am not sure how much it helps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, left comments
|
||
import static org.elasticsearch.telemetry.apm.internal.APMAgentSettings.APM_ENABLED_SETTING; | ||
|
||
public class APMMetric extends AbstractLifecycleComponent implements org.elasticsearch.telemetry.metric.Metric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ on Meter to match OTEL terminology
private static final Meter NOOP = OpenTelemetry.noop().getMeter("noop"); | ||
private final Instruments instruments = new Instruments(NOOP); | ||
private volatile boolean enabled; | ||
private final AtomicReference<APMServices> services = new AtomicReference<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use services in this implementation. All we need is instruments
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
import java.util.Objects; | ||
import java.util.concurrent.atomic.AtomicReference; | ||
|
||
public abstract class AbstractInstrument<T, I> implements Instrument { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more like a delegate/decorator pattern.
One role of this class is to replace the implementation of the underlying delegate instrument. Perhaps we could rename this?
Brainstorming some ideas for a name: SwitchableInstrument
, AbstractSwitchableInstrument
(I am biased on my previous draft)
But is also has buildInstrument
which fits the simple AbstractInstrument
too.
alternatively we could just rename instrumentRef
to delegate
. Although I am not sure how much it helps
modules/apm/src/main/java/org/elasticsearch/telemetry/apm/internal/metrics/Instruments.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 comments.
I think we should also add some tests (I will look into this and try to commit to this PR)
alternatively the API (interfaces only) could be split from this PR and merged. And we could iterate on the implementation and keep on adding tests/improvements
record APMServices(Meter meter, OpenTelemetry openTelemetry) {} | ||
|
||
public APMMetric(Settings settings) { | ||
this.enabled = APM_ENABLED_SETTING.get(settings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call setEnabled here. currently a default value for tracing.apm.enabled
is false, so there will always be a cluster state update event that will update the value to true.
However, if we at some point change the default to true, there will be no setting value change, so setEnabled
will never be called from cluster settings updater https://github.com/elastic/elasticsearch/blob/main/modules/apm/src/main/java/org/elasticsearch/telemetry/apm/internal/APMAgentSettings.java#L45
I think we can mark the class final, so that we can safely call setEnabled (it is a public method, so there is a risk of this escape
problem if someone extend this class).
this.enabled = APM_ENABLED_SETTING.get(settings); | ||
} | ||
|
||
public void setEnabled(boolean enabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is here wired to dynamic cluster settings
https://github.com/elastic/elasticsearch/pull/99832/files#diff-8e5ca1315006cc00c58ef42d4bfc14f50b30d11567a845f8e551ecf3e52c830bR58
regarding concurrency considerations we concluded that:
- enabled should be volatile
- Instruments.setProvider method is acquiring a lock. It needs this so that new instruments are not added at the same time when the implementation of the
Meter
(noop/otel) is set - whenever we add a new instrument via register* methods we need to acquire a lock
adding new instruments calling setProvider should be rare (we expect those to be called at most once during node lifetime)
Great to see metric infrastructure being added to Elasticsearch! Just a drive-by question out of curiosity. Why is this abstracting/wrapping the OpenTelemetry Metric API into an Elasticsearch-specific metrics API instead of directly providing access to the OpenTelemetry Metric API? |
@felixbarny Using OTEL api directly would require introducing this as a dependency to all the modules/plugins that want to use it. With metrics (but tracing too) we also want to have control when the feature is enabled (hence the 'switcheable' abstract implementation) |
private final T unit; | ||
|
||
public AbstractInstrument(Meter meter, MetricName name, String description, T unit) { | ||
this.instrumentRef = new AtomicReference<>(buildInstrument(meter)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a problem with this approach.
- buildInstrument implementations are relying on the
getName
, getDescription and getUnit methods. THose are not initialised yet. - the implementing classes might implement this 'in a wrong way' (unlikely but still) leading to concurrency problems. Perhaps we can mitigate this by sealed class.
I think we should change the signature of buildInstrument method to accept name, description and unit and we could pass it as a constructor parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, moved buildInstrument last to avoid other fields being uninitialized.
… into meter-noop-api
…/elasticsearch into meter-noop-api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, I will address comments from this review
@@ -159,7 +167,7 @@ public void setAgentSetting(String key, String value) { | |||
|
|||
public static final Setting<Boolean> APM_ENABLED_SETTING = Setting.boolSetting( | |||
APM_SETTING_PREFIX + "enabled", | |||
false, | |||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should switch this back to false before we merge and override this in serverless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a commit to do this.
return provider.get(); | ||
} | ||
|
||
private void destroyApmServices() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could refactor this method to return noop instance.
(same as above createApmServices)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a commit to do this.
this.otelMeterSupplier = otelMeterSupplier; | ||
this.noopMeterSupplier = noopMeterSupplier; | ||
this.instruments = new Instruments(enabled ? createApmServices() : noopMeterSupplier.get()); | ||
setEnabled(enabled); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the above line, we no longer have to call setEnabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a commit to do this.
|
||
import java.util.Objects; | ||
|
||
public class MetricName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case we should remove MetricName class? I cannot see if it used.
in https://github.com/elastic/elasticsearch/pull/99832/files#r1336150731 you mention you removed, perhaps you missed to commit it
It was quite a long PR to go through but worth it. I really like how linear the API is, and how the OTel implementation is hidden effectively. I personally don't like abstract classes like AbstractInstrument (I like a very flat hierarchy) but it's a matter of personal taste and I can see why it is very convenient here. |
The latest version contains a fix to allow sending metrics to APM server. also adds a apm agent jvm options "enable_experimental_instrumentations", "true" which is required to enable the otel-metrics-instrumentation. relates #99832
Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types: * Gauge - a single number that can go up or down * Histogram - bucketed samples * Counter - monotonically increasing summed value * UpDownCounter - summed value that may decrease Supports both Long* and Double* versions of the instruments. Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider. The metering provider starts as the open telemetry noop provider. `telemetry.metrics.enabled` turns on metering.
The latest version contains a fix to allow sending metrics to APM server. also adds a apm agent jvm options "enable_experimental_instrumentations", "true" which is required to enable the otel-metrics-instrumentation. relates elastic#99832
Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types: * Gauge - a single number that can go up or down * Histogram - bucketed samples * Counter - monotonically increasing summed value * UpDownCounter - summed value that may decrease Supports both Long* and Double* versions of the instruments. Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider. The metering provider starts as the open telemetry noop provider. `telemetry.metrics.enabled` turns on metering.
The latest version contains a fix to allow sending metrics to APM server. also adds a apm agent jvm options "enable_experimental_instrumentations", "true" which is required to enable the otel-metrics-instrumentation. relates elastic#99832
Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types:
Supports both Long* and Double* versions of the instruments.
Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider.
The metering provider starts as the open telemetry noop provider.