-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to unregister/stop/destroy instruments #2232
Comments
(Following https://github.com/open-telemetry/opentelemetry-specification/blob/main/CONTRIBUTING.md#issue-triaging process). @jsuereth I am reassigning this to you, I believe you know the context of this better. |
@mateuszrzeszutek micrometer is an API + Some implementation(s). I am not sure why that should translate into calls to our API instead of offering an implementation of micrometer that implements the /cc @jsuereth |
@bogdandrutu Micrometer was just one example but do you think apps should not have a way to stop reporting metrics? Dropwizard metrics in Java also has |
There are advantages to calling OTel API in micrometer bridge instrumentation:
Also, micrometer aside, the unregister/remove functionality is still very much needed. The database connection pool example from my first post here is still valid. |
Just another note about Micrometer, I expect it to be used for a very long time by Spring and likely Spring users too as a result. For example, for tracing they will use Micrometer Tracing, not the OpenTelemetry tracing API. I think the reality of the Java ecosystem means we should consider Micrometer as a first class API for Java metrics, and this seems OK to me in practice. As such, I'd like Micrometer usage to benefit from features like exemplars / baggage which means going through our API (or well our SDK implementations directly maybe, but not |
It is just a representation difference, it is still a Histogram, I don't see the point.
The micrometer instrumentation is not really an instrumentation, is actually an SDK/Producer whatever you call it, since micrometer is an API + Impl to produce metrics. Not sure if any metrics library should be seen as "instrumentation", but rather as producers of telemetry.
I am not saying to not have an integration (not instrumentation) with micrometer.
It is very hard to do that since their API is not well designed for that (or was not designed with that in mind). FYI: We don't have enough tracing APIs let's create another one :))) |
Well that's true, conceptually it's the same. It results in different
Doesn't that sort of work by default? If you're using a |
I would like to re-phrase this feature request: Can we allow deleting or unregistering Callbacks? For discussion in the Tuesday Jan 25 8AM PT Spec SIG. |
There is a connected topic discussed in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#asynchronous-example-attribute-removal-in-a-view The question is how, if at all, should an SDK know to stop reporting a metric? |
Just wanted to add an example that it's fairly easy to introduce memory leaks into apps without the ability to unregister callbacks. For example if there is a connection pool that is being decomissioned, possibly due to a database resharding, then it should be able to be garbage collected. But with any natural registration (no complication of weak references) of an async instrumentation, without the ability to remove the callback from OpenTelemetry this pool can never be collected. |
To back up @anuraaga's comment, while it's not as popular, old-school Java EE servers (like tomcat) or anything that uses hot / dynamic loading of plugins need the ability to clean up their memory and usage so Java can evict their bytecode and RAM. For @bogdandrutu I want to directly address the notion that Micrometer should use MetricProducer as is done for OpenCensus. The only reason we use MetricProducer bridge for OpenCensus is because the APi is so divergent from OpenTelemetry that we are unable to do a direct API bridge. If we could have done direct API, we'd have preferred that one. However because OpenCensus metric model is close enough to OTel's, we're not loosing too much with the MetricProducer bridge. I think @mateuszrzeszutek raises some great points around Metric bridges, and I want to call out a few of them:
An API <-> API bridge provides the easiest path forward for instrumentation authors to accomplish that goal. |
@jsuereth Nice to hear all these great things, but I want to also ensure that we are not adding unnecessary complexity to our APIs to support this, hence the "very divergent" part that you mention. The proposal says if this API has a bad thing we should support it because we prefer this type of integration.
A "MetricProducer bridge" will give that to the micrometer users.
A "MetricProducer bridge" will give that to the micrometer users. Resource can still be attached to the bridge instance.
Not 100% convinced that we cannot offer this, but I think we can with a bit of work. @jsuereth it is very important to distinguish between a "instrumentation user friendly API" which is design to be used in the application instrumentation vs an "sink" API that you are doing when wrapping this API. I don't want us to become a "sink" API, if that is the goal then let's have a sink API design and not try to change the current API designed for users to instrument application to fit that goal. Another argument is that micrometer is not an API :) sorry but it has way too much in that artifact to be considered an API. |
@jsuereth also I do agree that they are good reasons to support unregister/close/etc for an instrument, but I don't think the reason should be that another API has it. |
@jsuereth @mateuszrzeszutek is #2317 resolving this? I don't have that feeling... |
To clarify my intention, I believe #2317 narrowly solves this problem stating the SDK SHOULD give applications a way to create callbacks that support being unregistered, which allows a user to stop certain asynchronous measurements from reporting without shutting down an entire The Prometheus ecosystem uses a As the description states, users can simply stop using synchronous instruments if they want to stop reporting measurements; this doesn't seem like a problem we need to help the user with, but if we did, I would suggest that stop/close/destroy simply invalidates the instrument instance so that it can no longer be used to capture measurements. I suggest we file a separate issue or issues to discuss the following points: Need for Delete() verb?The Need for exporter memory preferences?In a push-based exporter, whether using cumulative or delta temporality, there's a question of whether to push a value that has not changed. Since we expect cumulative data to be pushed into a Prometheus ecosystem, it seems wise to default to pushing all values, even unchanged ones. In an exporter with a preference for deltas, it's typical to avoid reporting any delta where the delta (sum) or count (histogram) are zero. For synchronous instruments, this is something the SDK can facilitate by detecting stale streams and, after a while, forgetting them. SHOULD the SDK be required to support forgetting streams? For asynchronous instruments, if the user simply stops reporting values they will stop reporting, unless we require the SDK to detect staleness. Should the Prometheus exporter be responsible for this on its own, or should the SDK facilitate an exporter preference for "synchronous instrument memory"? This topic is connected with #2132 because if the SDK is required to detect staleness, then no additional memory is required to perform cumulative-to-delta translation. @mateuszrzeszutek please signal whether simply the option to unregister callbacks is enough, coupled with memory options for exporters? Or are you looking for something like Prometheus' |
I'd like to add to this issue in favor of having a cleanup mechanism in the spec: We perform monitoring of IoT devices which can be added/removed from our systems at any time by operators and would also benefit from being able to stop reporting metrics when those IoT devices are being decommissioned. Currently, the .NET implementation of asynchronous counter/gauges supports removing metric points from a specific metric, but this causes the last value emitted for that metric point to be re-exported ad-infinitum (or just become a stale entry in the metric point list when using delta export mode) by the OTEL SDK implementation. Another thing that would be interesting to specify is stale markers when a time series is known to be "finished". e.g. Prometheus Remote Write also defines stale markers that can tell Prometheus that a time series is known to be stale and immediately marks it as stale so that it no longer shows up. I think it would be nice to have a specified way to do something similar via OTLP. I would be willing to spend some time working on this if there is interest. Thanks for the hard work on OTEL, by the way! EDIT: I had the wrong link to stale markers in Prometheus |
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes pulp#4603
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes pulp#4603
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes pulp#4603
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes pulp#4603
At the moment, it is not possible to destroy instruments that send metrics. Therefore, when a user removes a domain, there might be still the metrics about it emitted. A temporary workaround is to restart the pulpcore-api process to reload meters. Ref: open-telemetry/opentelemetry-specification#2232 closes #4603
We're elevating this and we have closed #3985 in favour of this issue |
I there anything I can do to help with moving the extension of the spec forward? Is there a PR open perhaps on proposed changes to the spec for allowing removal? |
Hey, I'm interested in this and would like to make progress, but first would like to understand some details. Overall I see this as two parts:
Based on the feedback, I think the important part is focusing on 2), as what we have for 1) at this time seems enough. Opinions? cc @jmacd |
ad. 1. Is probably not only about the SDK, but also Metrics API , right? Reference #2232 (comment) |
Asynchronous instruments can already be unregistered: opentelemetry-specification/specification/metrics/api.md Lines 417 to 419 in ca6b2fc
Asynchronous instruments can also already manage cardinality within the callback since they can return only Measurements Synchronous instruments don't have a way to unregister the instrument as a whole, and also don't have a way to delete a single attribute set.
Being able to stop exporting metric points for deleted asynchronous instruments is already a recommendation: opentelemetry-specification/specification/metrics/sdk.md Lines 769 to 772 in 0319dea
Presumably, if (1) is implemented for synchronous instruments, the SDK would have a similar requirement around not producing data for deleted attribute sets. There are two orthogonal issues that are related to this, but IMO can be solved separately:
Overall, I would recommend starting by trying to define a way to delete an attribute set from a synchronous instrument. |
@carlosalberto the way I understand this issue, we can see this as an API, an SDK, or a data model problem. The API problem, maybe, is that there's no equivalent of Prometheus Delete(), which is an action meaning to "delete a single attribute set" from residence in the SDK. The SDK problem, maybe, is that we have SDKs that can be configured with Delta temporality but do not automatically drop memory for unused attribute sets. This is an optimization demonstrated in https://github.com/lightstep/otel-launcher-go/blob/main/lightstep/sdk/metric/README.md, however I'm not sure which or how many OTel SDKs perform this way. @alxbl wrote about this, and
The data model problem is linked in @dashpole's reply above. |
This seems to go against this: opentelemetry-specification/specification/metrics/sdk.md Lines 769 to 772 in 0319dea
If that is accurate, I would open an issue in https://github.com/open-telemetry/opentelemetry-dotnet. |
I was away from work for a while, I'm back now. I'll review the thread and open the issue in the .NET implementation. I want to validate that the behavior is still like that in the latest release (my comment was written back in 1.4.0) first though. EDIT: I am able to repro on the 1.11.1 build of the SDK and there's already an issue open in the dotnet SDK, I've referred back to here. |
What are you trying to achieve?
I'm currently working on the micrometer->OTel bridge instrumentation in the javaagent. Micrometer offers the possibility to remove a meter from the
MeterRegistry
and stop emitting whatever metrics it used to collect. For example, suppose you use a database connection pool that's instrumented with metrics - when you close/destroy the whole pool you probably want to stop collecting any metrics associated to it (because it doesn't exist anymore). This is useful for both asynchronous instruments (since once they're registered there's no way to stop them) and synchronous instruments (you can just stop using them, but the metrics SDK will still send in the last recorded value).Additional context.
Micrometer bridge PR: open-telemetry/opentelemetry-java-instrumentation#4919
CC @jsuereth
The text was updated successfully, but these errors were encountered: