-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics SDK synchronous instruments #1108
Conversation
433deeb
to
8404617
Compare
aa18e8a
to
f64082e
Compare
SUCCESS | ||
end | ||
|
||
def preferred_temporality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we also need a way to specify aggregation temporality - i.e. an attribute writer or an initializer argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will differ configuration of temporality to a follow up PR.
@exporter = exporter | ||
end | ||
|
||
def collect; end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want an optional timeout
parameter here as well. Not sure yet.
return | ||
end | ||
|
||
@metric_readers = @metric_readers.push(metric_reader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't really need the assignment here, since push
mutates the receiver.
@metric_readers = @metric_readers.push(metric_reader) | |
@metric_readers.push(metric_reader) |
Dumb question: this will return the @metric_readers
, right? Is that a good idea or should we explicitly return nil
? I know we do the same in TracerProvider
- I think that might be a mistake as well.
# a non-specific failure occurred, Export::TIMEOUT if a timeout occurred. | ||
def force_flush(timeout: nil) | ||
@mutex.synchronize do | ||
return Export::SUCCESS if @stopped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the style difference between this and shutdown
above, specifically return
ing from the block here, and not in shutdown
. Poking around, it looks like MRI will ensure the mutex is released even on return
, though this isn't explicitly specified in the docs. We use return
in the TracerProvider
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just a stylistic difference. Do you have a preference around the use of explicit vs implicit returns for a method like this? (I guess the argument could be made that we should just remain consistent with the style already being used in the tracer provider)
# Attempts to stop all the activity for this {MeterProvider}. | ||
# | ||
# Calls MetricReader#shutdown for all registered MetricReaders. | ||
# Calls MetricExporter#shutdown for all registered MetricExporters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not doing anything with MetricExporter
s yet.
418b7d7
to
f075de5
Compare
c78ee10
to
99d64ca
Compare
Hi everyone, do we have plan to support this? |
Hey @yfractal, could you clarify what you mean by support this? This is a WIP branch for the metrics SDK implementation, when it's no longer a WIP we will release and support it if that is what you are asking. |
module Metrics | ||
module Export | ||
class ConsoleMetricExporter | ||
PREFERRED_TEMPORALITY = 'delta' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're using frozen_string_literal
, is there any benefit to the named constant?
Hi @robertlaurin, yeah. it is, so great! hope it will be merged into master soon. I'm working on Ruby OpenTelemetry metric related thing, like to help on this. |
Definitely want this to be out sooner than later too 😄 Right now the focus is getting the foundational pieces of metrics SDK implemented to make it easier for collaboration on it. So an non exhaustive list would be:
Once those are at a point where we feel like there is some stability this PR will ideally get merged. Some of the obvious work to pick up after that will be:
|
I read through the specification and Python SDK those days, I fond there are a lot of concepts and not easy to understand. If we have the "foundational pieces", it will be help a lot ! |
@ahayworth @fbogsany this is at reviewable stopping point again. |
module Export | ||
# MetricReader provides a minimal example implementation. | ||
# It is not required to subclass this class to provide an implementation | ||
# of MetricReader, provided the interface is satisfied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's problematic that we have to expose metric_store
as part of the interface. That should be a SDK-internal concern, which means we probably do have to mark it as API-private and require concrete implementations to subclass this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I was really hoping to provide a simple duck type interface similar to the span processor instead of requiring it them to inherit from a base metric reader.
The intention is to have more focused work around this be part of the follow up PRs for things like the periodic pull reader exporter.
metrics_sdk/lib/opentelemetry/sdk/metrics/export/in_memory_metric_pull_exporter.rb
Outdated
Show resolved
Hide resolved
@mutex = Mutex.new | ||
end | ||
|
||
def collect(start_time, end_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does data_points
keep growing without bound? I'm not clear on the expected behaviour here - part of me thinks we're supposed to only return data points recorded during the interval start_time .. end_time
, but I'm really not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That needs to be addressed as part of the aggregation temporality work that is not part of this PR, i.e. sum or delta.
In the case of sum, there should be some concept of pruning. It's mentioned in the guidelines.
So here are some suggestions that we encourage SDK implementers to consider:
- You want to control the memory usage rather than allow it to grow indefinitely / unbounded - regardless of what aggregation temporality is being used.
- You want to improve the memory efficiency by being able to forget about things that are no longer needed.
- You probably don't want to keep exporting the same thing over and over again, if there is no updates. You might want to consider Resets and Gaps. For example, if a Cumulative metrics stream hasn't received any updates for a long period of time, would it be okay to reset the start time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much to add other than what Francis already mentioned!
Co-authored-by: Francis Bogsanyi <[email protected]>
Co-authored-by: Andrew Hayworth <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions — feel free to disregard, as most are borne from not fully understanding the spec. Nice work!
|
||
@mutex.synchronize do | ||
raise DuplicateInstrumentError if @registry.include? name | ||
OpenTelemetry.logger.warn("duplicate instrument registration occurred for instrument #{name}") if @instrument_registry.include? name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might consider including something like "this can result in corrupted / invalid metrics" or whatever, since that's a potential consequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will get addressed in future revisions/reviews of the API, a few things changed since we merged the original API PR.
module Aggregation | ||
extend self | ||
|
||
SUM = ->(v1, v2) { v1 + v2 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth a comment as to why we're setting a constant to a lambda, here (related: why are we setting a constant to a lambda, here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce allocations, similar idea to
opentelemetry-ruby/api/lib/opentelemetry/baggage/propagation.rb
Lines 17 to 25 in a46ba22
TEXT_MAP_PROPAGATOR = TextMapPropagator.new | |
private_constant :TEXT_MAP_PROPAGATOR | |
# Returns a text map propagator that propagates context using the | |
# W3C Baggage format. | |
def text_map_propagator | |
TEXT_MAP_PROPAGATOR | |
end |
This bit is likely to change in a follow up PR I'm working on.
module Metrics | ||
# The ConfiguratorPatch implements a hook to configure the metrics | ||
# portion of the SDK. | ||
module ConfiguratorPatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the file be called configurator_patch.rb
or this class be called ConfigurationPatch
or is misalignment intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, the name misalignment is not intentional. I'll fix it in my follow up PR.
# The metrics_configuration_hook method is where we define the setup process | ||
# for metrics SDK. | ||
def metrics_configuration_hook | ||
OpenTelemetry.meter_provider = Metrics::MeterProvider.new(resource: @resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth a comment indicating that @resource
instance var should be avail in the Configurator
, but 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @resource instance var will be available in the configurator, it's a little awkward as were developing this as a separate gem, but once we have stabilized the metrics api/sdk it will get merged into the SDK/API and this patch stuff will go away.
# Array values must not contain nil elements and all elements must be of | ||
# the same basic type (string, numeric, boolean). | ||
def record(amount, attributes: nil) | ||
update(OpenTelemetry::Metrics::Measurement.new(amount, attributes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to warn if negative numeric value, here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The histogram aggregation needs to be added properly, I'm working on that in a separate branch to follow up on this one.
def shutdown(timeout: nil) | ||
@mutex.synchronize do | ||
if @stopped | ||
OpenTelemetry.logger.warn('calling MetricProvider#shutdown multiple times.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always be true? If not, perhaps clearer to say "calling MetricProvider#shutdown while MeterProvider is marked as @stopped
"?
class MetricStore | ||
def initialize | ||
@mutex = Mutex.new | ||
@epoch_start_time = now_in_nano |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An epoch
from start_time to end_time is basically the duration between calls to collect
, is that right?
|
||
def add_metric_stream(metric_stream) | ||
@mutex.synchronize do | ||
@metric_streams = @metric_streams.dup.push(metric_stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why .dup
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a means of making the change atomic, the @metric_streams value won't change mid process. It'll either have the old array, or the new one. It won't change "in flight" so to speak.
|
||
def collect(start_time, end_time) | ||
@mutex.synchronize do | ||
MetricData.new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on how instantiating this object fulfills the obligations of collect
, but I'm definitely missing something....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think of it as collecting a snapshot of the metrics at a certain point in time. The metrics will accumulate between collect calls, so we want to capture all the information necessary to export and the current state of the data points. What's not shown here is that we will also need to configure the temporality of the metrics, so in the case of delta, after grabbing this snapshot we will drop all the datapoints. In cumulative temporality we need the datapoints and their values at the time of collection, but after collection finishes the data points on the metric stream can continue to grow, but we don't want to have that data change once it has been sent off for export. This is intended to be a simple struct to encapsulate this information distinctly from the metric stream.
|
||
def create_instrument(kind, name, unit, description, callback) | ||
super do | ||
case kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is that super do
will yield the return of the case
statement to the superclass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
Hey merging this is a huge achievement! Way to go @robertlaurin! |
This is exciting! Would love to use it, when will be a release cut? |
Hi @lisamburns - not in the immediate future. We need to add aggregation support in progress here, and then exporter support (not yet formally started, although very rough versions have been written as proof-of-concept stuff). I don't know that we have a specific timeline to share, but @robertlaurin has been working on getting it to a state where the work can be done in parallel. He could likely share more information! |
Hey @lisamburns, the metrics isn't quite ready for consumption yet. We'll cut a release when more of the implementation is complete. In its current state you could simple create synchronous instruments but the export path is not implemented, and the aggregation is incomplete. You can track the work in flight and the work to do on the metrics project board https://github.com/open-telemetry/opentelemetry-ruby/projects/2 |
First pass implementation of the Metrics SDK.
In scope:
Out of scope: