-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats/opentelemetry: Add OpenTelemetry instrumentation component #7066
stats/opentelemetry: Add OpenTelemetry instrumentation component #7066
Conversation
a417898
to
d60d637
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #7066 +/- ##
==========================================
- Coverage 82.39% 81.17% -1.23%
==========================================
Files 305 345 +40
Lines 31389 33927 +2538
==========================================
+ Hits 25864 27541 +1677
- Misses 4463 5213 +750
- Partials 1062 1173 +111 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review the tests on the next pass; wanted to get this pass out ASAP.
stats/opentelemetry/opentelemetry.go
Outdated
// EmptyMetrics represents no metrics. To start from a clean slate if want to | ||
// pick a subset of metrics, use this and add onto it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"if want to" - please fix. Or even just delete this bit. An example for how to enable/disable metrics would be more useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to "// EmptyMetrics represents no metrics. To start from a clean slate if the
// intended effect is to pick a subset of metrics, use this and add onto it."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm let me look into this Example test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do this once we nail down this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping - it will be nice to show examples of initializing: with metrics disabled, with 1-2 metrics enabled, with the default set of metrics enabled, and with 1-2 metrics disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
case *stats.Begin: | ||
ci := getCallInfo(ctx) | ||
if ci == nil { | ||
// Shouldn't happen, set by interceptor, defensive programming. Log it won't record? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah log.Error
SGTM. And in HandleRPC
above, processRPCEnd
below, etc.
And everywhere if you log.Error with "___ unexpected ___" then you shouldn't need a comment to double-explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I'll add errors in logs. I'll leave the comment that it shouldn't happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
metrics: map[string]bool{ | ||
"grpc.client.attempt.started": true, | ||
"grpc.client.attempt.duration": true, | ||
"grpc.client.attempt.sent_total_compressed_message_size": true, | ||
"grpc.client.attempt.rcvd_total_compressed_message_size": true, | ||
"grpc.client.call.duration": true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Metrics.Empty.Add()
? (Or create a NewMetrics(...)
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather do the NewMetrics, but I think the decided API restricts the operations of construction enough (can't just pass in a list[], you have a default list you scale up and down by single metrics) that we should do a metrics.Empty.Add() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
has, and NewMetrics
can have a variadic parameter. So this could be NewMetrics("grpc.client.attempt.started", "grpc.client.attempt.duration", ....)
.
Please use consts for these metric strings (as above). I think we can just delete EmptyMetrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NewMetrics having a variadic parameter breaks the API that we discussed...Add Remove or Empty() than add. I understand what you're going for, but in that case I might as well have passed a slice of metrics to initialize like I originally wanted but they pushed back wrt API on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how EmptyMetrics.Add(...)
is any different than NewMetrics(...)
except that one is much more obvious how to use it and doesn't have the weird loophole of someone doing (otel.EmptyMetrics = <something else>
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping on this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline; switched.
var DefaultClientMetrics = *EmptyMetrics.Add("grpc.client.attempt.started"). | ||
Add("grpc.client.attempt.duration"). | ||
Add("grpc.client.attempt.sent_total_compressed_message_size"). | ||
Add("grpc.client.attempt.rcvd_total_compressed_message_size"). | ||
Add("grpc.client.call.duration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please export const
s for the metric names (with type MetricName
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
stats/opentelemetry/opentelemetry.go
Outdated
// Users of this component should use these bucket boundaries as part of their | ||
// SDK MeterProvider passed in. This component sends this as "advice" to the | ||
// API, which works, however this stability is not guaranteed, so for safety the | ||
// SDK Meter Provider should set these bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove any lingering duplication from this where it's also repeated below. E.g. the bit about how users should set it in their SDK MeterProvider.
stats/opentelemetry/opentelemetry.go
Outdated
// the string "other". If unset, will use the method string as is. This is | ||
// intended to be used only for generic methods (see | ||
// grpc.UnknownServiceHandler), and not registered static methods (from | ||
// generated service protos). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like Set this to record the method name of RPCs handled by grpc.UnknownServiceHandler, but take care to limit the values allowed, as allowing too many will increase cardinality and could cause severe memory or performance problems.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appended "Take care to limit the values allowed, as allowing too many will increase cardinality and could cause severe memory or performance problems." If you feel strongly about switching the whole comment, let me know, I felt like it was important to call out generic vs. static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option is settable for both the server and client-side stats handler, right?
Is it only honored on server-side? If so, that should be included.
And then the bit about "from generated service protos" is too specific, since you can register handlers without using protobuf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I'll switch over to your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait no this is client and server side. Client side you also can want a knob for filtering arbitrary method names. Generated service protos applies to client side, server side is registered protos + registered handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client side you also can want a knob for filtering arbitrary method names.
I thought that was what the call option was for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/grpc/proposal/blob/master/A66-otel-stats.md calls for the flow to be something like -
Is the method registered/known?
-- If it's registered/known, record it
-- If it's not, use the method filter to record it.
But, yeah, in your case, you can achieve this with the knob since the knob is available to the users. Your call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per offline discussion, we decided to go the route you mentioned server side (really no other option). On client side, generic methods also go through StaticMethodCallOption exposed to users (i.e. Invoke() or NewStream()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote a new documentation comment.
// rpcInfo is RPC information scoped to the RPC attempt life span client side, | ||
// and the RPC life span server side. | ||
type rpcInfo struct { | ||
mi *metricsInfo | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If rpcInfo
is meant to be per-attempt then it should probably be attemptInfo
instead.
return | ||
} | ||
|
||
meter := csh.mo.MeterProvider.Meter("gRPC-Go") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per https://github.com/grpc/proposal/blob/master/A66-otel-stats.md#language-specific-details, you should specify the version as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok just saw this section. Added version to both client and server side. I remember talking last year about how this name is a no-op and trying to figure out what to do with this.
} | ||
|
||
meter := csh.mo.MeterProvider.Meter("gRPC-Go") | ||
if meter == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the behavior over here? Can this actually happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the no-op meter provider thingy. It just doesn't record metrics. I don't think this can actually happen, this is mainly for defensive programming.
clientMetrics := clientMetrics{} | ||
|
||
if _, ok := setOfMetrics["grpc.client.attempt.started"]; ok { | ||
asc, err := meter.Int64Counter("grpc.client.attempt.started", metric.WithUnit("attempt"), metric.WithDescription("The total number of RPC attempts started, including those that have not completed.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description - "Number of client call attempts started"
Unit - "{attempt}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
if _, ok := setOfMetrics["grpc.client.attempt.duration"]; ok { | ||
cad, err := meter.Float64Histogram("grpc.client.attempt.duration", metric.WithUnit("s"), metric.WithDescription("End-to-end time taken to complete an RPC attempt including the time it takes to pick a subchannel."), metric.WithExplicitBucketBoundaries(DefaultLatencyBounds...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description - "End-to-end time taken to complete a client call attempt"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
if _, ok := setOfMetrics["grpc.client.attempt.sent_total_compressed_message_size"]; ok { | ||
cas, err := meter.Int64Histogram("grpc.client.attempt.sent_total_compressed_message_size", metric.WithUnit("By"), metric.WithDescription("Total bytes (compressed but not encrypted) sent across all request messages (metadata excluded) per RPC attempt; does not include grpc or transport framing bytes."), metric.WithExplicitBucketBoundaries(DefaultSizeBounds...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description - "Compressed message bytes sent per client call attempt"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
serverMetrics := serverMetrics{} | ||
if _, ok := setOfMetrics["grpc.server.call.started"]; ok { | ||
scs, err := meter.Int64Counter("grpc.server.call.started", metric.WithUnit("call"), metric.WithDescription("The total number of RPCs started, including those that have not completed.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description - "Number of server calls started"
Unit - "{call}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
if _, ok := setOfMetrics["grpc.server.call.sent_total_compressed_message_size"]; ok { | ||
ss, err := meter.Int64Histogram("grpc.server.call.sent_total_compressed_message_size", metric.WithUnit("By"), metric.WithDescription("Total bytes (compressed but not encrypted) sent across all response messages (metadata excluded) per RPC; does not include grpc or transport framing bytes."), metric.WithExplicitBucketBoundaries(DefaultSizeBounds...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description - "Compressed message bytes sent per server call"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
if _, ok := setOfMetrics["grpc.server.call.rcvd_total_compressed_message_size"]; ok { | ||
sr, err := meter.Int64Histogram("grpc.server.call.rcvd_total_compressed_message_size", metric.WithUnit("By"), metric.WithDescription("Total bytes (compressed but not encrypted) received across all request messages (metadata excluded) per RPC; does not include grpc or transport framing bytes."), metric.WithExplicitBucketBoundaries(DefaultSizeBounds...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description - "Compressed message bytes received per server call"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
if _, ok := setOfMetrics["grpc.server.call.duration"]; ok { | ||
scd, err := meter.Float64Histogram("grpc.server.call.duration", metric.WithUnit("s"), metric.WithDescription("This metric aims to measure the end2end time an RPC takes from the server transport’s perspective."), metric.WithExplicitBucketBoundaries(DefaultLatencyBounds...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description - End-to-end time taken to complete a call from server transport's perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
if _, ok := setOfMetrics["grpc.client.call.duration"]; ok { | ||
ccs, err := meter.Float64Histogram("grpc.client.call.duration", metric.WithUnit("s"), metric.WithDescription("This metric aims to measure the end-to-end time the gRPC library takes to complete an RPC from the application’s perspective."), metric.WithExplicitBucketBoundaries(DefaultLatencyBounds...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description - "Time taken by gRPC to complete an RPC from application's perspective"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Hi all. So the proposal only talks about metrics. What about tracing? Would be nice to have both as an officially maintained package. Thank you! |
The gRPC OpenTelemetry Tracing gRFC is in flight: grpc/proposal#389. Eventually, the plan is to add tracing as part of this same module, as part of the client and server stats handler. This is like our OpenCensus instrumentation component, which does both: https://github.com/grpc/grpc-go/blob/master/stats/opencensus/opencensus.go. :) |
This is great! Thank you for the hard work! |
One more question: is the plan to replace https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/instrumentation/google.golang.org/grpc/otelgrpc? Two implementations would not benefit anyone, clearly. |
stats/opentelemetry/opentelemetry.go
Outdated
|
||
var joinDialOptions = internal.JoinDialOptions.(func(...grpc.DialOption) grpc.DialOption) | ||
|
||
// MetricName is a name of a metric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: MetricName is an identifier for a metric provided by this package.
or something?
Also peeking at the godoc might help you see things from your users' eyes:
$ ~/go/bin/godoc
# open browser at http://localhost:6060
Maybe just Metric
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to Metric. Switched docstring.
const ( | ||
// ClientAttemptStartedName is the name of the client attempt started | ||
// metric. | ||
ClientAttemptStartedName = MetricName("grpc.client.attempt.started") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: These will look slightly better as ClientAttemptStartedName MetricName = "grpc.client.attempt.started"
(which mimics how time.Duration
s are declared).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Yeah that looks nicer.
stats/opentelemetry/opentelemetry.go
Outdated
metrics map[MetricName]bool | ||
} | ||
|
||
// NewMetrics returns a Metrics with the metrics provided specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"provided specified" -> please fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't know what to say here really for this docstring, so I said "provided" which become "specified". I'll just drop the specified part and switch to "NewMetrics returns a Metrics containing the MetricName's provided".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch to "NewMetrics returns a Metrics containing the MetricName's provided".
That's fine, but note that adding 's
is not how we make things plural. MetricNames
please. (Or with markdown, MetricName
s is preferred but godoc doesn't support that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment also talked about below, this switched to "NewMetrics returns a Metrics containing Metrics".
const ( | ||
// ClientAttemptStartedName is the name of the client attempt started | ||
// metric. | ||
ClientAttemptStartedName = MetricName("grpc.client.attempt.started") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Name// please. Note that several of your defined metrics don't have the Name
suffix already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// ClientAttemptStartedName is the name of the client attempt started | ||
// metric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of saying "BlahBlahBlah is the name of the blah blah blah metric" which gives no information, specify what the metric actually measures, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Pulled Yashes consice description strings to these docstrings. @yashykt you should add these description strings to gRFC to officially spec them :).
stats/opentelemetry/opentelemetry.go
Outdated
// Metrics are the metrics to instrument. Will turn on the corresponding | ||
// metric supported by the client and server instrumentation components if | ||
// applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "Will turn on ________" mean?
Also: "If not set, the default metrics will be recorded"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added If not set, the default metrics will be recorded. Switched language to "will create instrument and record telemetry".
// Metrics are the metrics to instrument. Will turn on the corresponding | ||
// metric supported by the client and server instrumentation components if | ||
// applicable. | ||
Metrics *Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a user sets an unsupported metric? (E.g. if they specify server metrics when they initialize the client)? Ideally DialOption
would error??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remembering discussing this with Yash, and I think the decision was just to leave this as a no-op in this scenario (I.e. just ignore the metrics). Right @yashykt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, no-op is what i'm thinking. Consider the case of experimental metrics. There might be a case where we decide to remove experimental metrics and we don't want removal of metrics to break user code (if they had enabled the experimental metrics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds good to me. Let me know what you think Doug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, consider the case where they're initializing a DialOption
and accidentally passing the default server metrics object. I could easily see this happening, as all the types line up and everything. In this scenario, they'll get no metrics at all.
If we remove an experimental metric, maybe the user wants an init-time error so they know it was removed?
And in our case, that's what will happen anyway, unless we leave the const behind for the removed metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're calling the metrics in A66 stable, which is why we have default + exported consts. I think the goal was the defaults now are stable (i.e. ones defined in A66).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a user set Server Metrics for Dial Option :)? I guess the tradeoff is the scenario you mentioned earlier vs. being defensive now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By accident. Spot the bug:
opentelemetry.DialOption(opentelemetry.Options{
LoggingOptions: ...,
MetricsOptions: opentelemetry.MetricsOptions{
MeterProvider: blahblahblah,
Metrics: opentelemetry.DefaultServerMetrics,
TargetAttributeFilter: func(t target) bool {
return allowedTargets[t]
},
},
TracingOptions: ...,
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's separate the metrics field to client and server metrics then for type safety? not entirely type safe but at least the user knows whether setting for client or sever and not tied to DialOption/ServerOption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline; decided to make it DefaultMetrics that contains both client and server, which will be ignored on the other side to the initialized option.
stats/opentelemetry/opentelemetry.go
Outdated
// Users of this component should use these bucket boundaries as part of their | ||
// SDK MeterProvider passed in. This component sends this as "advice" to the | ||
// API, which works, however this stability is not guaranteed, so for safety the | ||
// SDK Meter Provider should set these bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this does show up in godoc. Please check out the godoc view.
Also, please remove duplication. We don't want the same "Users of this component should..." message appearing 3x.
stats/opentelemetry/opentelemetry.go
Outdated
// EmptyMetrics represents no metrics. To start from a clean slate if want to | ||
// pick a subset of metrics, use this and add onto it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping - it will be nice to show examples of initializing: with metrics disabled, with 1-2 metrics enabled, with the default set of metrics enabled, and with 1-2 metrics disabled.
stats/opentelemetry/e2e_test.go
Outdated
func ExampleMetric_Remove() { | ||
// Use DefaultClientMetrics without ClientAttemptDuration and | ||
// ClientCallDurationName. | ||
DefaultClientMetrics.Remove(ClientAttemptDuration, ClientCallDurationName) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these would be more useful on the DialOption / ServerOption, or Options, and should show how to build the DialOption. Just one line of code isn't super useful.
E.g. something like:
func ExampleDialOption_ExcludeMetrics() {
// To exclude specific metrics, initialize Options as follows:
otelOpts := opentelemetry.Option{
MetricsOptions: MetricsOptions{
Metrics: DefaultClientMetrics.Remove(opentelemetry.ClientAttemptDuration,
},
}
dialOption := opentelemetry.DialOption(otelOpts)
cc, _ := grpc.Dial("<target string>", dialOption)
// Handle err.
defer cc.Close()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't the examples technically have to be linked to a single exported function from the package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there are many ways of attaching examples at different places in the package. Check the testing documentation for more details: https://pkg.go.dev/testing#hdr-Examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added numerous examples.
stats/opentelemetry/opentelemetry.go
Outdated
// NewMetrics returns a Metrics with the metrics provided specified. | ||
func NewMetrics(metrics ...MetricName) *Metrics { | ||
newMetrics := make(map[MetricName]bool) | ||
// NewMetrics returns a Metrics containing the Metric's provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another choice is to say "containing metrics." - directly referencing the parameter's name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, switched.
// DefaultLatencyBounds are the default bounds for latency metrics. Users of | ||
// this component should set these bucket boundaries as part of their SDK | ||
// MeterProvider passed in for desired latency metrics. | ||
// DefaultLatencyBounds are the default bounds for latency metrics. | ||
DefaultLatencyBounds = []float64{0, 0.00001, 0.00005, 0.0001, 0.0003, 0.0006, 0.0008, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.008, 0.01, 0.013, 0.016, 0.02, 0.025, 0.03, 0.04, 0.05, 0.065, 0.08, 0.1, 0.13, 0.16, 0.2, 0.25, 0.3, 0.4, 0.5, 0.65, 0.8, 1, 2, 5, 10, 20, 50, 100} // provide "advice" through API, SDK should set this too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be nice to show users how to do this, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as part of example. Verified it worked by playing around with example configuration wrt bounds and seeing the result.
ClientAttemptRcvdCompressedTotalMessageSize Metric = "grpc.client.attempt.rcvd_total_compressed_message_size" | ||
// ClientCallDurationName is the time taken by gRPC to complete an RPC from | ||
// application's perspective. | ||
ClientCallDurationName Metric = "grpc.client.call.duration" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is still named Name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, whoops. Removed.
Sorry for the delay on the example, it's pretty complicated. I need to set bounds and instruments, and it also seems like it needs to output something so should I set up a whole RPC system? That's the only way the example can really "do" something. Showing users how to use default bounds links in MeterProvider SDK, and a reader which to show how to use I need to do a full RPC. And then each knob you want to show how it works (i.e. add/delete metrics is a whole other contained example that needs to spin up a server and do an rpc etc.). Is the e2e test with Defaults explicitly set and a manual reader set/read from an asserted on a good enough example? I'm fine adding an example, but to have it do anything (esp with default bounds) feels incredibly verbose compared to the examples I've seen. If you think it's not too verbose, what would be scope of your example? Can you review the e2e tests while you're at it while I'm writing example? |
The example doesn't have to be a complete system. Just show the code users need to do a thing. An example showing how users set the bounds on the meter should be a couple function calls in theory? They don't need to actually do anything. |
Added numerous examples and got to all comments from last pass. Examples and tests are ready to be looked at. |
Sorry, can you create a new PR for this and squash your commits? There are 180+ comments and github is getting bogged down. Thanks! |
Done in #7166. |
This PR implements the gRPC-Go instrumentation component for OpenTelemetry, as outlined in https://github.com/grpc/proposal/blob/master/A66-otel-stats.md.
RELEASE NOTES: