Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A66: OpenTelemetry Metrics #380

Merged
merged 31 commits into from
Sep 28, 2023
Merged
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
3eae932
OTel metrics proposal
yashykt Jul 21, 2023
7dd3b2a
mdformat
yashykt Jul 21, 2023
6f67deb
Add link
yashykt Jul 21, 2023
7730c4e
Add discussion thread
yashykt Jul 21, 2023
dccfec3
Fixes
yashykt Jul 21, 2023
58ed163
Resolve some reviewer comments
yashykt Jul 21, 2023
be41d1d
More details
yashykt Aug 24, 2023
c05f4ed
Sample implementation
yashykt Aug 24, 2023
f66acdf
Canonicalized name
yashykt Aug 25, 2023
3135189
C++ API changes and target considerations
yashykt Aug 28, 2023
e8ac550
Fill out CallTracer details
yashykt Sep 8, 2023
353c884
added Java API for OpenTelemetry metrics
DNVindhya Sep 11, 2023
f0f935d
Merge pull request #1 from DNVindhya/yashykt-otel-javaapi
yashykt Sep 12, 2023
6fab0c1
Reviewer comments
yashykt Sep 18, 2023
9227f9b
Reviewer comments
yashykt Sep 18, 2023
185358b
Reviewer comments
yashykt Sep 18, 2023
e349577
Reviewer comments
yashykt Sep 18, 2023
7190fdc
Reviewer comments
yashykt Sep 18, 2023
7cb1644
Reviewer comments
yashykt Sep 22, 2023
906e5db
Reviewer comments
yashykt Sep 25, 2023
0c94c64
Add Python API
yashykt Sep 25, 2023
082d554
Reviewer comments
yashykt Sep 25, 2023
7663a4e
OTel Plugin Arch details for Java and GO
yashykt Sep 26, 2023
b0c22c7
Fix hyperlink
yashykt Sep 26, 2023
8751cf1
Reviewer comments
yashykt Sep 27, 2023
368f45b
Update go API doc
yashykt Sep 27, 2023
41d9e5c
Java API doc
yashykt Sep 27, 2023
8045b22
Reviewer comments
yashykt Sep 27, 2023
6dcb1b9
s/OTel/OpenTelemetry
yashykt Sep 28, 2023
d24ba7e
Reviewer comments
yashykt Sep 28, 2023
9b1bcd8
Moving gRFC to Final status
yashykt Sep 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions A66-otel-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ spec. The OpenTelemetry API has added an experimental feature for
[advice](https://opentelemetry.io/docs/specs/otel/metrics/api/#instrument-advice)
that would allow the gRPC library to provide these buckets as a hint. Since this
is still an experimental feature and not yet implemented in all languages, it is
upto the user to choose the right bucket boundaries.
upto the user to choose the right bucket boundaries and set it through the
yashykt marked this conversation as resolved.
Show resolved Hide resolved
[OTel SDK](https://opentelemetry.io/docs/specs/otel/metrics/sdk/#view).

Also note that, as per an
yashykt marked this conversation as resolved.
Show resolved Hide resolved
[OpenTelemetry proposal on stability](https://docs.google.com/document/d/1Nvcf1wio7nDUVcrXxVUN_f8MNmcs0OzVAZLvlth1lYY/edit#heading=h.dy1cg9doaq26)
Expand All @@ -69,10 +70,12 @@ the scale to better fit the data.
#### Attributes
yashykt marked this conversation as resolved.
Show resolved Hide resolved

* `grpc.method` : Full gRPC method name, including package, service and
method, e.g. "google.bigtable.v2.Bigtable/CheckAndMutateRow". Note that some
gRPC implementations allow server to handle generic method names, i.e., not
registering method names in advance with the server. This allows clients to
send arbitrary method names that could potentially open up the server to
method, e.g. "google.bigtable.v2.Bigtable/CheckAndMutateRow". Note that gRPC
servers can receive arbitrary method names, i.e., method names that have not
been registered in advance with the server. This normally results in those
RPCs being rejected with an UNIMPLEMENTED status. Some gRPC implementations
allow servers to handle such generic method names. Since the stats plugin
would be recording all of these RPCs, this could open up the server to
malicious attacks that result in metrics being stored with a high
cardinality. To prevent this, unregistered/generic method names should by
default be reported with "other" value instead. Implementations should
Expand All @@ -96,19 +99,22 @@ the scale to better fit the data.
The total number of RPC attempts started, including those that have not completed. <br>
*Attributes*: grpc.method, grpc.target <br>
*Type*: Counter <br>
*Unit*: {attempt} <br>
*Unit*: `{attempt}` <br>
* **grpc.client.attempt.duration** <br>
End-to-end time taken to complete an RPC attempt including the time it takes to pick a subchannel. <br>
*Attributes*: grpc.method, grpc.target, grpc.status <br>
*Type*: Histogram (Latency Buckets) <br>
*Unit*: `s` <br>
* **grpc.client.attempt.sent_total_compressed_message_size** <br>
Total bytes (compressed but not encrypted) sent across all request messages (metadata excluded) per RPC attempt; does not include grpc or transport framing bytes. <br>
Attributes: grpc.method, grpc.target, grpc.status <br>
Type: Histogram (Size Buckets) <br>
*Unit*: `By` <br>
* **grpc.client.attempt.rcvd_total_compressed_message_size** <br>
yashykt marked this conversation as resolved.
Show resolved Hide resolved
Total bytes (compressed but not encrypted) received across all response messages (metadata excluded) per RPC attempt; does not include grpc or transport framing bytes. <br>
*Attributes*: grpc.method, grpc.target, grpc.status <br>
*Type*: Histogram (Size Buckets) <br>
*Unit*: `By` <br>

#### Client Per-Call Instruments

Expand All @@ -119,6 +125,7 @@ the scale to better fit the data.
If the implementation uses an interceptor then the exact start and end timestamps would depend on the ordering of the interceptors. Non-interceptor implementations should record the timestamps as close as possible to the top of the gRPC stack, i.e., payload serialization should be included in the measurement. <br>
*Attributes*: grpc.method, grpc.target, grpc.status <br>
*Type*: Histogram (Latency Buckets) <br>
*Unit*: `s` <br>

#### Server Instruments

Expand All @@ -131,16 +138,19 @@ the scale to better fit the data.
Total bytes (compressed but not encrypted) sent across all response messages (metadata excluded) per RPC; does not include grpc or transport framing bytes. <br>
*Attributes*: grpc.method, grpc.status <br>
*Type*: Histogram (Size Buckets) <br>
*Unit*: `By` <br>
* **grpc.server.call.rcvd_total_compressed_message_size** <br>
Total bytes (compressed but not encrypted) received across all request messages (metadata excluded) per RPC; does not include grpc or transport framing bytes. <br>
*Attributes*: grpc.method, grpc.status <br>
*Type*: Histogram (Size Buckets) <br>
*Unit*: `By` <br>
* **grpc.server.call.duration** <br>
This metric aims to measure the end2end time an RPC takes from the server transport’s (HTTP2/ inproc / cronet) perspective. <br>
ejona86 marked this conversation as resolved.
Show resolved Hide resolved
Start timestamp - After the transport knows that it's got a new stream. For HTTP2, this would be after the first header frame for the stream has been received and decoded. Whether the timestamp is recorded before or after HPACK is left to the implementation. <br>
End timestamp - Ends at the first point where the transport considers the stream done. For HTTP2, this would be when scheduling a trailing header with END_STREAM to be written, or RST_STREAM, or a connection abort. Note that this wouldn’t necessarily mean that the bytes have also been immediately scheduled to be written by TCP. <br>
*Attributes*: grpc.method, grpc.status <br>
*Type*: Histogram (Latency Buckets) <br>
*Unit*: `s` <br>

### OpenTelemetry Plugin Architecture

Expand All @@ -157,8 +167,8 @@ attempt, and the `CallAttemptTracer` gets invoked during the lifetime of the
attempt. On the server-side, we have an equivalent `ServerCallTracer`. (There is
no concept of an attempt on the server-side.)

The OTel plugin will basically be a way of configuring CallTracer factories on
gRPC channels and servers.
The OTel plugin will configure CallTracer factories on gRPC channels and
servers.

A CallTracer needs to know the channel's target in the canonical form, and the
full qualified method name for filling in the attributes needed on the metrics.
yashykt marked this conversation as resolved.
Show resolved Hide resolved
Expand Down