-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add protocol for dynamic configuration service #155
[WIP] Add protocol for dynamic configuration service #155
Conversation
I think we will probably want to coordinate this with the Metric Views API. This looks good as a start, for configuring collection periods, but I think people are even more interested in configuring the labels and aggregations used. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jmacd that tighter integration with views API would be beneficial. Left a few comments. Indeed a good start.
@open-telemetry/technical-committee should this be developed as another experimental feature?
3. `metrics` package contains the Metrics Service protos. | ||
1. `trace` package contains the Trace Service protos. | ||
2. `metrics` package contains the Metrics Service protos. | ||
3. `dynamicconfig`package contains the Dynamic Config Service protos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you adding a note here, also add a it to the maturity matrix: https://github.com/open-telemetry/opentelemetry-proto#maturity-level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
message ConfigRequest{ | ||
// Describes the version of the config service protocol to use. Required. | ||
int32 proto_version = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proto_version
may be misunderstood as a proto2 or proto3. Perhaps a better name can be used here.
int32 proto_version = 1; | |
int32 config_format_version = 1; |
|
||
message ConfigRequest{ | ||
// Describes the version of the config service protocol to use. Required. | ||
int32 proto_version = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would uint32
be a better type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this be used? Protocol Buffers by nature handle versioning, and if a drastic version change was needed you'd probably use a v2
package instead. If it's for the opaque metadata, it would be better to have a struct to contains metadata and metadata_format_version, but it also seems reasonable to leave that versioning to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point, proto_version removed
// whether an update to the configs had occurred -- if the client request's | ||
// fingerprint differs from the server's, then the server knows that a change | ||
// had occurred and an updated configuration should be sent. | ||
string last_fingerprint = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also consider adding some error reporting from SDK to collector when it failed to apply the config?
|
||
// Optional. The client is suggested to wait this long (in seconds) before | ||
// pinging the configuration service again. | ||
int32 suggested_wait_time = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32 suggested_wait_time = 5; | |
int32 suggested_wait_time_sec = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// whether an update to the configs had occurred -- if the client request's | ||
// fingerprint differs from the server's, then the server knows that a change | ||
// had occurred and an updated configuration should be sent. | ||
string last_fingerprint = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string last_fingerprint = 3; | |
string last_known_fingerprint = 3; |
just a suggestion. I think it make the name more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
// Describes the version of the config service protocol to use. Required. | ||
int32 proto_version = 1; | ||
|
||
// The target resource. This resource pings the config server requesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required or optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is ConfigRequest
, target is confusing, since target is something you usually send a request to. So maybe
// The resource for which configuration should be returned.
Also don't think we have to talk about pings / responses since those are implementation details and not part of the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, changed
// protocol, but still desire to communicate these settings to | ||
// instrumented applications. An application may in turn piggyback | ||
// metadata on a vendor's metric exporter to communicate information back | ||
// to its metric backend. In this way, metadata offers a channel to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how exporter will know this metadata?
} | ||
|
||
// Metrics with names that match a rule in the inclusion_patterns are | ||
// targeted by this schedule. Metrics that match the exclusion_patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about labels matching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It'll be cool to have a configuration service if we can work out good semantics that apply to many systems. For reference, here's X-Ray's tracing sapling rules. They're fairly simple and probably similar to others, giving the link in case it provides information about filling out the trace config section
https://docs.aws.amazon.com/xray/latest/devguide/xray-api-sampling.html
|
||
message ConfigRequest{ | ||
// Describes the version of the config service protocol to use. Required. | ||
int32 proto_version = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this be used? Protocol Buffers by nature handle versioning, and if a drastic version change was needed you'd probably use a v2
package instead. If it's for the opaque metadata, it would be better to have a struct to contains metadata and metadata_format_version, but it also seems reasonable to leave that versioning to the user.
// Describes the version of the config service protocol to use. Required. | ||
int32 proto_version = 1; | ||
|
||
// The target resource. This resource pings the config server requesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is ConfigRequest
, target is confusing, since target is something you usually send a request to. So maybe
// The resource for which configuration should be returned.
Also don't think we have to talk about pings / responses since those are implementation details and not part of the protocol.
// updated configs, and the server responds with the config information. | ||
opentelemetry.proto.resource.v1.Resource resource = 2; | ||
|
||
// Optional. Identifies the last configuration fingerprint / version number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Optional. The value of ConfigResponse.fingerprint for the last configuration a resource received.
And don't think we need details about how the fingerprint is used in this protocol, it's up to the server whether it uses it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
message ConfigResponse { | ||
// Describes the version of the config service protocol to use. Should match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
// the request’s version. | ||
int32 proto_version = 1; | ||
|
||
// Optional. The fingerprint associated with the current config settings. Each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fingerprint associated with this ConfigResponse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// change in configs yields a different fingerprint. The client may cache a | ||
// fingerprint, then return it on the next request to enable the server to | ||
// quickly check whether a change in the configs occurred. If no changes | ||
// occurred, then all other fields in the response are optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear, especially with the unimplemented blocks, whether it's trivial for a client to understand the difference between a response containing no fields because it didn't change or containing default fields. How about adding a boolean indicating whether this response is valid or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I follow. The original intention was that should nothing change, then the response will have only fingerprint
, which equals the client's last_known_fingerprint
, by which the client knows that no change happened. Does that sound like a practical outcome? I'll also update the comments to make the process more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a fingerprint is like a hash I don't think comparison with a previous value can guarantee nothing has changed due to hash collisions - it's more a hint which is more useful in the server which actually can compute the fingerprints.
So I think here for the client to know no change happened, it seems it needs to check the actual content. This would probably be many hasX
checks using the generated code for proto and might work but may be indirect.
Another thought is perhaps this fingerprint is overoptimization, at least for now so maybe ok to remove for now. The configs don't seem so large and comparing fingerprints (etag + 204 responses in the http world) may not be worth it in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points, thanks for the quick response! I'll bring it up with the rest of the group, and we'll update our proposal (which will hopefully be made available soon)
// occurred, then all other fields in the response are optional. | ||
string fingerprint = 2; | ||
|
||
// Dynamic configs specific to metrics, namely the metric collection schedule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might add different configs later, don't think there's value in restricting to , namely the ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
// adopts the configuration specified by the schedule. | ||
message Schedule { | ||
|
||
// Pattern dictates a light-weight pattern that can match 1 or more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A light-weight pattern that can...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
repeated Pattern exclusion_patterns = 2; | ||
|
||
// CollectionPeriod describes the sampling period for each metric. All | ||
// larger units are divisible by all smaller ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some background on why this should be an enum and not just arbitrary uint32
would make it clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intent was to enforce the property that all larger units be divisible by smaller ones (which makes the implementation much cleaner), though we're thinking of altering this -- for example in case a vendor has an extremely good reason to use a collection period of, say, 2 days
// * Using alternate representations for collection schedules, matching | ||
// metrics, resources, etc. | ||
// * Enabling other optimizations | ||
string metadata = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes
is better for opaque data, easy to store a string in bytes field, impossible to store arbitrary binary in a string field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea, changed
This is an interesting proposal that I think warrants an OTEP where it can be discussed. The attached proposal in PDF format is not suitable for commenting. I would like to see a few things addressed that I don't see in the proposal:
I think it is best to discuss these as part of an OTEP before discussing the protocol since protocol design/schema will be likely driven by the choices made in that OTEP. |
Thanks for all the comments! Really appreciate the interest. We'll finalize the proposal and open an OTEP as soon as possible |
Changes:
Goal
We suggest a new protocol for a dynamic configuration service that enables updating metric collection schedules, trace sampling parameters, and other configurations without having to restart an instrumented application. An early draft of the proposal is attached:
OpenTelemetry Configuration Service.pdf
We'd love to hear your thoughts! An example implementation is currently underway, implemented as an extension on the collector and modifications to the Go SDK. Our goal is to deliver a functioning prototype in the upcoming weeks to give an example of what this system may look like.
About
We're an intern group at Google. Members include: