-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add jaeger remote sampler to SDK configuration #1791
Add jaeger remote sampler to SDK configuration #1791
Conversation
|
||
Depending on the value of `OTEL_TRACES_SAMPLER`, `OTEL_TRACES_SAMPLER_ARG` may be set as follows: | ||
|
||
- For `traceidratio` and `parentbased_traceidratio` samplers: Sampling probability, a number in the [0..1] range, e.g. "0.25". Default is 1.0 if unset. | ||
- For `jaeger_remote_sampler`: The value is a comma separated list of: pooling interval `poolingIntervalMs` in milliseconds and initial sampling rate `initialSamplingRate` in the [0..1] range. For example: `poolingIntervalMs=100,initialSamplingRate=0.25`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from this the remote sampler needs 2 configuration options:
- service name - it will (re)use
otel.service.name
- collector endpoint - it will (re)use
otel.exporter.jaeger.endpoint
|
||
Depending on the value of `OTEL_TRACES_SAMPLER`, `OTEL_TRACES_SAMPLER_ARG` may be set as follows: | ||
|
||
- For `traceidratio` and `parentbased_traceidratio` samplers: Sampling probability, a number in the [0..1] range, e.g. "0.25". Default is 1.0 if unset. | ||
- For `jaeger_remote_sampler`: The value is a comma separated list of: pooling interval `poolingIntervalMs` in milliseconds and initial sampling rate `initialSamplingRate` in the [0..1] range. For example: `poolingIntervalMs=100,initialSamplingRate=0.25`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use initial_sampler
and initial_sampler_arg
instead of (or possibly in addition to) just a rate. The intent is to use a different Otel Sampler as initial, not to restrict it I think. This is reflected in how the code API accepts Sampler
as the argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the API in OTEL might support this but let me see if it is supported in Jaeger. The main use-case is to provide a migration path for Jaeger users.
In the long term there will be a native OTEL remote sampling implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Jaeger Java SDK the default sampler type in environment config is always percentage-based https://github.com/jaegertracing/jaeger-client-java/blob/568ab68ec2ac5e9e811b1d0d76f752f3242d45d3/jaeger-core/src/main/java/io/jaegertracing/Configuration.java#L394
The programmatic API allows to supply any sampler like in OTEL, but not via env variables.
@yurishkuro please loop in here. Do we need to set the initial sampler to rate limiting type? Was this something users were using? We didn't define environment configuration for it.
|
||
Depending on the value of `OTEL_TRACES_SAMPLER`, `OTEL_TRACES_SAMPLER_ARG` may be set as follows: | ||
|
||
- For `traceidratio` and `parentbased_traceidratio` samplers: Sampling probability, a number in the [0..1] range, e.g. "0.25". Default is 1.0 if unset. | ||
- For `jaeger_remote`: The value is a comma separated list of: pooling interval `poolingIntervalMs` in milliseconds and initial sampling rate `initialSamplingRate` in the [0..1] range. For example: `poolingIntervalMs=100,initialSamplingRate=0.25`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will (re)use
otel.exporter.jaeger.endpoint
Why not have a separate setting like server=
? Reusing otel.exporter.jaeger.endpoint
is both confusing (what does exporter have to do with sampler?) and inflexible, as it would couple exporter and sampler, even though someone might want to have them at different addresses.
Also, is this a gRPC or HTTP address?
poolingIntervalMs=100
It's polling, not pooling.
I would use a different, more realistic example value than 100ms, like 10000. Does this even need to be in ms (is that the convention for other vars?), or can it be in seconds?
`initialSamplingRate=0.25
I would rename initialSamplingRate to defaultSamplingRate.
And I would add clear documentation of the sub-properties:
For jaeger_remote
: The value is a comma separated list of:
pollingIntervalMs
in milliseconds indicating how often the sampler will poll the backend for updates to sampling strategydefaultSamplingRate
in the [0..1] range, which is used as the sampling probability when the backend cannot be reached to retrieve a sampling strategy. This value stops having an effect once a sampling strategy is retrieved successfully, as the remote strategy will be used until a new update is retrieved.server
(name tbd) - the address of gRPC (or http?) server that will serve the sampling strategy for the service (link to schema)- Example:
pollingIntervalMs=10000,defaultSamplingRate=0.25,server=...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about adding the endpoint/address to the arguments. I think it makes sense. 99.99999% users will use the same address as for the exporter, but having it in the argument makes it easy understandable what needs to be configured.
I don't like server
. I will use endpoint
to be consistent with other names in the SDK. The address could be perhaps both HTTP and grpc (like in case of OTLP), this offers flexibility for the SDKs to implement or or the other. I will check if the OTELcol exposes both protocols.
PR updated |
1dcdc6f
to
7242eb1
Compare
|
||
## Batch Span Processor | ||
- For `jaeger_remote`: The value is a comma separated list: | ||
- `endpoint`: the endpoint of gRPC server that serves the sampling strategy for the service ([sampling.proto](https://github.com/jaegertracing/jaeger-idl/blob/master/proto/api_v2/sampling.proto)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the syntax, just host:port? Is TLS supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below is an example of the endpoint. I will add it here to be more obvious...
TLS could be supported by providing a scheme in the endpoint like it is done for OTLP e.g. https://github.com/open-telemetry/opentelemetry-java/blob/43f1ecf7ede49104c2aac1c9e5448b7cf7859ce7/exporters/otlp/trace/src/main/java/io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.java#L131. We could add another option with path to the cert.
@@ -56,12 +56,16 @@ Known values for `OTEL_TRACES_SAMPLER` are: | |||
- `"parentbased_always_on"`: `ParentBased(root=AlwaysOnSampler)` | |||
- `"parentbased_always_off"`: `ParentBased(root=AlwaysOffSampler)` | |||
- `"parentbased_traceidratio"`: `ParentBased(root=TraceIdRatioBased)` | |||
- `"jaeger_remote"`: `JaegerRemoteSampler` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to give one more push for just jaeger
. Doesn't the syntax make most cognitive sense looking something like OTEL_EXPORTER=jaeger OTEL_PROPAGATORS=jaeger OTEL_SAMPLER=jaeger
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for me, I would still advocate for jaeger_remote
. Just jaeger
seems ambiguous to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that rather than calling out a named "jaeger_remote" this should be perhaps "custom" or something similar. The resulting value can still be "JeagerRemoteSampler"
And then the configuration below would be specific to the instance of the custom sampler and would be decoded by the sampler.
So the OTEL_TRACES_SAMPLER_ARG
field would then be <CustomSampler>;<CustomSamplerSpecificArgs>
So an example: JaegerRemoteSampler;endpoint=localhost:14250,pollingIntervalMs=5000,initialSamplingRate=0.25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MSNev this complicates the parsing and still requires the spec to call out valid custom samplers & schemes. I don't see a lot of benefits over the current proposed format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means we don't have to spec every sampler and the sdk developers don't have to make any code changes to support new samplers (beyond creating the sampler) so in the spec we just say use the custom sampler config.
still requires the spec to call out valid custom samplers & schemes
Only if we want them to be officially provided, ie. we don't have to call out "valid custom samplers" we just say you can add a customer sampler using this format with the JaegerRemote Sampler in this case providing an example. This could include a simple section on available/provided customer samplers as I can envisage that all environments won't necessarily want or need all samplers
So another example could just be
OTEL_TRACES_SAMPLER=custom
OTEL_TRACES_SAMPLER_ARG=MyCustomSampler;<MyCustomSamplerSpecificArgs>
So the value after the ';' is simple "passed" to the custom sampler itself to process rather than enforcing every customer sampler to conform to something it doesn't need. eg. There could be a sampler that doesn't take any args.
Available custom samplers
Jaeger Remote
Name: JaegerRemoteSampler
Args: ... as defined ...
Example Usage:
OTEL_TRACES_SAMPLER=custom
OTEL_TRACES_SAMPLER_ARG=JaegerRemoteSampler;endpoint=localhost:14250,pollingIntervalMs=5000,initialSamplingRate=0.25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still requires the spec to call out valid custom samplers & schemes
Only if we want them to be officially provided, ie. we don't have to call out "valid custom samplers"
Well, the whole point of this PR is to have Jaeger's remote sampler officially supported, so how does it benefit from being classified as "custom" yet still being fully spec-ed?
And even for unofficial custom samplers, I still don't see the value in the indirection. If someone wants to use a truly custom sampler that is not in the spec, they can still read its name from OTEL_TRACES_SAMPLER, and the SDK needs to have a configurable factory that could recognize the name and invoke the appropriate builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, #1829 could be a good candidate for OTEL_TRACES_SAMPLER=custom
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
1a043b1
to
92b740c
Compare
@tedsuo could you please review as well? Here is OTEPS for OTEL remote sampler open-telemetry/oteps#167. I would like to get this PR merged to have a clear migration path for Jaeger users. The native OTEL based approach will take a longer time to get it to fully working solution. |
Before merging this work, I have a couple of requests: First, can we get a specification for the current Jaeger remote sampling format and protocol OpenTelemetry is expected to support via a sampling plugin? Either written here as a spec, or linked to in the Jaeger documentation? Second, is it possible to get a commitment from the Jaeger team that future work on remote sampling will be moved over to the OpenTelemetry Sampling SIG? In other words, we'd like to add these sampling mechanisms in order to provide legacy support for existing Jaeger backends, which will allow the Jaeger community to switch over to using the OTel clients. But there are some concerns about having two independent groups working on sampling mechanisms in an uncoordinated fashion. @pavolloffay @yurishkuro @jpkrohling for future sampling work, is the Jaeger community comfortable with specifying those mechanisms as part of an OpenTelemetry SIG? The main concern is that plugin-based sampling affects data collection in a transparent fashion, and could lead to users creating an incoherent configuration which results in data loss or inaccurate analysis. So we want to ensure that all of the sampling features available in OpenTelemetry are designed to be coherent for the broader set of features OpenTelemetry provides, such as metrics and exemplars. Jaeger is the only backend that OpenTelemetry is committed to supporting which is doing work in this field, so I just want to confirm that you all are fine with coordinating on this work going forwards via an OpenTelemetry SIG. |
@tedsuo sgtm |
Green light from me as well. |
In Jaeger community we don't have any plans on enhancing the remote sampler, in fact we want to sunset our clients/SDKs in favor of OTEL. |
The text already has a link to jaeger sampling spec (proto file). Is that enough? |
Signed-off-by: Pavol Loffay <[email protected]>
What is the process to get this merged? It has been approved by a couple of people. I would like to start working on the implementation. |
* Add jaeger remote sampler to SDK configuration Signed-off-by: Pavol Loffay <[email protected]> * Use = Signed-off-by: Pavol Loffay <[email protected]> * add title back Signed-off-by: Pavol Loffay <[email protected]> Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Pavol Loffay [email protected]
Related to open-telemetry/opentelemetry-java#3368
This PR adds SDK configuration for
jaeger_remote_sampler
. The goal is to make the jaeger remote sampler available and configurable in auto-instrumentations.cc) @tedsuo @anuraaga