Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tracing service and configuration #12699

Merged
merged 7 commits into from
May 26, 2022
Merged

Conversation

rosstimothy
Copy link
Contributor

Provides a new tracing configuration block, which can be
used to configure if and how spans are exported to a
telemetry backend. In the example below, the tracing
service is enabled and will export spans to
collector.example.com:4317 via gRPC with mTLS enabled.

tracing_service:
  enabled: yes
  exporter_url: collector.example.com:4317
  sampling_rate_per_million: 1000000
  ca_certs:
    - /certs/rootCA.pem
  keypairs:
    - key_file:  /certs/example.com-client-key.pem
      cert_file: /certs/example.com-client.pem

This configuration ends up being consumed by the TeleportProcess
and passed to tracing.NewTraceProvider which sets up the OpenTelemetry
Exporter, TracerProvider, Propagator and Sampler. In order for spans to
be exported, the tracing_service must be enabled and have a
sampling_rate_per_million value > 0.

#12241

@rosstimothy rosstimothy force-pushed the tross/tracing_config branch 2 times, most recently from 5d81950 to 8277251 Compare May 17, 2022 16:22
@rosstimothy rosstimothy marked this pull request as ready for review May 17, 2022 17:14
lib/config/configuration.go Outdated Show resolved Hide resolved
lib/config/configuration.go Outdated Show resolved Hide resolved
lib/observability/tracing/tracing.go Outdated Show resolved Hide resolved
return trace.BadParameter("exporter URL cannot be empty")
}

if !strings.Contains(c.ExporterURL, "://") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a proper URL parse and then a check on Scheme here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially tried doing this; however url.Parse("localhost:4317") will result in a url with the Scheme set to localhost (https://go.dev/play/p/VoYSAAhscPQ). Open to other ideas if you have them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could either try to parse it as an address first and then as a URL on error, or we could just always require a URL, and whatever error happens ("unsupported schema localhost:") is the user's fault.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here was to have an empty scheme default to grpc. This seems to be how most other tracing exporters operate. Forcing users to always provide one of grpc, http, https would allow us to just do a url.Parse here but might be annoying to users. I suppose that would be fine as long as I add that grpc:// is required in the yet to exist tracing docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To identify that no Scheme was provided, we could check if the Host is empty after parsing the URL (https://go.dev/play/p/K_m82MHG1OO). As the documentation states:

URLs that do not start with a slash after the scheme are interpreted as:
scheme:opaque[?query][#fragment]

After knowing no scheme was provided, having a default scheme is valuable if it is the most used, but it should also be documented.

lib/observability/tracing/tracing.go Outdated Show resolved Hide resolved
lib/observability/tracing/tracing.go Show resolved Hide resolved
lib/observability/tracing/tracing.go Outdated Show resolved Hide resolved
sdktrace.WithResource(res),
sdktrace.WithSpanProcessor(sdktrace.NewBatchSpanProcessor(exporter)),
)}
otel.SetTracerProvider(provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens during tests or during a CA rotation, when the global provider might get overwritten?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good question. I will test this and possibly add a check to see if the global provider is already configured correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases the global provider should only be used at creation time to provide a component specific tracer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perhaps run the tracing configuration earlier than other components, and store this as a variable in TeleportProcess rather than storing it globally? Or is otel really only supposed to work with a single global exporter for the whole program?

Copy link
Contributor Author

@rosstimothy rosstimothy May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do both. The default global provider is a noop which won't record any spans. I'm explicitly setting it here so that in the event a provider can't be provided as a dependency we can fallback to retrieving it from the global provider. Though I hope we can avoid using the global provider as much as possible

lib/service/service.go Show resolved Hide resolved
lib/service/service.go Outdated Show resolved Hide resolved
@rosstimothy rosstimothy force-pushed the tross/tracing_config branch 2 times, most recently from e90bff2 to 69376c7 Compare May 23, 2022 20:04
lib/config/configuration.go Outdated Show resolved Hide resolved
lib/config/configuration.go Outdated Show resolved Hide resolved
return trace.BadParameter("exporter URL cannot be empty")
}

if !strings.Contains(c.ExporterURL, "://") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could either try to parse it as an address first and then as a URL on error, or we could just always require a URL, and whatever error happens ("unsupported schema localhost:") is the user's fault.

lib/observability/tracing/tracing.go Outdated Show resolved Hide resolved
sdktrace.WithResource(res),
sdktrace.WithSpanProcessor(sdktrace.NewBatchSpanProcessor(exporter)),
)}
otel.SetTracerProvider(provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perhaps run the tracing configuration earlier than other components, and store this as a variable in TeleportProcess rather than storing it globally? Or is otel really only supposed to work with a single global exporter for the whole program?

lib/service/cfg.go Outdated Show resolved Hide resolved
Comment on lines 974 to 975
// KeyPairs are the key and certificate pairs that the tracing service will.
// use for mTLS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// KeyPairs are the key and certificate pairs that the tracing service will.
// use for mTLS.
// KeyPairs are the paths for key and certificate pairs that the tracing service will use for outbound TLS connections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we ask Cloud if they'd like a way to specify these inline in the configuration file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud actually designed and implemented the metrics service and since this is a direct copy of that I don't think they will have any objections. I can ping them to be certain though

log.Info("Shutting down immediately.")
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
warnOnErr(provider.Shutdown(ctx), log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the scram shutdown (on SIGINT/SIGTERM) will not wait for services to shutdown, so it's very likely that we won't have the full 5 seconds to perform the provider shutdown - is that going to be a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any queued spans which haven't been flushed yet will be lost

Provides a new tracing configuration block, which can be
used to configure if and how spans are exported to a
telemetry backend. In the example below, the tracing
service is enabled and will export spans to
`collector.example.com:4317` via gRPC with mTLS enabled.

```yaml
tracing_service:
  enabled: yes
  exporter_url: collector.example.com:4317
  sampling_rate_per_million: 1000000
  ca_certs:
    - /certs/rootCA.pem
  keypairs:
    - key_file:  /certs/example.com-client-key.pem
      cert_file: /certs/example.com-client.pem
```

This configuration ends up being consumed by the `TeleportProcess`
and passed to `tracing.NewTraceProvider` which sets up the OpenTelemetry
Exporter, TracerProvider, Propagator and Sampler. In order for spans to
be exported, the `tracing_service` must be enabled **and** have a
`sampling_rate_per_million` value > 0.
@rosstimothy rosstimothy force-pushed the tross/tracing_config branch from 69376c7 to b8332b7 Compare May 25, 2022 15:29
@rosstimothy rosstimothy enabled auto-merge (squash) May 26, 2022 22:35
@rosstimothy rosstimothy merged commit 5b4a18b into master May 26, 2022
rosstimothy added a commit that referenced this pull request Jul 5, 2022
* Add tracing service and configuration

Provides a new tracing configuration block, which can be
used to configure if and how spans are exported to a
telemetry backend. In the example below, the tracing
service is enabled and will export spans to
`collector.example.com:4317` via gRPC with mTLS enabled.

```yaml
tracing_service:
  enabled: yes
  exporter_url: collector.example.com:4317
  sampling_rate_per_million: 1000000
  ca_certs:
    - /certs/rootCA.pem
  keypairs:
    - key_file:  /certs/example.com-client-key.pem
      cert_file: /certs/example.com-client.pem
```

This configuration ends up being consumed by the `TeleportProcess`
and passed to `tracing.NewTraceProvider` which sets up the OpenTelemetry
Exporter, TracerProvider, Propagator and Sampler. In order for spans to
be exported, the `tracing_service` must be enabled **and** have a
`sampling_rate_per_million` value > 0.
hydridity pushed a commit to hydridity/teleport that referenced this pull request Aug 26, 2022
Add tracing service and configuration (gravitational#12699)

* Add tracing service and configuration

Provides a new tracing configuration block, which can be
used to configure if and how spans are exported to a
telemetry backend. In the example below, the tracing
service is enabled and will export spans to
`collector.example.com:4317` via gRPC with mTLS enabled.

```yaml
tracing_service:
  enabled: yes
  exporter_url: collector.example.com:4317
  sampling_rate_per_million: 1000000
  ca_certs:
    - /certs/rootCA.pem
  keypairs:
    - key_file:  /certs/example.com-client-key.pem
      cert_file: /certs/example.com-client.pem
```

This configuration ends up being consumed by the `TeleportProcess`
and passed to `tracing.NewTraceProvider` which sets up the OpenTelemetry
Exporter, TracerProvider, Propagator and Sampler. In order for spans to
be exported, the `tracing_service` must be enabled **and** have a
`sampling_rate_per_million` value > 0.
@rosstimothy rosstimothy deleted the tross/tracing_config branch January 5, 2023 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants