-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed Tracing #12241
Labels
feature-request
Used for new features in Teleport, improvements to current should be #enhancements
Comments
rosstimothy
added
the
feature-request
Used for new features in Teleport, improvements to current should be #enhancements
label
Apr 26, 2022
Tasks related to RFD 65 - at the end Server-side changes
|
This was referenced May 10, 2022
Merged
This was referenced Jun 6, 2022
rosstimothy
added a commit
that referenced
this issue
Jun 11, 2022
Create spans for all public facing TeleportClient, ProxyClient, and NodeClient methods. This makes correlating spans easier to reason about when looking at `tsh` traces. As a result of creating spans, some additional context propagation is required as well to ensure that spans are linked properly. This also removes the unused `quiet` argument from `ConnectToCluster`. It's usage was not consistent by existing callers, and it was ignored, so in order to avoid confusion in future calls, it was removed. #12241
rosstimothy
added a commit
that referenced
this issue
Jun 11, 2022
Adds a `trace.Tracer` to the `backend.Reporter` wrapper so that all `bakend.Backend` implementations can be traced. Further instrumentation of each specific backend will be added at a later date to see how long each sql query, or call to dynamo/etcd took within each backend operation. #12241
Closed
rosstimothy
added a commit
that referenced
this issue
Jun 24, 2022
Add wrappers for ssh.Client, ssh.Session, ssh.Channel, ssh.ServerConn, and ssh.NewCh that pass tracing context along with all ssh messages. In order to maintain backwards compatibility the ssh.Client wrapper tries to open a TracingChannel when constructed. Any servers that don't support tracing will reject the unknown channel. The client will only provide tracing context to servers which do NOT reject the TracingChannel request. In order to include pass tracing context along all ssh payloads are wrapped in an envelope that includes the original payload AND any trace context. Servers now try to unmarshal all payloads into an envelope when processing messages. If there is an envelope provided, a new span will be created and the original payload will be pass along to handlers. Part of #12241
rosstimothy
added a commit
that referenced
this issue
Jul 5, 2022
Adds a `trace.Tracer` to the `backend.Reporter` wrapper so that all `bakend.Backend` implementations can be traced. Further instrumentation of each specific backend will be added at a later date to see how long each sql query, or call to dynamo/etcd took within each backend operation. #12241
rosstimothy
added a commit
that referenced
this issue
Jul 6, 2022
Add tracing support for ssh global requests and channels. Wrappers for `ssh.Client`, `ssh.Channel`, and `ssh.NewChannel` provide a mechanism for tracing context to be propagated via a `context.Context`. In order to maintain backwards compatibility the ssh.Client wrapper tries to open a TracingChannel when constructed. Any servers that don't support tracing will reject the unknown channel. The client will only provide tracing context to servers which do NOT reject the TracingChannel request. In order to include pass tracing context along all ssh payloads are wrapped in an Envelope that includes the original payload AND any tracing context. Servers now try to unmarshal all payloads into said Envelope when processing messages. If there is an Envelope provided, a new span will be created and the original payload will be pass along to handlers. Part of #12241
Merged
rosstimothy
added a commit
that referenced
this issue
Jul 6, 2022
Add tracing support for ssh global requests and channels. Wrappers for `ssh.Client`, `ssh.Channel`, and `ssh.NewChannel` provide a mechanism for tracing context to be propagated via a `context.Context`. In order to maintain backwards compatibility the ssh.Client wrapper tries to open a TracingChannel when constructed. Any servers that don't support tracing will reject the unknown channel. The client will only provide tracing context to servers which do NOT reject the TracingChannel request. In order to include pass tracing context along all ssh payloads are wrapped in an Envelope that includes the original payload AND any tracing context. Servers now try to unmarshal all payloads into said Envelope when processing messages. If there is an Envelope provided, a new span will be created and the original payload will be pass along to handlers. Part of #12241
rosstimothy
added a commit
that referenced
this issue
Jul 9, 2022
Tracing clients can detect if a server doesn't support tracing by checking for a trace.NotImplented error in response to an UploadTraces request. Since the grpc.Conn used by the client is likely to be bound to that server for the duration of its life it doesn't make sense to keep trying to forward traces. Instead the client now remembers that a server doesn't support tracing and will drop any spans. Part of #12241
rosstimothy
added a commit
that referenced
this issue
Jul 28, 2022
Adds a wrapper around `ssh.Session` which injects tracing context in a similar manner to the `ssh.Client` wrapper. All usages of `ssh.Session` have now been replaced and have the appropriate `context.Context` passed along Part of #12241
rosstimothy
added a commit
that referenced
this issue
Jul 28, 2022
Add tracing support for ssh global requests and channels. Wrappers for `ssh.Client`, `ssh.Channel`, and `ssh.NewChannel` provide a mechanism for tracing context to be propagated via a `context.Context`. In order to maintain backwards compatibility the ssh.Client wrapper tries to open a TracingChannel when constructed. Any servers that don't support tracing will reject the unknown channel. The client will only provide tracing context to servers which do NOT reject the TracingChannel request. In order to include pass tracing context along all ssh payloads are wrapped in an Envelope that includes the original payload AND any tracing context. Servers now try to unmarshal all payloads into said Envelope when processing messages. If there is an Envelope provided, a new span will be created and the original payload will be pass along to handlers. Part of #12241
rosstimothy
added a commit
that referenced
this issue
Aug 1, 2022
Adds a `teleport.forwarded.for` attribute to all spans that are forwarded to the auth server. This allows consumers of the spans to identify where the spans are coming from and take possible action. In some scenarios it may be desirable to drop forwarded spans along the collection process, by tagging them we can provide a way for those consumers to identify them. It also allows for potentially identifying a malicious user that may be trying to spam the telemetry backend with spans. Part of #12241
rosstimothy
added a commit
that referenced
this issue
Aug 1, 2022
SSH request tracing (#14124) Add tracing support for ssh global requests and channels. Wrappers for `ssh.Client`, `ssh.Channel`, and `ssh.NewChannel` provide a mechanism for tracing context to be propagated via a `context.Context`. In order to maintain backwards compatibility the ssh.Client wrapper tries to open a TracingChannel when constructed. Any servers that don't support tracing will reject the unknown channel. The client will only provide tracing context to servers which do NOT reject the TracingChannel request. In order to include pass tracing context along all ssh payloads are wrapped in an Envelope that includes the original payload AND any tracing context. Servers now try to unmarshal all payloads into said Envelope when processing messages. If there is an Envelope provided, a new span will be created and the original payload will be pass along to handlers. Part of #12241
rosstimothy
added a commit
that referenced
this issue
Aug 2, 2022
* Tag forwarded spans with custom attributes Adds a `teleport.forwarded.for` attribute to a resource or all spans that are forwarded to the auth server. This allows consumers of the spans to identify where the spans are coming from and take possible action. In some scenarios it may be desirable to drop forwarded spans along the collection process, by tagging them we can provide a way for those consumers to identify them. It also allows for potentially identifying a malicious user that may be trying to spam the telemetry backend with spans. Part of #12241
github-actions bot
pushed a commit
that referenced
this issue
Aug 4, 2022
Adds a wrapper around `ssh.Session` which injects tracing context in a similar manner to the `ssh.Client` wrapper. All usages of `ssh.Session` have now been replaced and have the appropriate `context.Context` passed along Part of #12241
reedloden
pushed a commit
that referenced
this issue
Aug 10, 2022
Trace ssh sessions Adds a wrapper around `ssh.Session` which injects tracing context in a similar manner to the `ssh.Client` wrapper. All usages of `ssh.Session` have now been replaced and have the appropriate `context.Context` passed along Part of #12241
reedloden
pushed a commit
that referenced
this issue
Aug 10, 2022
Tag forwarded spans with custom attributes (#14706) * Tag forwarded spans with custom attributes Adds a `teleport.forwarded.for` attribute to a resource or all spans that are forwarded to the auth server. This allows consumers of the spans to identify where the spans are coming from and take possible action. In some scenarios it may be desirable to drop forwarded spans along the collection process, by tagging them we can provide a way for those consumers to identify them. It also allows for potentially identifying a malicious user that may be trying to spam the telemetry backend with spans. Part of #12241
reedloden
pushed a commit
that referenced
this issue
Aug 15, 2022
SSH request tracing Add tracing support for ssh global requests and channels. Wrappers for `ssh.Client`, `ssh.Channel`, and `ssh.NewChannel` provide a mechanism for tracing context to be propagated via a `context.Context`. In order to maintain backwards compatibility the ssh.Client wrapper tries to open a TracingChannel when constructed. Any servers that don't support tracing will reject the unknown channel. The client will only provide tracing context to servers which do NOT reject the TracingChannel request. In order to include pass tracing context along all ssh payloads are wrapped in an Envelope that includes the original payload AND any tracing context. Servers now try to unmarshal all payloads into said Envelope when processing messages. If there is an Envelope provided, a new span will be created and the original payload will be pass along to handlers. Part of #12241
reedloden
pushed a commit
that referenced
this issue
Aug 15, 2022
Manually instrument `backend.Backend` (#13268) Adds a `trace.Tracer` to the `backend.Reporter` wrapper so that all `bakend.Backend` implementations can be traced. Further instrumentation of each specific backend will be added at a later date to see how long each sql query, or call to dynamo/etcd took within each backend operation. #12241
logand22
pushed a commit
that referenced
this issue
Aug 19, 2022
Tag forwarded spans with custom attributes (#14706) * Tag forwarded spans with custom attributes Adds a `teleport.forwarded.for` attribute to a resource or all spans that are forwarded to the auth server. This allows consumers of the spans to identify where the spans are coming from and take possible action. In some scenarios it may be desirable to drop forwarded spans along the collection process, by tagging them we can provide a way for those consumers to identify them. It also allows for potentially identifying a malicious user that may be trying to spam the telemetry backend with spans. Part of #12241
logand22
pushed a commit
that referenced
this issue
Aug 19, 2022
…#15480) Prevent forwarding traces to servers which don't support tracing (#14281) * Prevent forwarding traces to servers which don't support tracing Tracing clients can detect if a server doesn't support tracing by checking for a trace.NotImplented error in response to an UploadTraces request. Since the grpc.Conn used by the client is likely to be bound to that server for the duration of its life it doesn't make sense to keep trying to forward traces. Instead the client now remembers that a server doesn't support tracing and will drop any spans. Part of #12241
hydridity
pushed a commit
to hydridity/teleport
that referenced
this issue
Aug 26, 2022
SSH request tracing (gravitational#14124) * SSH request tracing Add tracing support for ssh global requests and channels. Wrappers for `ssh.Client`, `ssh.Channel`, and `ssh.NewChannel` provide a mechanism for tracing context to be propagated via a `context.Context`. In order to maintain backwards compatibility the ssh.Client wrapper tries to open a TracingChannel when constructed. Any servers that don't support tracing will reject the unknown channel. The client will only provide tracing context to servers which do NOT reject the TracingChannel request. In order to include pass tracing context along all ssh payloads are wrapped in an Envelope that includes the original payload AND any tracing context. Servers now try to unmarshal all payloads into said Envelope when processing messages. If there is an Envelope provided, a new span will be created and the original payload will be pass along to handlers. Part of gravitational#12241
hydridity
pushed a commit
to hydridity/teleport
that referenced
this issue
Aug 26, 2022
Manually instrument `backend.Backend` (gravitational#13268) Adds a `trace.Tracer` to the `backend.Reporter` wrapper so that all `bakend.Backend` implementations can be traced. Further instrumentation of each specific backend will be added at a later date to see how long each sql query, or call to dynamo/etcd took within each backend operation. gravitational#12241
hydridity
pushed a commit
to hydridity/teleport
that referenced
this issue
Aug 26, 2022
Allow traces to be exported to files (gravitational#14332) * Allow traces to be exported to files Adds support for exporting traces to a file. While not recommended for production use, some folks may need to collect traces without having any telemetry infrastructure in place to store them. To do so they can simply update their tracing_service to point to a directory, as seen in the following config snippet. ```yaml tracing_service: exporter_url: "file:///var/lib/teleport/traces" ``` The file contents will contain one json encoded otlp trace per line. Files written by the exporter will all follow the following naming convention: <unix_timestamp>-<random_number>.trace To prevent a trace file from growing unbound forever, there is a default limit of 100MB, after which, the file will be rotated for a brand new file. Users can adjust the file size limit by adding a query paramter to the exporter url like: `?limit=12345`. Part of gravitational#12241
hydridity
pushed a commit
to hydridity/teleport
that referenced
this issue
Aug 26, 2022
…gravitational#15479) Prevent forwarding traces to servers which don't support tracing (gravitational#14281) * Prevent forwarding traces to servers which don't support tracing Tracing clients can detect if a server doesn't support tracing by checking for a trace.NotImplented error in response to an UploadTraces request. Since the grpc.Conn used by the client is likely to be bound to that server for the duration of its life it doesn't make sense to keep trying to forward traces. Instead the client now remembers that a server doesn't support tracing and will drop any spans. Part of gravitational#12241
hydridity
pushed a commit
to hydridity/teleport
that referenced
this issue
Aug 26, 2022
Tag forwarded spans with custom attributes (gravitational#14706) * Tag forwarded spans with custom attributes Adds a `teleport.forwarded.for` attribute to a resource or all spans that are forwarded to the auth server. This allows consumers of the spans to identify where the spans are coming from and take possible action. In some scenarios it may be desirable to drop forwarded spans along the collection process, by tagging them we can provide a way for those consumers to identify them. It also allows for potentially identifying a malicious user that may be trying to spam the telemetry backend with spans. Part of gravitational#12241
hatched
pushed a commit
that referenced
this issue
Aug 30, 2022
Allow traces to be exported to files (#14332) * Allow traces to be exported to files Adds support for exporting traces to a file. While not recommended for production use, some folks may need to collect traces without having any telemetry infrastructure in place to store them. To do so they can simply update their tracing_service to point to a directory, as seen in the following config snippet. ```yaml tracing_service: exporter_url: "file:///var/lib/teleport/traces" ``` The file contents will contain one json encoded otlp trace per line. Files written by the exporter will all follow the following naming convention: <unix_timestamp>-<random_number>.trace To prevent a trace file from growing unbound forever, there is a default limit of 100MB, after which, the file will be rotated for a brand new file. Users can adjust the file size limit by adding a query paramter to the exporter url like: `?limit=12345`. Part of #12241
rosstimothy
added a commit
that referenced
this issue
Sep 14, 2022
The diagnostics docs were all in one page(`setup/reference/metrics.mdx`) which made things a bit hard to find, and wasn't conducive to adding new content. Diagnostics information is now located at `setup/diagnostics` with each section from the original `metrics.mdx` doc now having its own dedicated page. The section will now show up in the navbar at "Setup" > "Monitoring Your Cluster". The content of the profiling page is expanded to include information from https://github.com/gravitational/teleport/blob/740d184d1cfc69ae2e96c50ee738b13884fb232b/assets/monitoring/README.md#low-level-monitoring to illustrate what the different profile types are, the information that they capture, and how to retrieve them. A new Distributed Tracing page is also added to instruct users on how to setup the `tracing_service` to collect and export spans (the last open item for #12241).
rosstimothy
added a commit
that referenced
this issue
Sep 19, 2022
The diagnostics docs were all in one page(`setup/reference/metrics.mdx`) which made things a bit hard to find, and wasn't conducive to adding new content. Diagnostics information is now located at `setup/diagnostics` with each section from the original `metrics.mdx` doc now having its own dedicated page. The section will now show up in the navbar at "Setup" > "Monitoring Your Cluster". The content of the profiling page is expanded to include information from https://github.com/gravitational/teleport/blob/740d184d1cfc69ae2e96c50ee738b13884fb232b/assets/monitoring/README.md#low-level-monitoring to illustrate what the different profile types are, the information that they capture, and how to retrieve them. A new Distributed Tracing page is also added to instruct users on how to setup the `tracing_service` to collect and export spans (the last open item for #12241).
rosstimothy
added a commit
that referenced
this issue
Sep 29, 2022
The diagnostics docs were all in one page(`setup/reference/metrics.mdx`) which made things a bit hard to find, and wasn't conducive to adding new content. Diagnostics information is now located at `setup/diagnostics` with each section from the original `metrics.mdx` doc now having its own dedicated page. The section will now show up in the navbar at "Setup" > "Monitoring Your Cluster". The content of the profiling page is expanded to include information from https://github.com/gravitational/teleport/blob/740d184d1cfc69ae2e96c50ee738b13884fb232b/assets/monitoring/README.md#low-level-monitoring to illustrate what the different profile types are, the information that they capture, and how to retrieve them. A new Distributed Tracing page is also added to instruct users on how to setup the `tracing_service` to collect and export spans (the last open item for #12241).
rosstimothy
added a commit
that referenced
this issue
Oct 10, 2022
The diagnostics docs were all in one page(`setup/reference/metrics.mdx`) which made things a bit hard to find, and wasn't conducive to adding new content. Diagnostics information is now located at `setup/diagnostics` with each section from the original `metrics.mdx` doc now having its own dedicated page. The section will now show up in the navbar at "Setup" > "Monitoring Your Cluster". The content of the profiling page is expanded to include information from https://github.com/gravitational/teleport/blob/740d184d1cfc69ae2e96c50ee738b13884fb232b/assets/monitoring/README.md#low-level-monitoring to illustrate what the different profile types are, the information that they capture, and how to retrieve them. A new Distributed Tracing page is also added to instruct users on how to setup the `tracing_service` to collect and export spans (the last open item for #12241).
rosstimothy
added a commit
that referenced
this issue
Oct 14, 2022
* Enhance diagnostics docs The diagnostics docs were all in one page(`setup/reference/metrics.mdx`) which made things a bit hard to find, and wasn't conducive to adding new content. Diagnostics information is now located at `setup/diagnostics` with each section from the original `metrics.mdx` doc now having its own dedicated page. The section will now show up in the navbar at "Setup" > "Monitoring Your Cluster". The content of the profiling page is expanded to include information from https://github.com/gravitational/teleport/blob/740d184d1cfc69ae2e96c50ee738b13884fb232b/assets/monitoring/README.md#low-level-monitoring to illustrate what the different profile types are, the information that they capture, and how to retrieve them. A new Distributed Tracing page is also added to instruct users on how to setup the `tracing_service` to collect and export spans (the last open item for #12241).
github-actions bot
pushed a commit
that referenced
this issue
Oct 14, 2022
The diagnostics docs were all in one page(`setup/reference/metrics.mdx`) which made things a bit hard to find, and wasn't conducive to adding new content. Diagnostics information is now located at `setup/diagnostics` with each section from the original `metrics.mdx` doc now having its own dedicated page. The section will now show up in the navbar at "Setup" > "Monitoring Your Cluster". The content of the profiling page is expanded to include information from https://github.com/gravitational/teleport/blob/740d184d1cfc69ae2e96c50ee738b13884fb232b/assets/monitoring/README.md#low-level-monitoring to illustrate what the different profile types are, the information that they capture, and how to retrieve them. A new Distributed Tracing page is also added to instruct users on how to setup the `tracing_service` to collect and export spans (the last open item for #12241).
rosstimothy
added a commit
that referenced
this issue
Oct 14, 2022
* Enhance diagnostics docs The diagnostics docs were all in one page(`setup/reference/metrics.mdx`) which made things a bit hard to find, and wasn't conducive to adding new content. Diagnostics information is now located at `setup/diagnostics` with each section from the original `metrics.mdx` doc now having its own dedicated page. The section will now show up in the navbar at "Setup" > "Monitoring Your Cluster". The content of the profiling page is expanded to include information from https://github.com/gravitational/teleport/blob/740d184d1cfc69ae2e96c50ee738b13884fb232b/assets/monitoring/README.md#low-level-monitoring to illustrate what the different profile types are, the information that they capture, and how to retrieve them. A new Distributed Tracing page is also added to instruct users on how to setup the `tracing_service` to collect and export spans (the last open item for #12241).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature-request
Used for new features in Teleport, improvements to current should be #enhancements
What would you like Teleport to do?
Instrument and export distributed tracing spans to make Teleport easier to monitor, manage, and troubleshoot.
This is an umbrella issue for the distributed tracing work including RFD, backend and CLI changes.
What problem does this solve?
Determining and identifying areas of latency, getting a better picture of how things interact across service boundaries, collecting debug information.
If a workaround exists, please include it.
Try and piece together log statements
The text was updated successfully, but these errors were encountered: