Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit error handling in trace #371

Merged
merged 19 commits into from
Nov 30, 2020

Conversation

TommyCpp
Copy link
Contributor

@TommyCpp TommyCpp commented Nov 26, 2020

Current approaches

Currently, most of the errors in trace have been returned via Box<dyn std::error::Error + Send + Sync + 'static>. This approach gives users good flexibility to return basically any errors. But it totally depends on user to return meaningful messages in errors to help end user to understand the issue.

Proposed approaches

As discussed in #286, we decided to use thiserror library wide to allow multiple level of errors throughout the whole process to send span.

  1. We define a TraceError enum to represent any possible errors from trace module. It provides some predefined variantion, and we should add more variantions as needed. It also provides two generic variantions Other and Boxed. Other could hold any string and Boxed could hold any Box<dyn std::error::Error + Send + Sync + 'static>. It allows the user to adapt their currently implementation easily.

  2. We define a ExporterError trait, which contains one method to return the name of the exporter. Exporter implementations are engouraged to define their own Error type and implement ExporterError. This allow any exporter implementations to pass their custom errors back to SDK. This should also help narrow down which exporter causes the error when errors being returned.

  3. Change function signature to return TraceError or ExporterError when possible. And use global error handler to handle the rest of errors

It could help propagate error from upstream components.
Export error will consist two parts.

The first one is two variant in TraceError. One for timed out, the other for all other errors.

The other part is a trait ExportError, which extended the Error trait by asking user to provide the name of the exporter.

Users should implement their own Error for their exporters and implement ExportError on those errors.

(todo): Add errors for all exporters
@codecov
Copy link

codecov bot commented Nov 26, 2020

Codecov Report

Merging #371 (47836ea) into master (ac8e4e5) will decrease coverage by 0.50%.
The diff coverage is 19.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #371      +/-   ##
==========================================
- Coverage   54.00%   53.50%   -0.51%     
==========================================
  Files          71       72       +1     
  Lines        6051     6117      +66     
==========================================
+ Hits         3268     3273       +5     
- Misses       2783     2844      +61     
Impacted Files Coverage Δ
opentelemetry-otlp/src/span.rs 2.67% <0.00%> (ø)
opentelemetry/src/api/metrics/mod.rs 0.00% <0.00%> (ø)
opentelemetry/src/api/mod.rs 0.00% <0.00%> (ø)
opentelemetry/src/api/trace/mod.rs 0.00% <0.00%> (ø)
opentelemetry/src/exporter/trace/mod.rs 56.52% <0.00%> (-7.68%) ⬇️
opentelemetry/src/exporter/trace/stdout.rs 17.64% <0.00%> (-5.43%) ⬇️
opentelemetry/src/global/error_handler.rs 0.00% <0.00%> (ø)
opentelemetry/src/sdk/metrics/processors/basic.rs 0.00% <0.00%> (ø)
opentelemetry/src/testing/trace.rs 65.06% <31.57%> (-10.70%) ⬇️
opentelemetry/src/sdk/trace/span_processor.rs 78.36% <72.22%> (-0.49%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac8e4e5...47836ea. Read the comment docs.

@TommyCpp TommyCpp force-pushed the error-handling branch 2 times, most recently from 25ae3ec to 9c1952d Compare November 26, 2020 23:16
Copy link
Member

@jtescher jtescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start, couple of thoughts as the draft progresses

examples/basic-otlp/src/main.rs Outdated Show resolved Hide resolved
opentelemetry-jaeger/src/lib.rs Outdated Show resolved Hide resolved
opentelemetry-otlp/src/lib.rs Outdated Show resolved Hide resolved
opentelemetry/src/api/mod.rs Outdated Show resolved Hide resolved
opentelemetry/src/api/trace/mod.rs Outdated Show resolved Hide resolved
opentelemetry/src/exporter/trace/mod.rs Outdated Show resolved Hide resolved
opentelemetry/src/exporter/trace/mod.rs Outdated Show resolved Hide resolved
opentelemetry/src/exporter/trace/mod.rs Outdated Show resolved Hide resolved
opentelemetry/src/exporter/trace/stdout.rs Outdated Show resolved Hide resolved
* Fix typo and naming
* Use thiserror for stdout exporter and datadog exporter
* Implement Error for opentelemetry::Error, renamed from OpenTelemetryError
…try-rust into error-handling

� Conflicts:
�	opentelemetry-jaeger/Cargo.toml
* Add ExportError in MetricsError.
It can help unify the export error among trace, metric and logging in the future.

* Merge TraceError::Other into TraceError::Boxed.
Those two seems to have a lot in common. Created a custom wrap type for string and metric those two variants.

* Remove default implement for exporter_name.
It should have used in case like http client failure. But it could also encourage ppl not implement it at all.
@TommyCpp TommyCpp force-pushed the error-handling branch 3 times, most recently from 6914268 to 01c8265 Compare November 29, 2020 03:12
* Remove HttpClientError
Copy link
Member

@jtescher jtescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! only a couple of small comments, also looks like the msrv tests are now broken by an async std dependency, maybe we could feature flag those tests so they only run if async-std feature is used?

opentelemetry-jaeger/src/lib.rs Outdated Show resolved Hide resolved
opentelemetry/src/sdk/trace/span_processor.rs Outdated Show resolved Hide resolved
@jtescher
Copy link
Member

jtescher commented Nov 29, 2020

Or maybe just remove async-std from msrv job in

cargo test --verbose --manifest-path=opentelemetry/Cargo.toml --features trace,metrics,serialize,tokio,async-std,serde,http,tonic,reqwest &&
if that fixes it

Copy link
Member

@jtescher jtescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits left but looking good, ready whenever out of draft state

opentelemetry-contrib/src/trace/exporter/datadog/mod.rs Outdated Show resolved Hide resolved
opentelemetry/Cargo.toml Outdated Show resolved Hide resolved
@TommyCpp TommyCpp marked this pull request as ready for review November 30, 2020 01:59
@TommyCpp TommyCpp requested a review from a team November 30, 2020 01:59
Copy link
Member

@jtescher jtescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@jtescher jtescher merged commit 64bc7ba into open-telemetry:master Nov 30, 2020
@TommyCpp TommyCpp mentioned this pull request Nov 30, 2020
@TommyCpp TommyCpp deleted the error-handling branch December 1, 2020 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants