-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 2678: Span PoC - direction/feedback #2697
Conversation
Initial spike of one approach to adding tracing to Hyper. Feed back sought on: - use of feature flags - naming/layout - suitability of examples as inital test cases Signed-off-by: Mark van de Vyver <[email protected]>
Signed-off-by: Mark van de Vyver <[email protected]>
Signed-off-by: Mark van de Vyver <[email protected]>
Signed-off-by: Mark van de Vyver <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have some concerns about the current approach here. i don't really understand why it's necessary for hyper
itself to implement its own Layer
s for recording tracing
data. as far as i can tell, these don't implement any hyper
-specific behaviors, and are better off implemented outside of hyper
.
if there is a compelling reason for hyper
to provide hyper
-specific tracing
layers that implement behaviors tightly coupled with hyper
, I think those probably ought to go in a separate crate, rather than within the hyper
crate. since the tracing
facade allows a loose coupling between instrumentation and subscribers, there shouldn't be any reason for any hyper
layers to need to access hyper
's internals.
also, if there is some reason these layers need to be defined within hyper
itself...they almost certainly should be enabled by a separate feature flag than the tracing
spans and events exposed by hyper
. if i want to use tracing-subscriber::fmt
to record those spans and events, i might want to enable that feature flag...but i don't need any of the hyper
layers and their (often heavyweight) dependencies. there should be a separate feature flag for instrumentation in hyper.
@@ -108,6 +119,21 @@ tcp = [ | |||
"tokio/time", | |||
] | |||
|
|||
# Tracing layers | |||
layers = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the feature flag to enable tracing
support called layers
? i would expect it to be called tracing
or something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the feature flag to enable
tracing
support calledlayers
? i would expect it to be calledtracing
or something...
I believe that is where I started - again from memory - you can't have a feature name that is congruent to a crate name - but I'm not a Rust expert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can't have a feature name that is congruent to a crate name - but I'm not a Rust expert
An optional dependency is equivalent to a feature flag that only enables one dependency. I think that tracing
should just be an optional dependency and the instrumentation points should all be flagged with #[cfg(feature = "tracing")]
. If we do end up exposing other functionality, like the OpenTelemetry layer you've implemented, it should be under a separate feature flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the instrumentation points should all be flagged with #[cfg(feature = "tracing")]
What do you think of using two flags debug
and trace
or some such. Where the use cases are distinct, Use debug
when if you are investigating the integration of your app and hyper.
Use trace
when you are investigating hyper internals.
The debug spans would then be constrained to around only public functions.
Not sure if that type of distinction can work out?
If we do end up exposing other functionality, like the OpenTelemetry layer you've implemented, it should be under a separate feature flag.
Agreed, or probably better to have a separate crate
@@ -42,7 +41,18 @@ want = "0.3" | |||
# Optional | |||
|
|||
libc = { version = "0.2", optional = true } | |||
log = { version = "0.4.14", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there now a dependency on the log
crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there now a dependency on the
log
crate?
Is required to turn off tracing in an app that uses hyper (the example).
also from memory I believe the dependency was already there but just not surfaced/explicit. May be wrong in that recollection.
"log", | ||
"opentelemetry", | ||
"opentelemetry-jaeger", | ||
"serde_json", | ||
"reqwest", | ||
"tracing", | ||
"tracing-core", | ||
"tracing-log", | ||
"tracing-opentelemetry", | ||
"tracing-subscriber", | ||
"tracing-tree", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think that the feature that enables tracing
instrumentation in Hyper should enable all of these dependencies. suppose I want to log tracing
spans from hyper using tracing-subscriber
, in the plaintext format. in that case, i would want to enable tracing
instrumentation in hyper
...but i would not want hyper
to suddenly depend on log
, serde_json
, reqwest
, tracing_opentelemetry
, opentelemetry
, opentelemetry-jaeger
, and tracing-tree
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think that the feature that enables
tracing
instrumentation in Hyper should enable all of these dependencies. suppose I want to logtracing
spans from hyper usingtracing-subscriber
, in the plaintext format. in that case, i would want to enabletracing
instrumentation inhyper
...but i would not wanthyper
to suddenly depend onlog
,serde_json
,reqwest
,tracing_opentelemetry
,opentelemetry
,opentelemetry-jaeger
, andtracing-tree
.
Summarizing - response to the question of introducing a feature: No objection, in fact please consider features that allow more fine grained control. Accurate summary?
[[example]] | ||
name = "client_json_tracing_json" | ||
path = "examples/client_json_tracing_json.rs" | ||
required-features = ["full", | ||
"tracing/max_level_trace", | ||
] | ||
|
||
[[example]] | ||
name = "client_json_tracing_off" | ||
path = "examples/client_json_tracing_off.rs" | ||
required-features = ["full", | ||
"tracing/max_level_off", | ||
"log/max_level_off", | ||
] | ||
|
||
[[example]] | ||
name = "client_json_tracing_otel" | ||
path = "examples/client_json_tracing_otel.rs" | ||
required-features = ["full", | ||
"tracing/max_level_trace", | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take it or leave it: i think it might be worth putting all the tracing
examples in a examples/tracing
directory, so it's clear they're all related to various tracing
-related configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take it or leave it: i think it might be worth putting all the
tracing
examples in aexamples/tracing
directory, so it's clear they're all related to varioustracing
-related configurations?
OT but I'll take it into account closer to the time a final PR is ready.
let root_span = tracing::span!(tracing::Level::INFO, "root_span_echo").entered(); | ||
root_span.in_scope(|| async { | ||
// Log a `tracing` "event". | ||
info!(status = true, answer = 42, message = "first event"); | ||
|
||
let url = "http://jsonplaceholder.typicode.com/users".parse().unwrap(); | ||
let users = fetch_json(url).await.expect("Vector of user data"); | ||
// print users | ||
println!("users: {:#?}", users); | ||
|
||
// print the sum of ids | ||
let sum = users.iter().fold(0, |acc, user| acc + user.id); | ||
println!("sum of ids: {}", sum); | ||
|
||
// ...but, it can also be exited explicitly, returning the `Span` | ||
// struct: | ||
//let _root_span = root_span.exit(); | ||
}).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is incorrect. Span::in_scope
here is not instrumenting the async
block, but the closure that returns it...which, in this case, doesn't contain any tracing
events, as it immediately returns the async
block. So, calling in_scope
here is not doing anything. instead, these events occur within root_span
's scope because of the Span::entered
call prior to the closure...but entered
should not be used in asynchronous code. see the documentation for details on why.
instead, this should be:
let root_span = tracing::span!(tracing::Level::INFO, "root_span_echo").entered(); | |
root_span.in_scope(|| async { | |
// Log a `tracing` "event". | |
info!(status = true, answer = 42, message = "first event"); | |
let url = "http://jsonplaceholder.typicode.com/users".parse().unwrap(); | |
let users = fetch_json(url).await.expect("Vector of user data"); | |
// print users | |
println!("users: {:#?}", users); | |
// print the sum of ids | |
let sum = users.iter().fold(0, |acc, user| acc + user.id); | |
println!("sum of ids: {}", sum); | |
// ...but, it can also be exited explicitly, returning the `Span` | |
// struct: | |
//let _root_span = root_span.exit(); | |
}).await; | |
use tracing::instrument::Instrument; | |
let root_span = tracing::span!(tracing::Level::INFO, "root_span_echo); | |
async { | |
// Log a `tracing` "event". | |
info!(status = true, answer = 42, message = "first event"); | |
let url = "http://jsonplaceholder.typicode.com/users".parse().unwrap(); | |
let users = fetch_json(url).await.expect("Vector of user data"); | |
// print users | |
println!("users: {:#?}", users); | |
// print the sum of ids | |
let sum = users.iter().fold(0, |acc, user| acc + user.id); | |
println!("sum of ids: {}", sum); | |
} | |
.instrument(root_span) | |
.await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the correction. I think this makes the point that examples are required - ideally someone should be able to get tracing without having to become an expert in the tracing stack - 80/20 rule - some standard boiler plate setup should suffice for 80%+ of use cases.
tracing::Span::current() | ||
.record("http.method", &format!("{:?}", msg.req_method).as_str()); | ||
tracing::Span::current().record( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably more efficient to avoid getting the current span multiple times ---Span::current
performs a thread-local storage access. prefer:
tracing::Span::current() | |
.record("http.method", &format!("{:?}", msg.req_method).as_str()); | |
tracing::Span::current().record( | |
let span = tracing::Span::current(); | |
span.record("http.method", &format!("{:?}", msg.req_method).as_str()); | |
span.record( |
also, we should record the method as a Debug
trait object, in order to avoid allocating a string here:
tracing::Span::current() | |
.record("http.method", &format!("{:?}", msg.req_method).as_str()); | |
tracing::Span::current().record( | |
let span = tracing::Span::current(); | |
span.record("http.method", &msg.req_method as &dyn std::fmt::Debug)); | |
span.record( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we should record the method as a Debug trait object, in order to avoid allocating a string here:
Thanks, that is way beyond my ken at this point.
.record("http.method", &format!("{:?}", msg.req_method).as_str()); | ||
tracing::Span::current().record( | ||
"http.status_code", | ||
&format!("{:?}", msg.head.subject).as_str(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&format!("{:?}", msg.head.subject).as_str(), | |
&msg.head.subject as &dyn std::fmt::Debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hawkw I've tried these suggestions with no luck:
span.as_ref()
.unwrap()
.record("http.method", &msg.req_method as &dyn std::fmt::Debug);
gives this error:
the size for values of type `dyn std::fmt::Debug` cannot be known at compilation time
the trait `Sized` is not implemented for `dyn std::fmt::Debug`
I tried some suggestion by David Barsky (via Discord) which gives different errors.
I've also tried umpteen variations I'd come up with in partnership with the compiler... which I won't bore you with.
Consequently, the follow up PR will use what I have here, pending a working alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From memory... this is what led me to add layers, likely incorrectly. I'd come across some blogs that made me think any logic around value processing belonged in a layer.
For what it is worth: I've removed all layers, and if they are appropriate, we'd need to discuss where they live - a proc-macro-attribute crate doesn't appear to allow modules to be exported - so they don't/can't live in the proc-macro crate I've created as part of the follow-up PR.
"http.status_code", | ||
&format!("{:?}", msg.head.subject).as_str(), | ||
); | ||
trace!("Server::encode body={:?}", msg.body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer structured fields:
trace!("Server::encode body={:?}", msg.body); | |
trace!(body = ?msg.body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer structured fields:
I'm pretty much leaving existing code unchanged where it is not fully replaced by some span/event structure. If I was to propose changing this it would be to storing the byte count in one of the span fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further reflection I've followed this suggestion - if the proposed approach in the PR gets rejected it won't be because of this :)
tracing::Span::current().record("http.flavor", &0.0); | ||
tracing::Span::current().record("otel.status", &format!("{:?}", opentelemetry::trace::StatusCode::Error).as_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, we probably want to avoid getting the current span twice if we don't have to...
@@ -1011,6 +1127,30 @@ impl Http1Transaction for Client { | |||
} | |||
} | |||
|
|||
#[cfg_attr(feature = "layers", | |||
instrument( level = "trace", | |||
skip(msg), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably also don't want to record dst
here...
skip(msg), | |
skip(msg, dst), |
Thanks for the PR! I appreciate your effort to improve tracing in hyper. I agree with several of Eliza's points:
This isn't to say that the layers here aren't useful, I'm sure they are from an open telemetry perspective. Likely, building them and the examples can help you come up with what exact events and spans you actually need to serialize. I'd say keep experimenting with the examples, but in the goal of doing as I mentioned in #2678: Determine what spans, events, and fields that absolutely need to be inside hyper (as opposed to ones that can occur outside hyper, in a custom The point is that hyper should emit spans and events that a user couldn't figure out from outside hyper. I assume this will most often be timing related. Most of the other information can be recorded at the start of an |
Overall I agree. I don't believe I proposed examples or layers in the questions I posed. Would appreciate your feedback on the questions raised. |
Largely agree with these points. I'm trying to check how these spans 'look' in different contexts and to decide whether any changes are required - as temporary workarounds. Still exploring/experimenting. any layers in hyper would be proposed separately. Apologies if the questions I posed gave the impression I was wanting to bring these layers in. |
I understand the concern. The issue is: getting debug/trace output in a usable form should be off-the-shelf functionality. IMO, the Hyper instructions need not be more complex than this: podman run -p6831:6831/udp -p6832:6832/udp -p16686:16686 jaegertracing/all-in-one:latest
cargo run --features="some thing tracing/max_level_trace" --example client_json_tracing_otel
firefox http://localhost:16686 or some such.
See @hawkw 's earlier comment on the subtleties of setting up tracing for async. IMO, debug output should just work and not require you to spend more than 2 minutes on setup. Adding spans/events, and nothing more, is just throwing the same problem onto every user to keep solving. To make it concrete:
Right now it isn't clear to me that some layer is/isn't required - even as a workaround - to get the full output to solve the problems.
Correct. They've exposed several things that need to be addressed, but that is likely my ignorance, and may be non-issues.
Isn't this that?
Ditto. Also bear in mind:
I think this means erring on the side of being liberal in extracting information - there is no runtime performance penalty and we need debug output that doesn't result in issues sitting open for >1 month (ideally)?
OK, the I take your point about the finangle structure being proven prior-art. There are two reports of hang behavior that we are blind about. I'm not even sure if they have the same root cause. The spans/events required, as a first step, are those that expose the cause of the problem - the finangle guide is one useful way to organize them. However,
I don't want to get to the point of finding out what spans/events/logs are needed to identify the issue in my repro only to find out that the approach taken is totally unacceptable, e.g. think your |
I apologize, I should've been more clear when I mentioned |
@taqtiqa-mark, re:
I understand your motivation here. The point I've been trying to make, though, is that enabling logging, and determining how it's formatted, is an application-level concern. The general consensus in the Rust library ecosystem is that library crates (almost) never print anything to standard out by default, and don't have opinions about how logs should be formatted. Instead, the If users need to enable logging in a new application or in a reproduction of a bug report or something, they can just paste tracing_subscriber::fmt::init(); into their If users developing a production application want to use OpenTelemetry, they can configure |
Thanks for engaging - I agree with all of that. Also to my mind, providing a default does not necessarily constrain you outside of the default case. I have in mind only debug and trace level spans, etc. I also think they should be off by default, so not under the This comment is why I sought early feedback:
Buying into OpenTelemetry means you can format for stdout, JSON, Jaeger, OTLP, Prometheus, etc. I just want to make sure whether you're suggesting, say, a In particular how would that work w.r.t. empty fields being populated at runtime - it seems there the OTel details would still leak into the hyper crate? |
For a while I've wanted to add support for OpenTelemetry's conventional span field names, to Using that with use axum::{routing::get, Router};
use tower_http::trace::TraceLayer;
use tracing_subscriber::{fmt, prelude::*, EnvFilter};
#[tokio::main]
async fn main() {
opentelemetry::global::set_text_map_propagator(
opentelemetry::sdk::propagation::TraceContextPropagator::new(),
);
let tracer = opentelemetry_jaeger::new_pipeline()
.with_service_name("server")
.install_batch(opentelemetry::runtime::Tokio)
.unwrap();
tracing_subscriber::registry()
.with(fmt::layer().pretty())
.with(tracing_opentelemetry::layer().with_tracer(tracer))
.with(
EnvFilter::default()
.add_directive("tower_http=trace".parse().unwrap())
)
.init();
let app = Router::new()
.route("/", get(|| async { "Hello, World!" }))
// install the middleware and opt in to opentelemetry
.layer(TraceLayer::new_for_http().opentelemetry_server());
axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
.serve(app.into_make_service())
.await
.unwrap();
opentelemetry::global::shutdown_tracer_provider();
} |
Thanks @davidpdrsn I'll take a look at that. Right now I think I have enough feedback to make a meaningfully different second iteration to try address some of the issues raised. |
Seeking feedback on the approach:
TLDR;
Use of a feature flagSee issue RFC: Debug/tracing feature flags #2698layers
, adding the flag tofull
and placing tracing logic behind that flag.Use OpenTelemetry semantic conventions as the starting point.
The initial documentation is in
src/common/layers/mod.rs
[src/proto/h1/role.rs]:
Server::encode
span (328-470) is the most complete and serves point of reference. We start here because these are the functions the the subject of spans in the current code base:cfg_attr
is a template I propose to use elsewhere: Acceptable?tracing::Span::current().record(...)
: Acceptable?cfg!(feature = ~~"layers"~~ TBD)
guard: Acceptable?tracing-error
crate, so I don't propose to do too much around the errors: Acceptable?Value
trait are skipped. The proposed workaround until the Value trait is unsealed is toimpl Debug for ...
andimpl Display for ...
insrc/common/layers/values/*.rs
: Acceptable?debug!(..)
andtrace!(..)
either into a span field, or if not appropriate, behind acfg!(feature = "layers")
guard, leave all other levels alone: Acceptable?Once these four spans (
Client:encode
,Client:parse
,Server:encode
,Server:parse
) are stabilized and merged:tracing-error