-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add tracing to worker and proxy #1014
feat: add tracing to worker and proxy #1014
Conversation
a06eda5
to
d68170d
Compare
What is left to do on this PR? I would probably try to finish this one first, then address #1008, and only after that try to tackle metrics. |
This PR is missing a cleanup, some configuration options and documentation.
Ok! Sounds good. |
9ba6f50
to
e09374f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left some comments inline - most doc-related, but would be good for @igamigo and @Mirko-von-Leipzig to take a look as well.
bin/tx-prover/src/utils.rs
Outdated
let exporter = create_span_exporter(); | ||
|
||
TracerProvider::builder() | ||
.with_sampler(Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased(1.0)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily for this PR: Not sure if this will impact performance on production since it will depends on traces, but maybe this 1.0
could be an env var or configurable somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great. It might be good to determine if the tracing/logging/metrics related configuration should be done through env vars like RUST_LOG=<level>
or use a configuration file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make it a cli arg with env var support. clap supports this out the box, and personally I really dislike having these config files 😅 They just make deployment a nightmare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened up this issue to tackle this.
bin/tx-prover/src/api/mod.rs
Outdated
async fn prove_transaction( | ||
&self, | ||
request: Request<ProveTransactionRequest>, | ||
) -> Result<Response<ProveTransactionResponse>, tonic::Status> { | ||
debug!(request = ?request, "Processing reply"); | ||
info!("Received request to prove transaction"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function has the instrument macro, does this provide anything valuable? Perhaps if looking at the logs on stdout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make much sense to keep it. If looking at stdout out we already the log from inside the prover:
prover:prove_transaction:prove_program: miden_prover: Generated execution trace ...
I'm removing it and also changing the instrumentation. In particular, I'm changing the log level of the return vale log from info
to debug
.
Currently, using info
logs:
2024-12-16T17:45:13.526435Z INFO prover:prove_transaction: miden-tx-prover: return=Response { metadata: MetadataMap { headers: {} }, message: ProveTransactionResponse { proven_transaction: [187, 199, 33, ... ] }
And proven_transaction
is quite big to have it on by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks good! I left a few more comments inline.
@@ -18,6 +21,7 @@ impl StartProxy { | |||
/// | |||
/// This method will first read the config file to get the list of workers to start. It will | |||
/// then start a proxy with each worker as a backend. | |||
#[tracing::instrument(target = MIDEN_TX_PROVER, name = "proxy:execute")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this get a different proxy target
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, my idea was to group both services under the same target in order to be able to eventually join traces (adding the worker trace to the proxy essentially). Is that ok? Maybe there is other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure of how e.g. jaeger wants it, but I imagine one needs a common request UUID of sorts in order to aggregate a single request as it flows through.
Would the transaction ID not be good enough if we add it as a top level span in each process?
A potential issue would be someone sending the same transaction multiple times; in which case we would need to attach a more unique request identifier which is usually communicated using the X-Request-ID
http extension header. Though this feels unnecessary at this stage, and probably we only care about this on a transaction ID level.
You already have more practical experience with jaeger though, so maybe what I'm thinking isn't applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both from the proxy and worker perspective I think that eventually adding a X-Request-ID
header is the best option, using the UUID created in the proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create a follow-up issue for this? Likely we'll also want to do this in the node, and maybe even in the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #1031
bin/tx-prover/src/utils.rs
Outdated
let exporter = create_span_exporter(); | ||
|
||
TracerProvider::builder() | ||
.with_sampler(Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased(1.0)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make it a cli arg with env var support. clap supports this out the box, and personally I really dislike having these config files 😅 They just make deployment a nightmare.
@bobbinth I simplified the readme following your suggestion above and added a small part about how to change the exporter endpoint in case we want to use another solution. Also, I ended up changing the calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left one last doc-related comment inline.
I'm currently working on rebasing this branch with |
I'll do a more thorough review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a very good first stab. Once we have a better feeling for how dashboards, traces and logs look we can refine what we in/exclude.
opentelemetry = { version = "0.27", features = ["metrics", "trace"] } | ||
opentelemetry-otlp = { version = "0.27", features = ["grpc-tonic"] } | ||
opentelemetry_sdk = { version = "0.27", features = ["metrics", "rt-tokio"] } | ||
opentelemetry-semantic-conventions = "0.27" | ||
opentelemetry-jaeger = "0.22" | ||
tracing-opentelemetry = "0.28" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side comment: You'd think they'd re-export them at least..
@@ -18,6 +21,7 @@ impl StartProxy { | |||
/// | |||
/// This method will first read the config file to get the list of workers to start. It will | |||
/// then start a proxy with each worker as a backend. | |||
#[tracing::instrument(target = MIDEN_TX_PROVER, name = "proxy:execute")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create a follow-up issue for this? Likely we'll also want to do this in the node, and maybe even in the client.
bin/tx-prover/src/proxy/mod.rs
Outdated
ProxyHttpDefaultImpl.early_request_filter(_session, &mut ()).await | ||
} | ||
|
||
#[tracing::instrument(name = "proxy:connected_to_upstream", parent = &ctx.parent_span, skip(_session, _sock, _fd))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not be skipping all
in these? Or do we want to log context etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep context here, but I'm removing all other params
538835a
to
6e8f3b4
Compare
review: rename to MIDEN_TX_PROVER, add target to parent trace for each request review: use default implementations directly from trait review: unpin patch versions in cargo.toml review: remove debug log with transaction witness chore: remove unused import review: improve comment on default methods implementation review: remove info! log and change log level of the prove_transaction method return review: use log level from env review: improve tracing setup documentation review: refactor logging and tracing sections in readme review: add transaction ID to worker trace review: merge tracing setup functions fix: add default implementation proxy review: tracing section in readme shorter review: address readme comments fix: add missing import review: move changelog entry to bottom review: add missing empty line in readme review: exclude parameters from trace in connected_to_upstream
6e8f3b4
to
69f61df
Compare
this PR is part of #1004