Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add tracing to worker and proxy #1014

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

SantiagoPittella
Copy link
Collaborator

@SantiagoPittella SantiagoPittella commented Dec 11, 2024

this PR is part of #1004

@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-add-tracing-to-worker-proxy branch 2 times, most recently from a06eda5 to d68170d Compare December 12, 2024 19:13
@bobbinth
Copy link
Contributor

What is left to do on this PR? I would probably try to finish this one first, then address #1008, and only after that try to tackle metrics.

@SantiagoPittella
Copy link
Collaborator Author

What is left to do on this PR?

This PR is missing a cleanup, some configuration options and documentation.

I would probably try to finish this one first, then address #1008, and only after that try to tackle metrics.

Ok! Sounds good.

@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-add-tracing-to-worker-proxy branch from 9ba6f50 to e09374f Compare December 13, 2024 16:55
@SantiagoPittella SantiagoPittella changed the title wip: add tracing to worker and proxy feat: add tracing to worker and proxy Dec 13, 2024
@SantiagoPittella SantiagoPittella marked this pull request as ready for review December 13, 2024 16:55
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left some comments inline - most doc-related, but would be good for @igamigo and @Mirko-von-Leipzig to take a look as well.

bin/tx-prover/src/proxy/mod.rs Show resolved Hide resolved
bin/tx-prover/src/proxy/mod.rs Show resolved Hide resolved
bin/tx-prover/src/proxy/mod.rs Outdated Show resolved Hide resolved
bin/tx-prover/src/proxy/mod.rs Outdated Show resolved Hide resolved
bin/tx-prover/src/utils.rs Outdated Show resolved Hide resolved
bin/tx-prover/src/utils.rs Outdated Show resolved Hide resolved
bin/tx-prover/src/utils.rs Outdated Show resolved Hide resolved
bin/tx-prover/Cargo.toml Outdated Show resolved Hide resolved
bin/tx-prover/README.md Outdated Show resolved Hide resolved
bin/tx-prover/README.md Show resolved Hide resolved
bin/tx-prover/src/api/mod.rs Outdated Show resolved Hide resolved
bin/tx-prover/src/utils.rs Outdated Show resolved Hide resolved
let exporter = create_span_exporter();

TracerProvider::builder()
.with_sampler(Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased(1.0))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily for this PR: Not sure if this will impact performance on production since it will depends on traces, but maybe this 1.0 could be an env var or configurable somehow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great. It might be good to determine if the tracing/logging/metrics related configuration should be done through env vars like RUST_LOG=<level> or use a configuration file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make it a cli arg with env var support. clap supports this out the box, and personally I really dislike having these config files 😅 They just make deployment a nightmare.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened up this issue to tackle this.

bin/tx-prover/src/proxy/mod.rs Outdated Show resolved Hide resolved
async fn prove_transaction(
&self,
request: Request<ProveTransactionRequest>,
) -> Result<Response<ProveTransactionResponse>, tonic::Status> {
debug!(request = ?request, "Processing reply");
info!("Received request to prove transaction");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function has the instrument macro, does this provide anything valuable? Perhaps if looking at the logs on stdout?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make much sense to keep it. If looking at stdout out we already the log from inside the prover:

prover:prove_transaction:prove_program: miden_prover: Generated execution trace ...

I'm removing it and also changing the instrumentation. In particular, I'm changing the log level of the return vale log from info to debug.

Currently, using info logs:

2024-12-16T17:45:13.526435Z  INFO prover:prove_transaction: miden-tx-prover: return=Response { metadata: MetadataMap { headers: {} }, message: ProveTransactionResponse { proven_transaction: [187, 199, 33, ... ] }

And proven_transaction is quite big to have it on by default.

@SantiagoPittella
Copy link
Collaborator Author

@bobbinth @igamigo I answer a couple of your questions on the code/documentation itself. In particular the ones about tracing setup and Jaeger. Let me know if something was left unattended.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good! I left a few more comments inline.

bin/tx-prover/README.md Outdated Show resolved Hide resolved
bin/tx-prover/src/utils.rs Outdated Show resolved Hide resolved
bin/tx-prover/README.md Outdated Show resolved Hide resolved
bin/tx-prover/README.md Outdated Show resolved Hide resolved
bin/tx-prover/src/api/mod.rs Show resolved Hide resolved
@@ -18,6 +21,7 @@ impl StartProxy {
///
/// This method will first read the config file to get the list of workers to start. It will
/// then start a proxy with each worker as a backend.
#[tracing::instrument(target = MIDEN_TX_PROVER, name = "proxy:execute")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this get a different proxy target?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, my idea was to group both services under the same target in order to be able to eventually join traces (adding the worker trace to the proxy essentially). Is that ok? Maybe there is other way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure of how e.g. jaeger wants it, but I imagine one needs a common request UUID of sorts in order to aggregate a single request as it flows through.

Would the transaction ID not be good enough if we add it as a top level span in each process?

A potential issue would be someone sending the same transaction multiple times; in which case we would need to attach a more unique request identifier which is usually communicated using the X-Request-ID http extension header. Though this feels unnecessary at this stage, and probably we only care about this on a transaction ID level.

You already have more practical experience with jaeger though, so maybe what I'm thinking isn't applicable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both from the proxy and worker perspective I think that eventually adding a X-Request-ID header is the best option, using the UUID created in the proxy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a follow-up issue for this? Likely we'll also want to do this in the node, and maybe even in the client.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done #1031

let exporter = create_span_exporter();

TracerProvider::builder()
.with_sampler(Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased(1.0))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make it a cli arg with env var support. clap supports this out the box, and personally I really dislike having these config files 😅 They just make deployment a nightmare.

@SantiagoPittella
Copy link
Collaborator Author

@bobbinth I simplified the readme following your suggestion above and added a small part about how to change the exporter endpoint in case we want to use another solution.

Also, I ended up changing the calls HttpProxy::{method} because it wasn't working as expected. Contrary to my belief, it was calling the LoadBalancer implementation instead of the trait one. The workaround that I implemented is creating a dummy structure to implemented the trait too.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left one last doc-related comment inline.

bin/tx-prover/README.md Outdated Show resolved Hide resolved
@SantiagoPittella
Copy link
Collaborator Author

I'm currently working on rebasing this branch with next. I don't know if @Mirko-von-Leipzig wanted to perform a review of it before merging it. I plan to merge it at end of day today.

@Mirko-von-Leipzig
Copy link
Contributor

I'm currently working on rebasing this branch with next. I don't know if @Mirko-von-Leipzig wanted to perform a review of it before merging it. I plan to merge it at end of day today.

I'll do a more thorough review now.

Copy link
Contributor

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very good first stab. Once we have a better feeling for how dashboards, traces and logs look we can refine what we in/exclude.

CHANGELOG.md Show resolved Hide resolved
Comment on lines +42 to +47
opentelemetry = { version = "0.27", features = ["metrics", "trace"] }
opentelemetry-otlp = { version = "0.27", features = ["grpc-tonic"] }
opentelemetry_sdk = { version = "0.27", features = ["metrics", "rt-tokio"] }
opentelemetry-semantic-conventions = "0.27"
opentelemetry-jaeger = "0.22"
tracing-opentelemetry = "0.28"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment: You'd think they'd re-export them at least..

bin/tx-prover/README.md Show resolved Hide resolved
@@ -18,6 +21,7 @@ impl StartProxy {
///
/// This method will first read the config file to get the list of workers to start. It will
/// then start a proxy with each worker as a backend.
#[tracing::instrument(target = MIDEN_TX_PROVER, name = "proxy:execute")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a follow-up issue for this? Likely we'll also want to do this in the node, and maybe even in the client.

ProxyHttpDefaultImpl.early_request_filter(_session, &mut ()).await
}

#[tracing::instrument(name = "proxy:connected_to_upstream", parent = &ctx.parent_span, skip(_session, _sock, _fd))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not be skipping all in these? Or do we want to log context etc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep context here, but I'm removing all other params

review: rename to MIDEN_TX_PROVER, add target to parent trace for each request

review: use default implementations directly from trait

review: unpin patch versions in cargo.toml

review: remove debug log with transaction witness

chore: remove unused import

review: improve comment on default methods implementation

review: remove info! log and change log level of the prove_transaction method return

review: use log level from env

review: improve tracing setup documentation

review: refactor logging and tracing sections in readme

review: add transaction ID to worker trace

review: merge tracing setup functions

fix: add default implementation proxy

review: tracing section in readme shorter

review: address readme comments

fix: add missing import

review: move changelog entry to bottom

review: add missing empty line in readme

review: exclude parameters from trace in connected_to_upstream
@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-add-tracing-to-worker-proxy branch from 6e8f3b4 to 69f61df Compare December 19, 2024 18:18
@SantiagoPittella SantiagoPittella merged commit 87bbcca into next Dec 19, 2024
9 checks passed
@SantiagoPittella SantiagoPittella deleted the santiagopittella-add-tracing-to-worker-proxy branch December 19, 2024 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants