-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prometheus metrics for batch calls #403
Changes from 2 commits
14ec2d1
3a01a24
fbc9115
ab373b3
8759466
f1d7177
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -147,6 +147,10 @@ impl BatchTransport for HttpTransport { | |
|
||
async move { | ||
let _guard = metrics.on_request_start("batch"); | ||
let _guards: Vec<_> = calls | ||
.iter() | ||
.map(|call| metrics.on_request_start(method_name(call))) | ||
.collect(); | ||
|
||
let outputs = execute_rpc(client, inner, id, &Request::Batch(calls)).await?; | ||
handle_batch_response(&ids, outputs) | ||
|
@@ -258,6 +262,8 @@ impl TransportMetrics { | |
#[cfg(test)] | ||
mod tests { | ||
use super::*; | ||
use crate::transport::create_env_test_transport; | ||
use crate::Web3; | ||
|
||
#[test] | ||
fn handles_batch_response_being_in_different_order_than_input() { | ||
|
@@ -303,4 +309,31 @@ mod tests { | |
) | ||
.is_err()); | ||
} | ||
|
||
#[tokio::test] | ||
#[ignore] | ||
async fn batch_call_success() { | ||
let http = create_env_test_transport(); | ||
let web3 = Web3::new(http); | ||
let request = web3.transport().prepare("eth_blockNumber", Vec::default()); | ||
let request2 = web3.transport().prepare("eth_chainId", Vec::default()); | ||
let _batch_response = web3 | ||
.transport() | ||
.send_batch([request, request2]) | ||
.await | ||
.unwrap() | ||
.into_iter(); | ||
MartinquaXD marked this conversation as resolved.
Show resolved
Hide resolved
|
||
let metric_storage = | ||
TransportMetrics::instance(global_metrics::get_metric_storage_registry()).unwrap(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One issue I see with this test is that other tests may influence its result (since it uses a global storage registry). This could lead to flaky tests when run with other code that increments HTTP metrics. Since this is an ignored test, this isn't super critical. Maybe we can create a local registry that is used just for this test. |
||
for method_name in ["eth_blockNumber", "eth_chainId", "batch"] { | ||
let number_calls = metric_storage | ||
.requests_complete | ||
.with_label_values(&[method_name]); | ||
let inflight_calls = metric_storage | ||
.requests_inflight | ||
.with_label_values(&[method_name]); | ||
assert_eq!(number_calls.get(), 1); | ||
assert_eq!(inflight_calls.get(), 0); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mixes up some things in the metrics.
I feel it would be more useful to duplicate
into a new metrics whose only responsibility is to count individual requests inside of batch requests.
inner_batch_requests_complete
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I agree that we should separate timing information for batch requests as it will be low signal (each request in the batch will take the total time as you suggested). This could lead to us thinking "why is
eth_blockNumber
taking so long?" when in fact it only does because it is included in a batch with 1000 complicatedeth_call
s).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I don't think we need timing for the inner requests of the batches at all. We only need to count them. So I would only duplicate the one metric I mentioned and not the histogram or in-flight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about not duplicating counters and instead just add an argument to
on_request_start()
which allows to discard timing data? That seems like a low effort solution on the rust side as well as the grafana side.Edit: A separate function would mean we wouldn't have to touch existing call sites to add the new argument, tough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On_request_start does a bunch of things, most of which are not useful for the inner batch calls. Feels cleaner to me to handle the inner batch requests completely separately because all we want to do there is increment one prometheus counter. No need for DropGuard logic either because that is only used for timing the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep timing information for the batch, but not for the inner calls. This is what I meant with my ambiguous "should separate timing information for batch requests". I think we are arguing for the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would: