Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bandwidth metering metrics #844

Closed
wants to merge 28 commits into from

Conversation

akirillo
Copy link
Contributor

Resolves #783, introduces a type that exposes metrics from an underlying BandwidthMeter

@akirillo akirillo added C-enhancement New feature or request A-observability Related to tracing, metrics, logs and other observability tools A-networking Related to networking in general labels Jan 13, 2023
@akirillo akirillo self-assigned this Jan 13, 2023
@akirillo
Copy link
Contributor Author

Assumes the API introduced in #785

@akirillo akirillo added the S-blocked This cannot more forward until something else changes label Jan 13, 2023
@onbjerg onbjerg changed the title Bandwidth metering metrics feat: bandwidth metering metrics Jan 13, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2023

Codecov Report

Merging #844 (766297d) into main (b5dab61) will decrease coverage by 0.05%.
The diff coverage is 82.50%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #844      +/-   ##
==========================================
- Coverage   74.91%   74.87%   -0.05%     
==========================================
  Files         313      313              
  Lines       34108    34139      +31     
==========================================
+ Hits        25551    25560       +9     
- Misses       8557     8579      +22     
Flag Coverage Δ
unit-tests 74.87% <82.50%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
crates/net/common/src/lib.rs 100.00% <ø> (ø)
crates/net/network/src/manager.rs 47.98% <ø> (-0.03%) ⬇️
crates/net/network/src/network.rs 57.57% <ø> (+0.64%) ⬆️
crates/net/network/src/session/handle.rs 50.00% <ø> (ø)
crates/net/common/src/metered_stream.rs 81.86% <81.86%> (ø)
crates/net/network/src/session/active.rs 84.67% <100.00%> (-0.06%) ⬇️
crates/net/network/src/session/mod.rs 68.51% <100.00%> (-0.07%) ⬇️
crates/net/eth-wire/src/capability.rs 62.13% <0.00%> (-0.98%) ⬇️
crates/stages/src/stages/sender_recovery.rs 92.51% <0.00%> (-0.54%) ⬇️
crates/net/network/src/peers/manager.rs 83.39% <0.00%> (-0.19%) ⬇️
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@akirillo
Copy link
Contributor Author

Going to mark as blocked for now by #892

Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need to get this over the line?

@gakonst
Copy link
Member

gakonst commented Jan 23, 2023

As discussed we only want to be counting Ingress/Egress, and not KB/s or MB/s.

@akirillo
Copy link
Contributor Author

As discussed we only want to be counting Ingress/Egress, and not KB/s or MB/s.

Ack, that's effectively what this does, will rename to make it clearer.

@akirillo akirillo marked this pull request as ready for review January 23, 2023 18:33
@akirillo
Copy link
Contributor Author

As discussed we only want to be counting Ingress/Egress, and not KB/s or MB/s.

For viz: after discussing w/ @onbjerg, I realized I was overloading the term "metric." We don't want to expose bytes/s as an external metric (e.g. in Prometheus) but it will be helpful to have it in-process, picking up on that in #992

@mattsse mattsse requested a review from onbjerg January 24, 2023 17:26
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Wdyt @onbjerg ?

@akirillo akirillo removed the S-blocked This cannot more forward until something else changes label Jan 24, 2023
Copy link
Collaborator

@onbjerg onbjerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on the trait. Another question I have is what the difference is between NetworkIOMeter and MeteredStream? Could MeteredStream not tally this up itself, i.e. is it not possible to merge NetworkIOMeter and MeteredStream? If so, is there a reason not to?

@akirillo
Copy link
Contributor Author

Another question I have is what the difference is between NetworkIOMeter and MeteredStream? Could MeteredStream not tally this up itself, i.e. is it not possible to merge NetworkIOMeter and MeteredStream? If so, is there a reason not to?

There is a slight difference - one NetworkIOMeter can be shared by multiple MeteredStreams. We don't currently do this, for now we're just doing per-stream metering, but it is possible to have multiple streams incrementing the same meter

@onbjerg
Copy link
Collaborator

onbjerg commented Jan 24, 2023

I would prob rename NetworkIOMeter to MeteredStreamMetrics instead then and add a comment explaining that it can be shared between multiple streams to make it a bit clearer - allows us to potentially extract it later for non-network stream metering

@akirillo
Copy link
Contributor Author

I would prob rename NetworkIOMeter to MeteredStreamMetrics instead then and add a comment explaining that it can be shared between multiple streams to make it a bit clearer - allows us to potentially extract it later for non-network stream metering

I'm being a bit pedantic, but I'm wary of calling it Metrics because it's a struct containing in-process measurements, we have a separate struct NetworkIOMeterMetrics which actually exposes metrics via metrics-derive.

You're right though that it's not just for network streams, perhaps I'll call it StreamMeter? lol
Or MeteredStreamCounts?

@akirillo akirillo requested a review from onbjerg January 24, 2023 22:48
@akirillo
Copy link
Contributor Author

@onbjerg does this line up w how you wanted discovery ingress/egress monitored?

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be precise about what we actually want to

  1. measure
  2. expose via metric counter/gauge

Because right now I'm a little lost and not sure about either. But I'm strongly against instrumenting a TCP stream with metrics because this is likely more expensive that just incrementing an Atomic via relaxed for statistical purposes, like we're doing on main

is "total bytes read/written by all sessions combined" even useful?
what do we want/need from a single session in order to rate the quality of the session?

Comment on lines +126 to +128
/// The [`MeteredStreamCounts`] structs this can increment ingress / egress counts for,
/// with optional metrics associated with each.
meters: HashMap<&'a str, (MeteredStreamCounts, Option<MeteredStreamMetrics>)>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we using this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. here

// Attach meter for total across all sessions
metered_stream.add_meter(SESSIONS_METER_NAME, metered_stream_counts);
// Attach meter for this session
metered_stream.add_meter("session", MeteredStreamCounts::default());

we meter ingress/egress both on the meter shared by all sessions, and on a meter just for the current session

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so this is basically just

struct MeteredStream {
  inner,
  total: MeteredStreamCounts,
  session: MeteredStreamCounts
} 

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, i admittedly jumped the gun on flexibility so it can be any number of MeteredStreamCounts

}
}

impl MeteredStream<'_, UdpSocket> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm it is a bit odd that we implement something called Stream for a UdpSocket

since it only sends individual packets, perhaps counting bytes read/written is not really ideal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just want to count whole packets, we could make an extension to the MeteredSender, i.e. make it work over any Sink & have a similar MeteredReceiver that works over any Stream, and use a UdpFramed here to have a type that impls Sink + Stream.

Or, of course, we could make a new type here that simply counts whole packets, but FWIW the MeteredSender isn't used anywhere yet

@akirillo
Copy link
Contributor Author

strongly against instrumenting a TCP stream with metrics

Yeah, we're not exposing metrics over the TCP streams in the peer sessions, just counting up via Atomics

is "total bytes read/written by all sessions combined" even useful?

Honestly, as something that we'd want to reference in code, I don't think so. But now the MeteredStream is sufficiently flexible to easily remove the meter across all sessions, and only keep the per-session one.

what do we want/need from a single session in order to rate the quality of the session?

As I'm thinking through #908, I really think it's a matter of throughput and latency. Latency is out of scope here, but for throughput we need a rate. The new approach I'm working on for #1002 actually does read the per-stream ingress/egress byte values to calculate this rate, for use in dropping peers

@gakonst gakonst added the S-needs-investigation This issue requires detective work to figure out what's going wrong label Feb 9, 2023
@gakonst
Copy link
Member

gakonst commented Mar 17, 2023

Closing for now as we couldn't figure out a great way to integrate this at the time - think we'll revisit metrics in the coming weeks and we'll cherry-pick parts of this and the other PR then!

@gakonst gakonst closed this Mar 17, 2023
@gakonst gakonst deleted the andrew/bandwidth-metering-metrics branch March 17, 2023 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-networking Related to networking in general A-observability Related to tracing, metrics, logs and other observability tools C-enhancement New feature or request S-needs-investigation This issue requires detective work to figure out what's going wrong
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Expose bandwidth metering metrics
5 participants