Use `tracing` for improved diagnostics #1533

thomaseizinger · 2020-03-31T04:52:58Z

Tracing: A scoped, structured logging and diagnostics system.

rust-libp2p is designed as an async, event-based system. This makes it fairly hard to add rich logs in user land because peer information is not always available. For example, the InboundUpgrade trait doesn't know, which peer it actually connected to.

It might be worth considering to add a dependency on the tracing crate and create Spans around the calls to these methods that set certain variables in the context.

A downstream crate that also uses tracing would then automatically pick those scopes up and messages printed to the log file would automatically include this context. For example, logging a message like: "Upgrading inbound substream to /foo/bar/1.0.0" would show up as:

Mar 31 15:38:36.158 INFO libp2p{peer_id=QmbhXbFML9JbBHr96eHKXWbc7t2fWCLdnh6aNbzFAoz2rw} app::network: Upgrading inbound substream to /foo/bar/1.0.0

In this example, libp2p is a tracing scope and peer_id is a variable inside that scope.

Scopes also have log levels. A "trace" level scope will not show up as part of an "info" message etc.

Note: tracing is maintained by the folks from tokio but works totally independent of the tokio runtime.

The text was updated successfully, but these errors were encountered:

tomaka · 2020-04-29T12:45:00Z

I'm personally not a fan of forcing these kind of additions into the code. For example I already strongly dislike the fact that we're using the log crate.

But what I dislike is the "forcing" part. I'm in favour of adding (optional) wrappers around ProtocolsHandler and other traits.
In other words, the user has to explicitly opt-in these measurements.

thomaseizinger · 2020-05-02T04:34:05Z

In other words, the user has to explicitly opt-in these measurements.

Would a disabled-by-default cargo feature also fulfill this requirement? (Assuming it doesn't add too much of a maintenance burden to the codebase.)

I'm in favour of adding (optional) wrappers around ProtocolsHandler and other traits.

From what I understand, the main additions of tracing spans and events would have to happen in libp2p-core as it is the part of the code that polls other components and has the necessary contextual information. Not sure if wrappers can achieve the same thing.

I'll try to get up a draft PR to show what I mean.

folex · 2020-11-26T15:33:09Z

It would be totally awesome if I could somehow mark whole execution flow with some generated ID, and then figure out which request triggered what, and for how long it spent where. But I imagine it would require big rehaul of the libp2p-core codebase, which is unfortunate at best.

@thomaseizinger what do you have ( did? :) ) in mind, mark all tracing events with peer_id from inject_event, or what kind of info would you add?

thomaseizinger · 2020-12-01T23:18:51Z

@thomaseizinger what do you have ( did? :) ) in mind, mark all tracing events with peer_id from inject_event, or what kind of info would you add?

Back when I created this ticket, I was primarily interested in the PeerId being included in a Span for all the code within libp2p that is implicitly scoped to a single peer: ProtocolHandlers and upgrades. Once you reach the NetworkBehaviour abstraction level, the PeerId is provided to you as a parameter anyway and hence this would not be necessary.

Unfortunately, I never got around to follow up with a draft PR 😅

I think it might be possible to achieve this with wrappers around components that are passed to the SwarmBuilder. However, I am not familiar enough with the internals of libp2p to point out where exactly we would have to hook ourselves into.

Conceptually, what we need is:

Access to the state at which point we have already learned the PeerIdof the other party. I think this implies that we are past the authentication stage. Maybe a custom upgrade that wraps any other authentication upgrade is enough?
Once we have the state of the PeerId + connection, we would need to keep passing on wrappers as the connection flows through libp2p-core and swarm and create the necessary spans.

eserilev · 2023-08-01T10:00:46Z

Hello, i listened in on the maintainers call today and heard you guys mention this issue. From what I understood, adding tracing should be relatively straightforward and can be done in chunks. I think it was also mentioned that it would be nice to have this tracing specifically for Kademlia, so maybe thats a good place to start? Thomas also provided this PR firezone/firezone#1741 as an example of how he implemented tracing in a different repo, so this could serve as a nice blueprint to work off from.

I'd really like to work on this feature. I have virtually no experience with the inner workings of libp2p, but maybe a task like this would be a good introduction?

thomaseizinger · 2023-08-01T10:51:00Z

I'd really like to work on this feature.

Great! Thank you :)

I think a useful place to start is here: https://github.com/libp2p/rust-libp2p/blob/master/swarm/src/connection.rs

This is the main state machine that drives each connection. It has the context of which peer we are connected to and what connection ID we assigned to it. I would start by capturing this in an "error" span.

Following from that, you can change one of the protocols from log to tracing and modify an example or test to initialize a tracing subscriber instead of a log subscriber. You should then see those scopes in the logs.

We replace `log` with `tracing` across the codebase. Where possible, we make use of structured logging now instead of templating strings. `tracing` offers the ability to also record "span"s. A span lasts until its dropped and describes the entire duration that it is active for. All logs (in `tracing` term "events") are hierarchically embedded in all parent-spans). We introduce several spans: - On debug level: One for `new_outgoing_connection`, `new_incoming_connection` and `new_established_connection` - On debug level: `Connection::poll`, `Swarm::poll` and `Pool::poll` - On trace level: `NetworkBehaviour::poll` for each implementation of `NetworkBehaviour` - On trace level: `ConnectionHandler::poll` for each implementation of (protocol) `ConnectionHandler`s The idea here is that logging on debug level gives you a decent overview of what the system is doing. You get spans for the duration of connections and how often each connection gets polled. Dropping down to trace level gives you an extremely detailed view of how long each individual `ConnectionHandler` was executed as part of `poll` which could be used for detailed analysis on how busy certain handlers are. Most importantly, simply logging on `info` does not give you any spans. We consider `info` to be a good default that should be reasonably quiet. Resolves #1533. Pull-Request: #4282.

We replace `log` with `tracing` across the codebase. Where possible, we make use of structured logging now instead of templating strings. `tracing` offers the ability to also record "span"s. A span lasts until its dropped and describes the entire duration that it is active for. All logs (in `tracing` term "events") are hierarchically embedded in all parent-spans). We introduce several spans: - On debug level: One for `new_outgoing_connection`, `new_incoming_connection` and `new_established_connection` - On debug level: `Connection::poll`, `Swarm::poll` and `Pool::poll` - On trace level: `NetworkBehaviour::poll` for each implementation of `NetworkBehaviour` - On trace level: `ConnectionHandler::poll` for each implementation of (protocol) `ConnectionHandler`s The idea here is that logging on debug level gives you a decent overview of what the system is doing. You get spans for the duration of connections and how often each connection gets polled. Dropping down to trace level gives you an extremely detailed view of how long each individual `ConnectionHandler` was executed as part of `poll` which could be used for detailed analysis on how busy certain handlers are. Most importantly, simply logging on `info` does not give you any spans. We consider `info` to be a good default that should be reasonably quiet. Resolves libp2p#1533. Pull-Request: libp2p#4282.

thomaseizinger mentioned this issue Oct 5, 2022

core/transport: Extend MultiaddrNotSupported error #2980

Open

thomaseizinger added the help wanted label Aug 1, 2023

thomaseizinger mentioned this issue Aug 1, 2023

Breaking API changes and other feedback requests #3202

Open

eserilev mentioned this issue Aug 1, 2023

feat: introduce tracing #4282

Merged

4 tasks

thomaseizinger assigned eserilev Sep 28, 2023

thomaseizinger changed the title ~~Consider using tracing for improved diagnostics~~ Use tracing for improved diagnostics Nov 1, 2023

mergify bot closed this as completed in #4282 Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `tracing` for improved diagnostics #1533

Use `tracing` for improved diagnostics #1533

thomaseizinger commented Mar 31, 2020 •

edited

Loading

tomaka commented Apr 29, 2020 •

edited

Loading

thomaseizinger commented May 2, 2020 •

edited

Loading

folex commented Nov 26, 2020

thomaseizinger commented Dec 1, 2020

eserilev commented Aug 1, 2023

thomaseizinger commented Aug 1, 2023

Use tracing for improved diagnostics #1533

Use tracing for improved diagnostics #1533

Comments

thomaseizinger commented Mar 31, 2020 • edited Loading

tomaka commented Apr 29, 2020 • edited Loading

thomaseizinger commented May 2, 2020 • edited Loading

folex commented Nov 26, 2020

thomaseizinger commented Dec 1, 2020

eserilev commented Aug 1, 2023

thomaseizinger commented Aug 1, 2023

Use `tracing` for improved diagnostics #1533

Use `tracing` for improved diagnostics #1533

thomaseizinger commented Mar 31, 2020 •

edited

Loading

tomaka commented Apr 29, 2020 •

edited

Loading

thomaseizinger commented May 2, 2020 •

edited

Loading