Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update persistent networking parameters manager. #1762

Merged
merged 7 commits into from
Aug 8, 2023

Conversation

shamil-gadelshin
Copy link
Member

This PR contains two changes to the persistent networking parameters manager:

  • add ignore list to skip address adding (to prevent bootstrap nodes to get there)
  • add address removed event to remove address from Kademlia when the networking parameters manager reports an obsolete address

Code contributor checklist:

@shamil-gadelshin shamil-gadelshin added the networking Subspace networking (DSN) label Aug 4, 2023
@shamil-gadelshin shamil-gadelshin self-assigned this Aug 4, 2023
@@ -62,6 +76,13 @@ pub trait NetworkingParametersRegistry: Send + Sync {

/// Enables Clone implementation for `Box<dyn NetworkingParametersRegistry>`
fn clone_box(&self) -> Box<dyn NetworkingParametersRegistry>;

/// Triggers when we removed the peer address from the permanent storage. Returns optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a bit more specific, we don't really remove it from persistent storage when this event is triggered. Maybe call it on_unreachable or something of that sort?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I like on_unreachable_address but NetworkingParametersRegistry only knows about address removal nothing about the connectivity. I could add the intended usage comment though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it does have a timer and it does essentially check for how long address was unreachable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we need to remove it. Current name is misleading because address is literally not removed from anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll change to on_unreachable_address

last_address
};

address_removed_events.push(address_removed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not calling even handlers from here? Looks like you're collecting them into vector here just to iterate it afterwards. Or you can turn this for_each into filter_map and return iterator of these things without collecting first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier to test this way. I don't expect a lot of elements in this vector.

crates/subspace-networking/src/node_runner.rs Outdated Show resolved Hide resolved
crates/subspace-networking/src/node_runner.rs Outdated Show resolved Hide resolved
Comment on lines 394 to 404
if event.last_address {
self.swarm
.behaviour_mut()
.kademlia
.remove_peer(&event.peer_id);
} else {
self.swarm
.behaviour_mut()
.kademlia
.remove_address(&event.peer_id, &event.address);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, will Kademlia really keep a peer in routing table that we know no address of? Also maybe you can query Kademlia for list of addresses rather than sending last_address with event? Just looks like an awkward API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. However, I think there is a bug here anyway. The problem will arise when we remove known addresses that are unresponsive and remove other addresses (via remove_peer) that we don't know about because we didn't dial them previously. I changed the logic to handle the UnroutablePeer event instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense and I think we should then replace handle_removed_address_event with handle_removed_peer, but even then it seems problematic that we may not remember all the peers in Kademlia anyway, thus unable to remove them. Should known peers be unbounded in memory then and bounded when written to disk (by unbounded I mean bounded by Kademlia externally)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I followed here. I moved remove_peer call from there the UnroutablePeer handler. This way I both remove unreachable addresses and handler unroutable peers. Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that our known peers cache is limited in size. It seems possible that we will have some peers added to Kademlia, but not in known peers. If that is the case, they will never be removed from Kademlia and we do not have event about things evicted from LRU cache of known peers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we derive cache limits in your pending PR from the Kademlia internals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not dynamically, but if we have fixed Kademlia parameters and we know upper bound then yes. I agree this PR is an improvement, I'm just trying to understand whether it is a 100% solution to the problem.

crates/subspace-networking/src/utils.rs Outdated Show resolved Hide resolved
@@ -397,10 +398,16 @@ where
.kademlia
.remove_peer(&event.peer_id);
} else {
// Remove both versions of the address
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really possible to have both? It seems odd to not have identity in the address, how would you establish secure connection to someone you don't know identity of?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it's possible: https://github.com/libp2p/rust-libp2p/blob/f95f39f634d08dad8df72d7ecf92312f9ad2d826/swarm/src/dial_opts.rs#L79

We don't use this feature though. AFAIK, in each case, when we need to establish a connection with the peer we always have a function signature with both PeerId and Multiaddress.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't handle it then if it is not used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant when we use (implicitly) address without P2p(peer_id) protocol we have PeerId known. But it's Kademlia internals. I saw both types of addresses in the logs this is why I remove both types as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure when we use address without Peer ID, this is why I'm confused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use addresses without Peer ID when we add them as self-reported addresses to Kademlia. They become self-reported when they are taken from listening and external addresses lists on the other peer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we do not add our own address to our Kademlia, do we? I don't think I follow you here.

Copy link
Member Author

@shamil-gadelshin shamil-gadelshin Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we do (implicitly) - when we add external addresses and specify listening addresses. When we request providers - Kademlia fills providers with those addresses (both lists chained).

@@ -62,6 +76,13 @@ pub trait NetworkingParametersRegistry: Send + Sync {

/// Enables Clone implementation for `Box<dyn NetworkingParametersRegistry>`
fn clone_box(&self) -> Box<dyn NetworkingParametersRegistry>;

/// Triggers when we removed the peer address from the permanent storage. Returns optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it does have a timer and it does essentially check for how long address was unreachable.

crates/subspace-networking/src/node_runner.rs Outdated Show resolved Hide resolved
@@ -397,10 +398,16 @@ where
.kademlia
.remove_peer(&event.peer_id);
} else {
// Remove both versions of the address
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't handle it then if it is not used

Comment on lines 394 to 404
if event.last_address {
self.swarm
.behaviour_mut()
.kademlia
.remove_peer(&event.peer_id);
} else {
self.swarm
.behaviour_mut()
.kademlia
.remove_address(&event.peer_id, &event.address);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense and I think we should then replace handle_removed_address_event with handle_removed_peer, but even then it seems problematic that we may not remember all the peers in Kademlia anyway, thus unable to remove them. Should known peers be unbounded in memory then and bounded when written to disk (by unbounded I mean bounded by Kademlia externally)?

# Conflicts:
#	crates/subspace-networking/src/behavior/tests.rs
#	crates/subspace-networking/src/bin/subspace-bootstrap-node/main.rs
#	crates/subspace-networking/src/create.rs
Comment on lines 394 to 404
if event.last_address {
self.swarm
.behaviour_mut()
.kademlia
.remove_peer(&event.peer_id);
} else {
self.swarm
.behaviour_mut()
.kademlia
.remove_address(&event.peer_id, &event.address);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not dynamically, but if we have fixed Kademlia parameters and we know upper bound then yes. I agree this PR is an improvement, I'm just trying to understand whether it is a 100% solution to the problem.

@@ -397,10 +398,16 @@ where
.kademlia
.remove_peer(&event.peer_id);
} else {
// Remove both versions of the address
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we do not add our own address to our Kademlia, do we? I don't think I follow you here.

@nazar-pc nazar-pc disabled auto-merge August 8, 2023 11:46
@nazar-pc nazar-pc merged commit 1e781eb into main Aug 8, 2023
@nazar-pc nazar-pc deleted the remove-address-from-kademlia branch August 8, 2023 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
networking Subspace networking (DSN)
Projects
Development

Successfully merging this pull request may close these issues.

2 participants