Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Fix libp2p identify race #6573

Closed
wants to merge 7 commits into from

Conversation

ivan4th
Copy link
Contributor

@ivan4th ivan4th commented Dec 26, 2024

Motivation

This supersedes #6570

When a P2P test is set up using mocknet.FullMeshConnected(...) and then calls p2p/server.New(...), there's a possible race due to how libp2p identify service works. Namely, when a new peer connects, an active identify request is initiated towards it asking in particular what protocols does the peer support, to which the peer must reply with an identify response message. Also, when SetStreamHandler is called, an identify response message is pushed towards the currently connected peers. In some cases, the following race is possible:

  1. Peer A connects to peer B.
  2. Peer B sends identify request to peer A.
  3. Peer A sends response to the identify request from peer A. This response contains the list of protocols, but that list misses the protocol which is used for Server in p.4, b/c Server is not set up yet.
  4. Peer A sets up a Server which uses SetStreamHandler, and at this point peer A sends pushes an identify response message to peer B, without corresponding identify request.
  5. Peer B receives pushed identify response from A which is sent in p.4, despite it being sent after the response in p.3. This may happen due to how libp2p handles incoming requests. Peer B sets the supported protocols in its ProtoBook for peer A, the list of protocols now contains the protocol specfied for the Server in p.4.
  6. Peer B receives identify response from A which was sent in p.3, despite it being sent before p.4, due to possible reordering. This response also has a list of protocols, but it misses the protocol specified for the Server in p.4. Peer B again sets the supported protocols in its ProtoBook for peer A, but now that list misses the necessary protocol.
  7. Peer B tries to find peers which support the protocol used for the Server in p.4, or connect to peer B using that protocol. This fails b/c ProtoBook entry for peer A contains wrong protocol list.

In addition to this, there's an issue with protocol support checks which Fetcher does to check which peers it can retrieve data from. When a peer is freshly connected, the active identify request towards it may not be finished yet when the fetcher tries to check that peer. Although unlikely, in some cases this may cause valid peers to get ignored.

Description

This change removes the instances of use of mocknet.FullMeshConnected(...) where it may cause identify race, replacing it with mocknet.FullMeshLinked(...) followed by mesh.ConnectAllButSelf() after the Servers are set up. It also fixes fetcher peer selection mechanism so it waits for any pending identification request to finish, similar how to Host.NewStream does that.

Previously, in some tests there was a check for protocol list contents in some tests, but it worked mostly by chance, and now is replaced with delayed mesh connection.

Test Plan

Make sure the tests pass.

Copy link

codecov bot commented Dec 26, 2024

Codecov Report

Attention: Patch coverage is 76.74419% with 10 lines in your changes missing coverage. Please review.

Project coverage is 79.9%. Comparing base (1c27dca) to head (aba67b2).
Report is 2 commits behind head on develop.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
fetch/fetch.go 58.3% 4 Missing and 1 partial ⚠️
p2p/upgrade.go 79.1% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           develop   #6573     +/-   ##
=========================================
- Coverage     79.9%   79.9%   -0.1%     
=========================================
  Files          356     356             
  Lines        47357   47399     +42     
=========================================
+ Hits         37879   37880      +1     
- Misses        7342    7369     +27     
- Partials      2136    2150     +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

When a P2P test is set up using `mocknet.FullMeshConnected(...)` and
then calls `p2p/server.New(...)`, there's a possible race due to how
`libp2p` `identify` service works. Namely, when a new peer connects,
an active `identify` request is initiated towards it asking in
particular what protocols does the peer support, to which the peer
must reply with an identify response message. Also, when
`SetStreamHandler` is called, an identify response message is pushed
towards the currently connected peers. In some cases, the following
race is possible:

1. Peer `A` connects to peer `B`.
2. Peer `B` sends identify request to peer `A`.
3. Peer `A` sends response to the identify request from peer
`A`. This response contains the list of protocols, but that list
misses the protocol which is used for `Server` in p.4, b/c `Server` is
not set up yet.
4. Peer `A` sets up a `Server` which uses `SetStreamHandler`, and at
this point peer `A` sends pushes an identify response message to peer
`B`, _without_ corresponding identify request.
5. Peer `B` receives pushed identify response from `A` which is sent
in p.4, despite it being sent after the response in p.3. This may
happen due to how `libp2p` handles incoming requests. Peer `B` sets
the supported protocols in its `ProtoBook` for peer `A`, the list of
protocols now contains the protocol specfied for the `Server` in p.4.
6. Peer `B` receives identify response from `A` which was sent in p.3,
despite it being sent before p.4, due to possible reordering. This
response also has a list of protocols, but it misses the protocol
specified for the `Server` in p.4. Peer `B` again sets the supported
protocols in its `ProtoBook` for peer `A`, but now that list misses
the necessary protocol.
7. Peer `B` tries to find peers which support the protocol used for
the `Server` in p.4, or connect to peer `B` using that protocol. This
fails b/c `ProtoBook` entry for peer `A` contains wrong protocol list.

In addition to this, there's an issue with protocol support checks
which `Fetcher` does to check which peers it can retrieve data from.
When a peer is freshly connected, the active identify request towards
it may not be finished yet when the fetcher tries to check that peer.
Although unlikely, in some cases this may cause valid peers to get
ignored.

This change removes the instances of use of
`mocknet.FullMeshConnected(...)` where it may cause identify race,
replacing it with `mocknet.FullMeshLinked(...)` followed by
`mesh.ConnectAllButSelf()` after the `Server`s are set up.
It also fixes fetcher peer selection mechanism so it waits for any
pending identification request to finish, similar how to
`Host.NewStream` does that.
@ivan4th ivan4th force-pushed the fix/p2p-identify-race branch from 119c374 to 54840f8 Compare December 26, 2024 21:06
p2p/server/server_test.go Outdated Show resolved Hide resolved
p2p/host.go Outdated Show resolved Hide resolved
p2p/upgrade.go Outdated Show resolved Hide resolved
p2p/host.go Outdated Show resolved Hide resolved
fetch/fetch.go Outdated Show resolved Hide resolved
p2p/server/server_test.go Outdated Show resolved Hide resolved
@fasmat fasmat self-requested a review January 2, 2025 11:38
fetch/fetch.go Outdated Show resolved Hide resolved
fetch/mesh_data_test.go Show resolved Hide resolved
@ivan4th
Copy link
Contributor Author

ivan4th commented Jan 14, 2025

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Jan 14, 2025
## Motivation

This supersedes #6570

When a P2P test is set up using `mocknet.FullMeshConnected(...)` and then calls `p2p/server.New(...)`, there's a possible race due to how `libp2p` `identify` service works. Namely, when a new peer connects, an active `identify` request is initiated towards it asking in particular what protocols does the peer support, to which the peer must reply with an identify response message. Also, when `SetStreamHandler` is called, an identify response message is pushed towards the currently connected peers. In some cases, the following race is possible:

1. Peer `A` connects to peer `B`.
2. Peer `B` sends identify request to peer `A`.
3. Peer `A` sends response to the identify request from peer `A`. This response contains the list of protocols, but that list misses the protocol which is used for `Server` in p.4, b/c `Server` is not set up yet.
4. Peer `A` sets up a `Server` which uses `SetStreamHandler`, and at this point peer `A` sends pushes an identify response message to peer `B`, _without_ corresponding identify request.
5. Peer `B` receives pushed identify response from `A` which is sent in p.4, despite it being sent after the response in p.3. This may happen due to how `libp2p` handles incoming requests. Peer `B` sets the supported protocols in its `ProtoBook` for peer `A`, the list of protocols now contains the protocol specfied for the `Server` in p.4.
6. Peer `B` receives identify response from `A` which was sent in p.3, despite it being sent before p.4, due to possible reordering. This response also has a list of protocols, but it misses the protocol specified for the `Server` in p.4. Peer `B` again sets the supported protocols in its `ProtoBook` for peer `A`, but now that list misses the necessary protocol.
7. Peer `B` tries to find peers which support the protocol used for the `Server` in p.4, or connect to peer `B` using that protocol. This fails b/c `ProtoBook` entry for peer `A` contains wrong protocol list.

In addition to this, there's an issue with protocol support checks which `Fetcher` does to check which peers it can retrieve data from. When a peer is freshly connected, the active identify request towards it may not be finished yet when the fetcher tries to check that peer. Although unlikely, in some cases this may cause valid peers to get ignored.
@spacemesh-bors
Copy link

Build failed:

@ivan4th
Copy link
Contributor Author

ivan4th commented Jan 14, 2025

TestPartition_50_50 flake:

     logger.go:146: 2025-01-14T13:04:37.112Z	INFO	TestPartition_50_50	tests/common.go:60	address needs to be spawned	***"client": "smesher-4", "address": "stest1qqqqqqphl83yguzuswa5qs3txuh0n54trlxl42sphn72c"***
    partition_test.go:177: 
        	Error Trace:	/src/systest/tests/partition_test.go:177
        	            				/src/systest/tests/partition_test.go:228
        	Error:      	Not equal: 
        	            	expected: [202 69 88 85 90 102 21 157 58 207 11 81 31 248 149 206 184 81 26 198 148 34 141 149 236 99 130 126 213 40 89 22]
        	            	actual  : [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

@ivan4th
Copy link
Contributor Author

ivan4th commented Jan 14, 2025

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Jan 14, 2025
## Motivation

This supersedes #6570

When a P2P test is set up using `mocknet.FullMeshConnected(...)` and then calls `p2p/server.New(...)`, there's a possible race due to how `libp2p` `identify` service works. Namely, when a new peer connects, an active `identify` request is initiated towards it asking in particular what protocols does the peer support, to which the peer must reply with an identify response message. Also, when `SetStreamHandler` is called, an identify response message is pushed towards the currently connected peers. In some cases, the following race is possible:

1. Peer `A` connects to peer `B`.
2. Peer `B` sends identify request to peer `A`.
3. Peer `A` sends response to the identify request from peer `A`. This response contains the list of protocols, but that list misses the protocol which is used for `Server` in p.4, b/c `Server` is not set up yet.
4. Peer `A` sets up a `Server` which uses `SetStreamHandler`, and at this point peer `A` sends pushes an identify response message to peer `B`, _without_ corresponding identify request.
5. Peer `B` receives pushed identify response from `A` which is sent in p.4, despite it being sent after the response in p.3. This may happen due to how `libp2p` handles incoming requests. Peer `B` sets the supported protocols in its `ProtoBook` for peer `A`, the list of protocols now contains the protocol specfied for the `Server` in p.4.
6. Peer `B` receives identify response from `A` which was sent in p.3, despite it being sent before p.4, due to possible reordering. This response also has a list of protocols, but it misses the protocol specified for the `Server` in p.4. Peer `B` again sets the supported protocols in its `ProtoBook` for peer `A`, but now that list misses the necessary protocol.
7. Peer `B` tries to find peers which support the protocol used for the `Server` in p.4, or connect to peer `B` using that protocol. This fails b/c `ProtoBook` entry for peer `A` contains wrong protocol list.

In addition to this, there's an issue with protocol support checks which `Fetcher` does to check which peers it can retrieve data from. When a peer is freshly connected, the active identify request towards it may not be finished yet when the fetcher tries to check that peer. Although unlikely, in some cases this may cause valid peers to get ignored.
@spacemesh-bors
Copy link

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title Fix libp2p identify race [Merged by Bors] - Fix libp2p identify race Jan 14, 2025
@spacemesh-bors spacemesh-bors bot closed this Jan 14, 2025
@spacemesh-bors spacemesh-bors bot deleted the fix/p2p-identify-race branch January 14, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants