-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base node connectivity starvation - cannot ping active connections or receive metadata #6516
Comments
4 tasks
SWvheerden
pushed a commit
that referenced
this issue
Sep 5, 2024
Description --- Cleared handshake error form the client side by forcefully disconnecting form the peer. Motivation and Context --- This is to syop the phenomenon whereby a peer can dial another peer (`dial-eer XXXXX`) but cannot ping that same peer (`ping-peer XXXXX`). (See #6516) How Has This Been Tested? --- Unit tests pass Long-term system-level testing has to be done to confirm this solved the issue. What process can a PR reviewer use to test or verify this change? --- Review code changes. <!-- Checklist --> <!-- 1. Is the title of your PR in the form that would make nice release notes? The title, excluding the conventional commit tag, will be included exactly as is in the CHANGELOG, so please think about it carefully. --> Breaking Changes --- - [x] None - [ ] Requires data directory on base node to be deleted - [ ] Requires hard fork - [ ] Other - Please specify <!-- Does this include a breaking change? If so, include this line as a footer --> <!-- BREAKING CHANGE: Description what the user should do, e.g. delete a database, resync the chain -->
4 tasks
See PR #6655 |
SWvheerden
added a commit
that referenced
this issue
Nov 1, 2024
#6655) Description --- Added check connections to the p2p services (`MonitorPeersService`). All active connections are pinged on a set (slowish) interval (10 times slower than the _auto ping metadata interval_). The nodes that do not respond timeously on three consecutive iterations with a corresponding pong are disconnected. This will help keep the list of active connections (lazily) up to date. **Edit:** Fixed an error in the liveness service where misbehaving ping peers were never disconnected. The liveness service and monitor peers service work hand in hand. Liveness selects 8 randomly connected peers to obtain metadata from and will disconnect any of those that misbehave after 1 minute (2x ping intervals). The monitor peers service assesses all connected peers at a much slower pace and disconnects non-responsive peers after 15 minutes (10 x 3 ping intervals). Motivation and Context --- See #6516 How Has This Been Tested? --- Performed system-level testing. From the log below we can see that 5 of 41 active peer connections did not respond with a ping. Peer `e19e1454a1e0519866297960ad ` was disconnected because it did not respond three times in a row, ``` 2024-10-29 15:12:07.664466900 [minotari::base_node::monitor_peers] TRACE Found 5 of 41 outbound base node peer connections that did not respond to pings 2024-10-29 15:12:07.664619800 [minotari::base_node::monitor_peers] TRACE Peer e2fa82050c2f7579febafb7e08 stats - (iteration, connected, responsive) [(3, true, true), (4, true, true), (5, true, false)] 2024-10-29 15:12:07.664683300 [minotari::base_node::monitor_peers] DEBUG Disconnecting e19e1454a1e0519866297960ad as the peer is no longer responsive - (iteration, connected, responsive) [(2, true, true), (3, true, false), (4, true, false), (5, true, false)] 2024-10-29 15:12:07.665853300 [minotari::base_node::monitor_peers] TRACE Peer 6ea597117476676d5ddcb18153 stats - (iteration, connected, responsive) [(1, true, true), (2, true, true), (3, true, true), (4, true, true), (5, true, false)] 2024-10-29 15:12:07.665965500 [minotari::base_node::monitor_peers] TRACE Peer a671f812efe5ab14cbb3c1f9f4 stats - (iteration, connected, responsive) [(2, true, true), (3, true, true), (4, true, true), (5, true, false)] 2024-10-29 15:12:07.665997800 [minotari::base_node::monitor_peers] TRACE Peer e336b264e02f611cf4fbf51f22 stats - (iteration, connected, responsive) [(2, true, true), (3, true, true), (4, true, true), (5, true, false)] ``` What process can a PR reviewer use to test or verify this change? --- - Code review - System-level testing <!-- Checklist --> <!-- 1. Is the title of your PR in the form that would make nice release notes? The title, excluding the conventional commit tag, will be included exactly as is in the CHANGELOG, so please think about it carefully. --> Breaking Changes --- - [x] None - [ ] Requires data directory on base node to be deleted - [ ] Requires hard fork - [ ] Other - Please specify <!-- Does this include a breaking change? If so, include this line as a footer --> <!-- BREAKING CHANGE: Description what the user should do, e.g. delete a database, resync the chain --> --------- Co-authored-by: SW van Heerden <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Node 1 to node 3
Node 3 to node 1
The text was updated successfully, but these errors were encountered: