Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(comms): improve simultaneous connection handling #3697

Conversation

sdbondi
Copy link
Member

@sdbondi sdbondi commented Jan 11, 2022

Description

  • Handle case where connection is disconnected due to simultaneous dial can cause new connection to be removed from connectivity state
  • Remove messaging protocol timeouts, messaging protocol will end correctly when disconnected (ref fix: end stale outbound queue immediately on disconnect, auto retry outbound messages #3664)
  • Remove condition that connection should have 0 substreams for reaping. A connection may only be reaped if it's age is >= 20 minutes and there are 0 handles held for the connection
  • Detect if remote outbound stream is closed, by reading from it (yamux requires reading to determine if the stream is still alive)
  • Ensure disconnect event can only be emitted once per peer connection
  • remove delayed connection close and "will close" event

Motivation and Context

Observed a case where two nodes simultaneously dial and end up with no connections to each other due to a mis-timed peer disconnected event. The connection would have to wait for DHT connectivity to eventually redial to recover. Inactivity timeouts for messaging were a patch for the outbound messaging staying open, but this is not properly detected and handled.

How Has This Been Tested?

Existing tests updated
Manually: two base nodes that previously "lost" their connection due to this bug

@sdbondi sdbondi force-pushed the comms-improve-simultaneous-connections branch 3 times, most recently from d8522d5 to 154086e Compare January 12, 2022 05:53
@sdbondi sdbondi force-pushed the comms-improve-simultaneous-connections branch from 154086e to c564a9a Compare January 12, 2022 05:54
@aviator-app aviator-app bot merged commit 99ba6a3 into tari-project:development Jan 12, 2022
sdbondi added a commit to sdbondi/tari that referenced this pull request Jan 12, 2022
* development:
  fix(comms): improve simultaneous connection handling (tari-project#3697)
  refactor(mempool)!: optimisations,excess sig index,fix weight calc (tari-project#3691)
sdbondi added a commit to sdbondi/tari that referenced this pull request Jan 12, 2022
* development:
  fix(comms): improve simultaneous connection handling (tari-project#3697)
  refactor(mempool)!: optimisations,excess sig index,fix weight calc (tari-project#3691)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants