Handle tcp self connection issues #4599
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
Fixes : #4588
If a zeromq socket attempts connection/reconnection to an ephemeral port on the same host that is not yet active (or recovering from being down), as it cycles through ephemeral ports (for src port) it may attempt reconnection to itself (src/dst ip/port match), the result is a failed connection
protocol_error
and no further connection attempts.A good explanation of "tcp self connection" can be found here: https://totozhang.github.io/2016-01-11-tcp-self-connection/
Fix
When a
protocol_error
occurs, check to see if the src and dst clash. If they do allow the connection to FIN and carry on reconnecting on the next ephemeral port the OS picks.Test
I cant think of a good way to add a automatic test as the issue is very random (OS randomizes port selection for src). To force the test locally I restrict the ephemeral range.
Manual tests
before change
after change