Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update reconnect protocol to deal with half-open connections #276

Merged
merged 8 commits into from
Aug 10, 2021

Conversation

fhcarter
Copy link
Collaborator

@fhcarter fhcarter commented Aug 9, 2021

Fixes #275 -- will require fix to Vantiq/vantiq-Issues#295 to work completely.

Adds reconnect secret to the connection requests. This secret allows a connector to usurp its old connection to Vantiq. This is required when the server fails to notice that the connection between the connector and the server has been unavailable. This is not uncommon in unstable network connection cases. When this happens, the connector's attempts to reconnect are rejected by the server because it (thinks it) already has a connection for the source in question.

This fix, in a parallel with the server side fix, allows the connector, using a reconnect secret to assert that it is the connection the server thinks it knows about. The server will, then, honor the connection & close the old, now defunct, session.

The fix uses parts of the connect extension message not previous used, so old servers will ignore it. And it is optional, so old clients talking to new servers will be fine.

This will allow _this_ client to reconnect to the same source, allowing it to
recover from the _half open_ condition.
@@ -38,6 +38,9 @@
public static final String RESPONSE_ADDRESS_HEADER = "X-Reply-Address";
public static final String PROPERTY_MESSAGE_HEADERS = "messageHeaders";

// Set in parameters to allow reconnect after network half-open situation
public static String RECONNECT_SECRET = "reconnectSecret";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PMD Java] Do not use non-final non-private static fields (view)

Rule Ruleset Priority
MutableStaticState Design 3

References:

You can close this issue if no need to fix it. Learn more.

@@ -29,6 +29,7 @@
public static String testSourceName = null;
public static String testTypeName = null;
public static String testRuleName = null;
public static Boolean testRepeatedConnectsEnabled = null;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PMD Java] Do not use non-final non-private static fields (view)

Rule Ruleset Priority
MutableStaticState Design 3

References:

You can close this issue if no need to fix it. Learn more.

@@ -38,6 +38,9 @@
public static final String RESPONSE_ADDRESS_HEADER = "X-Reply-Address";
public static final String PROPERTY_MESSAGE_HEADERS = "messageHeaders";

// Set in parameters to allow reconnect after network half-open situation
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PMD Java] Too many fields (view)

Rule Ruleset Priority
TooManyFields Design 3

References:

You can close this issue if no need to fix it. Learn more.

* Unique identifier for this loader's class. This is set on load and then left. This
* shared secret allows instances of this client to perform reconnects to the server.
*/
protected final static String clientReconnectSecret = UUID.randomUUID().toString();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Checkstyle] 'static' modifier out of order with the JLS suggestions. (view)

Rule Severity
ModifierOrderCheck info

References:

You can close this issue if no need to fix it. Learn more.

@@ -38,6 +38,9 @@
public static final String RESPONSE_ADDRESS_HEADER = "X-Reply-Address";
public static final String PROPERTY_MESSAGE_HEADERS = "messageHeaders";

// Set in parameters to allow reconnect after network half-open situation
public static String RECONNECT_SECRET = "reconnectSecret";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Checkstyle] Name 'RECONNECT_SECRET' must match pattern '^[a-z][a-zA-Z0-9]*$'. (view)

Rule Severity
StaticVariableNameCheck info

References:

You can close this issue if no need to fix it. Learn more.


if (rootProject.hasProperty("TestRepeatedConnects")) {
systemProperty "TestRepeatedConnects", rootProject.findProperty("TestRepeatedConnects") ?: "empty"
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary until we're working with 1.32 systems

* shared secret allows instances of this client to perform reconnects to the server.
*/
protected final static String clientReconnectSecret = UUID.randomUUID().toString();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual secret. It is generated at class load/initialization time and allows this connector instance to reconnect at will. It will be the same for all sources to which this connector connects, but that's irrelevant -- we just want to prove ownership.

@@ -40,6 +41,8 @@ public static void getProps() {
testSourceName = System.getProperty("EntConTestSourceName", "testSourceName");
testTypeName = System.getProperty("EntConTestTypeName", "testTypeName");
testRuleName = System.getProperty("EntConTestRuleName", "testRuleName");
testRepeatedConnectsEnabled =
Boolean.valueOf(System.getProperty("TestRepeatedConnects", "false"));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary -- we can remove when we're testing mostly against 1.32+ systems

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add an issue reminding us to remove this once 1.32 is out.

assertTrue("Failed to open connection", client.isOpen());
assertTrue("Failed to authenticate", client.isAuthed());
assertTrue("Failed to connect to source (may fail if run against Vantiq version < 1.32)",
client.isConnected());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but just in case, we'll alert...

# This allows for manual testing for connector disconnection recovery.

for ((i = 0; i < 100; i++)) do
echo Attempt $i;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ShellCheck] Double quote to prevent globbing and word splitting. (view)

Rule Severity Code
SC2086 info 2086

References:

You can close this issue if no need to fix it. Learn more.

# Shell script for Mac OS that will turn off then on wifi to monkey with network.
# This allows for manual testing for connector disconnection recovery.

for ((i = 0; i < 100; i++)) do
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ShellCheck] In POSIX sh, arithmetic for loops are undefined. (view)

Rule Severity Code
SC3005 warning 3005

References:

You can close this issue if no need to fix it. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ignored as per spellcheck directive atop file

# Shell script for Mac OS that will turn off then on wifi to monkey with network.
# This allows for manual testing for connector disconnection recovery.

for ((i = 0; i < 100; i++)) do
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ShellCheck] In POSIX sh, ++ is undefined. (view)

Rule Severity Code
SC3018 warning 3018

References:

You can close this issue if no need to fix it. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ignored as per spellcheck directive atop file

@fhcarter
Copy link
Collaborator Author

fhcarter commented Aug 9, 2021

Related Vantiq PR: Vantiq/ag2rs#8032

@fhcarter
Copy link
Collaborator Author

fhcarter commented Aug 9, 2021

Jenkins run

OPCUA failures are linux/amz vs Mac OS.

@fhcarter
Copy link
Collaborator Author

fhcarter commented Aug 9, 2021

@rblumer4, this may be related to the OPC errors. Looks like it all stems from some bug(s) in OpenJDK...

bcgit/bc-java#941 says it's fixed in their 301 release. It's not important to fix this now, just noting for the record.

Copy link
Contributor

@namirfa namirfa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Two minor comments, one about adding a new issue and one about a potential name change. Feel free to ignore.

@@ -38,6 +39,9 @@
public static final String RESPONSE_ADDRESS_HEADER = "X-Reply-Address";
public static final String PROPERTY_MESSAGE_HEADERS = "messageHeaders";

// Set in parameters to allow reconnect after network half-open situation
public static final String RECONNECT_SECRET = "reconnectSecret";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the "reconnect secret" or "connect secret"?

I think it's fine either way, but the secret would be used for both the first connection as well as any other reconnection afterwards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to leave it as is. CONNECT_SECRET sounds like something that you need to connect (and, generally, you don't). It's also invisible to users of the SDK.

I'll think if there's a comment I can add that clarifies things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me!

@@ -40,6 +41,8 @@ public static void getProps() {
testSourceName = System.getProperty("EntConTestSourceName", "testSourceName");
testTypeName = System.getProperty("EntConTestTypeName", "testTypeName");
testRuleName = System.getProperty("EntConTestRuleName", "testRuleName");
testRepeatedConnectsEnabled =
Boolean.valueOf(System.getProperty("TestRepeatedConnects", "false"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add an issue reminding us to remove this once 1.32 is out.

@fhcarter
Copy link
Collaborator Author

Issue to remove stuff is #277

@fhcarter
Copy link
Collaborator Author

fhcarter commented Aug 10, 2021

Jenkins re-run after all the changes...

(no changes)

Comment on lines 43 to 48
// This is sent on any CONNECT_EXTENSION messages. It is used when a connector
// "re-connects" to the server to verify that it's the same connector connecting in
// (otherwise, the reconnect is rejected). So, while it's send on all CONNECT... calls,
// it's use is in only in the reconnect case (that is, usurping an existing connection).
// The name's a bit confusing, but CONNECT_SECRET sounds too much like something you need
// to connect (which, under normal circumstances, you don't).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This is sent on any CONNECT_EXTENSION messages. It is used when a connector
// "re-connects" to the server to verify that it's the same connector connecting in
// (otherwise, the reconnect is rejected). So, while it's send on all CONNECT... calls,
// it's use is in only in the reconnect case (that is, usurping an existing connection).
// The name's a bit confusing, but CONNECT_SECRET sounds too much like something you need
// to connect (which, under normal circumstances, you don't).
// This is sent on any CONNECT_EXTENSION messages. It is used when a connector
// "re-connects" to the server to verify that it's the same connector connecting in
// (otherwise, the reconnect is rejected). So, while it's sent on all CONNECT... calls,
// its use is only in the reconnect case (that is, usurping an existing connection).
// The name's a bit confusing, but CONNECT_SECRET sounds too much like something you need
// to connect (which, under normal circumstances, you don't).

@namirfa
Copy link
Contributor

namirfa commented Aug 10, 2021

Jenkins re-run after all the changes...

(no changes)

(wrong link)

Corrected link

@fhcarter fhcarter merged commit 9a47e7f into master Aug 10, 2021
@fhcarter
Copy link
Collaborator Author

fhcarter commented Aug 10, 2021

Note: For this to fully work, this requires an update to the Vantiq server. This is anticipated in release 1.32.
And has been backported to 1.31.16

@fhcarter fhcarter deleted the fhc-updateReconnect-#275 branch August 10, 2021 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Half-open connection prevents reconnect
2 participants