-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connection resume/recover #167
base: main
Are you sure you want to change the base?
Changes from 1 commit
807237c
f499132
ef36b77
2875633
0501809
6b67256
906e586
9523b6d
248bbc6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -505,13 +505,14 @@ h3(#realtime-connection). Connection | |
** @(RTN15g)@ Connection state is only maintained server-side for a brief period, given by the @connectionStateTtl@ in the @connectionDetails@, see "CD2f":#CD2f. If a client has been disconnected for longer than the @connectionStateTtl@, it should not attempt to resume. Instead, it should clear the local connection state, and any connection attempts should be made as for a fresh connection | ||
*** @(RTN15g1)@ This check should be made before each connection attempt. It is generally not sufficient to merely clear the connection state when moving to @SUSPENDED@ state (though that may be done too), since the device may have been sleeping / suspended, in which case it may have been many hours since it was last actually connected, even though, having been in the @CONNECTED@ state when it was put to sleep, it has only moved out of that state very recently (after waking up and noticing it's no longer connected) | ||
*** @(RTN15g2)@ Another consequence of that is that the measure of whether the client been disconnected for too long (for the purpose of this check) cannot just be whether the client left the @CONNECTED@ state more than @connectionStateTtl@ ago. Instead, it should be whether the difference between the current time and the last activity time is greater than the sum of the @connectionStateTtl@ and the @maxIdleInterval@, where the last activity time is the time of the last known actual sign of activity from Ably per "RTN23a":#RTN23a | ||
*** @(RTN15g3)@ When a connection attempt succeeds after the connection state has been cleared in this way, channels that were previously @ATTACHED@, @ATTACHING@, or @SUSPENDED@ must be automatically reattached, just as if the connection was a resume attempt which failed per "RTN15c7":#RTN15c7 | ||
*** @(RTN15g3)@ When a connection attempt succeeds after the connection state has been cleared in this way, channels that were previously @ATTACHED@, @ATTACHING@, or @SUSPENDED@ must be automatically reattached, just as if the connection was a resume attempt which failed per "RTN15c8":#RTN15c8 | ||
** @(RTN15b)@ In order for a connection to be resumed and connection state to be recovered, the client must have received a @CONNECTED@ ProtocolMessage which will include a private connection key. To resume that connection, the library reconnects to the "websocket":https://ably.com/topic/websockets endpoint with an additional querystring param: | ||
*** @(RTN15b1)@ @resume@ is the @ProtocolMessage#connectionKey@ from the most recent @CONNECTED@ @ProtocolMessage@ received | ||
*** @(RTN15b2)@ This clause has been deleted. It was valid up to and including specification version @1.2@. | ||
** @(RTN15c)@ The system's response to a resume request will be one of the following: | ||
**** @(RTN15c6)@ A @CONNECTED@ @ProtocolMessage@ with the same @connectionId@ as the current client (and no @error@ property). This indicates that the resume attempt was valid. The client library should move all channels that were in the @ATTACHING@, @ATTACHED@, or @SUSPENDED@ states to the @ATTACHING@ state, and initiate an @RTL4c@ attach sequence for each. The connection should also process any messages queued per @RTL6c2@ (there is no need to wait for the attaches to finish before processing queued messages). | ||
**** @(RTN15c7)@ @CONNECTED@ @ProtocolMessage@ with a new @connectionId@ and an @ErrorInfo@ in the @error@ field. In this case, the resume was invalid, and the error indicates the cause. The @error@ should be set as the @reason@ in the @CONNECTED@ event, and as the @Connection#errorReason@. The internal @msgSerial@ counter should be reset so that the first message published to Ably will contain a @msgSerial@ of @0@. The rest of the process is the same as for @RTN16c6@: The client library should move all channels that were in the @ATTACHING@, @ATTACHED@, or @SUSPENDED@ states to the @ATTACHING@ state, and initiate an @RTL4c@ attach sequence for each. The connection should also process any messages queued per @RTL6c2@. | ||
** @(RTN15c)@ The system's response to a resume request (with a non-empty connectionKey) will be one of the following: | ||
**** @(RTN15c6)@ A @CONNECTED@ @ProtocolMessage@ with the same @connectionId@ as the current client (and no @error@ property). This indicates that the resume attempt was valid. | ||
**** @(RTN15c7)@ @CONNECTED@ @ProtocolMessage@ with a new @connectionId@ and an @ErrorInfo@ in the @error@ field. In this case, the resume was invalid, and the error indicates the cause. The @error@ should be set as the @reason@ in the @CONNECTED@ event, and as the @Connection#errorReason@. The internal @msgSerial@ counter should be reset so that the first message published to Ably will contain a @msgSerial@ of @0@. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Spec doesn't mention about value of channel |
||
**** @(RTN15c8)@ Irrespective of success/failure for a resume request, the client library should move all channels that were in the @ATTACHING@, @ATTACHED@, or @SUSPENDED@ states to the @ATTACHING@ state, and initiate an @RTL4c@ attach sequence for each. The connection should also process any messages queued per @RTL6c2@ (there is no need to wait for the attaches to finish before processing queued messages). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But this item is a child of a spec item saying "The system's response to a resume request (with a non-empty connectionKey) will be one of the following:". Every other child is one of the four things that can happen:
Adding an item in the middle that's actually is a shared spec item for the first two of the items doesn't make sense There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, though the first 2 do exhibit a common behavior for attaching channels and processing pending messages. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can remove this if not needed. Though I am not sure if resume failure applies here too as per #167 (comment) |
||
**** @(RTN15c5)@ @ERROR@ @ProtocolMessage@ indicating a failure to authenticate as a result of a token error (see "RTN15h":#RTN15h). The transport will be closed by the server. The spec described in "RTN15h":#RTN15h must be followed for a connection being resumed with a token error | ||
**** @(RTN15c4)@ Any other @ERROR@ @ProtocolMessage@ indicating a fatal error in the connection. The server will close the transport immediately after. The client should transition to the @FAILED@ state triggering all attached channels to transition to the @FAILED@ state as well. Additionally the @Connection#errorReason@ will be set should be set with the error received from Ably | ||
**** @(RTN15c1)@ This clause has been replaced by "@RTN15c6@":#RTN15c6. It was valid up to and including specification version @1.2@. | ||
|
@@ -535,7 +536,7 @@ h3(#realtime-connection). Connection | |
** @(RTN16d)@ The library may wish to test that after a connection has been successfully recovered, the @Connection#id@ should be identical to the @id@ of the connection that was recovered, and @Connection#key@ should have been updated to the @ConnectionDetails#connectionKey@ provided in the @CONNECTED@ @ProtocolMessage@. | ||
** @(RTN16m)@ @Connection#recoveryKey@ is a deprecated property that, when read, contains the same value that would be returned by calling @Connection#createRecoveryKey@. If the implementation languague allows, it should be implemented in a way that does not require that it be recalculated on every message. If this is not possible, it should be recalculated on every incoming message. | ||
*** @(RTN16m1)@ @Connection#recoveryKey@ must be removed on the next major release of the SDK. | ||
** @(RTN16l)@ Recovery failures should be handled identically to resume failures, per "RTN15c7":#RTN15c7, "RTN15c5":#RTN15c5, and "RTN15c4":#RTN15c4. | ||
** @(RTN16l)@ Recovery success/failures should be handled identically to resume, per "RTN15c8":#RTN15c8, "RTN15c5":#RTN15c5, and "RTN15c4":#RTN15c4. | ||
** @(RTN16a)@ This clause has been replaced by "@RTN16i@":#RTN16i. It was valid up to and including specification version @1.2@. | ||
** @(RTN16b)@ This clause has been replaced by "@RTN16g@":#RTN16g and "@RTN16m@":#RTN16m. It was valid up to and including specification version @1.2@. | ||
** @(RTN16c)@ This clause has been replaced by "@RTN16g2@":#RTN16g2. It was valid up to and including specification version @1.2@. | ||
|
@@ -554,7 +555,7 @@ h3(#realtime-connection). Connection | |
* @(RTN19)@ Transport state side effects - when a transport is disconnected for any reason: | ||
** @(RTN19a)@ Any @ProtocolMessage@ that is awaiting an @ACK@/@NACK@ on the old transport will not receive the @ACK@/@NACK@ on the new transport. The client library must therefore resend any @ProtocolMessage@ that is awaiting a @ACK@/@NACK@ to Ably in order to receive the expected @ACK@/@NACK@ for that message (subject to @RTN7c@/@RTN7d@). The Ably service is responsible for keeping track of messages, ignoring duplicates and responding with suitable @ACK@/@NACK@ messages | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Imagine, connection was in In this case, what should happen to ack/nack messages? Are we supposed to treat it as a resume failure and send ack/nack messages with incremental message-serial ? Same we do it for channel attach in case of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, it feels we are attaching channels (in attaching, attached or suspended state ) for almost every type of reconnection 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update - I checked the code, seems we clear There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
they'll have been cleared already per RTN7c
yes: before you're connected for the first time There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Statement says There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update -
Not sure how server actually checks for duplicates : ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update - for the second case, since we are adding ack/nack to pending messages, is it a desired behavior? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is what RTN19a2 is for, which requires the re-send attempt (for a successful resume) to use the same msgSerial. The message id for a realtime connection is implicitly In the case of a resume failure, you're out of luck, so yes you can get duplicates. |
||
*** @(RTN19a1)@ One possible implementation of this requirement would be to add any in-flight messages to the @RTL6c2@ connection-wide queue of messages that will be sent once the connection next becomes @CONNECTED@ | ||
*** @(RTN19a2)@ In the case of an @RTN15c6@ successful resume, the @msgSerial@ of the reattempted @ProtocolMessage@s should remain the same as for the original attempt. In the case of an @RTN15c7@ failed resume, the message must be assigned a new @msgSerial@ from the SDK's internal counter. | ||
*** @(RTN19a2)@ In the case of an @RTN15c6@ successful resume, the @msgSerial@ of the reattempted @ProtocolMessage@s should remain the same as for the original attempt. In the case of an @RTN15c7@ failed resume, the message must be assigned a new @msgSerial@ from the SDK's internal counter ( need clarification). | ||
sacOO7 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
** @(RTN19b)@ If there are any pending channels i.e. in the @ATTACHING@ or @DETACHING@ state, the respective @ATTACH@ or @DETACH@ message should be resent to Ably | ||
* @(RTN23)@ Heartbeats | ||
** @(RTN23a)@ If a transport does not receive any indication of activity on a transport for a period greater than the sum of the @maxIdleInterval@ (which will be sent in the @connectionDetails@ of the most recent @CONNECTED@ message received on that transport) and the @realtimeRequestTimeout@, that transport should be disconnected. Any message (or non-message indicator, see @RTN23b@) received counts as an indication of activity and should reset the timer, not merely heartbeat messages. However, it must be received (that is, sent from the server to the client); client-sent data does not count. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if there was no resume attempt as a part of suspended retry, are we supposed to treat is as resume failed? That means we need to reset
msgSerial
in this case too 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we will get a
new connectionId
but there won't be an error in the errorField. I think this third case is coming for almost every other spec that mentions just a dual behavior for resume success/failure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think resume failure should be
new connectionId
!=prev connectionId
. We shouldn't have extra checks for errors as such. So, if there is no resume attempt (nullkey
), it will be treated as a resume failure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTN15g: "If a client has been disconnected for longer than the @connectionStateTtl@, it should not attempt to resume. Instead, it should clear the local connection state, and any connection attempts should be made as for a fresh connection" -- the msgSerial is part of the connection state, so it will have been reset in this case. (Perhaps the spec should explicitly list the properties that should be cleared, for clarity?)
(see also RTN15g1 which explains why RTN15g is strictly stronger than just doing this on going into SUSPENDED)
There's two cases. Either connection state (including connectionKey and msgSerial) is still there, in which case the library will try and do a resume, and the possible cases are listed in RTN15c. Or the connection state has been cleared, including the msgSerial, in which case the library is just doing a clean connection. The sdk never stops trying to resume but keeps its msgSerial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the
ably-js
, I can findmessageSerial
being cleared as a part ofclearing connection state
.https://github.com/ably/ably-js/blob/d735e23d41b558925b02fab0420832f80a77491a/src/common/lib/transport/connectionmanager.ts#L1087-L1092
But, I couldn't find any references to clearing messageSerial as a part of https://sdk.ably.com/builds/ably/specification/main/features/#RTN8c and https://sdk.ably.com/builds/ably/specification/main/features/#RTN9c