Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement incremental backoff and jitter #1431

Closed
deanna-lad opened this issue Jun 9, 2022 · 2 comments · Fixed by #1613
Closed

Implement incremental backoff and jitter #1431

deanna-lad opened this issue Jun 9, 2022 · 2 comments · Fixed by #1613
Assignees
Labels
enhancement New feature or improved functionality.

Comments

@deanna-lad
Copy link

deanna-lad commented Jun 9, 2022

Please see the example implementation in js here ably/ably-js#997

Incremental backoff and jitter was added to the features spec in ably/docs#1445

To implement this we need to change the retry behaviour for:

  • When the client connection is in the DISCONNECTED state
  • When a RealtimeChannel is in the SUSPENDED state

The new retry behaviour is specified in RTB1 of the features spec.

RTB1a

The backoff coefficient described in RTB1a can be calculated in javascript like so:

function getBackoffCoefficient(n) {
  return Math.min((n + 2) / 3, 2);
}

RTB1b

The jitter coefficient described in RTB1b can be calculated in javascript like so:

function getJitterCoefficient() {
  return 1 - Math.random() * 0.2;
}

RTB1

The overall retry time calculation should look like:

function getRetryTime(initValue, n) {
  return initValue * getBackoffCoefficient(n) * getJitterCoefficient();
}

The following code can be used to test it:

[1, 2, 3, 4, 5].forEach((n) => {
  console.log(getRetryTime(initValue, n));
});

Which, with an initValue of 15 (seconds) should print something like:

13.917470451245713
18.415226855678757
20.444851032898747
26.650729887092275
27.803382948778786

┆Issue is synchronized with this Jira Story by Unito

@deanna-lad deanna-lad added the enhancement New feature or improved functionality. label Jun 9, 2022
lawrence-forooghian added a commit that referenced this issue Mar 30, 2023
We’ll use this in the implementation of #1431.
lawrence-forooghian added a commit that referenced this issue Mar 30, 2023
lawrence-forooghian added a commit that referenced this issue Mar 30, 2023
I’m going to add further dependencies to this in the implementation
of #1431.
lawrence-forooghian added a commit that referenced this issue Mar 31, 2023
We’ll use this in the implementation of #1431.
lawrence-forooghian added a commit that referenced this issue Mar 31, 2023
We’ll use this in the implementation of #1431.

Here, we add an implementation of this protocol which returns a constant
delay. This represents the current (i.e. pre-RTB1) behaviour. I’ll
subsequently open a refactoring PR to actually make use of this
implementation.
lawrence-forooghian added a commit that referenced this issue Mar 31, 2023
This provides an implementation of ARTRetryDelayCalculator which uses
the incremental backoff and jitter rules of RTB1. We’ll use it in the
implementation of #1431.
lawrence-forooghian added a commit that referenced this issue Mar 31, 2023
I’m going to add further dependencies to this in the implementation
of #1431.
lawrence-forooghian added a commit that referenced this issue Mar 31, 2023
We’ll use this in the implementation of #1431.

Here, we add an implementation of this protocol which returns a constant
delay. This represents the current (i.e. pre-RTB1) behaviour. I’ll
subsequently open a refactoring PR to actually make use of this
implementation.
lawrence-forooghian added a commit that referenced this issue Mar 31, 2023
This provides an implementation of ARTRetryDelayCalculator which uses
the incremental backoff and jitter rules of RTB1. We’ll use it in the
implementation of #1431.
lawrence-forooghian added a commit that referenced this issue Mar 31, 2023
I’m going to add further dependencies to this in the implementation
of #1431.
lawrence-forooghian added a commit that referenced this issue Mar 31, 2023
I’m going to add further dependencies to this in the implementation
of #1431.
lawrence-forooghian added a commit that referenced this issue Apr 2, 2023
In the upcoming implementation of #1431, I’m going to add further
(private) information to the ARTConnectionStateChange object emitted by
an ARTRealtime instance. This information needs to get from the place
that it is generated, all the way to the ARTRealtime method that emits
the state change. This method is currently named
transition:withErrorInfo:, and if we decide we want to emit further
information then this method will need to gain new parameters. Hence,
everything that (directly or indirectly) calls this method will need to
be updated to accept these new parameters (or to provide overrides that
only accept a subset of these parameters).

So, instead, bundle up all of the metadata related to a connection state
change request into a single object, so that as we expand this metadata
we don’t need to keep changing method signatures (except in the
initializers for this metadata, which can easily gain convenience
initializers to avoid having to change existing code if we wish).

(Further down the line, we may also wish to add some sort of request ID
to this metadata, which can be logged at the point where the request is
generated and the point where it is handled, so that our logs give us
richer information about why, for example, the library emitted a given
state change.)
lawrence-forooghian added a commit that referenced this issue Apr 2, 2023
lawrence-forooghian added a commit that referenced this issue Apr 2, 2023
This is similar to bdaa458, but one layer of indirection further out.
The aim (needed for the upcoming implementation of #1431) is for the
ARTRealtimeChannelSuspended case in ARTRealtimeChannelInternal’s
transition:withMetadata: method to be able to add metadata to the
emitted channel state change object. (To achieve this, we still need to
update -reattachWithReason: to accept a request object, which we’ll do
next.)

The convenience initializers of ARTAttachRequest match the removed
overloads of internalAttach.
lawrence-forooghian added a commit that referenced this issue Apr 3, 2023
In the upcoming implementation of #1431, I’m going to add further
(private) information to the ARTConnectionStateChange object emitted by
an ARTRealtime instance. This information needs to get from the place
that it is generated, all the way to the ARTRealtime method that emits
the state change. This method is currently named
transition:withErrorInfo:, and if we decide we want to emit further
information then this method will need to gain new parameters. Hence,
everything that (directly or indirectly) calls this method will need to
be updated to accept these new parameters (or to provide overrides that
only accept a subset of these parameters).

So, instead, bundle up all of the metadata related to a connection state
change request into a single object, so that as we expand this metadata
we don’t need to keep changing method signatures (except in the
initializers for this metadata, which can easily gain convenience
initializers to avoid having to change existing code if we wish).

(Further down the line, we may also wish to add some sort of request ID
to this metadata, which can be logged at the point where the request is
generated and the point where it is handled, so that our logs give us
richer information about why, for example, the library emitted a given
state change.)
lawrence-forooghian added a commit that referenced this issue Apr 3, 2023
This is similar to 5df1a9d, but one layer of indirection further out.
The aim (needed for the upcoming implementation of #1431) is for the
ARTRealtimeChannelSuspended case in ARTRealtimeChannelInternal’s
transition:withMetadata: method to be able to add metadata to the
emitted channel state change object. (To achieve this, we still need to
update -reattachWithReason: to accept a metadata object, which we’ll do
next.)

The convenience initializers of ARTAttachRequestMetadata match the
removed overloads of internalAttach.
lawrence-forooghian added a commit that referenced this issue Apr 3, 2023
We’ll use this in the implementation of #1431, in which the time we need
to wait before performing a retry of an operation varies depending on
how many times we have already retried.
lawrence-forooghian added a commit that referenced this issue Apr 3, 2023
This is just a refactor, which lays some groundwork for #1431. It’ll
only become useful when we implement #1431, at which point we’ll use
BackoffRetryDelayCalculator instead of ConstantRetryDelayCalculator.
lawrence-forooghian added a commit that referenced this issue Apr 4, 2023
We’ll use this when testing our upcoming implementation of #1431.
lawrence-forooghian added a commit that referenced this issue Jun 28, 2023
lawrence-forooghian added a commit that referenced this issue Jun 29, 2023
We’ll use this when testing the incremental backoff and jitter behaviour
that we’ll implement in #1431.
lawrence-forooghian added a commit that referenced this issue Jun 29, 2023
Part of #1431.

TODO how on earth to test resetting of sequence

Co-authored-by: Lawrence Forooghian <[email protected]>
lawrence-forooghian added a commit that referenced this issue Jul 3, 2023
As it stands, this class does nothing of worth. But when we introduce
backoff and jitter in #1431, I want to be able to test the logic that
resets the retry count when the channel transitions out of the SUSPENDED
<-> ATTACHING sequence, hence a separate class that I can unit test.
lawrence-forooghian added a commit that referenced this issue Jul 3, 2023
lawrence-forooghian added a commit that referenced this issue Jul 3, 2023
lawrence-forooghian added a commit that referenced this issue Jul 3, 2023
As it stands, this class does nothing of worth. But when we introduce
backoff and jitter in #1431, I want to be able to test the logic that
resets the retry count when the channel transitions out of the SUSPENDED
<-> ATTACHING sequence, hence a separate class that I can unit test.
lawrence-forooghian added a commit that referenced this issue Jul 3, 2023
We’ll use this when testing the incremental backoff and jitter behaviour
that we’ll implement in #1431.
lawrence-forooghian added a commit that referenced this issue Jul 3, 2023
lawrence-forooghian added a commit that referenced this issue Jul 3, 2023
lawrence-forooghian added a commit that referenced this issue Jul 4, 2023
lawrence-forooghian added a commit that referenced this issue Jul 5, 2023
lawrence-forooghian added a commit that referenced this issue Jul 5, 2023
This bug describes how the library will sometimes perform an extra retry
when it should transition to the SUSPENDED state. This can cause
test__059 to fail depending on the exact chosen value of
connectionStateTtl. In #1431 I will be changing the value of
connectionStateTtl that this test uses, and the value I want to use does
indeed cause the test to fail. So, adjust the test to handle the case
where an extra retry occurs. We should remove this workaround once we
fix #1782.
lawrence-forooghian added a commit that referenced this issue Jul 5, 2023
This changes test__059 so that it’s guaranteed we’ll observe at least a
specified number of retries. This will become particularly useful when
we implement #1431, which will introduce variable retry delays, so that
we can check that the correct retry delay is being used for a given
retry.
lawrence-forooghian added a commit that referenced this issue Jul 5, 2023
Update test__059 to make assertions about the retryIn property of all
DISCONNECTED state changes, not just the first one. This is in
preparation for the retry delay becoming variable in #1431.
lawrence-forooghian added a commit that referenced this issue Jul 5, 2023
Refactor test__061 to derive the number of expected retries from the
retry delays. This is in preparation for the retry delay becoming
variable in #1431.
lawrence-forooghian added a commit that referenced this issue Jul 5, 2023
As we did for channel attach retries in e3459ff.

Resolves #1431.

Co-authored-by: Lawrence Forooghian <[email protected]>
lawrence-forooghian added a commit that referenced this issue Jul 11, 2023
As it stands, this class does nothing of worth. But when we introduce
backoff and jitter in #1431, I want to be able to test the logic that
resets the retry count when the channel transitions out of the SUSPENDED
<-> ATTACHING sequence, hence a separate class that I can unit test.
lawrence-forooghian added a commit that referenced this issue Jul 11, 2023
We’ll use this when testing the incremental backoff and jitter behaviour
that we’ll implement in #1431.
lawrence-forooghian added a commit that referenced this issue Jul 11, 2023
lawrence-forooghian added a commit that referenced this issue Jul 11, 2023
This bug describes how the library will sometimes perform an extra retry
when it should transition to the SUSPENDED state. This can cause
test__059 to fail depending on the exact chosen value of
connectionStateTtl. In #1431 I will be changing the value of
connectionStateTtl that this test uses, and the value I want to use does
indeed cause the test to fail. So, adjust the test to handle the case
where an extra retry occurs. We should remove this workaround once we
fix #1782.
lawrence-forooghian added a commit that referenced this issue Jul 11, 2023
This changes test__059 so that it’s guaranteed we’ll observe at least a
specified number of retries. This will become particularly useful when
we implement #1431, which will introduce variable retry delays, so that
we can check that the correct retry delay is being used for a given
retry.
lawrence-forooghian added a commit that referenced this issue Jul 11, 2023
Update test__059 to make assertions about the retryIn property of all
DISCONNECTED state changes, not just the first one. This is in
preparation for the retry delay becoming variable in #1431.
lawrence-forooghian added a commit that referenced this issue Jul 11, 2023
Refactor test__061 to derive the number of expected retries from the
retry delays. This is in preparation for the retry delay becoming
variable in #1431.
lawrence-forooghian added a commit that referenced this issue Jul 11, 2023
As we did for channel attach retries in e3459ff.

Resolves #1431.

Co-authored-by: Lawrence Forooghian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improved functionality.
4 participants