Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new auto refresh token algorithm #564

Merged
merged 3 commits into from
Jan 23, 2023
Merged

Conversation

hf
Copy link
Contributor

@hf hf commented Dec 20, 2022

Adds a new algorithm that automatically refreshes the user's session.

Problems with the previous algorithm:

  • Complex retry logic
  • Difficult to read and reason about quickly
  • Used in-memory state while session state is not in memory -- local storage, cookies, etc.
  • Could produce endless retries
  • Not multi-tab resilient -- performed refresh from all open tabs at once
  • Not clock-skew resilient

New algorithm:

  • Runs every 10 seconds (AUTO_REFRESH_TICK_DURATION) -- tick
  • Each tick checks the session from the proper storage medium (not possible to have unsynced state)
  • If the expiration is within 3 ticks (AUTO_REFRESH_TICK_THRESHOLD) a refresh is attempted
  • If a refresh is not successful, another one will be attempted on the next tick
  • A failing refresh due to network errors will be retried exponentially but up to a maximum of 10 seconds
  • Auto refresh runs only on visible tabs, not on hidden tabs

Copy link
Member

@kangmingtay kangmingtay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in-memory state while session state is not in memory -- local storage, cookies, etc.

iirc, the previous algo didn't do this though, also how does the new approach change this?

Not clock-skew resilient

just based off reading the code, i don't see how the new approach helps to solve this issue either

Could produce endless retries

the new algo seems like it can produce endless retries too if the isRetryable function passed into retryable is incorrect right?

Just to clarify, if the AUTO_REFRESH_TICK_THRESHOLD = 3 and the AUTO_REFRESH_TICK_DURATION = 10s, then 3 ticks would be equivalent to 30s right? I'm not entirely convince how moving to the concept of "ticks" would change how things work because we're just creating a new unit of time (every 10seconds is 1 tick) and this complexity would have to be passed down to the developer initialising the client lib which seems unnecessary.

If you haven't tested this yet, it might be worth referencing this PR and testing it manually to see if it can handle those scenarios.

src/lib/helpers.ts Show resolved Hide resolved
src/GoTrueClient.ts Show resolved Hide resolved
@hf
Copy link
Contributor Author

hf commented Dec 21, 2022

Used in-memory state while session state is not in memory -- local storage, cookies, etc.

iirc, the previous algo didn't do this though, also how does the new approach change this?

The previous algorithm started a timeout when a token was refreshed. The timeout is an object in RAM while the access and refresh tokens are objects in local storage (or equivalent). So if the underlying data changes, the timeout would fire based off of wrong assumptions.

With this algorithm, the source-of-truth (local storage or equivalent) is being checked on each tick which allows other independent processes of the tabs to refresh a token without causing race conditions. For example, if you have two open tabs and one does a refresh of the session (with an MFA step up) the other tab in the background will now automatically synchronize. Previously the timeout would fire regardless of the concurrently changed data.

Not clock-skew resilient

just based off reading the code, i don't see how the new approach helps to solve this issue either

This algorithm is skew resilient because it doesn't rely on having an exactly synchronized clock. Ticks are wall time independent, meaning that the wall clock can change between ticks while they still run on 10 second intervals. Furthermore, the tick threshold defines the maximum clock skew acceptable and can easily be changed without running into other clock timing issues.

Could produce endless retries

the new algo seems like it can produce endless retries too if the isRetryable function passed into retryable is incorrect right?

Think of retryable as a separate helper function. The retry mechanism on refresh token is only invoked for a maximum duration of 10 seconds in very specific cases (where there's a fetch error -- typically a network issue). The previous algorithm was problematic because it was easy to make a bug (due to its complexity and duplication).

Just to clarify, if the AUTO_REFRESH_TICK_THRESHOLD = 3 and the AUTO_REFRESH_TICK_DURATION = 10s, then 3 ticks would be equivalent to 30s right? I'm not entirely convince how moving to the concept of "ticks" would change how things work because we're just creating a new unit of time (every 10seconds is 1 tick) and this complexity would have to be passed down to the developer initialising the client lib which seems unnecessary.

There's nothing passed to the developer. The API and behavior is equivalent. This PR just adds the ability to non-browser developers to properly initialize auto refresh behavior. For example, if you're building a React Native app there's no window visbilitychanged event; so even when your screen is off or the app is in the background a refresh timer would fire. Now we're giving those folks the ability to pause auto refresh when the app is in a background state.

Browsers remain with the same behavior, and this time it's better because it automatically pauses when the tab goes into the hidden state; preventing all open tabs from refreshing with the same token at once.

If you haven't tested this yet, it might be worth referencing this PR and testing it manually to see if it can handle those scenarios.

Yes.

src/GoTrueClient.ts Outdated Show resolved Hide resolved
@J0
Copy link
Contributor

J0 commented Jan 4, 2023

The timeout is an object in RAM while the access and refresh tokens are objects in local storage (or equivalent). So if the underlying data changes, the timeout would fire based off of wrong assumptions.

Do you mind expanding a little more on how the assumptions would change and lead to the timeout firing based off wrong assumptions? Under the impression that the timeout object would still reference objects in local storage (or equivalent) regardless of where it's located

This algorithm is skew resilient because it doesn't rely on having an exactly synchronized clock. Ticks are wall time independent, meaning that the wall clock can change between ticks while they still run on 10 second intervals. Furthermore, the tick threshold defines the maximum clock skew acceptable and can easily be changed without running into other clock timing issues.

Makes sense - quick sanity check - so does this mean that if we switch to 5 second intervals the tolerance would be 5 seconds? (e.g. if NTP indicates it is 10:10:10 and our clock is 4 seconds ahead at 10:10:14 we'd still be in sync albeit with the auto refresh firing 4s ahead of NTP)

@hf
Copy link
Contributor Author

hf commented Jan 4, 2023

The timeout is an object in RAM while the access and refresh tokens are objects in local storage (or equivalent). So if the underlying data changes, the timeout would fire based off of wrong assumptions.

Do you mind expanding a little more on how the assumptions would change and lead to the timeout firing based off wrong assumptions? Under the impression that the timeout object would still reference objects in local storage (or equivalent) regardless of where it's located

Think about a scenario where you have 2 tabs open. Both tabs have the old process for auto refresh started. The user signs in on tab A and a timeout is scheduled in 1 hour to refresh the token. But then the user switches to tab B (where they're already logged in and because of this, tab B also has a timeout scheduled in 1 hour). Now imagine the user clicks sign out on tab B, which clears its timeout and the global local storage. But tab A's timeout is still scheduled, even though there's nothing to refresh.

Thus we have two ways of solving this problem: explicitly synchronizing the state in all open tabs (which is difficult, error-prone and probably less maintainable), or implicitly by never actually having distributed state i.e. all tabs use local storage for all operations and don't keep any derived state in RAM.

@hf hf force-pushed the hf/add-new-auto-refresh-alg branch from 3138c21 to 62df471 Compare January 5, 2023 18:32
Co-authored-by: Joel Lee <[email protected]>
@hf hf merged commit 013afae into master Jan 23, 2023
@hf hf deleted the hf/add-new-auto-refresh-alg branch January 23, 2023 10:10
@github-actions
Copy link
Contributor

🎉 This PR is included in version 2.10.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

hf added a commit that referenced this pull request Jan 31, 2023
In #564 `setInterval` started being used, however the existence of this
type of object marks a NodeJS process as never finished which makes
tests hang. By calling
[`Timeout#unref()`](https://nodejs.org/api/timers.html#timeoutunref) the
interval will continue to run but not hang the process.

Fixes #597.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants