Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement Cloud communication reliability #31986

Closed
wants to merge 27 commits into from

Conversation

Gustrb
Copy link
Contributor

@Gustrb Gustrb commented Mar 13, 2024

Proposed changes (including videos or screenshots)

When a Workspace Registers with Cloud it gets a set of credentials it uses to generate access tokens and communicate with Cloud. Can read more about how this works in ADR3.

When a workspace owner needs to scale and deploys multiple Rocket.Chat instances to make up their deployment. Each workspace establishes a change stream request to mongo watching settings. Then updates its collection cache locally.

When a workspace goes to use an access token to talk to a cloud service it first evaluates if the token needs renewed. If it does it request a new one. Then updates a setting containing the access token. Then also another setting containing the expire.

The problem occures when latency enters the picture. If due to either longer mongo query or network condition the other workspaces don't see both of these changes happen and update its cache. An instance can be using an expired token instead of renewing because the setting cache hasn't been updated.

We've seen many cases of this happen and typically restarting workspace can resolve.

This PR moves all credential related data to the 'workspace_credentials' collection, and stop using settings.get to fetch the configs that can be changed on multiple server instances

Issue(s)

Steps to test or reproduce

Create a new workspace, on registering, it should create a new record inside the rocketchat_workspace_credentials collection, containing an empty date as expirationDate, an empty string as accessToken and an empty string as scopes.
To test everything else, you must use the cloud functions, not sure how to call them from RC, but you could enter the Marketplace screen, go to paid apps, open one of them, and check if there is a new token in the database

Further comments

This implements: https://adr.rocket.chat/0061

CONN-5

Copy link

changeset-bot bot commented Mar 13, 2024

🦋 Changeset detected

Latest commit: 97ec7cb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 32 packages
Name Type
@rocket.chat/meteor Minor
@rocket.chat/core-typings Minor
@rocket.chat/model-typings Minor
@rocket.chat/models Minor
@rocket.chat/apps Patch
@rocket.chat/core-services Patch
@rocket.chat/cron Patch
@rocket.chat/gazzodown Major
@rocket.chat/livechat Patch
@rocket.chat/rest-typings Minor
@rocket.chat/ui-contexts Major
@rocket.chat/account-service Patch
@rocket.chat/authorization-service Patch
@rocket.chat/ddp-streamer Patch
@rocket.chat/omnichannel-transcript Patch
@rocket.chat/presence-service Patch
@rocket.chat/queue-worker Patch
@rocket.chat/stream-hub-service Patch
@rocket.chat/api-client Patch
@rocket.chat/license Patch
@rocket.chat/omnichannel-services Patch
@rocket.chat/pdf-worker Patch
@rocket.chat/presence Patch
rocketchat-services Patch
@rocket.chat/instance-status Patch
@rocket.chat/fuselage-ui-kit Major
@rocket.chat/ddp-client Patch
@rocket.chat/ui-avatar Major
@rocket.chat/ui-client Major
@rocket.chat/ui-video-conf Major
@rocket.chat/uikit-playground Patch
@rocket.chat/web-ui-registration Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.64%. Comparing base (105a1eb) to head (97ec7cb).
Report is 56 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop   #31986      +/-   ##
===========================================
+ Coverage    55.48%   55.64%   +0.16%     
===========================================
  Files         2371     2426      +55     
  Lines        52062    53356    +1294     
  Branches     10651    10970     +319     
===========================================
+ Hits         28888    29692     +804     
- Misses       20636    21037     +401     
- Partials      2538     2627      +89     
Flag Coverage Δ
e2e 55.23% <ø> (+0.52%) ⬆️
unit 72.74% <ø> (-2.63%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@Gustrb Gustrb requested a review from geekgonecrazy March 13, 2024 18:00
@Gustrb Gustrb marked this pull request as ready for review March 18, 2024 18:40
@Gustrb Gustrb requested review from a team as code owners March 18, 2024 18:40
@Gustrb
Copy link
Contributor Author

Gustrb commented Mar 18, 2024

I think I should write a migration to move the 'workspace_public_key', 'workspace_registration_client_uri', 'workspace_id', 'workspace_name', 'cloud_workspace_access_token', 'cloud_workspace_access_token_expires_at' values from the Settings collection into the workspace_credentials collection right? @geekgonecrazy

@Gustrb Gustrb requested review from tassoevan and d-gubert March 18, 2024 22:29
@geekgonecrazy
Copy link
Contributor

geekgonecrazy commented Mar 25, 2024

I think client id and client secret should stay in settings and use the settings get that bypasses cache. Making the new collection only for access tokens

Actually every thing but workspace access token and its expire, and scopes should stay.

imo the fields on the new collection should be:

_id, scopes, expire, access_token

Then for speed the index could contain all values so it only ever hits mongo index and never needs to hit disk

@Gustrb
Copy link
Contributor Author

Gustrb commented Mar 25, 2024

Oooh, it makes a lot of sense now, will work on it now, thx

Copy link
Contributor

dionisio-bot bot commented Apr 3, 2024

Looks like this PR is not ready to merge, because of the following issues:

  • This PR is missing the 'stat: QA assured' label
  • This PR is missing the required milestone or project

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

@Gustrb Gustrb requested review from ggazzo and tassoevan April 3, 2024 19:23
@Gustrb Gustrb modified the milestones: 6.7, 7.0 Apr 3, 2024
@casalsgh casalsgh modified the milestones: 7.0, 6.8 Apr 5, 2024
@casalsgh casalsgh removed this from the 6.8 milestone Apr 22, 2024
@debdutdeb debdutdeb self-requested a review April 22, 2024 20:29
@AllanPazRibeiro
Copy link
Contributor

Is possible to add some tests here?

Co-authored-by: Matheus Barbosa Silva <[email protected]>
@Gustrb
Copy link
Contributor Author

Gustrb commented May 17, 2024

@AllanPazRibeiro I guess we should find a way to add tests for our cloud services, but I am not sure if it is possible now, do you have any idea on how could we do it?

@AllanPazRibeiro
Copy link
Contributor

@AllanPazRibeiro I guess we should find a way to add tests for our cloud services, but I am not sure if it is possible now, do you have any idea on how could we do it?

Maybe you could add an e2e test for this endpoint apps/meteor/ee/server/apps/communication/rest.ts

@Gustrb Gustrb closed this Jul 22, 2024
@geekgonecrazy geekgonecrazy deleted the feature/cloud-certificate-setting-reliablitity branch July 23, 2024 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants