Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: regulation-worker changes for oauth destinations #2730

Merged
merged 9 commits into from
Nov 28, 2022

Conversation

sanpj2292
Copy link
Contributor

@sanpj2292 sanpj2292 commented Nov 24, 2022

Description

Initially in regulation-worker for OAuth destinations as soon as the refresh token request is complete, we would retry the regulation worker job recursively. It is possible that the recursion happens for infinite number of times(according to source-code but in actual case it can be rarely seen).
To prevent the risk of recursive calls happening infinite number of times for a job. We want to restrict it to only once
This PR includes this change.

Several other things included

  • We have included a configurable maxRetryAttempts for OAuth case
  • Included a stat to help us understand if the request to control plane(as part of the OAuth flow) is timing out

Notion Ticket

Notion

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@codecov
Copy link

codecov bot commented Nov 24, 2022

Codecov Report

❗ No coverage uploaded for pull request base (master@9f006e5). Click here to learn what that means.
Patch coverage: 75.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #2730   +/-   ##
=========================================
  Coverage          ?   46.74%           
=========================================
  Files             ?      298           
  Lines             ?    48951           
  Branches          ?        0           
=========================================
  Hits              ?    22881           
  Misses            ?    24605           
  Partials          ?     1465           
Impacted Files Coverage Δ
regulation-worker/internal/delete/api/api.go 67.53% <57.14%> (ø)
services/oauth/oauth.go 53.88% <76.92%> (ø)
regulation-worker/cmd/main.go 68.25% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

regulation-worker/internal/delete/api/api.go Outdated Show resolved Hide resolved
regulation-worker/internal/delete/api/api.go Outdated Show resolved Hide resolved
regulation-worker/internal/delete/api/api.go Outdated Show resolved Hide resolved
regulation-worker/internal/delete/api/api.go Outdated Show resolved Hide resolved
- update error message to include actual response
- update naming
- making maxRetryAttempts as configurable
- include jobId in logs
// prepares payload based on (job,destDetail) & make an API call to transformer.
// gets (status, failure_reason) which is converted to appropriate model.Error & returned to caller.
func (api *APIManager) Delete(ctx context.Context, job model.Job, destination model.Destination) model.JobStatus {
func (api *APIManager) deleteWithRetry(ctx context.Context, job model.Job, destination model.Destination, retryAttempts int) model.JobStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retryAttempts -> retryAttempt or retryAttemptCount or something like makes more sense as it is indicating, up till how many attempts happened right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can go either way:

  • count how many retries happened until now
  • current retry number

I'd prefer the latter.

Comment on lines +85 to +86
OAuth: OAuth,
MaxOAuthRefreshRetryAttempts: config.GetInt("RegulationWorker.oauth.maxRefreshRetryAttempts", 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not tie this variable to OAuth as it's a delete retry we are doing rather than a "refresh retry".
Moreover, we might retry a delete request due to other failure scenarios that could possibly succeed in the next retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry only is required for the OAuth case in order to prevent the case where in a failed job is retried after the token's validity period.


if isOAuthEnabled && isTokenExpired(jobResp) {
if isOAuthEnabled && isTokenExpired(jobResp) && currentOauthRetryAttempt < api.MaxOAuthRefreshRetryAttempts {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the jobStatus will be when isOAuthEnabled && isTokenExpired && currentOauthRetryAttempt >= api.MaxOAuthRefreshRetryAttempts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever is calculated as part of getJobStatus() that we got in the last retry(when currentOauthRetry = api.MaxOAuthRefreshRetryAttempts - 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a retry mechanism such that regulation-manager would provide the already failed jobs to regulation-worker to send it to destinaiton, so that way I feel it should be ok. Let me know if you think otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to rephrase the question:
What will the transformer's HTTP response code be whenever it returns in its response body at least 1 entry with authErrorCategory=REFRESH_TOKEN? Is it deterministic or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever atleast one entry of authErrorCategory = REFRESH_TOKEN is present in response body, we will return a status-code of 500 as transformer http status-code as of today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say deterministic, are you asking about how do we identify perpetual refresh token failures, I mean the scenario wherein even after multiple retries(when api.MaxOAuthRefreshRetryAttempts > 1), we are still getting refresh_token error from transformer or may be it is something else ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to make sure that the jobStatus will be model.JobStatusFailed if the token refresh fails or we have reached to maximum refresh attempts :)

@sanpj2292 sanpj2292 requested a review from atzoum November 25, 2022 09:26
@sanpj2292 sanpj2292 merged commit 0ed5a82 into master Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants