-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Status Updater Retrying on Failures #1062
Conversation
Ran into a tricky bug and got help from Saylor where the fake client we use in tests may not be honoring a canceled context and was still getting and updating statuses even after having its context canceled. This was a problem because when the context is canceled, the blocking |
5b7ad5d
to
5b1c7e8
Compare
5b1c7e8
to
baa32c2
Compare
Manually tested that the issue found here: #563 still no longer occurs. |
A couple things I'm a bit unsure of:
|
I wonder if we want to start a new pattern where we name the unit test files |
I think naming the unit test files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few small nits!
2bb8c7c
to
1149c6f
Compare
Proposed changes
Problem: NKG will not retry on status update failure, thus there is a chance that some resources will not have up-to-do statuses.
Solution: Add retry logic when status update fails with a small exponential backoff after each retry. Also, added logic to allow for a graceful exit of the status updater when the NKG pod context is cancelled.
Testing: Manually tested that the issue found here: #563 no longer occurs. Added unit tests for retry logic. Updated tests so they correctly test that when the context is canceled, the status updater will no longer update statuses.
Please focus on (optional): Feedback on how many attempts the status updater should retry before returning an error and moving on. Also, what the starting backoff sleep time should be.
Closes #1016
Checklist
Before creating a PR, run through this checklist and mark each as complete.