-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add reconciliation retries for CRs #423
Conversation
🤖 I have created a release *beep* *boop* --- ## [0.22.0](v0.21.1...v0.22.0) (2024-05-22) ### Features * add `expose` service entry for internal cluster traffic ([#356](#356)) ([1bde4cc](1bde4cc)) * add reconciliation retries for CRs ([#423](#423)) ([424b57b](424b57b)) * uds common renovate config ([#391](#391)) ([035786c](035786c)) * uds core docs ([#414](#414)) ([a35ca7b](a35ca7b)) ### Bug Fixes * mismatched exemption/policy for DropAllCapabilities ([#384](#384)) ([d8ec278](d8ec278)) * pepr mutation annotation overwrite ([#385](#385)) ([6e56b2a](6e56b2a)) * renovate config grouping, test-infra ([#411](#411)) ([05fd407](05fd407)) * renovate pepr comment ([#410](#410)) ([a825388](a825388)) ### Miscellaneous * **deps:** update keycloak ([#390](#390)) ([3e82c4e](3e82c4e)) * **deps:** update keycloak to v24.0.4 ([#397](#397)) ([c0420ea](c0420ea)) * **deps:** update keycloak to v24.0.4 ([#402](#402)) ([e454576](e454576)) * **deps:** update neuvector to v9.4 ([#381](#381)) ([20d4170](20d4170)) * **deps:** update pepr to 0.31.0 ([#360](#360)) ([fbd61ea](fbd61ea)) * **deps:** update prometheus-stack ([#348](#348)) ([49cb11a](49cb11a)) * **deps:** update prometheus-stack ([#392](#392)) ([2e656f5](2e656f5)) * **deps:** update uds to v0.10.4 ([#228](#228)) ([1750b23](1750b23)) * **deps:** update uds-k3d to v0.6.0 ([#398](#398)) ([288f009](288f009)) * **deps:** update velero ([#350](#350)) ([e7cb33e](e7cb33e)) * **deps:** update zarf to v0.33.2 ([#394](#394)) ([201a37b](201a37b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Posting additional context on this shift retroactively since this change was rather significant and resulted in a few issues. Retries were introduced here to account for a specific error we ran into during pepr upgrades/pods cycling. With the introduction of service monitor generation in the operator, we have a flow where the watcher pod generates a service monitor that the admission pods then mutate. Across upgrades we encountered intermittent failures due to webhook timeouts - the watcher would fail to apply the service monitors, erroring out reconciliation of a Package on something that should be retry-able (thinking about normal helm/zarf flow, multiple apply attempts would be made). Rather than introduce a targeted retry for just the servicemonitor behavior we decided it would potentially solve more intermittent issues (ex: intermittent networking related problems) if we just did a generic 5x retry on all Packages. This was reviewed synchronously and tested against a few scenarios where retries did resolve issues. For history sake linking bugs introduced here:
|
## Description Adds re-tries to Package CR status + logic to increment and handle retries. Currently will attempt package reconcile 5x before failing. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Other (security config, docs update, etc) ## Checklist before merging - [x] Test, docs, adr added or updated as needed - [x] [Contributor Guide Steps](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)(https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md#submitting-a-pull-request) followed
🤖 I have created a release *beep* *boop* --- ## [0.22.0](v0.21.1...v0.22.0) (2024-05-22) ### Features * add `expose` service entry for internal cluster traffic ([#356](#356)) ([1bde4cc](1bde4cc)) * add reconciliation retries for CRs ([#423](#423)) ([424b57b](424b57b)) * uds common renovate config ([#391](#391)) ([035786c](035786c)) * uds core docs ([#414](#414)) ([a35ca7b](a35ca7b)) ### Bug Fixes * mismatched exemption/policy for DropAllCapabilities ([#384](#384)) ([d8ec278](d8ec278)) * pepr mutation annotation overwrite ([#385](#385)) ([6e56b2a](6e56b2a)) * renovate config grouping, test-infra ([#411](#411)) ([05fd407](05fd407)) * renovate pepr comment ([#410](#410)) ([a825388](a825388)) ### Miscellaneous * **deps:** update keycloak ([#390](#390)) ([3e82c4e](3e82c4e)) * **deps:** update keycloak to v24.0.4 ([#397](#397)) ([c0420ea](c0420ea)) * **deps:** update keycloak to v24.0.4 ([#402](#402)) ([e454576](e454576)) * **deps:** update neuvector to v9.4 ([#381](#381)) ([20d4170](20d4170)) * **deps:** update pepr to 0.31.0 ([#360](#360)) ([fbd61ea](fbd61ea)) * **deps:** update prometheus-stack ([#348](#348)) ([49cb11a](49cb11a)) * **deps:** update prometheus-stack ([#392](#392)) ([2e656f5](2e656f5)) * **deps:** update uds to v0.10.4 ([#228](#228)) ([1750b23](1750b23)) * **deps:** update uds-k3d to v0.6.0 ([#398](#398)) ([288f009](288f009)) * **deps:** update velero ([#350](#350)) ([e7cb33e](e7cb33e)) * **deps:** update zarf to v0.33.2 ([#394](#394)) ([201a37b](201a37b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Description
Adds re-tries to Package CR status + logic to increment and handle retries. Currently will attempt package reconcile 5x before failing.
Type of change
Checklist before merging