You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
confirm-promotion hooks are executed before the promotion step. The canary promotion is paused until the hooks return HTTP 200. While the promotion is paused, Flagger will continue to run the metrics checks and rollout hooks.
The feature is great to continue the metric check & load test when the promotion gate is not open, so that we can detect further errors if any.
But it will run a completely evaluation cycle from the very beginning. Even if the promotion gate is open during the metric analysis, flagger controller need to complete all metric analysis iteration as well, as described in below diagram
Also, I noticed that during the "additional metric analysis" duration, per the line https://github.com/fluxcd/flagger/blob/main/pkg/controller/scheduler.go#L350-L351 , when the canary status has been changed to "waitingpromoption", even if there are error detected, the canary still can't be set to "fail" and "rollback" , how to handle this conflict?
So, I would like to propose to trigger rollback when metric analysis fails when phase==waitingpromoption
Any alternatives you've considered?
No, I didn't figure an alternative with the current code base.
I raised PR #1139 to resolve the issue.
I am open to listen to comments to achieve the target scenario with other ideas.
The text was updated successfully, but these errors were encountered:
Describe the feature
per Flagger documentation https://docs.flagger.app/usage/webhooks
The feature is great to continue the metric check & load test when the promotion gate is not open, so that we can detect further errors if any.
But it will run a completely evaluation cycle from the very beginning. Even if the promotion gate is open during the metric analysis, flagger controller need to complete all metric analysis iteration as well, as described in below diagram
Proposed solution
To short the waiting time after promotion step is approved, I would like to raise this PR to achieve the below timeline:
Also, I noticed that during the "additional metric analysis" duration, per the line https://github.com/fluxcd/flagger/blob/main/pkg/controller/scheduler.go#L350-L351 , when the canary status has been changed to "waitingpromoption", even if there are error detected, the canary still can't be set to "fail" and "rollback" , how to handle this conflict?
So, I would like to propose to trigger rollback when metric analysis fails when phase==waitingpromoption
Any alternatives you've considered?
No, I didn't figure an alternative with the current code base.
I raised PR #1139 to resolve the issue.
I am open to listen to comments to achieve the target scenario with other ideas.
The text was updated successfully, but these errors were encountered: