proposal: FailurePolicy (or something of the sort) #2972

krancour · 2024-11-20T18:18:46Z

The exact conditions that precipitated this proposal were many Stages whose Promotion processes all attempt pushing to the same branch. Unsurprisingly, this can create races between concurrent Promotions. In the time between one Promo checking out the relevant branch and pushing a new commit to it, another Promotion may have pushed its own commit to that branch, thereby creating a conflict that causes the first Promotion's git-push step to fail.

This is one of many reasons I strongly promote using a dedicated branch per Stage as a sort of storage, but this issue isn't about the wisdom or folly of any particular approach. The scenario above is merely an accessible example of a Promotion failure that could be resolved simply by repeating the steps of the Promotion process again, starting from 0.

With Promotion processes being entirely user-defined, it's not really possible to build any intelligent recovery logic directly into the git-push step. It seems, however, that there is a range of simple and generic "FailurePolicies" that could be quite useful.

Some ideas for further discussion:

Start the Promotion again from step 0 (retry up to some limit)
Let the Promotion fail then automatically create a new one just like it (retry up to some limit)
Do nothing
Let the Promotion fail then automatically create a new to return the Stage to its previous state (retry up to some limit)
Other...

Users could select a policy from these options and we can add more options over time.

Another complementary idea is for individual steps to be able to provide a hint in a failure result as to how best to proceed.

We've heard many ask for automatic rollbacks before, though we have no issue for it. I would propose that this notion of FailurePolicies might be the correct angle from which to approach that.

cc @jessesuen and @hiddeco for input.

The text was updated successfully, but these errors were encountered:

krancour added kind/enhancement area/controller needs/priority area/crds kind/proposal labels Nov 20, 2024

krancour mentioned this issue Nov 21, 2024

feat: allow configuration of retry attempts for Promotion steps #2940

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: FailurePolicy (or something of the sort) #2972

proposal: FailurePolicy (or something of the sort) #2972

krancour commented Nov 20, 2024

proposal: FailurePolicy (or something of the sort) #2972

proposal: FailurePolicy (or something of the sort) #2972

Comments

krancour commented Nov 20, 2024