docs: add cron wf improvements proposal #11559

eduardodbr · 2023-08-11T19:38:15Z

Motivation

Cron workflow improvements proposal as prompted by #10620

Signed-off-by: eduardodbr <[email protected]>

terrytangyuan

cc @sarabala1979 @caelan-io @Kitanotori @wesleyscholl @tico24

Signed-off-by: eduardodbr <[email protected]>

caelan-io

Thanks @eduardodbr ! Looks real great. I left some suggestions.

caelan-io · 2023-08-26T16:20:47Z

docs/proposals/cron-wf-improvement-proposal.md

+
+One aspect that needs to be discussed is whether these configurations apply to the entire life of a cron Workflow or just in consecutive schedules. For example, if we configure a workflow to stop scheduling after 2 failures, I think it makes sense to have this applied when it fails twice consecutively. Otherwise, we can have 2 outages in different periods which will suspend the workflow. On the other hand, when configuring a workflow to run twice with success, it would make more sense to have it execute with success regardless of whether it is a consecutive success or not. If we have an outage after the first workflow succeeds, which translates into failed workflows, it should need to execute with success only once. So I think it would make sense to have:
+
+- maxFailures - maximum number of **consecutive failures** before stopping the scheduling of a workflow


I would propose adding another field to toggle between consecutive or total failures. It could be a boolean with default as false.

Maybe: consecFailures or consecFail

This would make the default logic align for both maxSuccess and maxFailures which is simpler logic for a new user to understand. If they want consecutive failures counted, then they can set consecFailures to true.

Maybe we could have the run strategy to be configurable by success or failure. For example:

RunStrategy: onSuccess: maxSuccess: 3 onFail: maxFail: 3 consecFail: false

this would allow us to further upgrade the configurable behavior in the future without breaking the schema. For example, if we decide to ship the back-off in another PR, you could just upgrade it to:

RunStrategy: onSuccess: maxSucess: 3 onFail: maxFail: 3 consecFail: true back-off: enabled: true factor: 2

However, I do agree with you, this may add a lot of complexity, and its best to keep it simple for now.

I like this suggestion of yours:

Maybe we could have the run strategy to be configurable by success or failure. this would allow us to further upgrade the configurable behavior in the future without breaking the schema. For example, if we decide to ship the back-off in another PR, you could just upgrade it

I'm up for this if you think it'll be simple enough to implement on the first attempt.

caelan-io · 2023-08-26T16:21:28Z

docs/proposals/cron-wf-improvement-proposal.md

+```
+RunStrategy:
+ maxSuccess:
+ maxFailures:


maybe maxFail for simplicity and align with "success vs. fail"

caelan-io · 2023-08-26T16:21:53Z

docs/proposals/cron-wf-improvement-proposal.md

+
+`maxSuccess` - defines how many child workflows must have success before suspending the workflow schedule
+
+`maxFailures` - defines how many child workflows must fail before suspending the workflow scheduling. This may contain `Failed` workflows, `Errored` workflows or spec errors.


very important to document this clearly in the docs when the feature is released.

caelan-io · 2023-08-26T16:22:42Z

docs/proposals/cron-wf-improvement-proposal.md

+    - Probably only makes sense if we can backup the state somewhere (like a BD). However, I don't have enough knowledge about workflow's architecture to tell how good of an idea this is.
+- If a CronWorkflow gets re-applied, does it maintain or reset the number of success/failures?
+
+    - I guess it should reset since a configuration change should be seen as a new start.


I agree with resetting state upon re-application of the CronWorkflow

caelan-io · 2023-08-26T16:25:01Z

docs/proposals/cron-wf-improvement-proposal.md

+- Manually re-create the workflow: could be an issue if the user has multiple cron workflows
+- Instead of stopping the workflow scheduling, introduce a back-off period as suggested by [#7291](https://github.com/argoproj/argo-workflows/issues/7291). Or maybe allow both configurations.
+
+I believe option 2 would allow the user to select if they want to stop scheduling or not. If they do, when cron workflows are wrongfully halted, they will need to manually start them again. If they don't, Argo will only introduce a back-off period between schedules to avoid rescheduling workflows that are just going to fail. Spec could look something like:


I agree with this, but I would say that we should probably ship this feature in a separate PR than the above requirements you outlined. It feels like this would add additional complexity.

caelan-io · 2023-08-26T16:26:59Z

docs/proposals/cron-wf-improvement-proposal.md

+
+## Introduction
+
+Currently, CronWorkflows are a great resource if we want to run recurring tasks to infinity. However, it is missing the ability to customize it, for example define how many times a workflow should run or how to handle multiple failures. I believe argo workflows would benefit of having more configuration options for cron workflows, to allow to change its behavior based on the result of its child’s success or failures. Below I present my thoughts on how we could improve them, but also some questions and concerns on how to properly do it.


I would also note here how a comparable tool, Airflow, includes this functionality so it is a strong precedent to enable it for Argo Workflows as well: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#triggering-dag-with-future-date

I didn't mention that because this proposal was aiming to be something wider than just allowing to run a workflow once (in the future or not). However, I agree that it is one of the most important features that Argo workflows is lacking. I believe a lot of people would need it more frequently than configuring the behavior of recurrent success/fail. This proposal aimed to allow both use cases. However, if we come to the conclusion that it may be to complex to implement all this behavior, I would suggest at least implementing the trigger once feature.

caelan-io · 2023-09-01T17:34:23Z

@terrytangyuan - what are your thoughts on moving forward with this?

terrytangyuan · 2023-09-11T21:34:56Z

This is on my list of items to review.

terrytangyuan · 2023-09-21T02:28:13Z

docs/proposals/cron-wf-improvement-proposal.md

+```yaml
+RunStrategy:
+ maxSuccess:
+ maxFailures:
+  value: # this would be optional
+  back-off:
+   enabled: true
+   factor: 2


What about something like the following so it's more extensible to other phases?

runStrategy: condition: "failure >= 3" backOff: factor: ...

eduardodbr · 2023-11-26T16:39:33Z

@caelan-io @terrytangyuan I have some time to come back to this and after your suggestions I suggest the following first approach:

New Spec field stopStrategy with a condition that is used to stop scheduling workflows. Controller stops scheduling workflows if the stopping condition is achieved:

stopStrategy:
   condition: "succeeded >= 1"

New status fields:
- Succeeded - counts the number of workflows that completed in phase Succeeded
- Failed - counts the number of workflows that failed to submit or completed in phase Error or Failed
- Completed - whether the stopping condition was achieved or not (is it worth it to have it as an annotation?)
Use the completed status field instead of changing the cronworkflow with “suspend=true” to avoid changing the cronworkflow spec, which conflicts with gitops users
Keywords allowed for stopping conditions are failed and succeeded:
- Failed - uses the status field failed
- Succeeded - uses the status field succeeded

stopStrategy:
 condition: "failed >= 3"

stopStrategy:
 condition: "succeeded >= 1"

Updating the stopping condition after the workflow is completed, will continue the normal cron workflow execution.

Just wondering if instead of naming it stopStrategy we could use something more generic. However, I feel that @terrytangyuan's suggestion runStrategy can be easily missed if the user just wants to schedule the workflow for a X amount of times.

If it is ok with you I can open a PR with the above changes.

terrytangyuan · 2023-11-28T17:42:11Z

I don't have a strong preference regarding the naming.

caelan-io · 2023-11-29T01:28:39Z

That naming convention sounds good to me! It'll help to see it in the code, though. I think it's worth moving forward with the PR as you've described here, and we can revisit if we see any issues with it.

@terrytangyuan - it looks like we'll need your approval to merge this enhancement proposal.

caelan-io · 2023-12-06T13:24:31Z

@terrytangyuan - can we merge since @eduardodbr has opened a PR that's under review already?

terrytangyuan · 2023-12-06T13:38:24Z

Sure

caelan-io · 2023-12-08T10:52:03Z

Thank you, Terry!

eduardodbr added 2 commits August 11, 2023 20:36

docs: add cron wf improvements proposal

4c75e8e

Signed-off-by: eduardodbr <[email protected]>

fix: make docs

379c46d

Signed-off-by: eduardodbr <[email protected]>

terrytangyuan reviewed Aug 17, 2023

View reviewed changes

fix: make docs

cb73db9

Signed-off-by: eduardodbr <[email protected]>

caelan-io suggested changes Aug 26, 2023

View reviewed changes

caelan-io approved these changes Sep 1, 2023

View reviewed changes

terrytangyuan reviewed Sep 21, 2023

View reviewed changes

eduardodbr mentioned this pull request Nov 30, 2023

feat: add stopStrategy to cron workflows #12305

Merged

terrytangyuan approved these changes Dec 6, 2023

View reviewed changes

terrytangyuan enabled auto-merge (squash) December 6, 2023 13:38

Merge branch 'main' into cron_wf_improvements_proposal

8b04a12

terrytangyuan merged commit 8859c92 into argoproj:main Dec 6, 2023
13 checks passed

eduardodbr mentioned this pull request Jan 23, 2024

REQUEST: New membership for eduardodbr argoproj/argoproj#272

Closed

5 tasks

agilgur5 added the area/cron-workflows label Feb 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add cron wf improvements proposal #11559

docs: add cron wf improvements proposal #11559

eduardodbr commented Aug 11, 2023

terrytangyuan left a comment

caelan-io left a comment

caelan-io Aug 26, 2023

eduardodbr Aug 31, 2023

caelan-io Sep 1, 2023

caelan-io Aug 26, 2023

caelan-io Aug 26, 2023

caelan-io Aug 26, 2023

caelan-io Aug 26, 2023

caelan-io Aug 26, 2023

eduardodbr Aug 31, 2023

caelan-io commented Sep 1, 2023

terrytangyuan commented Sep 11, 2023

terrytangyuan Sep 21, 2023

eduardodbr commented Nov 26, 2023

terrytangyuan commented Nov 28, 2023

caelan-io commented Nov 29, 2023

caelan-io commented Dec 6, 2023

terrytangyuan commented Dec 6, 2023

caelan-io commented Dec 8, 2023


		One aspect that needs to be discussed is whether these configurations apply to the entire life of a cron Workflow or just in consecutive schedules. For example, if we configure a workflow to stop scheduling after 2 failures, I think it makes sense to have this applied when it fails twice consecutively. Otherwise, we can have 2 outages in different periods which will suspend the workflow. On the other hand, when configuring a workflow to run twice with success, it would make more sense to have it execute with success regardless of whether it is a consecutive success or not. If we have an outage after the first workflow succeeds, which translates into failed workflows, it should need to execute with success only once. So I think it would make sense to have:

		- maxFailures - maximum number of consecutive failures before stopping the scheduling of a workflow


		`maxSuccess` - defines how many child workflows must have success before suspending the workflow schedule

		`maxFailures` - defines how many child workflows must fail before suspending the workflow scheduling. This may contain `Failed` workflows, `Errored` workflows or spec errors.


		## Introduction

		Currently, CronWorkflows are a great resource if we want to run recurring tasks to infinity. However, it is missing the ability to customize it, for example define how many times a workflow should run or how to handle multiple failures. I believe argo workflows would benefit of having more configuration options for cron workflows, to allow to change its behavior based on the result of its child’s success or failures. Below I present my thoughts on how we could improve them, but also some questions and concerns on how to properly do it.

docs: add cron wf improvements proposal #11559

docs: add cron wf improvements proposal #11559

Conversation

eduardodbr commented Aug 11, 2023

Motivation

terrytangyuan left a comment

Choose a reason for hiding this comment

caelan-io left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caelan-io commented Sep 1, 2023

terrytangyuan commented Sep 11, 2023

Choose a reason for hiding this comment

eduardodbr commented Nov 26, 2023

terrytangyuan commented Nov 28, 2023

caelan-io commented Nov 29, 2023

caelan-io commented Dec 6, 2023

terrytangyuan commented Dec 6, 2023

caelan-io commented Dec 8, 2023