Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ContinuousTimetable and support for @continuous schedule_interval #29909

Merged
merged 14 commits into from
Mar 16, 2023

Conversation

SamWheating
Copy link
Contributor

@SamWheating SamWheating commented Mar 3, 2023

Closes: #29900

This introduces a new @continuous Timetable which will always try to start new DAGRuns.

The degree of parallelism can be then bounded with the max_active_dagruns parameter.

This is a little bit different from the currently available approaches:

  • It doesn't have the notion of catchup like schedule_interval="* * * * *" and can hypothetically run more often
  • Its a lot lighter-weight (both in definition and execution) than using a TriggerDagOperator at the end of a DAG.

Tested in Breeze with the following DAG:

from airflow.models import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

from airflow.models.param import Param

dag = DAG(
    f"continuous_dag",
    schedule_interval="@continuous",
    start_date=datetime(2021, 1, 1),
)

task = BashOperator(task_id="the_task", dag=dag, bash_command="sleep 10", owner="nobody!")

And it seemed to work well:

image

This PR also adds the active_runs_limit attribute to the Timetable protocol - Airflow will assert that a DAG's max_active_runs is no higher than this value at parse time, and raise an AirflowException if it is higher. This is used in the case of the ContinuousTimetable to ensure that only one DAG is running at any given time (see discussion below).

@SamWheating SamWheating force-pushed the sw-continuous-scheduling-timetable branch from 9dd6c00 to 95de643 Compare March 3, 2023 23:23
airflow/timetables/simple.py Outdated Show resolved Hide resolved
@uranusjr
Copy link
Member

uranusjr commented Mar 5, 2023

I’m not sure a new directive is worthwhile tbh.

@potiuk
Copy link
Member

potiuk commented Mar 5, 2023

I’m not sure a new directive is worthwhile tbh.

I think it's an interesting pattern - a number of users asked for it and any attempt to do it without such directive is pretty cumbersome. While I initially had the same thought, Looking at how simple it will be for the users to use it, I think it is worthwile to add it.

I generally also think we should have more built-in time-tables that serve various cases like that. The custom timatable interface is prohibitively complex even for an experienced Python developer and testing it is next to impossible. A good example of that is CronTriggerTimetable - which even is written by us, apparently exposes race condition: #27399 where it occasionally looses one tasks.

So as a community I think we should invest IMHO in having a few more "generic", "robust" and "declaratively configurable" timetables that should serve a number of common cases as otherwise we are asking our users for too much of an effort to develop their own custom timetables.

This is just one example of such case and imho more cases should follow.

@eladkal
Copy link
Contributor

eladkal commented Mar 6, 2023

I think we also need to add it to the docs
https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/timetable.html#built-in-timetables

@SamWheating SamWheating force-pushed the sw-continuous-scheduling-timetable branch 2 times, most recently from b90c917 to 1e671a5 Compare March 11, 2023 22:13
airflow/models/dag.py Outdated Show resolved Hide resolved
@SamWheating SamWheating force-pushed the sw-continuous-scheduling-timetable branch from 1e671a5 to e2dc10b Compare March 15, 2023 03:45
airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
@SamWheating SamWheating force-pushed the sw-continuous-scheduling-timetable branch from c30d877 to bebb875 Compare March 16, 2023 04:22
@potiuk potiuk force-pushed the sw-continuous-scheduling-timetable branch from bebb875 to 179b597 Compare March 16, 2023 08:02
airflow/timetables/base.py Outdated Show resolved Hide resolved
airflow/timetables/base.py Outdated Show resolved Hide resolved
@potiuk potiuk merged commit c1aa4b9 into apache:main Mar 16, 2023
@eladkal eladkal added this to the Airflow 2.6.0 milestone Mar 16, 2023
@pierrejeambrun pierrejeambrun added the type:new-feature Changelog: New Features label Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add continues scheduling option
6 participants