-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding ContinuousTimetable and support for @continuous schedule_interval #29909
Adding ContinuousTimetable and support for @continuous schedule_interval #29909
Conversation
9dd6c00
to
95de643
Compare
I’m not sure a new directive is worthwhile tbh. |
I think it's an interesting pattern - a number of users asked for it and any attempt to do it without such directive is pretty cumbersome. While I initially had the same thought, Looking at how simple it will be for the users to use it, I think it is worthwile to add it. I generally also think we should have more built-in time-tables that serve various cases like that. The custom timatable interface is prohibitively complex even for an experienced Python developer and testing it is next to impossible. A good example of that is CronTriggerTimetable - which even is written by us, apparently exposes race condition: #27399 where it occasionally looses one tasks. So as a community I think we should invest IMHO in having a few more "generic", "robust" and "declaratively configurable" timetables that should serve a number of common cases as otherwise we are asking our users for too much of an effort to develop their own custom timetables. This is just one example of such case and imho more cases should follow. |
I think we also need to add it to the docs |
b90c917
to
1e671a5
Compare
1e671a5
to
e2dc10b
Compare
c30d877
to
bebb875
Compare
…ds timetable's configured limit
bebb875
to
179b597
Compare
Closes: #29900
This introduces a new
@continuous
Timetable which will always try to start new DAGRuns.The degree of parallelism can be then bounded with the
max_active_dagruns
parameter.This is a little bit different from the currently available approaches:
schedule_interval="* * * * *"
and can hypothetically run more oftenTested in Breeze with the following DAG:
And it seemed to work well:
This PR also adds the
active_runs_limit
attribute to the Timetable protocol - Airflow will assert that a DAG'smax_active_runs
is no higher than this value at parse time, and raise an AirflowException if it is higher. This is used in the case of the ContinuousTimetable to ensure that only one DAG is running at any given time (see discussion below).