-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow timetable to slightly miss catchup cutoff #33404
Allow timetable to slightly miss catchup cutoff #33404
Conversation
Previously, with catchup=False, CronTriggerTimetable would aggressively cut off a run if the scheduler doesn't ask to schedule the next run immediately. This causes DAGs to seemingly mysteriously "miss" a run from time to time due to the scheduler inevitably having a very slight hiccup. This change makes the timetable's non-catchup cutoff logic a little more lax, and only activate when the scheduler misses at least an entire interval. For example, for a daily cron, if the previous run happened on midnight of 2nd Jun (to cover 1st Jun), the timetable would still allow scheduling a run covering 2nd Jun if the scheduler asks for it some time during the 2nd, and would skip the 2nd Jun run entirely only if the scheduler fails to ask for a run on the entirety of the 2nd and only asks after midnight on the 3rd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think it is a very good fix even if it changes the first run behaviour.
Thanks for very detailed explanation @uranusjr ! |
(cherry picked from commit a6299d4)
I am not sure, but I think it was the opposite problem. And I am not entirely sure what's the CronTimetable expected behaviour it - it's not explained in https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/timetables/interval/index.html#airflow.timetables.interval.CronDataIntervalTimetable whether the previous schedule should be when not full interval passes in this case. Ah my bad. This actually IS described as desired behaviour because this one in https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/timetables/trigger/index.html
|
But the decription quoted by the user contradicts it IMHO https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/timetable.html#the-time-when-a-dag-run-is-triggered
So yeah. I am also confused what are the expectations here - @uranusjr - do you know? |
@potiuk, @o-nikolas The original behavior of the CronTriggerTimetable was as described in the docs and as described as the expected behavior in #35647. If the DAG was enabled, the first DAG run would occur only once the next schedule interval had occurred - e.g. for schedule "0 13 * * *" if the DAG was enabled on December 30th at 13:01, the DAG would not run until December 31st at 13:00. However, there was the bug (#27399) where this could lead to Airflow not scheduling the DAG on occasion. The fix implemented altered this behavior so Airflow will now trigger for the previous schedule interval once enabled - e.g. for schedule "0 13 * * *" if the DAG was enabled on December 30th at 13:01, the DAG would run immediately for the schedule interval for December 30th. The docs are wrong. I have been hoping to make a PR to fix them but hadn't had a chance. For what it's worth, I think there is still value in a timetable that behaves the way the docs describe it though it seems that is more complicated than originally expected. |
In this case, I would say the behavior described in the docs should be the actual behavior. The fix made in this PR is great, but it shouldn't alter the bahavior for the 1st triggered run. The flag |
I agree with @shubham22 here. A DAG should not be triggered to run for the first time prior to its scheduled run time using CronTriggerTimetable. This is the point of Catchup=False. |
Previously, with catchup=False, CronTriggerTimetable would aggressively cut off a run if the scheduler doesn't ask to schedule the next run immediately. This causes DAGs to seemingly mysteriously "miss" a run from time to time due to the scheduler inevitably having a very slight hiccup.
This change makes the timetable's non-catchup cutoff logic a little more lax, and only activate when the scheduler misses at least an entire interval. For example, for a daily cron, if the previous run happened on midnight of 2nd Jun (to cover 1st Jun), the timetable would still allow scheduling a run covering 2nd Jun if the scheduler asks for it some time during the 2nd, and would skip the 2nd Jun run entirely only if the scheduler fails to ask for a run on the entirety of the 2nd and only asks after midnight on the 3rd.
As discussed in #27399, I feel this is the more reasonable fix than #32921, even though it slightly changes the behaviour (specifically, the first ever run would start one interval earlier than previously). Quoting from the linked issue:
Fix #27339. Close #32921.