Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3702] Add backfill option to run backwards #4676

Merged
merged 1 commit into from
Feb 14, 2019

Conversation

dima-asana
Copy link
Contributor

@dima-asana dima-asana commented Feb 9, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    This adds an optional capability for the backfill CLI command to process dates in reverse order.

Tests

  • My PR adds the following unit tests:
    test_jobs::BackfillJobTest::test_backfill_enqueue_backwards

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

@codecov-io
Copy link

codecov-io commented Feb 9, 2019

Codecov Report

Merging #4676 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4676      +/-   ##
==========================================
+ Coverage   74.42%   74.43%   +0.01%     
==========================================
  Files         430      430              
  Lines       27972    27980       +8     
==========================================
+ Hits        20819    20828       +9     
+ Misses       7153     7152       -1
Impacted Files Coverage Δ
airflow/bin/cli.py 67.22% <ø> (ø) ⬆️
airflow/models/__init__.py 92.84% <ø> (+0.05%) ⬆️
airflow/executors/base_executor.py 95.23% <100%> (+0.07%) ⬆️
airflow/jobs.py 77.52% <100%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2eea1a...2d558f6. Read the comment docs.

@feng-tao
Copy link
Member

@dima-asana , I have been pretty busy lately. Looking at your code, I think your pr is almost there which only needs to handle the case when the DAG has depend_on_past(we could throw the exception if reverse_backfill and depend_on_past are both true).

Do you want to update your pr ? I think we could commit yours and close mine. What do you think?

@dima-asana dima-asana force-pushed the airflow-3702 branch 2 times, most recently from 22e5eaa to 81b9f75 Compare February 14, 2019 05:53
@dima-asana
Copy link
Contributor Author

Sure, updated. Note that in addition to the suggested change for depends_on_past handling, I changed the executor queue dict to an ordered dict. I think for this change to be impactful that's necessary -- I didn't realize that the executor queue were orderless until testing this code more.

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 small nit. LGTM. thanks for the patch.

(
"if set, the backfill will run tasks from the most "
"recent day first "
"instead of throwing exceptions"),
Copy link
Member

@feng-tao feng-tao Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding one more line "this option will fail if the DAG depends on past."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@feng-tao feng-tao merged commit a9f9f1c into apache:master Feb 14, 2019
ashb pushed a commit to ashb/airflow that referenced this pull request Mar 6, 2019
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
@damnko
Copy link

damnko commented May 25, 2020

Hi, can the run_backwards parameter be used when defining a DAG like so?

from airflow import DAG
dag = DAG(
    dag_id = "...",
    scheduled_interval = "@daily",
    run_backwards = True,
    ...
)

It seems to be something that can be used only manually through the Airflow cli but wanted to confirm. Is there any way to setup a DAG to run as backfilling job from the most recent DAG to the oldest?
Thanks

@aashayVimeo
Copy link

hi @dima-asana @feng-tao Any updates on the above comment for running backwards dag from dag definition.

from airflow import DAG
dag = DAG(
    dag_id = "...",
    scheduled_interval = "@daily",
    run_backwards = True,
    ...
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants