Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parsing context to DAG Parsing #25161

Merged
merged 1 commit into from
Aug 4, 2022

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jul 19, 2022

Adds proper context in the form of context managers setting
evironment variables to indicate whethere the
dag file is parsed in context of DAG processor or Task Execution
and allows to retrieve DAG_ID and TASK_ID easily.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:CLI area:Scheduler including HA (high availability) scheduler kind:documentation labels Jul 19, 2022
@potiuk potiuk requested a review from pingzh July 19, 2022 15:15
@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2022

This is the first attempt to implement the "robust execution context" as a follow up after #25121 . I am not 100% if I got everything, but I think it might be quite close. @pingzh I know you have a ton of experience with varioous runners and the way they are implemented is a bit "convoluted" so I'd appreciate thorough review (note that it is based on #25121 which was documentation only so only last commit matter.

@potiuk potiuk force-pushed the robust-context-of-dag-parsing branch 2 times, most recently from 6787927 to 092b36c Compare July 19, 2022 15:57
@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2022

A bit cleaner interface (also appropriate to be implemented as "future".

Copy link
Contributor

@pingzh pingzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this idea.

one thing is if https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run is released, we don't need to inject the context in the executor level, as airflow tasks run --local process does not parse dag file anymore. This means we only need the context manager in the task_runner

airflow/executors/base_executor.py Outdated Show resolved Hide resolved
airflow/utils/dag_parsing_context.py Outdated Show resolved Hide resolved
docs/apache-airflow/howto/dynamic-dag-generation.rst Outdated Show resolved Hide resolved
docs/apache-airflow/howto/dynamic-dag-generation.rst Outdated Show resolved Hide resolved
docs/apache-airflow/howto/dynamic-dag-generation.rst Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the robust-context-of-dag-parsing branch from d79b073 to 8ee0d60 Compare July 22, 2022 18:48
@potiuk potiuk changed the title Robust context of dag parsing Add parsing context to DAG Parsing Jul 22, 2022
@potiuk
Copy link
Member Author

potiuk commented Jul 22, 2022

Addressed all comments and got rid of the "non-robust" solution.

@potiuk potiuk force-pushed the robust-context-of-dag-parsing branch 4 times, most recently from b6dc941 to bb067a8 Compare July 24, 2022 19:05
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General point: lets mark anything we do here as experimental as Dag Fetcher (AIP-5) may give us a nicer way of doing this

@potiuk
Copy link
Member Author

potiuk commented Jul 28, 2022

General point: lets mark anything we do here as experimental as Dag Fetcher (AIP-5) may give us a nicer way of doing this

We did already:

There is an experimental approach that you can take to optimize this behaviour. Note that it is not always
possible to use (for example when generation of subsequent DAGs depends on the previous DAGs) or when
there are some side-effects of your DAGs generation. Also the code snippet below is pretty complex and while
we tested it and it works in most circumstances, there might be cases where detection of the currently
parsed DAG will fail and it will revert to creating all the DAGs or fail. Use this solution with care and
test it thoroughly.

@potiuk potiuk force-pushed the robust-context-of-dag-parsing branch 2 times, most recently from 850521b to de70620 Compare August 2, 2022 12:15
@potiuk
Copy link
Member Author

potiuk commented Aug 2, 2022

I think it should get green this time

airflow/utils/dag_parsing_context.py Outdated Show resolved Hide resolved
docs/apache-airflow/howto/dynamic-dag-generation.rst Outdated Show resolved Hide resolved
airflow/executors/local_executor.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member

ashb commented Aug 2, 2022

We did already:

Ah, but it's not marked with the "experimental" sphinx tag https://github.com/apache/airflow/blame/main/docs/apache-airflow/listeners.rst#L31

@potiuk
Copy link
Member Author

potiuk commented Aug 2, 2022

We did already:

Ah, but it's not marked with the "experimental" sphinx tag https://github.com/apache/airflow/blame/main/docs/apache-airflow/listeners.rst#L31

Added.

@potiuk potiuk force-pushed the robust-context-of-dag-parsing branch from de70620 to 5b669f8 Compare August 2, 2022 20:54
@potiuk
Copy link
Member Author

potiuk commented Aug 2, 2022

Jsut remaining question is about where to inject the parsing_context - @pingzh - any comments here?

@potiuk potiuk force-pushed the robust-context-of-dag-parsing branch 2 times, most recently from 059d9d0 to 5ec6f06 Compare August 3, 2022 12:09
@potiuk
Copy link
Member Author

potiuk commented Aug 3, 2022

Now I think it should all be fine.

@potiuk potiuk force-pushed the robust-context-of-dag-parsing branch 2 times, most recently from 5936a4c to 1b082af Compare August 3, 2022 16:32
@potiuk
Copy link
Member Author

potiuk commented Aug 3, 2022

Right. Finally Green.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the compat issue

@potiuk potiuk force-pushed the robust-context-of-dag-parsing branch from 1b082af to 467ef2f Compare August 4, 2022 10:17
Adds proper context in the form of context managers setting
evironment variables to indicate whethere the
dag file is parsed in context of DAG processor or Task Execution
and allows to retrieve DAG_ID and TASK_ID easily.
@potiuk potiuk force-pushed the robust-context-of-dag-parsing branch from d56f68c to 36c9b75 Compare August 4, 2022 15:50
@potiuk potiuk merged commit 2bf3dc6 into apache:main Aug 4, 2022
@potiuk potiuk deleted the robust-context-of-dag-parsing branch August 4, 2022 15:51
@jedcunningham jedcunningham added this to the Airflow 2.4.0 milestone Aug 9, 2022
@jedcunningham jedcunningham added the type:new-feature Changelog: New Features label Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI area:Scheduler including HA (high availability) scheduler kind:documentation type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants