-
Notifications
You must be signed in to change notification settings - Fork 184
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support building DAGs out of topologically unsorted YAML files (#307)
Any YAML files that declare upstream tasks after downstream tasks, regardless of using dynamic task mapping, would fail. Example of DAG that would fail: ``` test_expand: default_args: owner: "custom_owner" start_date: 2 days description: "test expand" schedule_interval: "0 3 * * *" default_view: "graph" tasks: process: operator: airflow.operators.python_operator.PythonOperator python_callable_name: expand_task python_callable_file: $CONFIG_ROOT_DIR/expand_tasks.py partial: op_kwargs: test_id: "test" expand: op_args: request.output dependencies: [request] request: operator: airflow.operators.python.PythonOperator python_callable_name: example_task_mapping python_callable_file: $CONFIG_ROOT_DIR/expand_tasks.py ``` In this example, the upstream (parent) task "request" is defined after the downstream (child) task "process". Before this change, this DAG would fail. I implemented a solution to solve the problem that uses Kahn's algorithm to sort the tasks topologically: https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm It has asymptotic complexity O(N + D), where N is the total number of tasks, and D is the total number of dependencies. This complexity seems acceptable. An alternative to the current approach would be to create all the tasks without dependencies as a starting point and add the dependencies once all tasks were made - similar to what we did in https://github.com/astronomer/astronomer-cosmos. However, this approach would require a bigger refactor of the DAG factory and may have issues with dynamic task mapping. Closes: #225
- Loading branch information
Showing
5 changed files
with
125 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
def example_task_mapping(): | ||
return [[1], [2], [3]] | ||
def make_list(): | ||
return [[1], [2], [3], [4]] | ||
|
||
|
||
def expand_task(x, test_id): | ||
print(test_id) | ||
print(x) | ||
return [x] | ||
def consume_value(expanded_param, fixed_param): | ||
print(fixed_param) | ||
print(expanded_param) | ||
return [expanded_param] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters