Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mini scheduler expansion of mapped task #27506

Merged
merged 3 commits into from
Nov 9, 2022

Conversation

ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Nov 4, 2022

We have a case where the mini scheduler tries to expand a mapped task even when the downstream tasks are not yet done.

The mini scheduler extracts a partial subset of a dag and in the process, some upstream tasks are dropped.
If the task happens to be a mapped task, the expansion will fail since it needs the upstream output to make the expansion. When the expansion fails, the task is marked as upstream_failed. This leads to other downstream tasks being marked as upstream failed.

The solution was to ignore this error and not mark the mapped task as upstream_failed when the expansion fails and the dag is a partial subset

closes: #27449

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Nov 4, 2022
@ephraimbuddy ephraimbuddy force-pushed the fix-mini-scheduler branch 3 times, most recently from fafa450 to c381035 Compare November 8, 2022 08:26
We have a case where the mini scheduler tries to expand a mapped task even when the downstream tasks are not yet done.

The mini scheduler extracts a partial subset of a dag and in the process, some upstream tasks are dropped.
If the task happens to be a mapped task, the expansion will fail since it needs the upstream output to make the expansion. When the expansion fails, the task is marked as `upstream_failed`. This leads to other downstream tasks being marked as upstream failed.

The solution was to ignore this error and not mark the mapped task as upstream_failed when the expansion fails and the dag is a partial subset
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor points, but LGTM!

airflow/models/mappedoperator.py Outdated Show resolved Hide resolved
airflow/models/taskinstance.py Show resolved Hide resolved
airflow/models/taskinstance.py Outdated Show resolved Hide resolved
@ashb ashb added this to the Airflow 2.4.3 milestone Nov 9, 2022
@ephraimbuddy ephraimbuddy merged commit ed92e5d into apache:main Nov 9, 2022
@ephraimbuddy ephraimbuddy deleted the fix-mini-scheduler branch November 9, 2022 14:06
ephraimbuddy added a commit that referenced this pull request Nov 9, 2022
We have a case where the mini scheduler tries to expand a mapped task even when the downstream tasks are not yet done.

The mini scheduler extracts a partial subset of a dag and in the process, some upstream tasks are dropped.
If the task happens to be a mapped task, the expansion will fail since it needs the upstream output to make the expansion. When the expansion fails, the task is marked as `upstream_failed`. This leads to other downstream tasks being marked as upstream failed.

The solution was to ignore this error and not mark the mapped task as upstream_failed when the expansion fails and the dag is a partial subset

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit ed92e5d)
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Nov 9, 2022
ephraimbuddy added a commit that referenced this pull request Nov 9, 2022
We have a case where the mini scheduler tries to expand a mapped task even when the downstream tasks are not yet done.

The mini scheduler extracts a partial subset of a dag and in the process, some upstream tasks are dropped.
If the task happens to be a mapped task, the expansion will fail since it needs the upstream output to make the expansion. When the expansion fails, the task is marked as `upstream_failed`. This leads to other downstream tasks being marked as upstream failed.

The solution was to ignore this error and not mark the mapped task as upstream_failed when the expansion fails and the dag is a partial subset

Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit ed92e5d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dynamic tasks marked as upstream_failed when none of their upstream tasks are failed or upstream_failed
4 participants