Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly check the existence of missing mapped TIs #25788

Merged
merged 5 commits into from
Aug 25, 2022

Conversation

ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Aug 18, 2022

The previous implementation of missing indexes was not right. Missing indexes
were being checked everytime that task_instance_scheduling_decision was called.
The missing tasks should only be revised after expanding of last resort for mapped tasks have been done. If we find that a task is in schedulable state and has already been expanded, we revise their indexes and ensure they are complete. Missing indexes are marked as removed.
This implementation allows the revision to be done in one place

Closes: #25681

airflow/models/dagrun.py Outdated Show resolved Hide resolved
airflow/models/dagrun.py Outdated Show resolved Hide resolved
airflow/models/dagrun.py Outdated Show resolved Hide resolved
@ashb ashb changed the title Properly check the existence of missing indexes Properly check the existence of missing mapped TIs Aug 18, 2022
@ephraimbuddy ephraimbuddy requested a review from ashb August 18, 2022 09:54
airflow/models/dagrun.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the fix-unnecessary-calls branch 2 times, most recently from db598ad to 51af409 Compare August 18, 2022 10:23
tests/models/test_dagrun.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the fix-unnecessary-calls branch 4 times, most recently from 904bcb9 to bb06806 Compare August 18, 2022 21:10
@ephraimbuddy ephraimbuddy requested a review from uranusjr as a code owner August 18, 2022 21:10
airflow/models/dagrun.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the fix-unnecessary-calls branch from c536b31 to 8516d24 Compare August 19, 2022 07:30
airflow/models/dagrun.py Outdated Show resolved Hide resolved
airflow/models/dagrun.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the fix-unnecessary-calls branch from 8516d24 to 07249ad Compare August 19, 2022 08:02
@ephraimbuddy ephraimbuddy requested review from uranusjr and removed request for ashb August 24, 2022 15:29
airflow/models/dagrun.py Outdated Show resolved Hide resolved
Comment on lines +1066 to +1071
total_length = (
task.parse_time_mapped_ti_count
or task.run_time_mapped_ti_count(self.run_id, session=session)
or 0
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This induces an unnecessary query of parse_time_mapped_ti_count is 0. Reading how run_time_mapped_ti_count is used, some refactoring might be a good idea. I’ll do it after this PR is done.

airflow/models/dagrun.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the fix-unnecessary-calls branch from 219ccfb to c2dc5e6 Compare August 25, 2022 08:59
@ephraimbuddy ephraimbuddy requested a review from uranusjr August 25, 2022 09:07
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few very small corrections; this should be good otherwise.

airflow/models/dagrun.py Outdated Show resolved Hide resolved
airflow/models/dagrun.py Outdated Show resolved Hide resolved
airflow/models/dagrun.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the fix-unnecessary-calls branch from cc38cb2 to da6f608 Compare August 25, 2022 12:14
ephraimbuddy and others added 5 commits August 25, 2022 15:22
The previous implementation of missing indexes was not right. Missing indexes
were being checked everytime that `task_instance_scheduling_decision` was called.
The missing tasks should only be revised after expanding of last resort for mapped tasks have been done. If we find that a task is in schedulable state and has already been expanded, we revise their indexes and ensure they are complete. Missing indexes are marked as removed.
This implementation allows the revision to be done in one place
Co-authored-by: Tzu-ping Chung <[email protected]>
Co-authored-by: Tzu-ping Chung <[email protected]>
@ephraimbuddy ephraimbuddy force-pushed the fix-unnecessary-calls branch from da6f608 to 23a546e Compare August 25, 2022 14:25
@ephraimbuddy ephraimbuddy merged commit db818ae into apache:main Aug 25, 2022
@ephraimbuddy ephraimbuddy deleted the fix-unnecessary-calls branch August 25, 2022 19:11
anja-istenic pushed a commit to anja-istenic/airflow that referenced this pull request Aug 29, 2022
Also a chapter was added to recommend taking a backup before
the migration.

Based on discussions and user input from apache#25866, apache#24526

Closes: apache#24526

Improve cleanup of temporary files in CI (apache#25957)

After recent change in Paralell execution, we start to have
infrequent "no space left on device" message - likely caused by
the /tmp/ generated files clogging the filesystem from multiple
runs. We could fix it by simply running cleanup after parallel
job always, but this is not good due to diagnostics needed
when debugging parallel runs locally so we need to have
a way to skip /tmp files deletion.

This PR fixes the problem twofold:

* cleanup breeze instructions which is run at the beginning of
  every job cleans also /tmp file
* the parallel jobs cleans after themselvs unless skipped.

Properly check the existence of missing mapped TIs (apache#25788)

The previous implementation of missing indexes was not correct. Missing indexes
were being checked every time that `task_instance_scheduling_decision` was called.
The missing tasks should only be revised after expanding of last resort for mapped tasks have been done. If we find that a task is in schedulable state and has already been expanded, we revise its indexes and ensure they are complete. Missing indexes are marked as removed.
This implementation allows the revision to be done in one place

Co-authored-by: Tzu-ping Chung <[email protected]>

Fix dataset_event_manager resolution (apache#25943)

Appears `__init__` is not invoked as part of `_run_raw_task` due to the way TI is refreshed from db.  Centralize dataset manager instantiation instead.

Fix unhashable issue with secrets.backend_kwargs and caching (apache#25970)

Resolves apache#25968

Fix response schema for list-mapped-task-instance (apache#25965)

update areActiveRuns, fix states (apache#25962)
@ashb ashb added this to the Airflow 2.4.0 milestone Sep 9, 2022
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler enters crash loop in certain cases with dynamic task mapping
4 participants