Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Heartbeat Parameter to Allow Per-LocalTaskJob Configuration #32313

Merged
merged 24 commits into from
Jul 15, 2023

Conversation

pragnareddye
Copy link
Contributor

This pull request introduces changes to allow users to set heartbeat expectations separately for LocalTaskJob,. his resolves a current limitation where all job types share a single configuration parameter for expected heartbeat time.

related to: #30908


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

This pull request introduces changes to allow users to set heartbeat expectations separately for LocalTaskJob,. his resolves a current limitation where all job types share a single configuration parameter for expected heartbeat time.


related to: apache#30908
@pragnareddye pragnareddye changed the title Refactor Job Heartbeat Parameter to Allow Per-Job Configuration Add Job Heartbeat Parameter to Allow Per-LocalTaskJob Configuration Jul 2, 2023
@pragnareddye pragnareddye changed the title Add Job Heartbeat Parameter to Allow Per-LocalTaskJob Configuration Introduce Heartbeat Parameter to Allow Per-LocalTaskJob Configuration Jul 2, 2023
@@ -158,7 +158,7 @@ def sigusr2_debug_handler(signum, frame):
try:
self.task_runner.start()

heartbeat_time_limit = conf.getint("scheduler", "scheduler_zombie_task_threshold")
heartbeat_time_limit = conf.getint("scheduler", "local_task_job_heartbeat_sec")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why a different heartbeat was used here instead of the jobs heartrate. Do we really need this configuration or we should use JOB_HEARTBEAT_SEC? It's already set for this localtaskjob, If there's no apparent reason for a different configuration, I think we should use self.job.heartrate for this without any other configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ephraim! For us atleast, when we are running our jobs, we have these default values set and having these default values set differently for different job types is very convenient for us. Anyways the idea of this is also coming from #30908. Definitely open to discuss however

@uranusjr
Copy link
Member

uranusjr commented Jul 5, 2023

There are static check failures. Please set up pre-commit as described in the contributor guides and clean up the errors.

@uranusjr
Copy link
Member

uranusjr commented Jul 7, 2023

Does this need a significant change news fragment? The change introduces a backward incompatibility—if the user had changed scheduler_heartbeat_sec, the local task job heartbeat would change when Airflow is upgraded; previously it’d follow the old config, but now the old config is not referenced.

@potiuk
Copy link
Member

potiuk commented Jul 7, 2023

Does this need a significant change news fragment? The change introduces a backward incompatibility—if the user had changed scheduler_heartbeat_sec, the local task job heartbeat would change when Airflow is upgraded; previously it’d follow the old config, but now the old config is not referenced.

Agree. Yeah.

@kaxil
Copy link
Member

kaxil commented Jul 9, 2023

We should make it a backwards-compatible change though

@pragnareddye
Copy link
Contributor Author

@kaxil We could potentially make it backwards compatible by checking if the new param local_task_job_heartbeat_sec is not set, then use the existing scheduler config. but this means local_task_job_heartbeat_sec won't have a default value set or set to 0

@kaxil
Copy link
Member

kaxil commented Jul 10, 2023

@kaxil We could potentially make it backwards compatible by checking if the new param local_task_job_heartbeat_sec is not set, then use the existing scheduler config. but this means local_task_job_heartbeat_sec won't have a default value set or set to 0

I think that is fine but we shouldn't break backwards compatibility here since we plan to include this in 2.7

@pragnareddye
Copy link
Contributor Author

pragnareddye commented Jul 13, 2023

@kaxil @potiuk @pankajkoti @uranusjr Made the fix backwards compatible. Build is passing, All good to go

Comment on lines 1198 to 1201
# The frequency (in seconds) at which the LocalTaskJob should send heartbeat signals to the
# scheduler to notify it's still alive. If this value is set to 0, the heartbeat interval will default
# to scheduler's `scheduler_zombie_task_threshold` config.
local_task_job_heartbeat_sec = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to want the heartbeat interval to be actually zero (i.e. as soon as possible)? I’m wondering if -1 ia better default.

Suggested change
# The frequency (in seconds) at which the LocalTaskJob should send heartbeat signals to the
# scheduler to notify it's still alive. If this value is set to 0, the heartbeat interval will default
# to scheduler's `scheduler_zombie_task_threshold` config.
local_task_job_heartbeat_sec = 0
# The frequency (in seconds) at which the LocalTaskJob should send heartbeat signals to the
# scheduler to notify it's still alive. If this value is set to 0, the heartbeat interval will default
# to the value of scheduler_zombie_task_threshold.
local_task_job_heartbeat_sec = 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely considered -1 but if someone sets the value as 0 then we would have to consider that. Since value 0 as a heartbeat value doesn't work, I think 0 makes sense. What do you think?

@potiuk
Copy link
Member

potiuk commented Jul 15, 2023

LGTM. Good thinking about back-compatibility.

@potiuk potiuk merged commit 9b466bd into apache:main Jul 15, 2023
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Aug 2, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.7.0 milestone Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants