Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Heartbeat Parameter to Allow Per-LocalTaskJob Configuration #32313

Merged
merged 24 commits into from
Jul 15, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions airflow/config_templates/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2268,6 +2268,15 @@ scheduler:
type: string
example: ~
default: "5"
local_task_job_heartbeat_sec:
description: |
The frequency (in seconds) at which the LocalTaskJob should send heartbeat signals to the
scheduler to notify it's still alive. The value of this parameter should be less than the
scheduler's `scheduler_zombie_task_threshold` config.
version_added: ~
uranusjr marked this conversation as resolved.
Show resolved Hide resolved
type: int
example: ~
default: "5"
num_runs:
description: |
The number of times to try to schedule each DAG file
Expand Down
6 changes: 6 additions & 0 deletions airflow/config_templates/default_airflow.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -1173,6 +1173,12 @@ job_heartbeat_sec = 5
# how often the scheduler should run (in seconds).
scheduler_heartbeat_sec = 5

# The heartbeat frequency (in seconds) at which the scheduler will check the state of the LocalTaskJob.
# A high value could lead to the scheduler not noticing a "zombie" LocalTaskJob quickly,
# while a low value could mean more frequent database queries.
# This setting is used to detect tasks that hang or tasks that have been erroneously marked as running but are not actually running.
local_task_job_heartbeat_sec = 5
pragnareddye marked this conversation as resolved.
Show resolved Hide resolved

# The number of times to try to schedule each DAG file
# -1 indicates unlimited number
num_runs = -1
Expand Down
2 changes: 1 addition & 1 deletion airflow/jobs/local_task_job_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ def sigusr2_debug_handler(signum, frame):
try:
self.task_runner.start()

heartbeat_time_limit = conf.getint("scheduler", "scheduler_zombie_task_threshold")
heartbeat_time_limit = conf.getint("scheduler", "local_task_job_heartbeat_sec")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why a different heartbeat was used here instead of the jobs heartrate. Do we really need this configuration or we should use JOB_HEARTBEAT_SEC? It's already set for this localtaskjob, If there's no apparent reason for a different configuration, I think we should use self.job.heartrate for this without any other configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ephraim! For us atleast, when we are running our jobs, we have these default values set and having these default values set differently for different job types is very convenient for us. Anyways the idea of this is also coming from #30908. Definitely open to discuss however


# LocalTaskJob should not run callbacks, which are handled by TaskInstance._run_raw_task
# 1, LocalTaskJob does not parse DAG, thus cannot run callbacks
Expand Down