Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX] Reset autostop timer whenever autostop config changes on remote machine #3205

Merged
merged 8 commits into from
Feb 29, 2024
Merged
20 changes: 7 additions & 13 deletions sky/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -2559,17 +2559,12 @@ def autostop(

- The cluster has restarted.

- An autostop is set when there is no active setting. (Namely, either
there's never any autostop setting set, or the previous autostop setting
was canceled.) This is useful for restarting the autostop timer.
- An autostop idle time is set.

Example: say a cluster without any autostop set has been idle for 1 hour,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep/update this scenario?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario shoud be similar as the new scenario we have. Do you think it is needed to keep this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd help the users/readers. Even though the technical impl is the same, it's "setting autostop for the first time" vs. "resetting autostop" scenarios.

then an autostop of 30 minutes is set. The cluster will not be immediately
autostopped. Instead, the idleness timer only starts counting after the
autostop setting was set.

When multiple autostop settings are specified for the same cluster, the
last setting takes precedence.
Example: say a cluster with autostop set to 2 hours has been idle for 1
hour, then autostop is reset to 30 minutes. The cluster will not be
immediately autostopped. Instead, the idleness timer restarts counting
when the second autostop setting of 30 minutes was submitted.

Typical usage:

Expand All @@ -2581,9 +2576,8 @@ def autostop(
# Cancel autostop for a specific cluster.
sky autostop cluster_name --cancel
\b
# Since autostop was canceled in the last command, idleness will
# restart counting after this command.
sky autostop cluster_name -i 60
# Autodown this cluster after 60 minutes of idleness.
sky autostop cluster_name -i 60 --down
"""
if cancel and idle_minutes is not None:
raise click.UsageError(
Expand Down
8 changes: 3 additions & 5 deletions sky/skylet/autostop_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,11 @@ def get_autostop_config() -> AutostopConfig:
def set_autostop(idle_minutes: int, backend: Optional[str], down: bool) -> None:
boot_time = psutil.boot_time()
autostop_config = AutostopConfig(idle_minutes, boot_time, backend, down)
prev_autostop_config = get_autostop_config()
configs.set_config(_AUTOSTOP_CONFIG_KEY, pickle.dumps(autostop_config))
logger.debug(f'set_autostop(): idle_minutes {idle_minutes}, down {down}.')
if (prev_autostop_config.autostop_idle_minutes < 0 or
prev_autostop_config.boot_time != psutil.boot_time()):
concretevitamin marked this conversation as resolved.
Show resolved Hide resolved
# Either autostop never set, or has been canceled. Reset timer.
set_last_active_time_to_now()
# Reset timer whenever an autostop setting is submitted, i.e. the idle
# time will be counted from now.
set_last_active_time_to_now()


def set_autostopping_started() -> None:
Expand Down
2 changes: 1 addition & 1 deletion sky/skylet/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
# cluster yaml is updated.
#
# TODO(zongheng,zhanghao): make the upgrading of skylet automatic?
SKYLET_VERSION = '7'
SKYLET_VERSION = '8'
# The version of the lib files that skylet/jobs use. Whenever there is an API
# change for the job_lib or log_lib, we need to bump this version, so that the
# user can be notified to update their SkyPilot version on the remote cluster.
Expand Down
11 changes: 5 additions & 6 deletions tests/test_smoke.py
Original file line number Diff line number Diff line change
Expand Up @@ -1713,7 +1713,7 @@ def test_autostop(generic_cloud: str):
f'sky status | grep {name} | grep "1m"',

# Ensure the cluster is not stopped early.
'sleep 20',
'sleep 40',
f's=$(sky status {name} --refresh); echo "$s"; echo; echo; echo "$s" | grep {name} | grep UP',

# Ensure the cluster is STOPPED.
Expand All @@ -1728,12 +1728,11 @@ def test_autostop(generic_cloud: str):
f'sky exec {name} tests/test_yamls/minimal.yaml',
f'sky logs {name} 2 --status',

# Test restarting the idleness timer via cancel + reset:
# Test restarting the idleness timer via reset:
f'sky autostop -y {name} -i 1', # Idleness starts counting.
'sleep 30', # Almost reached the threshold.
f'sky autostop -y {name} --cancel',
'sleep 40', # Almost reached the threshold.
f'sky autostop -y {name} -i 1', # Should restart the timer.
'sleep 30',
'sleep 40',
f's=$(sky status {name} --refresh); echo "$s"; echo; echo; echo "$s" | grep {name} | grep UP',
f'sleep {autostop_timeout}',
f's=$(sky status {name} --refresh); echo "$s"; echo; echo; echo "$s" | grep {name} | grep STOPPED',
Expand Down Expand Up @@ -1772,7 +1771,7 @@ def test_autodown(generic_cloud: str):
# Ensure autostop is set.
f'sky status | grep {name} | grep "1m (down)"',
# Ensure the cluster is not terminated early.
'sleep 30',
'sleep 40',
f's=$(sky status {name} --refresh); echo "$s"; echo; echo; echo "$s" | grep {name} | grep UP',
# Ensure the cluster is terminated.
f'sleep {autodown_timeout}',
Expand Down
Loading