Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] sky down --purge does not work for k8s when switched to a new cluster #3093

Closed
romilbhardwaj opened this issue Feb 5, 2024 · 2 comments

Comments

@romilbhardwaj
Copy link
Collaborator

romilbhardwaj commented Feb 5, 2024

sky down --purge for k8s stays stuck on Terminating 1 cluster when a new k8s cluster is connected. Looks like a recent regression, since it works fine on 0.4.1.

Some initial digging indicates this call to ray stop on the new provisioner gets stuck:

# Stop the ray autoscaler first to avoid the head node trying to
# re-launch the worker nodes, during the termination of the
# cluster.
try:
# We do not check the return code, since Ray returns
# non-zero return code when calling Ray stop,
# even when the command was executed successfully.
self.run_on_head(handle, 'ray stop --force')
except exceptions.FetchIPError:

Related to #3013.

Repro:

  1. Create GKE cluster
  2. sky launch -c test
  3. sky local up to switch k8s cluster identity
  4. sky down test --purge stays stuck.
@romilbhardwaj romilbhardwaj changed the title [k8s] sky down --purge does not work for k8s [k8s] sky down --purge does not work for k8s when switched to a new cluster Feb 5, 2024
@romilbhardwaj
Copy link
Collaborator Author

Raised by user - should fix this asap.

@romilbhardwaj
Copy link
Collaborator Author

Closed by #3043.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant