-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8s][ux] Auto-exclude stale Kubernetes cloud #2807
Comments
This is also related to #3013 |
Going to self-assign and work on this! |
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue was closed because it has been stalled for 10 days with no activity. |
I'm running into this after having renewed my k8s cert in my kube config. I can see the pods as unhealthy, there might have been another isuse. However, I'm unable to start new clusters on said k8s due this error. UpdateIt seems like the error is actually swallowing a real error - in my case |
Thanks for the report @chris-aeviator - that sounds bad. Can you share the full output log and the commands you ran so I can reproduce it? |
Nvm @chris-aeviator, I can reproduce this. Looks like a recent regression from #4443. Being fixed in #4514 - can you give that branch a try and see if it fixes your issue too? |
I often terminate a Kubernetes cluster externally using the cloud console/cli (e.g.,
gcloud container clusters delete <cluster-name> --region us-central1-c
), but I forget to run sky check to update the list of enabled clouds.As a result, the next
sky launch
fails:We should consider printing a warning and continuing by either:
Kubernetes
from the list of clouds considered by the optimizerKubernetes
from the list of enabled clouds stored in global user state.1 is less aggressive and doesn't require user to re-run
sky check
in case it is a transient failure.The text was updated successfully, but these errors were encountered: