[k8s][ux] Auto-exclude stale Kubernetes cloud #2807

romilbhardwaj · 2023-11-21T02:44:31Z

I often terminate a Kubernetes cluster externally using the cloud console/cli (e.g., gcloud container clusters delete <cluster-name> --region us-central1-c), but I forget to run sky check to update the list of enabled clouds.

As a result, the next sky launch fails:

sky.exceptions.ResourcesUnavailableError: Timed out when trying to get node info from Kubernetes cluster. Please check if the cluster is healthy and retry.

We should consider printing a warning and continuing by either:

Excluding Kubernetes from the list of clouds considered by the optimizer
Removing Kubernetes from the list of enabled clouds stored in global user state.

1 is less aggressive and doesn't require user to re-run sky check in case it is a transient failure.

The text was updated successfully, but these errors were encountered:

Michaelvll · 2024-02-05T06:42:56Z

This is also related to #3013

kbrgl · 2024-02-24T20:49:19Z

Going to self-assign and work on this!

github-actions · 2024-06-24T01:50:12Z

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2024-10-23T01:59:17Z

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2024-11-03T02:03:46Z

This issue was closed because it has been stalled for 10 days with no activity.

chris-aeviator · 2024-12-29T00:28:16Z

I'm running into this after having renewed my k8s cert in my kube config. I can see the pods as unhealthy, there might have been another isuse.

However, I'm unable to start new clusters on said k8s due this error.

Update

It seems like the error is actually swallowing a real error - in my case BAD_BASE64_DECODE - which in my case I can only see when executing the purge command

romilbhardwaj · 2024-12-29T01:55:08Z

Thanks for the report @chris-aeviator - that sounds bad. Can you share the full output log and the commands you ran so I can reproduce it?

romilbhardwaj · 2024-12-29T02:24:04Z

Nvm @chris-aeviator, I can reproduce this. Looks like a recent regression from #4443. Being fixed in #4514 - can you give that branch a try and see if it fixes your issue too?

romilbhardwaj added the k8s Kubernetes related items label Nov 21, 2023

romilbhardwaj mentioned this issue Jan 10, 2024

[k8s] sky start/launch fails if kubeconfig is missing and k8s is enabled #2543

Closed

github-actions bot added the Stale label Jun 24, 2024

romilbhardwaj removed the Stale label Jun 24, 2024

github-actions bot added the Stale label Oct 23, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2024

romilbhardwaj reopened this Dec 29, 2024

github-actions bot removed the Stale label Dec 29, 2024

romilbhardwaj mentioned this issue Dec 29, 2024

[k8s] Fix --purge not cleaning up cluster in stale k8s context #4514

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[k8s][ux] Auto-exclude stale Kubernetes cloud #2807

[k8s][ux] Auto-exclude stale Kubernetes cloud #2807

romilbhardwaj commented Nov 21, 2023

Michaelvll commented Feb 5, 2024 •

edited

Loading

kbrgl commented Feb 24, 2024

github-actions bot commented Jun 24, 2024

github-actions bot commented Oct 23, 2024

github-actions bot commented Nov 3, 2024

chris-aeviator commented Dec 29, 2024 •

edited

Loading

romilbhardwaj commented Dec 29, 2024

romilbhardwaj commented Dec 29, 2024

[k8s][ux] Auto-exclude stale Kubernetes cloud #2807

[k8s][ux] Auto-exclude stale Kubernetes cloud #2807

Comments

romilbhardwaj commented Nov 21, 2023

Michaelvll commented Feb 5, 2024 • edited Loading

kbrgl commented Feb 24, 2024

github-actions bot commented Jun 24, 2024

github-actions bot commented Oct 23, 2024

github-actions bot commented Nov 3, 2024

chris-aeviator commented Dec 29, 2024 • edited Loading

Update

romilbhardwaj commented Dec 29, 2024

romilbhardwaj commented Dec 29, 2024

Michaelvll commented Feb 5, 2024 •

edited

Loading

chris-aeviator commented Dec 29, 2024 •

edited

Loading