-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8s] Add sky status
flag to query global Kubernetes status
#4040
Conversation
Updated to query job status from all running controllers:
Still needs clean up and edge case handling. |
…o k8s_global_status # Conflicts: # sky/data/storage_utils.py
UX LGTM; quick nits:
I tried launching a managed job on the same shared k8s cluster, and the job loops forever in starting. Controller logs:
|
Fixed UX comments and added error handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @romilbhardwaj.
sky/jobs/core.py
Outdated
def queue_kubernetes(pod_name: str, | ||
context: Optional[str] = None, | ||
skip_finished: bool = False) -> List[Dict[str, Any]]: | ||
"""Gets the jobs queue from a specific controller pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming is surprising. From the name I thought it's "gets the queue info for an entire k8s cluster". Maybe rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to queue_from_kubernetes_pod
Adds a
--kubernetes
flag tosky status
to show the global state of the Kubernetes cluster, including SkyPilot clusters created by other users. Helps users see the current state of the Kubernetes cluster.Example:
TODO:
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh