Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] sky status --kubernetes docs #4064

Merged
merged 2 commits into from
Oct 14, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 53 additions & 4 deletions docs/source/reference/kubernetes/kubernetes-setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -261,9 +261,19 @@ You can also check the GPUs available on your nodes by running:
.. code-block:: console

$ sky show-gpus --cloud kubernetes
Kubernetes GPUs
GPU QTY_PER_NODE TOTAL_GPUS TOTAL_FREE_GPUS
L4 1, 2, 3, 4 8 6
H100 1, 2 4 2
L4 1, 2, 4 12 12
H100 1, 2, 4, 8 16 16

Kubernetes per node GPU availability
NODE_NAME GPU_NAME TOTAL_GPUS FREE_GPUS
my-cluster-0 L4 4 4
my-cluster-1 L4 4 4
my-cluster-2 L4 2 2
my-cluster-3 L4 2 2
my-cluster-4 H100 8 8
my-cluster-5 H100 8 8


.. _kubernetes-observability:
Expand All @@ -274,8 +284,47 @@ All SkyPilot tasks are run in pods inside a Kubernetes cluster. As a cluster adm
you can inspect running pods (e.g., with :code:`kubectl get pods -n namespace`) to check which
tasks are running and how many resources they are consuming on the cluster.

Additionally, you can also deploy tools such as the `Kubernetes dashboard <https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/>`_ for easily viewing and managing
SkyPilot tasks running on your cluster.
Below, we provide tips on how to monitor SkyPilot resources on your Kubernetes cluster.

.. _kubernetes-observability-skystatus:

SkyPilot CLI
romilbhardwaj marked this conversation as resolved.
Show resolved Hide resolved
^^^^^^^^^^^^

We provide a convenience command, :code:`sky status --kubernetes`, to view the status of all SkyPilot resources in the cluster.
romilbhardwaj marked this conversation as resolved.
Show resolved Hide resolved

Unlike :code:`sky status` which lists only the SkyPilot resources launched by the current user,
:code:`sky status --kubernetes` lists all SkyPilot resources in the cluster across all users.

.. code-block:: console

$ sky status --kubernetes
Kubernetes cluster state (context: mycluster)
SkyPilot clusters
USER NAME LAUNCHED RESOURCES STATUS
alice infer-svc-1 23 hrs ago 1x Kubernetes(cpus=1, mem=1, {'L4': 1}) UP
alice sky-jobs-controller-80b50983 2 days ago 1x Kubernetes(cpus=4, mem=4) UP
alice sky-serve-controller-80b50983 23 hrs ago 1x Kubernetes(cpus=4, mem=4) UP
bob dev 1 day ago 1x Kubernetes(cpus=2, mem=8, {'H100': 1}) UP
bob multinode-dev 1 day ago 2x Kubernetes(cpus=2, mem=2) UP
bob sky-jobs-controller-2ea485ea 2 days ago 1x Kubernetes(cpus=4, mem=4) UP

Managed jobs
In progress tasks: 1 STARTING
USER ID TASK NAME RESOURCES SUBMITTED TOT. DURATION JOB DURATION #RECOVERIES STATUS
alice 1 - eval 1x[CPU:1+] 2 days ago 49s 8s 0 SUCCEEDED
bob 4 - pretrain 1x[H100:4] 1 day ago 1h 1m 11s 1h 14s 0 SUCCEEDED
bob 3 - bigjob 1x[CPU:16] 1 day ago 1d 21h 11m 4s - 0 STARTING
bob 2 - failjob 1x[CPU:1+] 1 day ago 54s 9s 0 FAILED
bob 1 - shortjob 1x[CPU:1+] 2 days ago 1h 1m 19s 1h 16s 0 SUCCEEDED


.. _kubernetes-observability-dashboard:

Kubernetes Dashboard
^^^^^^^^^^^^^^^^^^^^
You can deploy tools such as the `Kubernetes dashboard <https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/>`_ to easily view and manage
SkyPilot resources on your cluster.

.. image:: ../../images/screenshots/kubernetes/kubernetes-dashboard.png
:width: 80%
Expand Down
Loading