Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Admin policy enforcement plugin #3966

Merged
merged 67 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
cb28b8d
support policy hook
Michaelvll Sep 19, 2024
b64efa0
test task labels
Michaelvll Sep 19, 2024
cf89929
Add test for policy that sets labels
Michaelvll Sep 20, 2024
54c93ea
Fix comment
Michaelvll Sep 20, 2024
1d1c500
format
Michaelvll Sep 20, 2024
a0bdb2c
use -e to make test related files visible
Michaelvll Sep 20, 2024
543e66a
Add config.rst
Michaelvll Sep 20, 2024
520a2a1
Fix test
Michaelvll Sep 20, 2024
b533351
fix config rst
Michaelvll Sep 20, 2024
466f7fe
Apply policy to service
Michaelvll Sep 20, 2024
050dc7a
add policy for serving
Michaelvll Sep 20, 2024
31e0174
Add docs
Michaelvll Sep 20, 2024
0c74f2a
fix
Michaelvll Sep 20, 2024
48a6cc9
format
Michaelvll Sep 20, 2024
1ca5a8a
Update interface
Michaelvll Sep 20, 2024
14b2346
fix
Michaelvll Sep 21, 2024
cb39c73
Fix
Michaelvll Sep 21, 2024
1e3ddef
fix
Michaelvll Sep 21, 2024
aa87df7
Fix test config
Michaelvll Sep 21, 2024
28487a4
Fix mutated config
Michaelvll Sep 21, 2024
d1f0480
fix
Michaelvll Sep 21, 2024
f42ace5
Add policy doc
Michaelvll Sep 21, 2024
c04f3dc
rename
Michaelvll Sep 21, 2024
58f413c
minor
Michaelvll Sep 21, 2024
52053bd
Add additional arguments for autostop
Michaelvll Sep 21, 2024
4a4f682
fix mypy
Michaelvll Sep 21, 2024
a8d1c44
format
Michaelvll Sep 22, 2024
6c73d81
rejected message
Michaelvll Sep 22, 2024
247c0b8
format
Michaelvll Sep 22, 2024
f8a5a64
Update sky/utils/policy_utils.py
Michaelvll Sep 22, 2024
73a4581
Update sky/utils/policy_utils.py
Michaelvll Sep 22, 2024
d78a822
Fix
Michaelvll Sep 22, 2024
8cc963c
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 22, 2024
68275f6
Update examples/admin_policy/example_policy/example_policy/__init__.py
Michaelvll Sep 22, 2024
9644622
Update docs/source/reference/config.rst
Michaelvll Sep 22, 2024
17f8fa1
Address comments
Michaelvll Sep 22, 2024
07c4748
format
Michaelvll Sep 22, 2024
15f1062
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 22, 2024
994272b
changes in examples
Michaelvll Sep 22, 2024
3597dae
Fix enforce autostop
Michaelvll Sep 22, 2024
43a6088
Fix autostop enforcement
Michaelvll Sep 22, 2024
8770d0b
fix test
Michaelvll Sep 22, 2024
7984beb
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
d155d60
Update sky/admin_policy.py
Michaelvll Sep 23, 2024
6ffa5ae
Update sky/admin_policy.py
Michaelvll Sep 23, 2024
a6dd900
wip
Michaelvll Sep 23, 2024
4274287
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
0609482
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
67552d7
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
7de757e
fix
Michaelvll Sep 23, 2024
8443ddc
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 23, 2024
7fbc30d
fix
Michaelvll Sep 23, 2024
92b68fc
fix
Michaelvll Sep 23, 2024
7d8af9a
Use sky.status for autostop
Michaelvll Sep 23, 2024
5b37f47
update policy
Michaelvll Sep 23, 2024
c7af310
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
cb232a8
fix policy.rst
Michaelvll Sep 23, 2024
5e9f544
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 23, 2024
deb4c92
Add comment
Michaelvll Sep 23, 2024
cbff59d
Fix logging
Michaelvll Sep 23, 2024
1fe350a
fix CI
Michaelvll Sep 23, 2024
2e8e41c
Update docs/source/cloud-setup/policy.rst
Michaelvll Sep 23, 2024
aae42ce
Use sphnix inline code
Michaelvll Sep 23, 2024
73c8fb7
Merge branch 'policy-hook' of github.com:skypilot-org/skypilot into p…
Michaelvll Sep 23, 2024
11bbd5e
Add comment
Michaelvll Sep 23, 2024
3630535
fix skypilot config file mounts for jobs and serve
Michaelvll Sep 23, 2024
e020dea
Merge branch 'master' of github.com:skypilot-org/skypilot into policy…
Michaelvll Sep 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 96 additions & 38 deletions docs/source/cloud-setup/policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,22 @@ Admin Policy Enforcement
========================


SkyPilot allows admins to enforce policies on users' SkyPilot usage by applying
custom validation and mutation logic on user's task and SkyPilot config.
SkyPilot provides an **admin policy** mechanism that admins can use to enforce certain policies on users' SkyPilot usage. An admin policy applies
custom validation and mutation logic to a user's tasks and SkyPilot config.

Example usage:

- Adds custom labels to all tasks [Link to below, fix case]
- Always Disable Public IP for AWS Tasks [Link to below]
- Enforce Autostop for all Tasks [Link to below]
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved


To implement and use an admin policy:

- Admins writes a simple Python package with a policy class that implements SkyPilot's ``sky.AdminPolicy`` interface;
- Admins distributes this package to users;
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved
- Users simply set the ``admin_policy`` field in the SkyPilot config file ``~/.sky/config.yaml`` for the policy to go into effect.

In short, admins offers a Python package with a customized inheritance of SkyPilot's
``AdminPolicy`` interface, and a user just needs to set the ``admin_policy`` field in
the SkyPilot config ``~/.sky/config.yaml`` to enforce the policy to all their
tasks.

Overview
--------
Expand All @@ -32,7 +41,7 @@ For example:
.. hint::

SkyPilot loads the policy from the given package in the same Python environment.
You can test the existance of the policy by running:
You can test the existence of the policy by running:

.. code-block:: bash

Expand All @@ -42,8 +51,8 @@ For example:
Admin-Side
~~~~~~~~~~

An admin can distribute the Python package to users with pre-defined policy. The
policy should follow the following interface:
An admin can distribute the Python package to users with a pre-defined policy. The
policy should implement the `sky.AdminPolicy` `interface <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_:

.. code-block:: python

Expand All @@ -52,44 +61,52 @@ policy should follow the following interface:
class MyPolicy(sky.AdminPolicy):
@classmethod
def validate_and_mutate(cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
# Logics for validate and modify user requests.
# Logic for validate and modify user requests.
...
return sky.MutatedUserRequest(user_request.task,
user_request.skypilot_config)


``UserRequest`` and ``MutatedUserRequest`` are defined as follows:
``UserRequest`` and ``MutatedUserRequest`` are defined as follows (see `source code <https://github.com/skypilot-org/skypilot/blob/master/sky/admin_policy.py>`_ for more details):

.. code-block:: python

class UserRequest:
"""User request to the policy.
"""A user request.

A "user request" is defined as a `sky launch / exec` command or its API
equivalent.

It is a combination of a task, request options, and the global skypilot
config used to run a task, including `sky launch / exec / jobs launch / ..`.
`sky jobs launch / serve up` involves multiple launch requests, including
the launch of controller and clusters for a job (which can have multiple
tasks if it is a pipeline) or service replicas. Each launch is a separate
request.

This class wraps the underlying task, the global skypilot config used to run
a task, and the request options.

Args:
task: User specified task.
skypilot_config: Global skypilot config to be used in this request.
request_options: Request options. It can be None for jobs and
services.
request_options: Request options. It is None for jobs and services.
"""
task: sky.Task
skypilot_config: sky.Config
operation_args: sky.RequestOptions
task: 'sky.Task'
skypilot_config: 'sky.Config'
request_options: Optional['RequestOptions'] = None


class MutatedUserRequest:
task: sky.Task
skypilot_config: sky.Config
task: 'sky.Task'
skypilot_config: 'sky.Config'

That said, an ``AdminPolicy`` can mutate any fields of a user request, including
In other words, an ``AdminPolicy`` can mutate any fields of a user request, including
the :ref:`task <yaml-spec>` and the :ref:`global skypilot config <config-yaml>`,
giving admins a lot of flexibility to control user's SkyPilot usage.

An ``AdminPolicy`` is responsible to both validate and mutate user requests. If
An ``AdminPolicy`` can be used to both validate and mutate user requests. If
a request should be rejected, the policy should raise an exception.

The ``sky.Config`` and ``sky.RequestOptions`` are defined as follows:
The ``sky.Config`` and ``sky.RequestOptions`` classes are defined as follows:

.. code-block:: python

Expand All @@ -110,11 +127,19 @@ The ``sky.Config`` and ``sky.RequestOptions`` are defined as follows:
"""Sets a value with nested keys."""
...

@dataclass
class RequestOptions:
"""Options a user specified in their request to SkyPilot."""
"""Request options for admin policy.

Args:
cluster_name: Name of the cluster to create/reuse.
cluster_running: Whether the cluster is running.
idle_minutes_to_autostop: If provided, the cluster will be set to
autostop after this many minutes of idleness.
down: If true, use autodown rather than autostop.
dryrun: Is the request a dryrun?
"""
# Cluster name is None if not specified by the user.
cluster_name: Optional[str]
cluster_exists: bool
idle_minutes_to_autostop: Optional[int]
down: bool
dryrun: bool
Expand All @@ -123,6 +148,14 @@ The ``sky.Config`` and ``sky.RequestOptions`` are defined as follows:
Example Policies
----------------

We have provided a few example policies in `examples/admin_policy/example_policy <https://github.com/skypilot-org/skypilot/tree/master/examples/admin_policy/example_policy>`_. You can test these policies by installing the example policy package in your Python environment.

.. code-block:: bash

git clone https://github.com/skypilot-org/skypilot.git
cd skypilot
pip install examples/admin_policy/example_policy

Reject All
~~~~~~~~~~

Expand All @@ -138,7 +171,7 @@ Reject All

.. code-block:: yaml

admin_policy: examples.admin_policy.reject_all.RejectAllPolicy
admin_policy: example_policy.RejectAllPolicy


Add Kubernetes Labels for all Tasks
Expand All @@ -159,7 +192,7 @@ Add Kubernetes Labels for all Tasks

.. code-block:: yaml

admin_policy: examples.admin_policy.add_labels.AddLabelsPolicy
admin_policy: example_policy.AddLabelsPolicy


Always Disable Public IP for AWS Tasks
Expand All @@ -183,7 +216,7 @@ Always Disable Public IP for AWS Tasks

.. code-block:: yaml

admin_policy: examples.admin_policy.disable_public_ip.DisablePublicIPPolicy
admin_policy: example_policy.DisablePublicIPPolicy


Enforce Autostop for all Tasks
Expand All @@ -197,26 +230,51 @@ Enforce Autostop for all Tasks
@classmethod
def validate_and_mutate(
cls, user_request: sky.UserRequest) -> sky.MutatedUserRequest:
"""Enforces autostop for all tasks."""
"""Enforces autostop for all tasks.

Note that with this policy enforced, users can still change the autostop
setting for an existing cluster by using `sky autostop`.
"""
request_options = user_request.request_options

# Request options is None when a task is executed with `jobs launch` or
# `sky serve up`.
if request_options is None:
return sky.MutatedUserRequest(
task=user_request.task,
skypilot_config=user_request.skypilot_config)

# Get the cluster record to operate on.
cluster_record = sky.status(request_options.cluster_name, refresh=True)

# Check if the user request should specify autostop settings.
need_autostop = False
if not cluster_record:
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved
# Cluster does not exist
need_autostop = True
elif cluster_record[0]['status'] == sky.ClusterStatus.STOPPED:
# Cluster is stopped
need_autostop = True
elif cluster_record[0]['autostop'] < 0:
# Cluster is running but autostop is not set
need_autostop = True

# Check if the user request is setting autostop settings.
is_setting_autostop = False
idle_minutes_to_autostop = request_options.idle_minutes_to_autostop
# Enforce autostop/down to be set for all tasks for new clusters.
if not request_options.cluster_running and (
idle_minutes_to_autostop is None or
idle_minutes_to_autostop < 0):
raise RuntimeError('Autostop/down must be set for all newly '
'launched clusters.')
is_setting_autostop = (idle_minutes_to_autostop is not None and
idle_minutes_to_autostop >= 0)

# If the cluster requires autostop but the user request is not setting
# autostop settings, raise an error.
if need_autostop and not is_setting_autostop:
raise RuntimeError('Autostop/down must be set for all clusters.')

return sky.MutatedUserRequest(
task=user_request.task,
skypilot_config=user_request.skypilot_config)


.. code-block:: yaml

admin_policy: examples.admin_policy.enforce_autostop.EnforceAutostopPolicy
admin_policy: example_policy.EnforceAutostopPolicy
3 changes: 2 additions & 1 deletion docs/source/reference/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,11 @@ Available fields and semantics:
# Default: false.
disable_ecc: false

# Custom policy to be applied to all tasks. (optional).
# Admin policy to be applied to all tasks. (optional).
#
# The policy class to be applied to all tasks, which can be used to validate
# and mutate user requests.
#
# This is useful for enforcing certain policies on all tasks, e.g.,
# add custom labels; enforce certain resource limits; etc.
#
Expand Down
1 change: 1 addition & 0 deletions examples/admin_policy/add_labels.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.AddLabelsPolicy
1 change: 0 additions & 1 deletion examples/admin_policy/config_label_config.yaml

This file was deleted.

1 change: 1 addition & 0 deletions examples/admin_policy/disable_public_ip.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
admin_policy: example_policy.DisablePublicIPPolicy
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Example admin policy module and prebuilt policies."""

from example_policy.skypilot_policy import ConfigLabelPolicy
from example_policy.skypilot_policy import AddLabelsPolicy
from example_policy.skypilot_policy import DisablePublicIPPolicy
from example_policy.skypilot_policy import EnforceAutostopPolicy
from example_policy.skypilot_policy import RejectAllPolicy
from example_policy.skypilot_policy import TaskLabelPolicy
from example_policy.skypilot_policy import UseSpotForGPUPolicy
Loading
Loading